90
Scaling DB Access for 100s of billions of queries per day 1

Scaling DB Access for 100s of billions of queries per day · 25.06.2019  · Microservice connect request request request disconnect request response Typical Database Access by Application

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Scaling DB Access for 100s of billions of queries per day · 25.06.2019  · Microservice connect request request request disconnect request response Typical Database Access by Application

Scaling DB Access for 100s of billions of queries per day

1

Page 2: Scaling DB Access for 100s of billions of queries per day · 25.06.2019  · Microservice connect request request request disconnect request response Typical Database Access by Application

TLDR - Hera

Open Source

High Eff iciency

Reliable Access

to data stores

2

Out Of DB Connections?

• Scaling

• Multiplexing

• Sharding

• Resiliency

• Manageability

• Our switch to Go Lang

Page 3: Scaling DB Access for 100s of billions of queries per day · 25.06.2019  · Microservice connect request request request disconnect request response Typical Database Access by Application

About Petrica

• Works @ PayPal

• developing the database proxy.

• database backed publish-subscriber messaging system

• PayPal Here backend

• Golang contributor

• Previously @

• Worked on distributed systems, embedded system, desktop apps

©2019 PayPal Inc. Confidential and proprietary. 3

Page 4: Scaling DB Access for 100s of billions of queries per day · 25.06.2019  · Microservice connect request request request disconnect request response Typical Database Access by Application

About Kenneth

• Works @ PayPal

• paypal.de

• PayPal Verisign Two Factor Authentication

• Coded shared memory for sharding

• Migrated to Go (hera)

• Previously @

• Visualization of Genomic Metabolism, Artificial Chemistry, and EPICS

Monitoring

©2019 PayPal Inc. Confidential and proprietary. 4

Page 5: Scaling DB Access for 100s of billions of queries per day · 25.06.2019  · Microservice connect request request request disconnect request response Typical Database Access by Application

©2019 PayPal Inc. Confidential and proprietary. 5

Page 6: Scaling DB Access for 100s of billions of queries per day · 25.06.2019  · Microservice connect request request request disconnect request response Typical Database Access by Application

About PayPal – 2018 facts

©2019 PayPal Inc. Confidential and proprietary. 6

100+Currencies

200+Markets

267MActive Customer Accounts

9.9BPayments Transactions

$578B Total Payments Volume

Up 27% YoY*Total Payment Volume

2,700Applications

4,500Engineers

17,000Releases

Page 7: Scaling DB Access for 100s of billions of queries per day · 25.06.2019  · Microservice connect request request request disconnect request response Typical Database Access by Application

PayPal OLTP Environment scale

©2015 PayPal Inc. Confidential and proprietary. 7

2000+ Database Instances

100+ Billion Calls/day

3000+Database Nodes

74 PB Total Storage

300+ Schemas

10 +Availability Zones

200,000 +Application Servers

1+ Billion Calls/hour

Page 8: Scaling DB Access for 100s of billions of queries per day · 25.06.2019  · Microservice connect request request request disconnect request response Typical Database Access by Application

8

Key Take-aways

• Massive scale

• Complexity

• Growth

Page 9: Scaling DB Access for 100s of billions of queries per day · 25.06.2019  · Microservice connect request request request disconnect request response Typical Database Access by Application

©2019 PayPal Inc. Confidential and proprietary. 9

Mic ro se rv i ce

connect

request

request

request

disconnect

request

response

Typical Database Access by Application

Page 10: Scaling DB Access for 100s of billions of queries per day · 25.06.2019  · Microservice connect request request request disconnect request response Typical Database Access by Application

©2019 PayPal Inc. Confidential and proprietary. 10

Mic ro se rv i ce

connect

request

request

request

disconnect

request

response

100 ms

1 ms

1 ms

1 ms

103 ms

Typical Database Access by Application

Page 11: Scaling DB Access for 100s of billions of queries per day · 25.06.2019  · Microservice connect request request request disconnect request response Typical Database Access by Application

©2019 PayPal Inc. Confidential and proprietary. 11

Mic ro se rv i ce

connect

request

request

request

request

response

100 ms

~0 ms

1 ms

1 ms

~3 ms

Po o l

1 ms

Connection is pre-established, cached and re-used for each request

release

lease

Typical Database Access by Application

Page 12: Scaling DB Access for 100s of billions of queries per day · 25.06.2019  · Microservice connect request request request disconnect request response Typical Database Access by Application

From Few to Thousands Microservices

©2019 PayPal Inc. Confidential and proprietary. 12

Mic ro se rv i ce

Configure the pool cache large

enough for availability

Page 13: Scaling DB Access for 100s of billions of queries per day · 25.06.2019  · Microservice connect request request request disconnect request response Typical Database Access by Application

From Few to Thousands Microservices

©2019 PayPal Inc. Confidential and proprietary. 13

Mic ro se rv i ce

M i c ro se rv i ce

Configure the pool cache large

enough for availability

Add more service when scaling up

Page 14: Scaling DB Access for 100s of billions of queries per day · 25.06.2019  · Microservice connect request request request disconnect request response Typical Database Access by Application

©2019 PayPal Inc. Confidential and proprietary. 14

Mic ro se rv i ce

Mic ro se rv i ce

M i c ro se rv i ce

Configure the pool cache large

enough for availability

Add more service when scaling up

From Few to Thousands Microservices

Page 15: Scaling DB Access for 100s of billions of queries per day · 25.06.2019  · Microservice connect request request request disconnect request response Typical Database Access by Application

©2019 PayPal Inc. Confidential and proprietary. 15

Mic ro se rv i ce

Mic ro se rv i ce

M i c ro se rv i ce

M i c ro se rv i ce

Configure the pool cache large

enough for availability

Add more service when scaling up

From Few to Thousands Microservices

Page 16: Scaling DB Access for 100s of billions of queries per day · 25.06.2019  · Microservice connect request request request disconnect request response Typical Database Access by Application

©2019 PayPal Inc. Confidential and proprietary. 16

Mic ro se rv i ce

Mic ro se rv i ce

Mic ro se rv i ce

M i c ro se rv i ce

M i c ro se rv i ce

Configure the pool cache large

enough for availability

Add more service when scaling up

Hit a connection limit

From Few to Thousands Microservices

Page 17: Scaling DB Access for 100s of billions of queries per day · 25.06.2019  · Microservice connect request request request disconnect request response Typical Database Access by Application

17

Solutions?

• Lower the connection pool

size in each microservice

node

• Improve microservice code

to use the connection more

efficiently

• Buy better database

hardware

Page 18: Scaling DB Access for 100s of billions of queries per day · 25.06.2019  · Microservice connect request request request disconnect request response Typical Database Access by Application

Partitioned Connection Pool

©2019 PayPal Inc. Confidential and proprietary. 18

Mic ro se rv i ce

M i c ro se rv i ce

Request

Request

While servicing requests, the

microservices use only some

DB connections

Page 19: Scaling DB Access for 100s of billions of queries per day · 25.06.2019  · Microservice connect request request request disconnect request response Typical Database Access by Application

Partitioned Connection Pool

©2019 PayPal Inc. Confidential and proprietary. 19

Mic ro se rv i ce

M i c ro se rv i ce

Request

Request

While servicing requests, the

microservices use only some

DB connections

In some node all the

connections can be temporary

busy

Page 20: Scaling DB Access for 100s of billions of queries per day · 25.06.2019  · Microservice connect request request request disconnect request response Typical Database Access by Application

Partitioned Connection Pool

©2019 PayPal Inc. Confidential and proprietary. 20

Mic ro se rv i ce

While servicing requests, the

microservices use only some

DB connections

In some node all the

connections can be temporary

busy

Client requests will incur

latency though the database

has enough capacity

Mic ro se rv i ce

Request

Request

Page 21: Scaling DB Access for 100s of billions of queries per day · 25.06.2019  · Microservice connect request request request disconnect request response Typical Database Access by Application

21

Solutions?

Scale with shared pool !!!

Page 22: Scaling DB Access for 100s of billions of queries per day · 25.06.2019  · Microservice connect request request request disconnect request response Typical Database Access by Application

S h a r e d c o n n e c t i o n p o o l

©2019 PayPal Inc. Confidential and proprietary. 22

Mic ro se rv i ce

M i c ro se rv i ce

Request

Request

Shared pool

Page 23: Scaling DB Access for 100s of billions of queries per day · 25.06.2019  · Microservice connect request request request disconnect request response Typical Database Access by Application

©2019 PayPal Inc. Confidential and proprietary. 23

Se rv i ce

Service request

Service

response

SQL request

SQL request

SQL request

SQL request

DB connection

Database connection is

allocated but mostly idle

Typical service request

Page 24: Scaling DB Access for 100s of billions of queries per day · 25.06.2019  · Microservice connect request request request disconnect request response Typical Database Access by Application

©2019 PayPal Inc. Confidential and proprietary. 24

Se rv i ce

Service request

Service

response

SQL request

SQL request

SQL request

SQL request

SQL request

SQL request

SQL request

DB connection

Serving other requests when client seems idle?

Page 25: Scaling DB Access for 100s of billions of queries per day · 25.06.2019  · Microservice connect request request request disconnect request response Typical Database Access by Application

Multiplexing

©2019 PayPal Inc. Confidential and proprietary. 25

Mic ro se rv i ce

M i c ro se rv i ce

Request

RequestSha red Po o l

Services perceive that they have a database connection

Page 26: Scaling DB Access for 100s of billions of queries per day · 25.06.2019  · Microservice connect request request request disconnect request response Typical Database Access by Application

Multiplexing

©2019 PayPal Inc. Confidential and proprietary. 26

Mic ro se rv i ce

M i c ro se rv i ce

Request

RequestSha red Po o l

Services perceive that they have a database connection

Page 27: Scaling DB Access for 100s of billions of queries per day · 25.06.2019  · Microservice connect request request request disconnect request response Typical Database Access by Application

Hera – Highly Efficient Reliable Access

©2019 PayPal Inc. Confidential and proprietary. 27

Mic ro se rv i ce

M i c ro se rv i ce

Request

Request

Hera

Shared connection pool multiplexer

to data s tores

Page 28: Scaling DB Access for 100s of billions of queries per day · 25.06.2019  · Microservice connect request request request disconnect request response Typical Database Access by Application

HERAMicroservice-

First

DB proxy

28

Principles and constraints

• Correctness

• Resiliency, -ilities, performance

• Minimal configuration

• Thin client

• Minimal change to wire protocol

Page 29: Scaling DB Access for 100s of billions of queries per day · 25.06.2019  · Microservice connect request request request disconnect request response Typical Database Access by Application

29

Let’s talk about other features!

Page 30: Scaling DB Access for 100s of billions of queries per day · 25.06.2019  · Microservice connect request request request disconnect request response Typical Database Access by Application

Scale

Features

30

• Multiplexing

• Read write split

• Sharding

Page 31: Scaling DB Access for 100s of billions of queries per day · 25.06.2019  · Microservice connect request request request disconnect request response Typical Database Access by Application

Read-Write Split

©2019 PayPal Inc. Confidential and proprietary. 31

RW

RW RW

RW

Oracle RAC - Real

Application Cluster

Each database is exchanging

lock information

Page 32: Scaling DB Access for 100s of billions of queries per day · 25.06.2019  · Microservice connect request request request disconnect request response Typical Database Access by Application

Read-Write Split

©2019 PayPal Inc. Confidential and proprietary. 32

RW RW

Oracle RAC - Real

Application Cluster

R

R R

Hera No de

Separate write queries

and direct to one node

Reduces traffic between

DB nodes

Page 33: Scaling DB Access for 100s of billions of queries per day · 25.06.2019  · Microservice connect request request request disconnect request response Typical Database Access by Application

Sharding

33

• Client vs Server

• Finding the Shard

• Sample Query Run

• Convert Legacy App

• Move Scuttle Bin’s Data

Page 34: Scaling DB Access for 100s of billions of queries per day · 25.06.2019  · Microservice connect request request request disconnect request response Typical Database Access by Application

Sharding: Client vs Server

©2019 PayPal Inc. Confidential and proprietary. 34

C++ Client

C++ Lib

Database (shard 0) Database (shard 1) Database (shard 2)

Page 35: Scaling DB Access for 100s of billions of queries per day · 25.06.2019  · Microservice connect request request request disconnect request response Typical Database Access by Application

Sharding: Same Logic in Java Client

©2019 PayPal Inc. Confidential and proprietary. 35

C++ Client

C++ Lib

Database (shard 0) Database (shard 1) Database (shard 2)

Java Client

Java Lib

Page 36: Scaling DB Access for 100s of billions of queries per day · 25.06.2019  · Microservice connect request request request disconnect request response Typical Database Access by Application

Sharding: Duplicate Logic in Go Client

©2019 PayPal Inc. Confidential and proprietary. 36

C++ Client

C++ Lib

Database (shard 0) Database (shard 1) Database (shard 2)

Java Client Go Client

Java Lib Go Lib

Page 37: Scaling DB Access for 100s of billions of queries per day · 25.06.2019  · Microservice connect request request request disconnect request response Typical Database Access by Application

Sharding: Copy Logic in Python

©2019 PayPal Inc. Confidential and proprietary. 37

C++ Client

C++ Lib

Database (shard 0) Database (shard 1) Database (shard 2)

Java Client Go Client

Java Lib Go Lib

Python Client

Python Lib

Page 38: Scaling DB Access for 100s of billions of queries per day · 25.06.2019  · Microservice connect request request request disconnect request response Typical Database Access by Application

Sharding: Node.js Client Too

©2019 PayPal Inc. Confidential and proprietary.38

Node.js ClientC++ Client

C++ Lib

Database (shard 1) Database (shard 2)

Java Client Go Client

Java Lib Go Lib

Python Client

Python Lib Node.js Lib

Database (shard 0)

Page 39: Scaling DB Access for 100s of billions of queries per day · 25.06.2019  · Microservice connect request request request disconnect request response Typical Database Access by Application

WorkerWorker

WorkerWorker

WorkerWorker

WorkerWorker

WorkerWorker

WorkerWorker

Node.js Client

Sharding: Keep Logic in Server

© 2019 PayPal Inc. Confidential and proprietary. 39

Hera Multiplexer

C++ Client

C++ Lib

Database (shard 0) Database (shard 1) Database (shard 2)

Java Client Go Client

Hera Java Lib Hera Go Lib

Python Client

Python Lib Node.js Lib

Page 40: Scaling DB Access for 100s of billions of queries per day · 25.06.2019  · Microservice connect request request request disconnect request response Typical Database Access by Application

Sharding: Implementation

©2019 PayPal Inc. Confidential and proprietary. 40

1. Shard Key: account_id=2000

Sha rd 0 Sha rd 1 Sha rd M

Hash function (shard key value) %K

Scuttle bin to logical shard mapping

Logical to physical shard mapping

Shard Key

DB Shard

0

DB Shard

N

Page 41: Scaling DB Access for 100s of billions of queries per day · 25.06.2019  · Microservice connect request request request disconnect request response Typical Database Access by Application

©2019 PayPal Inc. Confidential and proprietary. 41

Sha rd 0 Sha rd 1 Sha rd M

Hash function (shard key value) %K

Scuttle bin to logical shard mapping

Logical to physical shard mapping

Shard Key

DB Shard

0

DB Shard

N

1. Shard Key: account_id=2000

2. Murmur3Hash(2000)%1024=280

Sharding: Implementation

Page 42: Scaling DB Access for 100s of billions of queries per day · 25.06.2019  · Microservice connect request request request disconnect request response Typical Database Access by Application

©2019 PayPal Inc. Confidential and proprietary. 42

Sha rd 0 Sha rd 1 Sha rd M

Hash function (shard key value) %K

Scuttle bin to logical shard mapping

Logical to physical shard mapping

Shard Key

DB Shard

0

DB Shard

N

1. Shard Key: account_id=2000

2. Murmur3Hash(2000)%1024=280

3. Select shard_id from

hera_shard_map where

scuttle_id = 280;

shard_id

---------

1

Sharding: Implementation

Page 43: Scaling DB Access for 100s of billions of queries per day · 25.06.2019  · Microservice connect request request request disconnect request response Typical Database Access by Application

©2019 PayPal Inc. Confidential and proprietary. 43

Sha rd 0 Sha rd 1 Sha rd M

Hash function (shard key value) %K

Scuttle bin to logical shard mapping

Logical to physical shard mapping

Shard Key

DB Shard

0

DB Shard

N

1. Shard Key: account_id=2000

2. Murmur3Hash(2000)%1024=280

3. Select shard_id from

hera_shard_map where scuttle_id =

280;

shard_id

---------

1

4. TWO_TASK_1 = LOAN_SH1

TWO_TASK_1=‘tcp(loan-sh1:

3306)/loan’

Sharding: Implementation

Page 44: Scaling DB Access for 100s of billions of queries per day · 25.06.2019  · Microservice connect request request request disconnect request response Typical Database Access by Application

Sharding: Converting Application Query

- Shard Key: account_id

- select * from loan, appfile where loan.id = ? and appfile.loan_id = loan.id

©2019 PayPal Inc. Confidential and proprietary. 44

Page 45: Scaling DB Access for 100s of billions of queries per day · 25.06.2019  · Microservice connect request request request disconnect request response Typical Database Access by Application

Sharding Compatible Query

- Shard Key: account_id

- select * from loan, appfile where loan.id = ? and appfile.loan_id = loan.id and loan.account_id = ? and appfile.account_id = loan.account_id

©2019 PayPal Inc. Confidential and proprietary. 45

Page 46: Scaling DB Access for 100s of billions of queries per day · 25.06.2019  · Microservice connect request request request disconnect request response Typical Database Access by Application

Sharding: Hera JDBC SQL Rewrite

- Shard Key: account_id

- select * from loan, appfile where loan.id = :loan_id and appfile.loan_id = loan.id and loan.account_id = :account_id and appfile.account_id = loan.account_id

- Hera can now pick out the shard key name and value

©2019 PayPal Inc. Confidential and proprietary. 46

Page 47: Scaling DB Access for 100s of billions of queries per day · 25.06.2019  · Microservice connect request request request disconnect request response Typical Database Access by Application

Sharding: Legacy App Conversion

©2019 PayPal Inc. Confidential and proprietary. 47

• All queries directed to shard 0

• Logs queries that don’t bind to shard

key

Sha rd 0

Hash function (shard key value) %K

Scuttle bin to logical shard mapping

Logical to physical shard mapping

Shard Key

DB Shard

0

DB Shard

N

Page 48: Scaling DB Access for 100s of billions of queries per day · 25.06.2019  · Microservice connect request request request disconnect request response Typical Database Access by Application

Sharding: Legacy App Conversion

©2019 PayPal Inc. Confidential and proprietary. 48

• Whitelist sends one value to a

specific shard

• Limits risk of failures to 1 value

• Hera uses Shard 1, but same DB

• Fast rollback on errors

• Repeat for larger sets

Sha rd 0

Hash function (shard key value)

%K

Scuttle bin to logical shard

mapping

Logical to physical shard mapping

Shard

Key

DB Shard

0

DB Shard

N

Whitelist

Sha rd 1

Page 49: Scaling DB Access for 100s of billions of queries per day · 25.06.2019  · Microservice connect request request request disconnect request response Typical Database Access by Application

Sharding: Physical Whitelist

©2019 PayPal Inc. Confidential and proprietary. 49

• Validates permissions and data

copy

Sha rd 0

Hash function (shard key value)

%K

Scuttle bin to logical shard

mapping

Logical to physical shard mapping

Shard

Key

DB Shard

0

DB Shard

N

Whitelist

Sha rd 1

Page 50: Scaling DB Access for 100s of billions of queries per day · 25.06.2019  · Microservice connect request request request disconnect request response Typical Database Access by Application

Sharding: Logical Move for One Scuttle Bin

©2019 PayPal Inc. Confidential and proprietary. 50

• If successful, do a physical data move next

Sha rd 0

Hash function (shard key value)

%K

Scuttle bin to logical shard

mapping

Logical to physical shard mapping

Shard

Key

DB Shard

0

DB Shard

N

Whitelist

Sha rd 1

Page 51: Scaling DB Access for 100s of billions of queries per day · 25.06.2019  · Microservice connect request request request disconnect request response Typical Database Access by Application

Sharding: Moving Scuttle Bin

©2019 PayPal Inc. Confidential and proprietary. 51

• Start data copy

• Typically, tables are partitioned by

scuttle bin

Sha rd 0 Sha rd 1 Sha rd M

Hash function (shard key value) %K

Scuttle bin to logical shard mapping

Logical to physical shard mapping

Shard Key

DB Shard

0

DB Shard

N

Page 52: Scaling DB Access for 100s of billions of queries per day · 25.06.2019  · Microservice connect request request request disconnect request response Typical Database Access by Application

Sharding: Moving Scuttle Bin

©2019 PayPal Inc. Confidential and proprietary. 52

• Start data copy

• Typically, tables are partitioned by

scuttle bin

• Block writes

Sha rd 0 Sha rd 1 Sha rd M

Hash function (shard key value) %K

Scuttle bin to logical shard mapping

Logical to physical shard mapping

Shard Key

DB Shard

0

DB Shard

N

Page 53: Scaling DB Access for 100s of billions of queries per day · 25.06.2019  · Microservice connect request request request disconnect request response Typical Database Access by Application

Sharding: Moving Scuttle Bin

©2019 PayPal Inc. Confidential and proprietary. 53

• Start data copy

• Typically, tables are partitioned by

scuttle bin

• Block writes

• Data fully copied

Sha rd 0 Sha rd 1 Sha rd M

Hash function (shard key value) %K

Scuttle bin to logical shard mapping

Logical to physical shard mapping

Shard Key

DB Shard

0

DB Shard

N

Page 54: Scaling DB Access for 100s of billions of queries per day · 25.06.2019  · Microservice connect request request request disconnect request response Typical Database Access by Application

Sharding: Moving Scuttle Bin

©2019 PayPal Inc. Confidential and proprietary. 54

• Start data copy

• Typically, tables are partitioned by

scuttle bin

• Block writes

• Data fully copied

• Update map

• Use scuttle bin in new location

Sha rd 0 Sha rd 1 Sha rd M

Hash function (shard key value) %K

Scuttle bin to logical shard mapping

Logical to physical shard mapping

Shard Key

DB Shard

0

DB Shard

N

Page 55: Scaling DB Access for 100s of billions of queries per day · 25.06.2019  · Microservice connect request request request disconnect request response Typical Database Access by Application

55

Sharding: Take Away

• Queries must bind shard key & value

• Manage risk converting legacy code

• Controlled data redistribution

Page 56: Scaling DB Access for 100s of billions of queries per day · 25.06.2019  · Microservice connect request request request disconnect request response Typical Database Access by Application

Resiliency

Principles

56

• Deal with exceptional

cases

▪ Surge in load

▪ Slow SQL requests

▪ Issues with the DB

Page 57: Scaling DB Access for 100s of billions of queries per day · 25.06.2019  · Microservice connect request request request disconnect request response Typical Database Access by Application

Resiliency

Principles

57

• Deal with exceptional

cases

▪ Surge in load

▪ Slow SQL requests

▪ Issues with the DB

• Limit or avoid impact

• Self heal

Page 58: Scaling DB Access for 100s of billions of queries per day · 25.06.2019  · Microservice connect request request request disconnect request response Typical Database Access by Application

Resiliency

Features

58

• Bouncer

• Surge Queue

• Slow Query Eviction

• Read Replica

• Transparent Application

Failover

Page 59: Scaling DB Access for 100s of billions of queries per day · 25.06.2019  · Microservice connect request request request disconnect request response Typical Database Access by Application

Resiliency: Oversubscription - Bouncer

©2019 PayPal Inc. Confidential and proprietary. 59

Se rv i ce 1

Hera No de

Se rv i ce 2

Se rv i ce 3

Se rv i ce 4

Service 4 connection is

tcp accepted and then

closed without SSL.

Page 60: Scaling DB Access for 100s of billions of queries per day · 25.06.2019  · Microservice connect request request request disconnect request response Typical Database Access by Application

Resiliency: Oversubscription – Surge Queue

©2019 PayPal Inc. Confidential and proprietary. 60

Se rv i ce 1

Hera No de

Se rv i ce 2

Se rv i ce 3

Service 3 can make a

request

It waits--queued

Page 61: Scaling DB Access for 100s of billions of queries per day · 25.06.2019  · Microservice connect request request request disconnect request response Typical Database Access by Application

Resiliency: Oversubscription – Surge Queue

©2019 PayPal Inc. Confidential and proprietary. 61

Se rv i ce 1

Hera No de

Se rv i ce 2

Se rv i ce 3

Generally, 1ms later,

Service 3 gets run.

Page 62: Scaling DB Access for 100s of billions of queries per day · 25.06.2019  · Microservice connect request request request disconnect request response Typical Database Access by Application

Resiliency: Oversubscription – Surge Queue

©2019 PayPal Inc. Confidential and proprietary. 62

Se rv i ce 1

Hera No de

Se rv i ce 2

Se rv i ce 3

Waiting more than 1s gives

an error to Service 3.

Page 63: Scaling DB Access for 100s of billions of queries per day · 25.06.2019  · Microservice connect request request request disconnect request response Typical Database Access by Application

Resiliency: Slow Query Eviction

©2019 PayPal Inc. Confidential and proprietary. 63

Se rv i ce 1

Hera No de

Se rv i ce 2

Se rv i ce 3

1s

Slow Query

When Hera is overloaded

Page 64: Scaling DB Access for 100s of billions of queries per day · 25.06.2019  · Microservice connect request request request disconnect request response Typical Database Access by Application

Resiliency: Slow Query Eviction

©2019 PayPal Inc. Confidential and proprietary. 64

Se rv i ce 1

Hera No de

Se rv i ce 2

Se rv i ce 3

1s

Slow Query

Hundred of queries can run

once slow query is

removed.

Page 65: Scaling DB Access for 100s of billions of queries per day · 25.06.2019  · Microservice connect request request request disconnect request response Typical Database Access by Application

Monitoring Usage

©2019 PayPal Inc. Confidential and proprietary. 65

Se rv i ce 1

Hera No de

Se rv i ce 2

Se rv i ce 3

Each Hera NodeFree DB

Connections

(acpt)

Time

Page 66: Scaling DB Access for 100s of billions of queries per day · 25.06.2019  · Microservice connect request request request disconnect request response Typical Database Access by Application

Resiliency: Eviction – Actual event

©2019 PayPal Inc. Confidential and proprietary. 66

Each Hera NodeFree DB

Connections

(acpt)

Time

Happened few years back, when automatic eviction went unnoticed.

Evicting expensive query makes space for many typical queries.

Page 67: Scaling DB Access for 100s of billions of queries per day · 25.06.2019  · Microservice connect request request request disconnect request response Typical Database Access by Application

Read Replicas

©2019 PayPal Inc. Confidential and proprietary. 67

MUX R1 Pool

Po o l 1

R1 PoolR1 Pool

MUX R1 PoolR1 Pool

R2 Pool

Po o l 2

Hera

Appl icat ion

Read Repl ica 1

Read Repl ica 2

Wri te DB

F a i l o v e r

m o d u l e

B u s i n e s s

l o g i c

R1 & R2: different data replication lag

Page 68: Scaling DB Access for 100s of billions of queries per day · 25.06.2019  · Microservice connect request request request disconnect request response Typical Database Access by Application

Transparent Application Failover – Read Replica Retry

©2019 PayPal Inc. Confidential and proprietary. 68

MUX

R1 PoolR1 Pool

R1 Pool

R1 PoolR1 Pool

R2 Pool

Po o l

Hera

Appl icat ion

B u s i n e s s

l o g i c

Read Repl ica 1

Read Repl ica 2

Write DB

• Build failover on server side

• Also failover when query is slow

Page 69: Scaling DB Access for 100s of billions of queries per day · 25.06.2019  · Microservice connect request request request disconnect request response Typical Database Access by Application

Manageability

69

• Maintenance without customer

impact

• Recommend clients recycle

often

• Oracle RAC maintenance

Page 70: Scaling DB Access for 100s of billions of queries per day · 25.06.2019  · Microservice connect request request request disconnect request response Typical Database Access by Application

Recycle Connections– Node Going Away

©2019 PayPal Inc. Confidential and proprietary. 70

Hera No de

Se rv i ce 2

Se rv i ce 3

Hera No de

Hera No de

TCP Load

Balancer

Page 71: Scaling DB Access for 100s of billions of queries per day · 25.06.2019  · Microservice connect request request request disconnect request response Typical Database Access by Application

Recycle Connections for Manageability

©2019 PayPal Inc. Confidential and proprietary. 71

Hera No de

Se rv i ce 2

Se rv i ce 3

Hera No de

Hera No de

TCP Load

Balancer

Page 72: Scaling DB Access for 100s of billions of queries per day · 25.06.2019  · Microservice connect request request request disconnect request response Typical Database Access by Application

Recycle Connections– Node Returns

©2019 PayPal Inc. Confidential and proprietary. 72

Hera No de

Se rv i ce 2

Se rv i ce 3

Hera No de

TCP Load

Balancer

Hera No de

Page 73: Scaling DB Access for 100s of billions of queries per day · 25.06.2019  · Microservice connect request request request disconnect request response Typical Database Access by Application

DB Maintenance

©2019 PayPal Inc. Confidential and proprietary. 73

RW 2

Oracle RAC - Real

Application Cluster

3

4 1

Hera No de

• Preparing Oracle RAC node 3

maintenance

• DBAs remove node 3 from Oracle

configs

• Insert into hera_maint (inst_id,

status, status_time, module,

machine) values (3, ‘F’, [unix

epoch]+2, [hera pool name], [host]

Page 74: Scaling DB Access for 100s of billions of queries per day · 25.06.2019  · Microservice connect request request request disconnect request response Typical Database Access by Application

DB Maintenance

©2019 PayPal Inc. Confidential and proprietary. 74

RW 2

Oracle RAC -Real

Application Cluster

3

4 1

Hera No de

• After [unix epoch] +2

Take Aways

• Keep controls near those who need

it

• Avoid being unnecessarily involved

Page 75: Scaling DB Access for 100s of billions of queries per day · 25.06.2019  · Microservice connect request request request disconnect request response Typical Database Access by Application

75

Transition to Go

Page 76: Scaling DB Access for 100s of billions of queries per day · 25.06.2019  · Microservice connect request request request disconnect request response Typical Database Access by Application

- Written in C++, asynchronous code

- Scalable

- Efficient

- Complex

©2019 PayPal Inc. Confidential and proprietary. 76

Previous version

Transition to Go

Page 77: Scaling DB Access for 100s of billions of queries per day · 25.06.2019  · Microservice connect request request request disconnect request response Typical Database Access by Application

- Written in C++, asynchronous code

- Scalable

- Efficient

- Complex

©2019 PayPal Inc. Confidential and proprietary. 77

Previous version

Transition to GoNew changes

- New functional requirements

- New non functional requirements (efficiency)

Page 78: Scaling DB Access for 100s of billions of queries per day · 25.06.2019  · Microservice connect request request request disconnect request response Typical Database Access by Application

- Written in C++, asynchronous code

- Scalable

- Efficient

- Complex

©2019 PayPal Inc. Confidential and proprietary. 78

Previous version

Transition to GoNew changes

- New functional requirements

- New non functional requirements (efficiency)

- Go programing is synchronous

- Scale efficiently out of the box

- Does it meet our requirements?

- Is stop-the-world garbage collection time an issue?

How about writing in Go?

Page 79: Scaling DB Access for 100s of billions of queries per day · 25.06.2019  · Microservice connect request request request disconnect request response Typical Database Access by Application

Proof-of-concept

1st Release: No Sharding

Full feature with C++

dependencies

Full feature all Go

©2019 PayPal Inc. Confidential and proprietary. 79

Going with Go

Transition to Go

Page 80: Scaling DB Access for 100s of billions of queries per day · 25.06.2019  · Microservice connect request request request disconnect request response Typical Database Access by Application

- Running in production for 1 year

- Latency and CPU met the requirements

- No clients change

- Had some issues and learnings

©2019 PayPal Inc. Confidential and proprietary. 80

Mission Accomplished

Page 81: Scaling DB Access for 100s of billions of queries per day · 25.06.2019  · Microservice connect request request request disconnect request response Typical Database Access by Application

func routine() {

time.Sleep(time.Second * 10)

}

func main() {

routines := 10000

wg.Add(routines)

for i := 0; i < routines; i++ {

go func() {

routine(i)

wg.Done()

}()

}

wg.Wait()

}

©2019 PayPal Inc. Confidential and proprietary. 81

Spawn 10000 go-routines sleep for 10 seconds to simulate processing

Running on my 2 cores test machine

How many OS threads I expect at runtime?

Learning Example

Page 82: Scaling DB Access for 100s of billions of queries per day · 25.06.2019  · Microservice connect request request request disconnect request response Typical Database Access by Application

func routine() {

time.Sleep(time.Second * 10)

}

func main() {

routines := 10000

wg.Add(routines)

for i := 0; i < routines; i++ {

go func() {

routine(i)

wg.Done()

}()

}

wg.Wait()

}

©2019 PayPal Inc. Confidential and proprietary. 82

Spawn 10000 go-routines sleep for 10 seconds to simulate processing

Running on my 2 cores test machine

Learning Example

$> ./mytest &

[1] 12345

$> ps –L –C mytest –no-headers | wc –l

6

$>

Page 83: Scaling DB Access for 100s of billions of queries per day · 25.06.2019  · Microservice connect request request request disconnect request response Typical Database Access by Application

func routine() {

//time.Sleep(time.Second * 10)

syscall.Select(0, nil, nil, nil, &syscall.Timeval{Sec: 10})

}

func main() {

routines := 10000

wg.Add(routines)

for i := 0; i < routines; i++ {

go func() {

routine(i)

wg.Done()

}()

}

wg.Wait()

}

©2019 PayPal Inc. Confidential and proprietary. 83

Learning ExampleSpawn 10000 go-routines sleep for

10 seconds to simulate processing

Running on my 2 cores test machine

Using system call to sleep how many OS threads I expect at runtime?

Page 84: Scaling DB Access for 100s of billions of queries per day · 25.06.2019  · Microservice connect request request request disconnect request response Typical Database Access by Application

func routine() {

//time.Sleep(time.Second * 10)

syscall.Select(0, nil, nil, nil, &syscall.Timeval{Sec: 10})

}

func main() {

routines := 10000

wg.Add(routines)

for i := 0; i < routines; i++ {

go func() {

routine(i)

wg.Done()

}()

}

wg.Wait()

}

©2019 PayPal Inc. Confidential and proprietary. 84

Learning ExampleSpawn 10000 go-routines sleep for

10 seconds to simulate processing

Running on my 2 cores test machine

$ ./mytest

runtime: program exceeds 10000-thread limit

fatal error: thread exhaustion

runtime stack:

….

Page 85: Scaling DB Access for 100s of billions of queries per day · 25.06.2019  · Microservice connect request request request disconnect request response Typical Database Access by Application

. . . “A Go program creates a new thread only when a goroutine is

ready to run but all the existing threads are blocked in system calls,

cgo calls, or are locked to other goroutines due to use of

runtime.LockOSThread”

©2019 PayPal Inc. Confidential and proprietary. 85

from “runtime/debug” func SetMaxThreads()

Learning Example

Page 86: Scaling DB Access for 100s of billions of queries per day · 25.06.2019  · Microservice connect request request request disconnect request response Typical Database Access by Application

86

Key Take Away • System calls lock the OS

thread. Ex: os.File read

• When using C/C++ library,

be aware of the systems

calls (mutexes, file reads,

socket reads, sleep)

• Golang locks, socket reads,

sleep are not locking the OS

thread

• In your load tests, verify the

number of OS threads

Page 87: Scaling DB Access for 100s of billions of queries per day · 25.06.2019  · Microservice connect request request request disconnect request response Typical Database Access by Application

Open source

github.com/paypal/hera

✓ Code, documentation, examples

✓ Standard clients

• Java: JDBC driver

• Golang: database/sql driver

Google group heradatalink

©2019 PayPal Inc. Confidential and proprietary. 87

github.com/paypal/hera

Internal Mirror

Build

Deploy

PayPal Config

Page 88: Scaling DB Access for 100s of billions of queries per day · 25.06.2019  · Microservice connect request request request disconnect request response Typical Database Access by Application

Acknowledgements

Kamlakar

Mukundan

Liana

Sid

Yaping

Shuping

Stephanie

Varun

Qing

Chun

Dwain

©2019 PayPal Inc. Confidential and proprietary. 88

Paresh

Srini

Dheeraj

Pradeep

Pramod

Samrat

Akash

Ashwin

and countless others

Page 89: Scaling DB Access for 100s of billions of queries per day · 25.06.2019  · Microservice connect request request request disconnect request response Typical Database Access by Application

Next Steps

Need more database connections?

Hera can help scale

• Multiplexing

• Sharding

• Resiliency

• Manageability

©2019 PayPal Inc. Confidential and proprietary. 89

Open Source github.com/paypal/hera

Google Groups – heradatalink

Our Future

• Scaling Tools

• More Clients

Page 90: Scaling DB Access for 100s of billions of queries per day · 25.06.2019  · Microservice connect request request request disconnect request response Typical Database Access by Application

©2019 PayPal Inc. Confidential and proprietary. 90

github.com/paypal/hera Google group: heradatalink