Scaling DB Access for 100s of billions of queries per day · 25.06.2019  · Microservice connect...

Preview:

Citation preview

Scaling DB Access for 100s of billions of queries per day

1

TLDR - Hera

Open Source

High Eff iciency

Reliable Access

to data stores

2

Out Of DB Connections?

• Scaling

• Multiplexing

• Sharding

• Resiliency

• Manageability

• Our switch to Go Lang

About Petrica

• Works @ PayPal

• developing the database proxy.

• database backed publish-subscriber messaging system

• PayPal Here backend

• Golang contributor

• Previously @

• Worked on distributed systems, embedded system, desktop apps

©2019 PayPal Inc. Confidential and proprietary. 3

About Kenneth

• Works @ PayPal

• paypal.de

• PayPal Verisign Two Factor Authentication

• Coded shared memory for sharding

• Migrated to Go (hera)

• Previously @

• Visualization of Genomic Metabolism, Artificial Chemistry, and EPICS

Monitoring

©2019 PayPal Inc. Confidential and proprietary. 4

©2019 PayPal Inc. Confidential and proprietary. 5

About PayPal – 2018 facts

©2019 PayPal Inc. Confidential and proprietary. 6

100+Currencies

200+Markets

267MActive Customer Accounts

9.9BPayments Transactions

$578B Total Payments Volume

Up 27% YoY*Total Payment Volume

2,700Applications

4,500Engineers

17,000Releases

PayPal OLTP Environment scale

©2015 PayPal Inc. Confidential and proprietary. 7

2000+ Database Instances

100+ Billion Calls/day

3000+Database Nodes

74 PB Total Storage

300+ Schemas

10 +Availability Zones

200,000 +Application Servers

1+ Billion Calls/hour

8

Key Take-aways

• Massive scale

• Complexity

• Growth

©2019 PayPal Inc. Confidential and proprietary. 9

Mic ro se rv i ce

connect

request

request

request

disconnect

request

response

Typical Database Access by Application

©2019 PayPal Inc. Confidential and proprietary. 10

Mic ro se rv i ce

connect

request

request

request

disconnect

request

response

100 ms

1 ms

1 ms

1 ms

103 ms

Typical Database Access by Application

©2019 PayPal Inc. Confidential and proprietary. 11

Mic ro se rv i ce

connect

request

request

request

request

response

100 ms

~0 ms

1 ms

1 ms

~3 ms

Po o l

1 ms

Connection is pre-established, cached and re-used for each request

release

lease

Typical Database Access by Application

From Few to Thousands Microservices

©2019 PayPal Inc. Confidential and proprietary. 12

Mic ro se rv i ce

Configure the pool cache large

enough for availability

From Few to Thousands Microservices

©2019 PayPal Inc. Confidential and proprietary. 13

Mic ro se rv i ce

M i c ro se rv i ce

Configure the pool cache large

enough for availability

Add more service when scaling up

©2019 PayPal Inc. Confidential and proprietary. 14

Mic ro se rv i ce

Mic ro se rv i ce

M i c ro se rv i ce

Configure the pool cache large

enough for availability

Add more service when scaling up

From Few to Thousands Microservices

©2019 PayPal Inc. Confidential and proprietary. 15

Mic ro se rv i ce

Mic ro se rv i ce

M i c ro se rv i ce

M i c ro se rv i ce

Configure the pool cache large

enough for availability

Add more service when scaling up

From Few to Thousands Microservices

©2019 PayPal Inc. Confidential and proprietary. 16

Mic ro se rv i ce

Mic ro se rv i ce

Mic ro se rv i ce

M i c ro se rv i ce

M i c ro se rv i ce

Configure the pool cache large

enough for availability

Add more service when scaling up

Hit a connection limit

From Few to Thousands Microservices

17

Solutions?

• Lower the connection pool

size in each microservice

node

• Improve microservice code

to use the connection more

efficiently

• Buy better database

hardware

Partitioned Connection Pool

©2019 PayPal Inc. Confidential and proprietary. 18

Mic ro se rv i ce

M i c ro se rv i ce

Request

Request

While servicing requests, the

microservices use only some

DB connections

Partitioned Connection Pool

©2019 PayPal Inc. Confidential and proprietary. 19

Mic ro se rv i ce

M i c ro se rv i ce

Request

Request

While servicing requests, the

microservices use only some

DB connections

In some node all the

connections can be temporary

busy

Partitioned Connection Pool

©2019 PayPal Inc. Confidential and proprietary. 20

Mic ro se rv i ce

While servicing requests, the

microservices use only some

DB connections

In some node all the

connections can be temporary

busy

Client requests will incur

latency though the database

has enough capacity

Mic ro se rv i ce

Request

Request

21

Solutions?

Scale with shared pool !!!

S h a r e d c o n n e c t i o n p o o l

©2019 PayPal Inc. Confidential and proprietary. 22

Mic ro se rv i ce

M i c ro se rv i ce

Request

Request

Shared pool

©2019 PayPal Inc. Confidential and proprietary. 23

Se rv i ce

Service request

Service

response

SQL request

SQL request

SQL request

SQL request

DB connection

Database connection is

allocated but mostly idle

Typical service request

©2019 PayPal Inc. Confidential and proprietary. 24

Se rv i ce

Service request

Service

response

SQL request

SQL request

SQL request

SQL request

SQL request

SQL request

SQL request

DB connection

Serving other requests when client seems idle?

Multiplexing

©2019 PayPal Inc. Confidential and proprietary. 25

Mic ro se rv i ce

M i c ro se rv i ce

Request

RequestSha red Po o l

Services perceive that they have a database connection

Multiplexing

©2019 PayPal Inc. Confidential and proprietary. 26

Mic ro se rv i ce

M i c ro se rv i ce

Request

RequestSha red Po o l

Services perceive that they have a database connection

Hera – Highly Efficient Reliable Access

©2019 PayPal Inc. Confidential and proprietary. 27

Mic ro se rv i ce

M i c ro se rv i ce

Request

Request

Hera

Shared connection pool multiplexer

to data s tores

HERAMicroservice-

First

DB proxy

28

Principles and constraints

• Correctness

• Resiliency, -ilities, performance

• Minimal configuration

• Thin client

• Minimal change to wire protocol

29

Let’s talk about other features!

Scale

Features

30

• Multiplexing

• Read write split

• Sharding

Read-Write Split

©2019 PayPal Inc. Confidential and proprietary. 31

RW

RW RW

RW

Oracle RAC - Real

Application Cluster

Each database is exchanging

lock information

Read-Write Split

©2019 PayPal Inc. Confidential and proprietary. 32

RW RW

Oracle RAC - Real

Application Cluster

R

R R

Hera No de

Separate write queries

and direct to one node

Reduces traffic between

DB nodes

Sharding

33

• Client vs Server

• Finding the Shard

• Sample Query Run

• Convert Legacy App

• Move Scuttle Bin’s Data

Sharding: Client vs Server

©2019 PayPal Inc. Confidential and proprietary. 34

C++ Client

C++ Lib

Database (shard 0) Database (shard 1) Database (shard 2)

Sharding: Same Logic in Java Client

©2019 PayPal Inc. Confidential and proprietary. 35

C++ Client

C++ Lib

Database (shard 0) Database (shard 1) Database (shard 2)

Java Client

Java Lib

Sharding: Duplicate Logic in Go Client

©2019 PayPal Inc. Confidential and proprietary. 36

C++ Client

C++ Lib

Database (shard 0) Database (shard 1) Database (shard 2)

Java Client Go Client

Java Lib Go Lib

Sharding: Copy Logic in Python

©2019 PayPal Inc. Confidential and proprietary. 37

C++ Client

C++ Lib

Database (shard 0) Database (shard 1) Database (shard 2)

Java Client Go Client

Java Lib Go Lib

Python Client

Python Lib

Sharding: Node.js Client Too

©2019 PayPal Inc. Confidential and proprietary.38

Node.js ClientC++ Client

C++ Lib

Database (shard 1) Database (shard 2)

Java Client Go Client

Java Lib Go Lib

Python Client

Python Lib Node.js Lib

Database (shard 0)

WorkerWorker

WorkerWorker

WorkerWorker

WorkerWorker

WorkerWorker

WorkerWorker

Node.js Client

Sharding: Keep Logic in Server

© 2019 PayPal Inc. Confidential and proprietary. 39

Hera Multiplexer

C++ Client

C++ Lib

Database (shard 0) Database (shard 1) Database (shard 2)

Java Client Go Client

Hera Java Lib Hera Go Lib

Python Client

Python Lib Node.js Lib

Sharding: Implementation

©2019 PayPal Inc. Confidential and proprietary. 40

1. Shard Key: account_id=2000

Sha rd 0 Sha rd 1 Sha rd M

Hash function (shard key value) %K

Scuttle bin to logical shard mapping

Logical to physical shard mapping

Shard Key

DB Shard

0

DB Shard

N

©2019 PayPal Inc. Confidential and proprietary. 41

Sha rd 0 Sha rd 1 Sha rd M

Hash function (shard key value) %K

Scuttle bin to logical shard mapping

Logical to physical shard mapping

Shard Key

DB Shard

0

DB Shard

N

1. Shard Key: account_id=2000

2. Murmur3Hash(2000)%1024=280

Sharding: Implementation

©2019 PayPal Inc. Confidential and proprietary. 42

Sha rd 0 Sha rd 1 Sha rd M

Hash function (shard key value) %K

Scuttle bin to logical shard mapping

Logical to physical shard mapping

Shard Key

DB Shard

0

DB Shard

N

1. Shard Key: account_id=2000

2. Murmur3Hash(2000)%1024=280

3. Select shard_id from

hera_shard_map where

scuttle_id = 280;

shard_id

---------

1

Sharding: Implementation

©2019 PayPal Inc. Confidential and proprietary. 43

Sha rd 0 Sha rd 1 Sha rd M

Hash function (shard key value) %K

Scuttle bin to logical shard mapping

Logical to physical shard mapping

Shard Key

DB Shard

0

DB Shard

N

1. Shard Key: account_id=2000

2. Murmur3Hash(2000)%1024=280

3. Select shard_id from

hera_shard_map where scuttle_id =

280;

shard_id

---------

1

4. TWO_TASK_1 = LOAN_SH1

TWO_TASK_1=‘tcp(loan-sh1:

3306)/loan’

Sharding: Implementation

Sharding: Converting Application Query

- Shard Key: account_id

- select * from loan, appfile where loan.id = ? and appfile.loan_id = loan.id

©2019 PayPal Inc. Confidential and proprietary. 44

Sharding Compatible Query

- Shard Key: account_id

- select * from loan, appfile where loan.id = ? and appfile.loan_id = loan.id and loan.account_id = ? and appfile.account_id = loan.account_id

©2019 PayPal Inc. Confidential and proprietary. 45

Sharding: Hera JDBC SQL Rewrite

- Shard Key: account_id

- select * from loan, appfile where loan.id = :loan_id and appfile.loan_id = loan.id and loan.account_id = :account_id and appfile.account_id = loan.account_id

- Hera can now pick out the shard key name and value

©2019 PayPal Inc. Confidential and proprietary. 46

Sharding: Legacy App Conversion

©2019 PayPal Inc. Confidential and proprietary. 47

• All queries directed to shard 0

• Logs queries that don’t bind to shard

key

Sha rd 0

Hash function (shard key value) %K

Scuttle bin to logical shard mapping

Logical to physical shard mapping

Shard Key

DB Shard

0

DB Shard

N

Sharding: Legacy App Conversion

©2019 PayPal Inc. Confidential and proprietary. 48

• Whitelist sends one value to a

specific shard

• Limits risk of failures to 1 value

• Hera uses Shard 1, but same DB

• Fast rollback on errors

• Repeat for larger sets

Sha rd 0

Hash function (shard key value)

%K

Scuttle bin to logical shard

mapping

Logical to physical shard mapping

Shard

Key

DB Shard

0

DB Shard

N

Whitelist

Sha rd 1

Sharding: Physical Whitelist

©2019 PayPal Inc. Confidential and proprietary. 49

• Validates permissions and data

copy

Sha rd 0

Hash function (shard key value)

%K

Scuttle bin to logical shard

mapping

Logical to physical shard mapping

Shard

Key

DB Shard

0

DB Shard

N

Whitelist

Sha rd 1

Sharding: Logical Move for One Scuttle Bin

©2019 PayPal Inc. Confidential and proprietary. 50

• If successful, do a physical data move next

Sha rd 0

Hash function (shard key value)

%K

Scuttle bin to logical shard

mapping

Logical to physical shard mapping

Shard

Key

DB Shard

0

DB Shard

N

Whitelist

Sha rd 1

Sharding: Moving Scuttle Bin

©2019 PayPal Inc. Confidential and proprietary. 51

• Start data copy

• Typically, tables are partitioned by

scuttle bin

Sha rd 0 Sha rd 1 Sha rd M

Hash function (shard key value) %K

Scuttle bin to logical shard mapping

Logical to physical shard mapping

Shard Key

DB Shard

0

DB Shard

N

Sharding: Moving Scuttle Bin

©2019 PayPal Inc. Confidential and proprietary. 52

• Start data copy

• Typically, tables are partitioned by

scuttle bin

• Block writes

Sha rd 0 Sha rd 1 Sha rd M

Hash function (shard key value) %K

Scuttle bin to logical shard mapping

Logical to physical shard mapping

Shard Key

DB Shard

0

DB Shard

N

Sharding: Moving Scuttle Bin

©2019 PayPal Inc. Confidential and proprietary. 53

• Start data copy

• Typically, tables are partitioned by

scuttle bin

• Block writes

• Data fully copied

Sha rd 0 Sha rd 1 Sha rd M

Hash function (shard key value) %K

Scuttle bin to logical shard mapping

Logical to physical shard mapping

Shard Key

DB Shard

0

DB Shard

N

Sharding: Moving Scuttle Bin

©2019 PayPal Inc. Confidential and proprietary. 54

• Start data copy

• Typically, tables are partitioned by

scuttle bin

• Block writes

• Data fully copied

• Update map

• Use scuttle bin in new location

Sha rd 0 Sha rd 1 Sha rd M

Hash function (shard key value) %K

Scuttle bin to logical shard mapping

Logical to physical shard mapping

Shard Key

DB Shard

0

DB Shard

N

55

Sharding: Take Away

• Queries must bind shard key & value

• Manage risk converting legacy code

• Controlled data redistribution

Resiliency

Principles

56

• Deal with exceptional

cases

▪ Surge in load

▪ Slow SQL requests

▪ Issues with the DB

Resiliency

Principles

57

• Deal with exceptional

cases

▪ Surge in load

▪ Slow SQL requests

▪ Issues with the DB

• Limit or avoid impact

• Self heal

Resiliency

Features

58

• Bouncer

• Surge Queue

• Slow Query Eviction

• Read Replica

• Transparent Application

Failover

Resiliency: Oversubscription - Bouncer

©2019 PayPal Inc. Confidential and proprietary. 59

Se rv i ce 1

Hera No de

Se rv i ce 2

Se rv i ce 3

Se rv i ce 4

Service 4 connection is

tcp accepted and then

closed without SSL.

Resiliency: Oversubscription – Surge Queue

©2019 PayPal Inc. Confidential and proprietary. 60

Se rv i ce 1

Hera No de

Se rv i ce 2

Se rv i ce 3

Service 3 can make a

request

It waits--queued

Resiliency: Oversubscription – Surge Queue

©2019 PayPal Inc. Confidential and proprietary. 61

Se rv i ce 1

Hera No de

Se rv i ce 2

Se rv i ce 3

Generally, 1ms later,

Service 3 gets run.

Resiliency: Oversubscription – Surge Queue

©2019 PayPal Inc. Confidential and proprietary. 62

Se rv i ce 1

Hera No de

Se rv i ce 2

Se rv i ce 3

Waiting more than 1s gives

an error to Service 3.

Resiliency: Slow Query Eviction

©2019 PayPal Inc. Confidential and proprietary. 63

Se rv i ce 1

Hera No de

Se rv i ce 2

Se rv i ce 3

1s

Slow Query

When Hera is overloaded

Resiliency: Slow Query Eviction

©2019 PayPal Inc. Confidential and proprietary. 64

Se rv i ce 1

Hera No de

Se rv i ce 2

Se rv i ce 3

1s

Slow Query

Hundred of queries can run

once slow query is

removed.

Monitoring Usage

©2019 PayPal Inc. Confidential and proprietary. 65

Se rv i ce 1

Hera No de

Se rv i ce 2

Se rv i ce 3

Each Hera NodeFree DB

Connections

(acpt)

Time

Resiliency: Eviction – Actual event

©2019 PayPal Inc. Confidential and proprietary. 66

Each Hera NodeFree DB

Connections

(acpt)

Time

Happened few years back, when automatic eviction went unnoticed.

Evicting expensive query makes space for many typical queries.

Read Replicas

©2019 PayPal Inc. Confidential and proprietary. 67

MUX R1 Pool

Po o l 1

R1 PoolR1 Pool

MUX R1 PoolR1 Pool

R2 Pool

Po o l 2

Hera

Appl icat ion

Read Repl ica 1

Read Repl ica 2

Wri te DB

F a i l o v e r

m o d u l e

B u s i n e s s

l o g i c

R1 & R2: different data replication lag

Transparent Application Failover – Read Replica Retry

©2019 PayPal Inc. Confidential and proprietary. 68

MUX

R1 PoolR1 Pool

R1 Pool

R1 PoolR1 Pool

R2 Pool

Po o l

Hera

Appl icat ion

B u s i n e s s

l o g i c

Read Repl ica 1

Read Repl ica 2

Write DB

• Build failover on server side

• Also failover when query is slow

Manageability

69

• Maintenance without customer

impact

• Recommend clients recycle

often

• Oracle RAC maintenance

Recycle Connections– Node Going Away

©2019 PayPal Inc. Confidential and proprietary. 70

Hera No de

Se rv i ce 2

Se rv i ce 3

Hera No de

Hera No de

TCP Load

Balancer

Recycle Connections for Manageability

©2019 PayPal Inc. Confidential and proprietary. 71

Hera No de

Se rv i ce 2

Se rv i ce 3

Hera No de

Hera No de

TCP Load

Balancer

Recycle Connections– Node Returns

©2019 PayPal Inc. Confidential and proprietary. 72

Hera No de

Se rv i ce 2

Se rv i ce 3

Hera No de

TCP Load

Balancer

Hera No de

DB Maintenance

©2019 PayPal Inc. Confidential and proprietary. 73

RW 2

Oracle RAC - Real

Application Cluster

3

4 1

Hera No de

• Preparing Oracle RAC node 3

maintenance

• DBAs remove node 3 from Oracle

configs

• Insert into hera_maint (inst_id,

status, status_time, module,

machine) values (3, ‘F’, [unix

epoch]+2, [hera pool name], [host]

DB Maintenance

©2019 PayPal Inc. Confidential and proprietary. 74

RW 2

Oracle RAC -Real

Application Cluster

3

4 1

Hera No de

• After [unix epoch] +2

Take Aways

• Keep controls near those who need

it

• Avoid being unnecessarily involved

75

Transition to Go

- Written in C++, asynchronous code

- Scalable

- Efficient

- Complex

©2019 PayPal Inc. Confidential and proprietary. 76

Previous version

Transition to Go

- Written in C++, asynchronous code

- Scalable

- Efficient

- Complex

©2019 PayPal Inc. Confidential and proprietary. 77

Previous version

Transition to GoNew changes

- New functional requirements

- New non functional requirements (efficiency)

- Written in C++, asynchronous code

- Scalable

- Efficient

- Complex

©2019 PayPal Inc. Confidential and proprietary. 78

Previous version

Transition to GoNew changes

- New functional requirements

- New non functional requirements (efficiency)

- Go programing is synchronous

- Scale efficiently out of the box

- Does it meet our requirements?

- Is stop-the-world garbage collection time an issue?

How about writing in Go?

Proof-of-concept

1st Release: No Sharding

Full feature with C++

dependencies

Full feature all Go

©2019 PayPal Inc. Confidential and proprietary. 79

Going with Go

Transition to Go

- Running in production for 1 year

- Latency and CPU met the requirements

- No clients change

- Had some issues and learnings

©2019 PayPal Inc. Confidential and proprietary. 80

Mission Accomplished

func routine() {

time.Sleep(time.Second * 10)

}

func main() {

routines := 10000

wg.Add(routines)

for i := 0; i < routines; i++ {

go func() {

routine(i)

wg.Done()

}()

}

wg.Wait()

}

©2019 PayPal Inc. Confidential and proprietary. 81

Spawn 10000 go-routines sleep for 10 seconds to simulate processing

Running on my 2 cores test machine

How many OS threads I expect at runtime?

Learning Example

func routine() {

time.Sleep(time.Second * 10)

}

func main() {

routines := 10000

wg.Add(routines)

for i := 0; i < routines; i++ {

go func() {

routine(i)

wg.Done()

}()

}

wg.Wait()

}

©2019 PayPal Inc. Confidential and proprietary. 82

Spawn 10000 go-routines sleep for 10 seconds to simulate processing

Running on my 2 cores test machine

Learning Example

$> ./mytest &

[1] 12345

$> ps –L –C mytest –no-headers | wc –l

6

$>

func routine() {

//time.Sleep(time.Second * 10)

syscall.Select(0, nil, nil, nil, &syscall.Timeval{Sec: 10})

}

func main() {

routines := 10000

wg.Add(routines)

for i := 0; i < routines; i++ {

go func() {

routine(i)

wg.Done()

}()

}

wg.Wait()

}

©2019 PayPal Inc. Confidential and proprietary. 83

Learning ExampleSpawn 10000 go-routines sleep for

10 seconds to simulate processing

Running on my 2 cores test machine

Using system call to sleep how many OS threads I expect at runtime?

func routine() {

//time.Sleep(time.Second * 10)

syscall.Select(0, nil, nil, nil, &syscall.Timeval{Sec: 10})

}

func main() {

routines := 10000

wg.Add(routines)

for i := 0; i < routines; i++ {

go func() {

routine(i)

wg.Done()

}()

}

wg.Wait()

}

©2019 PayPal Inc. Confidential and proprietary. 84

Learning ExampleSpawn 10000 go-routines sleep for

10 seconds to simulate processing

Running on my 2 cores test machine

$ ./mytest

runtime: program exceeds 10000-thread limit

fatal error: thread exhaustion

runtime stack:

….

. . . “A Go program creates a new thread only when a goroutine is

ready to run but all the existing threads are blocked in system calls,

cgo calls, or are locked to other goroutines due to use of

runtime.LockOSThread”

©2019 PayPal Inc. Confidential and proprietary. 85

from “runtime/debug” func SetMaxThreads()

Learning Example

86

Key Take Away • System calls lock the OS

thread. Ex: os.File read

• When using C/C++ library,

be aware of the systems

calls (mutexes, file reads,

socket reads, sleep)

• Golang locks, socket reads,

sleep are not locking the OS

thread

• In your load tests, verify the

number of OS threads

Open source

github.com/paypal/hera

✓ Code, documentation, examples

✓ Standard clients

• Java: JDBC driver

• Golang: database/sql driver

Google group heradatalink

©2019 PayPal Inc. Confidential and proprietary. 87

github.com/paypal/hera

Internal Mirror

Build

Deploy

PayPal Config

Acknowledgements

Kamlakar

Mukundan

Liana

Sid

Yaping

Shuping

Stephanie

Varun

Qing

Chun

Dwain

©2019 PayPal Inc. Confidential and proprietary. 88

Paresh

Srini

Dheeraj

Pradeep

Pramod

Samrat

Akash

Ashwin

and countless others

Next Steps

Need more database connections?

Hera can help scale

• Multiplexing

• Sharding

• Resiliency

• Manageability

©2019 PayPal Inc. Confidential and proprietary. 89

Open Source github.com/paypal/hera

Google Groups – heradatalink

Our Future

• Scaling Tools

• More Clients

©2019 PayPal Inc. Confidential and proprietary. 90

github.com/paypal/hera Google group: heradatalink

Recommended