27
Constraints of Highly Scalable Databases

Lightning talk: highly scalabe databases and the PACELC theorem

Embed Size (px)

Citation preview

Page 1: Lightning talk: highly scalabe databases and the PACELC theorem

Constraints of

Highly Scalable

Databases

Page 2: Lightning talk: highly scalabe databases and the PACELC theorem

1.

Traditional Databases

Recap of the ACID constraints

Page 3: Lightning talk: highly scalabe databases and the PACELC theorem

“Traditional” databases operate with the Transaction

paradigm that guarantees certain properties

� (A) Atomicity� (C) Consistency� (I) Isolation� (D) Durability

Page 4: Lightning talk: highly scalabe databases and the PACELC theorem

The ACID Guarantees

1. Atomicity

Each transaction must be “all or nothing” - if any part fails, the whole transaction must be rolled back as if it never happened.

2. Consistency

The end-state of a transaction must follow all the rules defined in the database: data constraints, cascades, triggers etc.

3. Isolation

The result of 2 concurrent operations should be the same as if they occurred in sequential order.

4. Durability

A transaction, once committed, will survive permanently even if the system fails.

This includes disk crashes, power outages, etc.

Page 5: Lightning talk: highly scalabe databases and the PACELC theorem

Locking

● Read / write / range locks

How do they do this?

Concurrency Control

● 2-phase commit (2PC), 3PC protocols

● Distributed locks

Page 6: Lightning talk: highly scalabe databases and the PACELC theorem

But then came the 2000s

Page 7: Lightning talk: highly scalabe databases and the PACELC theorem

And Scale Happened

Page 8: Lightning talk: highly scalabe databases and the PACELC theorem

Traditional RDBMSs were not designed for

the needs of modern web applications

Global Scale

Netflix knows which movies you watched, when, at what point(s) you paused and for how long, etc.

It then replicates that data across 3 global data centers.

Volume

In 2008, Facebook had only 100 million users and needed 8,000 shards of MySQL.

Today it has ~ 1.86 Billion users.

Speed

In 2013 Twitter was recording 150,000 new tweets/second every single day.

Page 9: Lightning talk: highly scalabe databases and the PACELC theorem

What to do?

Scale up! (?)- Increase memory, cores, CPU- Cache reads with memcached- Master-slave replication- Sharding

Page 10: Lightning talk: highly scalabe databases and the PACELC theorem

NOT

ENOUGH

Page 11: Lightning talk: highly scalabe databases and the PACELC theorem

2.

Redefining Constraints

Replacing ACID with BASE

Page 12: Lightning talk: highly scalabe databases and the PACELC theorem

“DMBS research is about ACID

(mostly). But we forfeit “C” and “I” for availability, graceful

degradation, and performance. This tradeoff is fundamental.

- Eric Brewer, 2000

Page 13: Lightning talk: highly scalabe databases and the PACELC theorem

Eric Brewer proposed a new set of properties: BASE

Soft StateBasically Available

Eventual consistency

System is always available for clients (but may not be consistent)

Database is no longer in charge of “valid” data state. The app is now responsible.

If all goes well, all clients will eventually see the same thing. Probably.

Page 14: Lightning talk: highly scalabe databases and the PACELC theorem

In the world of BASE parameters,

A different set of priorities rule

� Availability is most important� Weak consistency (i.e. stale data) is okay� Approximate answers are okay� Aggressive (optimistic) algorithms are okay� Simple, fast, easy evolution of the schema is important

Page 15: Lightning talk: highly scalabe databases and the PACELC theorem

A new set of constraints:

the CAP Theorem

It is impossible for a distributed computer system to simultaneously provide more than 2 of these 3 guarantees:� Consistency� Availability� Partition tolerance

(Eric Brewer, 1998-2000)

Page 16: Lightning talk: highly scalabe databases and the PACELC theorem

The CAP Parameters

1. Consistency*

All clients get the same view of the data, or they get an error

(i.e. every read receives the most recent write)

2. Availability

All clients can always read and always write

(i.e. every request receives a non-error response)

3. Partition tolerance

The system functions even if some nodes are unavailable

(i.e. system operates despite an arbitrary number of messages being dropped by the network between nodes)

Page 17: Lightning talk: highly scalabe databases and the PACELC theorem
Page 18: Lightning talk: highly scalabe databases and the PACELC theorem

All NoSQL databases live somewhere on this

spectrum, based on how they’re tuned

ACID BASE

● What levels of availability do you choose to provide?● What levels of consistency do you choose to provide?● What do you do when a partition is detected?● How do you recover from a partition event?

Page 19: Lightning talk: highly scalabe databases and the PACELC theorem

But wait…

we’re not

through yet

Page 20: Lightning talk: highly scalabe databases and the PACELC theorem

2010: Daniel Abadi (Yale) says CAP is misleading

The trade-offs defined by CAP’s “pick any 2” are misleading:

● The only time you need to make a trade-off is when there is a partition event (P)

● Systems that sacrifice C must do so all the time● But systems that sacrifice A only need to do so when

there’s a partition

Most importantly, you don’t give up C to gain A

You give up C to get another missing ingredient: L

Page 21: Lightning talk: highly scalabe databases and the PACELC theorem

LATENCY

Latency = how long must a client request wait for your response?

Page 22: Lightning talk: highly scalabe databases and the PACELC theorem

Imagine replicating data across global data centers

Data Center 1

Data Center 2

Data Center 3

Data Center 4Data Center n

Data Center 5

Page 23: Lightning talk: highly scalabe databases and the PACELC theorem

“A high availability requirement implies that the system must replicate data.

But as soon as a distributed system replicates data, a tradeoff between

consistency and latency arises.

- Abadi, 2010

Page 24: Lightning talk: highly scalabe databases and the PACELC theorem

The PACELC theorem (Abadi, 2010)

In a system that replicates data:� If a partition (P) is detected, how does the system trade off

○ (A) Availability or○ (C) Consistency

� Else (E) how does the system trade off○ (L) Latency or○ (C) Consistency

Page 25: Lightning talk: highly scalabe databases and the PACELC theorem

DDBS P+A P+C E+L E+C

Dynamo, Cassandra,

Riak

Mongo, H-Store, VoltDb

Yahoo! PNUTS

Comparing NoSQL databases using PACELC

Page 26: Lightning talk: highly scalabe databases and the PACELC theorem

References

� Images and title ideas from:○ http://blog.nahurst.com/visual-guide-to-nosql-systems ○ http://digbigdata.com/know-thy-cap-theorem-for-nosql/

� Detailed references at: ○ http://www.bardoloi.com/blog/2017/03/06/pacelc-theorem/

Page 27: Lightning talk: highly scalabe databases and the PACELC theorem

thanks!

Any questions?

You can find me at@bardoloi