27
Database Consistency Models

Database Consistency Models

Embed Size (px)

Citation preview

Page 1: Database Consistency Models

Database Consistency Models

Page 2: Database Consistency Models

ACID

● Atomicity: each transaction is "all or nothing" (Commit or rollback)

● Consistency: any transaction will bring the database from one valid state to another (Preserves relational integrity)

● Isolation: concurrent execution of transactions results in a system state that would be obtained if transactions were executed serially

● Durability: persistence to disk (rebooting doesn't cause data loss, for example)

Page 3: Database Consistency Models

Examples

● Traditional relational databases:

● Oracle

● SQL Server

● MySQL

● Etc.

● Some NewSQL databases:

● VoltDB

● AltiBase

Page 4: Database Consistency Models

Deficiencies of ACID

● Difficult to maintain high availability & fault tolerance in distributed scenarios

● CAP Theorem

● Huge performance overhead in distributed synchronization

● Huge performance overhead to maintain integrity

Page 5: Database Consistency Models

CAP Theorem(Brewer's conjecture)

Page 6: Database Consistency Models

CAP Theorem(Brewer's conjecture)

● In plain english:

"...during a network partition, a distributed system must choose either Consistency or Availability." -- foundationdb.com

Page 7: Database Consistency Models

CAP Theorem(Brewer's conjecture)

● Assume that you want strong consistency.

● This implies synchronous, blocking updates.

● Assume you also want availability

● This implies multiple nodes with redundancies.

● When you update one node, you need broadcast synchronously to all other nodes, waiting for successful confirmations (very slow!!!)

● So far so good... But now a node failed to connect to the others (network failure)!

● If you don't wait for it to come back, you've sacrificed consistency. If you block on it, you've sacrificed availability.

Page 8: Database Consistency Models

CAP Theorem(Brewer's conjecture)

Page 9: Database Consistency Models

BASE

● Basically available: there will be a response to any request, but that response could still be ‘failure’ to obtain the requested data or the data may be in an inconsistent or changing state.

● Soft state: even during times without input there may be changes going on due to ‘eventual consistency,’ thus the state of the system is always ‘soft.’

● Eventually consistent: "the storage system guarantees that if no new updates are made to the object, eventually all accesses will return the last updated value." -- the CTO of Amazon.com

Page 10: Database Consistency Models

Safety versus Liveness

● Liveness: a value distributed across systems eventually converges to be the same across those same systems (generally the last update value).

● "Something good eventually happens"

● Safety:the system is at all times consistent.

● "Nothing bad ever happens"

● Eventual consistency is purely a liveness guarantee (reads eventually return the same value) and does not make safety guarantees: an eventually consistent system can return any value before it converges.

Page 11: Database Consistency Models

Safety versus Liveness

● To be clear: in eventual consistency, by default, two concurrent read/write increments of a standard counter can potentially increase it by only 1.

● The last write wins, but there is no guarantee with regards to what happened in between (and they may have both read the value when it wasn't consistent)

● This is what happens when you don't have any safety guarantee, as in eventual consistency.

Page 12: Database Consistency Models

Examples

● Most big social media websites

● Google Cloud Datastore

● Most NoSQL databases:

● Riak, Redis, Hadoop (without Hbase), Couchbase, MongoDB (in some configurations), Cassandra (in some configurations)

● Etc.

● Amazon's Dynamo DB

● DNS (Domain Name System)

Page 13: Database Consistency Models

Deficiencies of BASE

● Delay in convergence

● No safety guarantee

● You don't have the same update semantics as in ACID transactions

Page 14: Database Consistency Models

Solutions to BASE's Problems

● Application developers can write compensation logic

● Okay in small, simple applications

● Quickly becomes umanageable in complex applications

● ACID 2.0 design principles that guarantee ACID-like consistency even with an eventual consistency mechanism.

Page 15: Database Consistency Models

Mutable shared states are the root of all evil.

Page 16: Database Consistency Models

ACID 2.0

● Associativity & Commutativity: the messages in the queue can be processed in any order.

● Idempotence: the message queue can use at-least-once-delivery guarantees (retry logic). Duplicate processing of the same message doesn't matter.

● Distributed: refers to the fact that ACID 2.0 applies to distributed systems.

Page 17: Database Consistency Models

What does it mean?

● Unlike ACID and BASE, ACID 2.0 doesn't tell you what are the guarantees, instead it tells you that there are certain design principles that are immune to transactional integrity issues.

● In particular, immutable data structures that you transform are easier to handle than mutable shared states (as most functional programming languages have understood)

Page 18: Database Consistency Models

The CALM Theorem

● Consistency as Logical Monotonicity

● Logically monotonic: intuitively, a monotonic program (or data structure) makes forward progress over time: it never "retracts" an earlier conclusion in the face of new information.

● Implementation is usually through a class of data structures referred to as CRDTs (conflict-free replicated data types)

Page 19: Database Consistency Models

Example: the PN-Counter

● Counts the number of increment and decrement calls per transaction (or "actor", or "node")

● When the value is read, it's calculated on the fly by summing up the number of increment "marks" and subtracting from the number of decrement "marks"

Page 20: Database Consistency Models

Example: the PN-Counter

Page 21: Database Consistency Models

Example: Bitcoin

● The bitcoin transaction ledger is a CRDT. It's an append only structure.

● The ledger contains the history of all transactions ever made: and it's a replicated dataset, updated by appending new transactions in a peer-to-peer "eventual consistency" framework.

Page 22: Database Consistency Models

Example: Apache Spark RDDs

● Spark is a high-performance distributed computing framework

● Big Data analytics

● Machine learning (MLlib)

● Distributed graph processing (GraphX)

● Spark SQL

● It replaces Hadoop MapReduce (about 30 to 100 times faster)

● The essence of the Spark framework is a type of data structure called a Resilient Distributed Dataset (which is a CRDT).

Page 23: Database Consistency Models

Example: Apache Spark RDDs

● RDDs features:

● Immutable

● Distributed / Replicated

● Expose map(), filter(), reduce(), join() operations to produce new derived RDDs (very "functional" rather than object-oriented – written in Scala)

● Logs "lineage" information (how the RDD was constructed) across partitions, rather than the data itself, for efficiency. If a network fault occurs, it can reconstruct the data through that lineage. This way the cost of data replication isn't generally incurred (only in fault recovery scenarios).

Page 24: Database Consistency Models

Example: Apache Spark RDDs

Page 25: Database Consistency Models

Other examples

● Apache Kafka message queue

● Riak vector clocks for synchronization

● The game league of legends uses Riak CRDTs for its in-game chat system

● TreeDoc and Logoot: for collaborative text editing

● SoundCloud uses a CRDT set for streaming, implemented on top of Redis

Page 26: Database Consistency Models

Deficiences of CRDTs

● Not a universal solution: doesn't cover all possible applications

● Garbage collection issues (append-only means it consumes increasing amounts of space!)

● Complex to design

Page 27: Database Consistency Models

Some solutions

● Bloom programming language

● Provide a "framework" to develop in a commutative, order-insensitive way that favors data structure of a CRDT type.

● Existing distributed computing platforms do the complicated work for us (Apache Spark, for example)

● We still need to accept locking ACID or weakly consistent BASE for some parts of the system. We can also resort to better "compromises" such as causal consistency.