NewSQL - The Future of Databases?

Preview:

Citation preview

NewSQLThe Future of Databases?

1

Elvis Saravia & Dau-Heng Hsu

23/11/2015

Outline● Introducing NewSQL● Architecture● Drawbacks of NewSQL● Conclusion● Q&A

2

What is NewSQL?“...NewSQL is a class of modern relational database management systems that seek to provide the same scalable performance of NoSQL systems for online transaction processing (read-write) workloads while still maintaining the ACID guarantees of a traditional database system…”

3

- Wikipedia

OLTP (Online Transaction Processing)

4

Old OLTP New OLTP

OldSQL for New OLTP ● Too slow● Does not Scale

NoSQL for New OLTP ● Cannot guarantee consistency

NewSQL for New OLTP ● Fast, Scalable and consistent● Supports SQL

State of the Database

5

ACID transactions

SQL support

Standardized

Horizontal Scaling

High Availability

Horizontal Scaling

High Availability

ACID transactions

SQL support

Standardized

ACID transactions

Horizontal Scaling

High Availability

SQL support

Standardized

RDBMS (OLDSQL) NOSQL NEWSQL

A more comprehensive look

6

● Traditional OldSQL○ SQL○ ACID compliant○ Re-write and re-architect to scale (Sharding, Denormalizing, Distributed Caching)

● NoSQL○ Scalability and Availability○ Schema-less (great for non-transactional systems)○ Give up SQL○ Give up ACID transactions (not fit for OLTP systems)

● NewSQL○ SQL ○ Scalable, shared nothing architecture○ ACID compliant○ Schema

Why do we need NewSQL (Summary)?● Provide the same scalable performance of NoSQL for OLTP, and still

maintaining the ACID.● With relations and SQL.

7

8

NewSQL Categories1. New architectures: VoltDB, NuoDB2. SQL engines: TokuDB, ScaleDB3. Transparent sharding: ScaleBase, dbShards

9

Source: Wikipedia

1. Architecture: New architectures● Provide concurrency control.● Traditional relational db concurrency control

○ 2 phase locking

● Newsql db concurrency control○ MVCC (Multi Version Concurrency Control)○ Basic Timestamp Concurrency Control○ Optimistic Concurrency Control○ T/O with Partition-Level Locking○ And others.

● e.g. Google Spanner, VoltDB, MemSQL

10

MVCC (Multi Version Concurrency Control)● Read data without blocking update.● Each transaction keeps a snapshot.● By reading the snapshot, gets a consistent view of the database.● Cost:

○ Garbage collection on old snapshot.

11New architectures

snapshots time

Basic Timestamp Concurrency Control● Timestamp on tuple.● For read or write:

○ rejects if the timestamp is less than the timestamp of the last write to that tuple.

● For a write operation:○ rejects if the timestamp is less than the timestamp of the last read to that tuple.

● Cost:○ Each site maintains a logical clock, need to be accurate.

12New architectures

Optimistic Concurrency Control● Tracks the read/write transaction; Stores all write operations in private

workspace.● The system determines whether that transaction’s read set overlaps with

the write set of any concurrent transactions.● Transactions write their updates to shared memory only at commit

time, the contention period is short.● Cost:

○ Rollback

13New architectures

T/O with Partition-Level Locking● Database is divided into disjoint subsets, called partitions.● Partition

○ Lock.○ Single-threaded execution engine.

● Apply timestamp on a transaction, and add to queues.● Execution the oldest timestamp transaction in the queue.

14New architectures

2. Architecture: SQL engines● Provide highly optimized storage engines for SQL.

○ use MySQL Cluster as an example.

● Separate nodes into 3 kinds of node○ Data node

■ Store the data○ Management node

■ Configuration and monitoring of the cluster.○ Application node or SQL node

■ Connects to all of the data nodes and perform data storage and retrieval.

● Consistency will be controlled by Application nodes.

15

3. Architecture: Transparent sharding● Use sharding middleware.● All the node can connect to middleware.● Middleware will control all the process to

ensure the consistency.● e.g. dbShards and ScaleBase.

16

Main drawback● Write latency.

○ With the concurrency control, need more time to make sure the data is consistent.

● Can use in-memory mechanism to help us reduce latency, but restricted by memory size.

17Source: http://www.planetcassandra.org/nosql-performance-benchmarks/

Write latency for workload Read/Write

Conclusion● A database trend to watch● NewSQL is ACID compliant, SQL based, scalable, distributed, highly

available RDBMS system● NewSQL databases are becoming more demanded due to the rise of

data-oriented industries (e.g. IoT)

18

Something to think about: In fact, both NoSQL and NewSQL databases can offer a degree of consistency, and availability, as well as partition tolerance.

Q&A

20

Recommended