Upload
elvis-saravia
View
177
Download
0
Embed Size (px)
Citation preview
NewSQLThe Future of Databases?
1
Elvis Saravia & Dau-Heng Hsu
23/11/2015
Outline● Introducing NewSQL● Architecture● Drawbacks of NewSQL● Conclusion● Q&A
2
What is NewSQL?“...NewSQL is a class of modern relational database management systems that seek to provide the same scalable performance of NoSQL systems for online transaction processing (read-write) workloads while still maintaining the ACID guarantees of a traditional database system…”
3
- Wikipedia
OLTP (Online Transaction Processing)
4
Old OLTP New OLTP
OldSQL for New OLTP ● Too slow● Does not Scale
NoSQL for New OLTP ● Cannot guarantee consistency
NewSQL for New OLTP ● Fast, Scalable and consistent● Supports SQL
State of the Database
○
5
ACID transactions
SQL support
Standardized
Horizontal Scaling
High Availability
Horizontal Scaling
High Availability
ACID transactions
SQL support
Standardized
ACID transactions
Horizontal Scaling
High Availability
SQL support
Standardized
RDBMS (OLDSQL) NOSQL NEWSQL
A more comprehensive look
6
● Traditional OldSQL○ SQL○ ACID compliant○ Re-write and re-architect to scale (Sharding, Denormalizing, Distributed Caching)
● NoSQL○ Scalability and Availability○ Schema-less (great for non-transactional systems)○ Give up SQL○ Give up ACID transactions (not fit for OLTP systems)
● NewSQL○ SQL ○ Scalable, shared nothing architecture○ ACID compliant○ Schema
Why do we need NewSQL (Summary)?● Provide the same scalable performance of NoSQL for OLTP, and still
maintaining the ACID.● With relations and SQL.
7
8
NewSQL Categories1. New architectures: VoltDB, NuoDB2. SQL engines: TokuDB, ScaleDB3. Transparent sharding: ScaleBase, dbShards
9
Source: Wikipedia
1. Architecture: New architectures● Provide concurrency control.● Traditional relational db concurrency control
○ 2 phase locking
● Newsql db concurrency control○ MVCC (Multi Version Concurrency Control)○ Basic Timestamp Concurrency Control○ Optimistic Concurrency Control○ T/O with Partition-Level Locking○ And others.
● e.g. Google Spanner, VoltDB, MemSQL
10
MVCC (Multi Version Concurrency Control)● Read data without blocking update.● Each transaction keeps a snapshot.● By reading the snapshot, gets a consistent view of the database.● Cost:
○ Garbage collection on old snapshot.
11New architectures
snapshots time
Basic Timestamp Concurrency Control● Timestamp on tuple.● For read or write:
○ rejects if the timestamp is less than the timestamp of the last write to that tuple.
● For a write operation:○ rejects if the timestamp is less than the timestamp of the last read to that tuple.
● Cost:○ Each site maintains a logical clock, need to be accurate.
12New architectures
Optimistic Concurrency Control● Tracks the read/write transaction; Stores all write operations in private
workspace.● The system determines whether that transaction’s read set overlaps with
the write set of any concurrent transactions.● Transactions write their updates to shared memory only at commit
time, the contention period is short.● Cost:
○ Rollback
13New architectures
T/O with Partition-Level Locking● Database is divided into disjoint subsets, called partitions.● Partition
○ Lock.○ Single-threaded execution engine.
● Apply timestamp on a transaction, and add to queues.● Execution the oldest timestamp transaction in the queue.
14New architectures
2. Architecture: SQL engines● Provide highly optimized storage engines for SQL.
○ use MySQL Cluster as an example.
● Separate nodes into 3 kinds of node○ Data node
■ Store the data○ Management node
■ Configuration and monitoring of the cluster.○ Application node or SQL node
■ Connects to all of the data nodes and perform data storage and retrieval.
● Consistency will be controlled by Application nodes.
15
3. Architecture: Transparent sharding● Use sharding middleware.● All the node can connect to middleware.● Middleware will control all the process to
ensure the consistency.● e.g. dbShards and ScaleBase.
16
Main drawback● Write latency.
○ With the concurrency control, need more time to make sure the data is consistent.
● Can use in-memory mechanism to help us reduce latency, but restricted by memory size.
17Source: http://www.planetcassandra.org/nosql-performance-benchmarks/
Write latency for workload Read/Write
Conclusion● A database trend to watch● NewSQL is ACID compliant, SQL based, scalable, distributed, highly
available RDBMS system● NewSQL databases are becoming more demanded due to the rise of
data-oriented industries (e.g. IoT)
18
Something to think about: In fact, both NoSQL and NewSQL databases can offer a degree of consistency, and availability, as well as partition tolerance.
References1. http://www.informationweek.com/big-data/big-data-analytics/16-nosql-ne
wsql-databases-to-watch/d/d-id/12695592. https://en.wikipedia.org/wiki/NewSQL3. https://github.com/cockroachdb/cockroach
4. https://voltdb.com/
5. https://451research.com
19
Q&A
20