View
238
Download
3
Category
Preview:
Citation preview
Introduction to NoSQL Databases
Jianfeng Zhan2012.6.25
A quick introduction to DB
Models of Reality
REALITY• structures• processes
DATABASE SYSTEM
DATABASE
DML
DDL
A database is a model of structures of reality The use of a database reflect processes of reality A database system is a software system which
supports the definition and use of a database DDL: Data Definition Language DML: Data Manipulation Language
Data Modeling
REALITY• structures• processes
DATABASE SYSTEM
MODEL
data modeling
The model represents a perception of structures of reality
The data modeling process is to fix a perception of structures of reality and represent this perception
In the data modeling process we select aspects and we abstract
Process Modeling
REALITY• structures• processes
DATABASE SYSTEM
MODEL
process modeling
The use of the model reflects processes of reality Processes may be represented by programs with
embedded database queries and updates Processes may be represented by ad-hoc database
queries and updates at run-timeDML DML
PROG
Data Model
data structures integrity constraints operations
A data model consists of notations for expressing:
Data Model - Data Structures
attribute types entity types relationship types
FLIGHT# AIRLINE WEEKDAY PRICE
FLIGHT-SCHEDULE
101 delta mo 156
545 american we 110
912 scandinavian fr 450
242 usair mo 231
DEPT-AIRPORT
FLIGHT# AIRPORT-CODE
101 atl
912 cph
545 lax
All data models have notation for defining:
Data Model - Constraints
Static constraints apply to database state Dynamic constraints apply to change of database state E.g., “All FLIGHT-SCHEDULE entities must have
precisely one DEPT-AIRPORT relationship
FLIGHT# AIRLINE WEEKDAY PRICE
FLIGHT-SCHEDULE
101 delta mo 156
545 american we 110
912 scandinavian fr 450
242 usair mo 231
Constraints express rules that cannot be expressed by the data structures alone:
DEPT-AIRPORT
FLIGHT# AIRPORT-CODE
101 atl
912 cph
545 lax
242 bos
Data Model - Operations
insert FLIGHT-SCHEDULE(97, delta, tu, 258); insert DEPT-AIRPORT(97, atl);
select FLIGHT#, WEEKDAYfrom FLIGHT-SCHEDULEwhere AIRLINE=‘delta’;
Operations support change and retrieval of data:
FLIGHT# AIRLINE WEEKDAY PRICE
FLIGHT-SCHEDULE
101 delta mo 156
545 american we 110
912 scandinavian fr 450
242 usair mo 231
97 delta tu 258
DEPT-AIRPORT
FLIGHT# AIRPORT-CODE
101 atl
912 cph
545 lax
242 bos
97 atl
Data Model - Operations from Programs
declare C cursor for select FLIGHT#, WEEKDAYfrom FLIGHT-SCHEDULEwhere AIRLINE=‘delta’;open C;repeat
fetch C into :FLIGHT#, :WEEKDAY;do your thing;
until done;close C;
FLIGHT# AIRLINE WEEKDAY PRICE
FLIGHT-SCHEDULE
101 delta mo 156
545 american we 110
912 scandinavian fr 450
242 usair mo 231
97 delta tu 258
Keys and Identifiers
A key on FLIGHT# in FLIGHT-SCHEDULE will force all FLIGHT#’s to be unique in FLIGHT-SCHEDULE
Consider the following keys on DEPT-AIRPORT:
Keys (or identifiers) are uniqueness constraints
FLIGHT# AIRPORT-CODE FLIGHT# AIRPORT-CODE FLIGHT# AIRPORT-CODEFLIGHT# AIRPORT-CODE
DEPT-AIRPORT
FLIGHT# AIRPORT-CODE
101 atl
912 cph
545 lax
242 bos
FLIGHT# AIRLINE WEEKDAY PRICE
FLIGHT-SCHEDULE
101 delta mo 156
545 american we 110
912 scandinavian fr 450
242 usair mo 231
Integrity and Consistency Integrity: does the model reflect reality well? Consistency: is the model without internal conflicts?
a FLIGHT# in FLIGHT-SCHEDULE cannot be null because it models the existence of an entity in the real world
a FLIGHT# in DEPT-AIRPORT must exist in FLIGHT-SCHEDULE because it doesn’t make sense for a non-existing FLIGHT-SCHEDULE entity to have a DEPT-AIRPORT
DEPT-AIRPORT
FLIGHT# AIRPORT-CODE
101 atl
912 cph
545 lax
242 bos
FLIGHT# AIRLINE WEEKDAY PRICE
FLIGHT-SCHEDULE
101 delta mo 156
545 american we 110
912 scandinavian fr 450
242 usair mo 231
Triggers and Stored Procedures
Triggers can be defined to enforce constraints on a database, e.g.,
DEFINE TRIGGER DELETE-FLIGHT-SCHEDULE ON DELETE FROM FLIGHT-SCHEDULE WHERE
FLIGHT#=‘X’ACTION DELETE FROM DEPT-AIRPORT WHERE
FLIGHT#=‘X’;
DEPT-AIRPORT
FLIGHT# AIRPORT-CODE
101 atl
912 cph
545 lax
242 bos
FLIGHT# AIRLINE WEEKDAY PRICE
FLIGHT-SCHEDULE
101 delta mo 156
545 american we 110
912 scandinavian fr 450
242 usair mo 231
Normalization
FLIGHT# AIRLINE PRICE
FLIGHT-SCHEDULE
101 delta 156
545 american 110
912 scandinavian 450
FLIGHT# AIRLINE WEEKDAY PRICE
FLIGHT-SCHEDULE
101 delta mo
545 american mo 110
912 scandinavian fr 450
156
101 delta fr 156
545 american we 110
545 american fr 110
FLIGHT# AIRLINE WEEKDAYS PRICE
FLIGHT-SCHEDULE
101 delta mo,fr 156
545 american mo,we,fr 110
912 scandinavian fr 450
FLIGHT# WEEKDAY
FLIGHT-WEEKDAY
101 mo
545 mo
912 fr
101 fr
545 we
545 fr
ANSI/SPARC 3-Level DB Architecture - separating concerns
database system
schema
data
database
database systemDDL
DML
a database is divided into schema and data the schema describes the intension (types) the data describes the extension (data) Why? Effective! Efficient!
ANSI/SPARC 3-Level DB Architecture - separating concerns
schema
data
schema
conceptual schema internal schema
data
internal schema
data
external schema
ANSI/SPARC 3-Level DB Architecture
externalschema1
externalschema2
externalschema3
conceptualschema
internalschema
database
• external schema:
use of data
• conceptual schema:
meaning of data
• internal schema:
storage of data
Conceptual Schema
Describes all conceptually relevant, general, time-invariant structural aspects of the universe of discourse
Excludes aspects of data representation and physical organization, and access
NAME ADDR SEX AGE
CUSTOMER
An object-oriented conceptual schema would also describe all process aspects
External Schema
Describes parts of the information in the conceptual schema in a form convenient to a particular user group’s view
Is derived from the conceptual schema
NAME ADDR SEX AGE
CUSTOMER
NAME ADDR
MALE-TEEN-CUSTOMER
TEEN-CUSTOMER(X, Y) =CUSTOMER(X, Y, S, A) WHERE SEX=M AND 12<A<20;
Internal Schema Describes how the information described in the
conceptual schema is physically represented to provide the overall best performance
NAME ADDR SEX AGE
CUSTOMER
NAME ADDR SEX AGE
CUSTOMER
B+-tree on AGE NAME PTR
index on NAME
Indexing
Why Bother? Disk access time: 0.01-0.03 sec Memory access time: 0.000001-0.000003 sec Databases are I/O bound Rate of improvement of
(memory access time)/(disk access time) >>1 Things won’t get better anytime soon!
Indexing helps reduce I/O !
Indexing (cont.)
Clustering vs. non-clustering alters the data block into a certain distinct order to match the index,
resulting in the row data being stored in order. The data is present in arbitrary order, but the logical ordering is
specified by the index.
Primary and secondary indices An index structure that is defined on the ordering field index field that are neither ordering fields nor key fields
I/O cost for lookup: Heap: N/2 Sorted file: log2(N) Single-level index: log2(n)+1 Multi-level index; B+-tree: logfanout(n)+1 Hashing: 2-3
Concurrency Control
datereservflight# customer#
flight-inst
flight# date #avail-seats
T1:read(flight-inst(flight#,date)seats:=#avail-seatsif seats>0 then {seats:=seats-1
write(reserv(flight#,date,customer1))write(flight-inst(flight#,date,seats))}
T2:
read(flight-inst(flight#,date)seats:=#avail-seatsif seats>0 then {seats:=seats-1write(reserv(flight#,date,customer2))write(flight-inst(flight#,date,seats))}
overbooking!
ACID Transactions An ACID transaction is a sequence of database
operations that has the following properties: Atomicity
Either all operations are carries out, or none is This property is the responsibility of the concurrency
control and the recovery sub-systems Consistency
A transaction maps a correct database state to another correct state
This requires that the transaction is correct, which is the responsibility of the application programmer
Concurrency Control (cont.)
Isolation Although multiple transactions execute
concurrently, i.e. interleaved, not parallel, they appear to execute sequentially
This is the responsibility of the concurrency control sub-system
Durability The effect of a completed transaction is
permanent This is the responsibility of the recovery manager
Concurrency Control (cont.)
Serializability is a good definition of correctness A variety of concurrency control protocols exist
Two-phase (2PL) locking deadlock and livelock possible deadlock prevention: wait-die, wound-wait deadlock detection: rollback a transaction
Optimistic protocol: proceed optimistically; back up and repair if needed
Pessimistic protocol: do not proceed until knowing that no back up is needed
RecoveryStorage types: Volatile: main memory Nonvolatile: disk
Errors: Logical error: transaction fails; e.g. bad input, overflow System error: transaction fails; e.g. deadlock System crash: power failure; main memory lost, disk
survives Disk failure: head crash, sabotage, fire; disk lost
What to do?
Recovery (cont.) Deferred update (NO-UNDO/REDO):
don’t change database until ready to commit write-ahead to log to disk change the database
Immediate update (UNDO/NO-REDO): write-ahead to log on disk update database anytime commit not allowed until database is completely updated
Immediate update (UNDO/REDO): write-ahead to log on disk update database anytime commit allowed before database is completely updated
Shadow paging (NO-UNDO/NO-REDO): write-ahead to log in disk keep shadow page; update copy only; swap at commit
Parallel Databases
A database in which a single query may be executed by multiple processors working together in parallel
There are three types of systems: Shared memory Shared disk Shared nothing
Parallel Databases - Shared Memory
processors share memory via bus
extremely efficient processor communication via memory writes
bus becomes the bottleneck not scalable beyond 32 or 64
processorsP processor
M memorydisk
P
M
P
P
P
Parallel Databases - Shared Disk
processors share disk via interconnection network
memory bus not a bottleneck fault tolerance wrt. processor
or memory failure scales better than shared
memory interconnection network to
disk subsystem is a bottleneck
P
P
P
P
M
M
M
M
Parallel Databases - Shared Nothing
scales better than shared memory and shared disk
main drawbacks: higher processor
communication cost higher cost of non-local disk
access used in the Teradata database
machine
PM
PM
PM
PM
OUTLINE
NoSQL Definition Motivation Data Store Introduction
-- Key-value Stores-- Document Stores-- Extensible Record Stores-- New Relational Database
Conclusion
NoSQL: The Name
环境变化
互联网网络延迟变大
Time out
Partial failures
Network partitions
RDBMS
Web apps can (usually) do without--Transaction/ Strong Consistency/ integrity--Complex queries
Web apps have different needs (than the apps that RDBMS were designed for)--Scalability & elasticity (at low cost)--High availability--Flexible schemas/ semi-structured data --Geographic distribution (multiple datacenters)
NoSQL Systems
No declarative query language– more programmingRelaxed consistency—fewer guarantees
NoSQL Systems
The idea behind the NoSQL: Giving up ACIDconstraints, one can achieve much higher performance and scalability.ACID= Atomicity, Consistency, Isolation, and DurabilityBASE=Basically Available, soft state, Eventually consistent.
CAP Theorem
A system can have only two out of three of the following properties:consistency, availability, and partition-tolerance.
CAP details
Consistency (all nodes see the same data at the same time)
Availability (a guarantee that every request receives a response about whether it was successful or failed)
Partition tolerance (the system continues to operate despite arbitrary message loss or failure of part of the system)
The easiest way to understand CAP
Two nodes on opposite sides of a partition. Allowing at least one node to update state will cause
the nodes to become inconsistent. forfeiting C.
Likewise, if the choice is to preserve consistency,one side of the partition must act as if it isunavailable. forfeiting A.
Only when nodes communicate is it possible topreserve both consistency and availability. forfeiting P.
Classification of NoSQL systems and tradeoffs (1).
Read performance versus write performance Hbase optimized for write performance.
Records on disk are never overwritten; instead, updates are written to a buffer in memory, and the entire buffer is written sequentially to disk.
Latency versus durability Writes are synched to disk before the system
returns success to users. Writes are stored in memory at write time and
synched to disk later.
Classification of NoSQL systems and tradeoffs (2).
Synchronous versus asynchronous replication Improve system availability, avoid data loss,
and improve performance.
Data partition row-based storage
Efficient access of an entire record. Column storage
Efficient for accessing a subset of the columns.
Design decisions of various systems.
Types of NoSQL Databases
Key-value stores
Document stores
Extensible record stores
NoSQL systems differ mainly in their data models
Specific implementations differ in the persistent mechanism and additional functionalities: Replication
Versioning
Locking
Transactions
etc..
Types of NoSQL Databases
Key-Value Stores
• Global Collection of Key/Value Pairs
• Inspired by Amazon’s Dynamo and Distributed Hashtables
•Operations
•void Put(string key, byte[] data);
•byte[] Get(string key);
•void Remove(string key);
Key-Value Stores: Examples
Project Voldemort
Advanced key-value store Created by LinkedIn, now open source Written in Java Provides MVCC
Multiversion concurrency control Asynchronous replication Sharding + Consistent Hashing Automatic failure detection and recovery
MVCC
Snapshot one Time Object 1 Object 2 t1 "Hello" "Bar" t0 "Foo" "Bar“
Snapshot two Time Object 1 Object 2 Object 3 t2 "Hello" (deleted) "Foo-Bar" t1 "Hello" "Bar" t0 "Foo" "Bar"
A Solution: Hashing
Example: y = ax+b (mod n)
Intuition: Assigns items to “random” caches few items per cache
Easy to compute which cache holds an item
Server
items assigned to cachesby hash function.
Users use hash to compute cache for item.
Adding Caches: why consistent hashing?
Suppose a new cache arrives. How work it into hash function? Natural change:
y=ax+b (mod n+1) Problem: changes bucket for every item
every cache will be flushed servers get swamped with new requests
Goal: when add bucket, few items move
Project Voldemort
Operations:
value = store.get(key)
store.put(key, value)
store.delete(key)
Pros? & Cons?
What is a document? Semi-structured data Encapsulates and encodes data (or information) in
some standard formats or encodings Encodings:
XML YAML JSON BSON Binary forms: PDF, Microsoft Office documents.. etc.
Document Stores: Document?
Document Stores: Document?
Documents are like rows or records in relational databases, BUT
Schema No Schema
FirstName:"Bob", Address:"5 Oak St.", Hobby:"sailing"
FirstName:"Jonathan", Address:"15 Wanamassa Point Road", Children:[{Name:"Michael",Age:10}, {Name:"Jennifer", Age:8}, {Name:"Samantha", Age:5}, {Name:"Elena", Age:2}]
RowDocument
Document Stores
Similar to Key-value stores but with a major differences, value is a document generally support secondary indexes
Flexible schema Any number of fields can be added Multiple types of documents (objects) and nested
documents or lists Documents stored in JSON or Binary JSON (BSON) No ACID property
Document Stores: Examples
TERRASTOREby Google
CouchDB
Apache project since 2008 Schema free, document oriented database
Documents are stored in JSON format Support secondary indexes B-tree storage engine MVCC model, no locking No joins, no PK/FK
Incremental replication
CouchDB
REST API
Libraries for various languages that convert native API calls into the RESTful calls Java, C, PHP, etc.
CRUD HTTP ParamsCreate PUT /db/docidRead GET /db/docidUpdate POST /db/docidDelete DELETE /db/docid
CouchDB: Views
Views Filter, sort, “join”, aggregate, report Map/Reduce based K/V pairs from Map/Reduce are also stored in
the B-tree engine Built on demand Can be materialized & incrementally updated
CouchDB: Views
CouchDB: Local Consistency
• CouchDB uses Multi-Version Concurrency Control (MVCC)
CouchDB: “Global” Consistency
• Incremental Replication
Extensible record stores
Extensible record stores also called column stores.
Each key is associated with multiple attributes(i.e. columns)
Hybrid row/column stores Inspired Google BigTable Example: HBase, Cassandra
Column: HBase
Based on Google’s BigTable Apache Project TLP Cloudera (certification, EC2 AMI’s, etc.) Layered over HDFS (Hadoop Distributed File
System). Input/Output for MapReduce Jobs APIs
---Thrift, REST
Thrift API Thrift is an interface definition language that is used to
define and create services for numerous languages. It is used as a remote procedure call (RPC) framework and
was developed at Facebook for "scalable cross-language services development".
It combines a software stack with a code generation engine to build services that work efficiently to a varying degree and seamlessly between different languages.
Although developed at Facebook, it is now an open sourceproject in the Apache Software Foundation.
To put it simply, Apache Thrift is a binary communication protocol.
Thrift Architecture
REST architecture style (1)
Client–server separation of concerns
Stateless The client–server communication is further
constrained by no client context being stored on the server between requests.
Each request from any client contains all of the information necessary to service the request, and any session state is held in the client.
REST architecture style (2)
Cacheable As on the World Wide Web, clients can cache
responses. Responses must therefore, implicitly or explicitly,
define themselves as cacheable, or not, to prevent clients reusing stale or inappropriate data in response to further requests.
REST architecture style (3)
Layered system A client cannot ordinarily tell whether it is
connected directly to the end server, or to an intermediary along the way.
Code on demand (optional) Servers are able temporarily to extend or
customize the functionality of a client by the transfer of executable code.
Uniform interface
Column: HBase
Automatic Partitioning Automatic re-balancing/re-partitioning Fault tolerant
--HDFS---Multiple Replicates
Highly distributed
Column: HBase
Column: Cassandra
Create at facebook for Inbox search Facebook Google Code ASF Commercial Support available from Riptano Features taken from both Dynamo and Big
Table-- Dynamo – Consistent hashing, Partitioning,
Replication-- Big Table- Column Familes, MemTables,
SSTables
Column: Cassandra
Symmetric nodes-- No single point of failure-- Linearly scalable-- Ease of administration
Flexible/Automated Provisioning Flexible Replica Replacement High Availability
-- Eventually Consistency-- However, consistency is tuneable
Column: Cassandra
Partitioning--Random----Good distribution of data between nodes---- Range scans not possible--Order preserving---can lead to unbalanced nodes--- Range scans, Natural Order
Extremely fast reads/writes (low latency) Thrift API
Column-oriented NoSQLName Producer Data Model QueryingBigTable Google Set of couple(key,
values)Selection (by combination of row, column, and time stamp ranges)
HBase Apache Groups of columns (a BigTable clone)
JRUBY IRB-based shell(similar to SQL)
Hypertable Hypertable Like BigTable HQL(Hypertext Query Language)
CASSANDRA
Apache Columns, groups of columns corresponding to a key(supercolumns)
Simple selection on key, range queries, column or column ranges
PNUTS Yahoo (hashed or ordered) tables, typed arrays, flexible schema
Selection and projection from a single table (retrieve an arbitrary single record by primary key, range queries, complex predicates, ordering, top-k)
Scalable Relational Systems
Also called NewSQL SQL ACID Performance and scalability through modern
innovative software architecture
Scalable Relational Systems
RDBMS will provide scalabilty: Use small scope operations Use small-scope transaction
MySQL Cluster
shared-nothing clusters
NDB storage engine (replace the InnoDB)
Replication(2PC)
Horizontal data partitioning
Two phases commit protocols
The commit-request phase (or voting phase) a coordinator process attempts to prepare all the
transaction's participating processes to take the necessary steps for either committing or aborting the transaction and to vote, either "Yes": commit, or "No": abort
The commit phase based on voting of the cohorts, the coordinator
decides whether to commit (only if all have voted "Yes") or abort the transaction (otherwise), and notifies the result to all the cohorts. The participating processes then follow with the needed actions.
为了帮助保护您的隐私,PowerPoint 已阻止自动下载此图片。
Horizontal data partitioning
Data within NDB tables is automatically partitioned across all of the data nodes in the system.
This is done based on a hashing algorithm based on the PRIMARY KEY on the table, and is transparent to the end application.
MySQL Cluster
Scalable Relational Systems
CONCLUSION: NoSQL pros/cons
Advantages Massive scalability High availability Lower cost (than competitive solutions at that scale) (usually) predictable elasticity
Schema flexibility, sparse & semi-structured data
Disadvantages Limited query capabilities (so far) Eventual consistency is not intuitive to program for
Makes client applications more complicated No standardization
Portability might be an issue
CONCLUSION
For now NoSQL databases are still far from advanced
database technologies NoSQL will not replace traditional relational
DBMS NoSQL are good for specialized applications
involving large unstructured distributed data with high requirements on scaling
A reading list
E Brewer , CAP Twelve Years Later: How the" Rules" Have Changed, Computer-IEEE Computer Magazine, 2012
谢谢
Recommended