21
Kyle Banerjee Digital Services Program Manager Orbis Cascade Alliance Dropping ACID: Wrapping Your Mind Around NoSQL Databases

Dropping ACID: Wrapping Your Mind Around NoSQL Databases

Embed Size (px)

Citation preview

Kyle BanerjeeDigital Services Program ManagerOrbis Cascade Alliance

Dropping ACID:

Wrapping Your Mind Around NoSQL Databases

Why should anyone care?

Great for the Web

• No schema – easy to store data that are really awkward to work with in RDBMS

• Much easier horizontal scalability than RDBMS

• Works great with huge amounts of data

• High fault tolerance

• Integration of both RESTful and cloud computing technologies

Examples of sites using NoSQL

There is no magic

• Databases are fast because they physically structure data so it can be accessed efficiently

• NoSQL achieves performance through tradeoffs that make sense in a Web environment

• RDBMS can be used in high performance applications

• Compromises (e.g. denormalization, sharding) that kill the advantage of having an RDBMS are often necessary

• Technically more complex (i.e. expen$ive) to implement/maintain

What is a NoSQL database?

A nonrelational data store

–Document Store

–Wide Column Store

–Key Value Store

–Graph

–XML

NoSQL databases differ significantly in what they are good for

What’s best depends on your data

Complexity

Key/Value stores

Size

Wide column

Document

databases Graph

databases

Your priorities

• What types of queries do you need to support?

• How much data?

• Optimized for reads, writes, or updates?

• Versioning

• How separate is data from app? Will other applications need to access it in future?

And how you want to interact with it

• RESTful inteface

• Query API

• NonSQL query languages

• Via indexed values, keys, nodes

• File access

Key value stores

• Basically a hash

• Focus on scaling to huge amounts of data

• Examples: Amazon SimpleDB, Voldemort, Dynomite, BerkeleyDB, Riak

Wide column stores

• Somewhat like column oriented relational databases

• Same elements don’t have to have same columns

• Examples: Hadoop, Cassandra, Hbase

Document databases

• Like key-value stores, but values have meaning to database

• Examples: CouchDB, MongoDB

Graph databases

• Uses nodes, relationships between nodes and key-value properties

• Recursive structures in relational DBs require expensive joins

• Examples: Neo4j, VertexDB, AllegroGraph

Things that simplify life

• JSON

• RESTful interface or easy API

• Multiversion Concurrency Control (MVCC)

Traditional RDBMS

animal_type

animal_id: integer

description: varchar

pet

pet_id: integer

animal_id: integer

name: varchar

likes

pet_id: integer

friend_id: integer

pet animal_type likes animal_type

Charley dog Powder dog

Charley dog Bo dog

hates

pet_id: integer

animal_id: integer

pet animal_type hates animal_type

Charley dog Abby cat

Charley dog Spidey tarantula

JSON Example

{

"name": "Charley",

"animal_type": "dog",

"likes": [

{"name": "Powder", "animal_type": "dog"},

{"name": "Bo", "animal_type": "dog"}

],

"hates": [

{"name": "Abby", "animal_type": "cat "},

{"name": “Spidey", "animal_type": “tarantula"}

]

}

Why JSON?

• Lightweight, interoperable and open

• Can be composed in any text editor

• Syntax is crazy easy

• With RESTful API, can be used with any software that supports HTTP (even the user’s browser can make direct DB calls)

• Allows you to send and receive data as it is used

How easy can REST be?

Create: HTTP PUT /db/docid

Read: HTTP GET /db/docid

Update: HTTP POST /db/docid

Delete: HTTP DELETE /db/docid

MVCC in a nutshell

• Creates new version each time an update is made

• Timestamps used to prevent conflicts

• Reads are always possible

Disadvantages of NoSQL

• Performance and scalability achieved at the expense of feature support

• No joins. Grouping and ordering become more problematic

• No SQL

• No transactions

• Eventual consistency vs strict consistency

• Tools are often lacking

The bottom line

• In a library context, NoSQL is appropriate when flexible schema or fast displays that contain related data are needed

• Understand the problem at hand as well as the pros/cons of your options before deciding on a solution

• Don’t ditch your RDBMS

Questions?

Kyle Banerjee

Orbis Cascade Alliance

[email protected]