108
Berglund Tim NoSQL SMACKDOWN 1

NoSQL Smackdown!

Embed Size (px)

DESCRIPTION

A whirlwind tour of a few NoSQL solutions, learning the very different ways they represent data and seeing their unique strengths and weaknesses in various kinds of applications. Along the way, we'll learn why new technologies must be introduced to address today's scaling challenges, and what compromises we'll have to make if we want to abandon the databases of our youth.

Citation preview

Page 1: NoSQL Smackdown!

BerglundTim

NoSQLSMACKDOWN

1

Page 2: NoSQL Smackdown!

@tlberglund

#nosql2

Page 3: NoSQL Smackdown!

Voldemort

3

Page 4: NoSQL Smackdown!

negativavia

4

Page 5: NoSQL Smackdown!

SQL5

Page 6: NoSQL Smackdown!

tuple (n.)

relation (n.)An unordered set of tuples of the same type.

6

Page 7: NoSQL Smackdown!

tuple (n.)A function that maps attributes to values.

7

Page 8: NoSQL Smackdown!

tuple (n.)(A bundle of key-value pairs—but don’t tell anyone!)

8

Page 9: NoSQL Smackdown!

id username pwd_hash born_at monkey

1 mluther d8c82af9 Nov 1483 FALSE

2 aaugustine 329b8dae Nov 354 FALSE

3 gnyssa e50ec9e0 Jun 335 FALSE

4 bonzo 330e01f2 Apr 2007 TRUE

9

Page 10: NoSQL Smackdown!

Relations

10

Page 11: NoSQL Smackdown!

Comparing “NoSQL” to “relational” is a bit of a shell game.

—Eben Hewittauthor of Cassandra: The Definitive Guide

11

Page 12: NoSQL Smackdown!

Transactions

12

Page 13: NoSQL Smackdown!

C

PA

CAP Theorem

13

Page 14: NoSQL Smackdown!

Tradeoff Between

ConsistencyAvailability

Partition Tolerance

14

Page 15: NoSQL Smackdown!

Between what?

15

Page 16: NoSQL Smackdown!

consistency (n.)All clients always have the same view of the data.

16

Page 17: NoSQL Smackdown!

availability (n.)All clients can always read or write within some maximum latency.

17

Page 18: NoSQL Smackdown!

partition tolerance (n.)

No set of failures less than total network failure is allowed to cause the system to respond incorrectly.

18

Page 19: NoSQL Smackdown!

Cluster Node

Cluster Node

Cluster Node

Cluster Node

19

Page 20: NoSQL Smackdown!

Cluster Node

Cluster Node

Cluster Node

Cluster Node

Switch

20

Page 21: NoSQL Smackdown!

Cluster Node

Cluster Node

Cluster Node

Cluster Node

Switch

21

Page 22: NoSQL Smackdown!

C

PA

CAP Theorem

22

Page 23: NoSQL Smackdown!

Strongly Consistent

C

PA

MongoDBCassandra

23

Page 24: NoSQL Smackdown!

Always Available

C

PA

CouchDBRiak

VoldemortCassandra

24

Page 25: NoSQL Smackdown!

Partition Intolerant

C

PA

MySQLOracle

SQL ServerNeo4JRedis

25

Page 26: NoSQL Smackdown!

C

PA

26

Page 27: NoSQL Smackdown!

negativavia

a way forward

27

Page 28: NoSQL Smackdown!

NoSQL is a set of different approaches to storing and

retrieving data.

28

Page 29: NoSQL Smackdown!

What’s Different?

Data models

Querying

Approaches to scale

29

Page 30: NoSQL Smackdown!

Tradeoffs

Complex transactions vs. scalability

Consistency vs. availability (often)

Performance vs. durability

Horizontal vs. vertical scale

Cheap writes vs. cheap reads

30

Page 31: NoSQL Smackdown!

OriginLicenseImplementation language

Data model

How does it scale?

API/Query language

Deployments

Support and community

31

Page 32: NoSQL Smackdown!

32

Page 33: NoSQL Smackdown!

Voldemort

33

Page 34: NoSQL Smackdown!

34

Page 35: NoSQL Smackdown!

Origin-Facebook Inbox search

back in 2007

License-Apache Public License 2.0

35

Page 36: NoSQL Smackdown!

Implementation Language-

Java 6

Data Model-

It’s a Big-Table-based

“column store.”

36

Page 37: NoSQL Smackdown!

Column

TimestampValueName

37

Page 39: NoSQL Smackdown!

Row

Column

Key

Column

Column

Column

Column

39

Page 40: NoSQL Smackdown!

Column Family

ColumnColumnKey Column

ColumnColumnKey

ColumnKey

40

Page 41: NoSQL Smackdown!

“Contacts” Column Family

emailfull_name050fe74e2 mobile

emailfull_namebbf77f01d

full_name8b20d8f6

41

Page 42: NoSQL Smackdown!

SuperColumn

Name

Columnkey

Columnkey

Columnkey

42

Page 43: NoSQL Smackdown!

Contact Info SuperColumn

4145bfaf15f10c2e6033f8b9c3143297a36f5fe3

20101011T120502ZTim Berglundfull_name

[email protected]

19940217T145637Z[redacted]mobile

20101011T120452Z80123postal_code

full_name

email

mobile

postal_code

43

Page 44: NoSQL Smackdown!

SuperColumn Family

Key

SuperColumnKey

Key

SuperColumn

SuperColumn

SuperColumn

SuperColumn

44

Page 45: NoSQL Smackdown!

Keyspace

SuperColumn Family

SuperColumn Family

SuperColumn Family

Column Family

Column Family

45

Page 46: NoSQL Smackdown!

A what?

46

Page 47: NoSQL Smackdown!

Nested Hash Table

Cluster.Keyspace.ColumnFamily[key1][key2] = <column>

...SuperColumnFamily[key1][key2] = <row>

Cluster.Keyspace.ColumnFamily[key] = <row>

...SuperColumnFamily[key1][key2][key3] = <column>

...SuperColumnFamily[key] = <map of rows>

47

Page 48: NoSQL Smackdown!

Scalability-

Rock star!(see Amazon Dynamo)

2000

4000

6000

8000

A000

C000

E000

0000

48

Page 49: NoSQL Smackdown!

2000

4000

6000

8000

A000

C000

E000

0000

49

Page 50: NoSQL Smackdown!

Scalability

- Consistent hashing

- No distinguished nodes

- Add and remove nodes

on a live cluster

50

Page 51: NoSQL Smackdown!

API- Thrift RPC

- Easy to fetch columns

by key

- Hadoop integration- Native clients

51

Page 52: NoSQL Smackdown!

Deployments-

52

Page 54: NoSQL Smackdown!

Voldemort

54

Page 55: NoSQL Smackdown!

55

Page 56: NoSQL Smackdown!

Origin-

Founders of DoubleClick were totally going to

take over the Cloud

License-Database: GNU Affero 3.0

Drivers: APL 2

56

Page 57: NoSQL Smackdown!

Implementation Language-

C++

Data Model-

JSON document database

(this is so simple!)

57

Page 58: NoSQL Smackdown!

{ "_id" : ObjectId("4cbd00455280f73d395922a4"), "contact" : { "tags" : ["man", "", "", ""] "firstName" : "Myron", "lastName" : "Dalton", "address1" : "4322 Maple Street", "city" : "Santa Ana", "state" : "CA", "postalCode" : "92705", "email" : "[email protected]" }, "occupation" : "Long haul truck driver" }

58

Page 59: NoSQL Smackdown!

Does it scale?

Well...it shards!

59

Page 60: NoSQL Smackdown!

API- Native JavaScript

console

- Binary drivers

- Ad-hoc query language

(but it’s NOT SQL, okay?)

60

Page 61: NoSQL Smackdown!

db.address.find().limit(5)

db.contact.find({ “lastName”: “Berglund” })

db.address.find({ $query: { “stateProvince”: “CO” }, $orderBy: { “city”: 1 } })

db.address.find({ “contact.city”: “Chicago” })

db.address.remove({_id: ObjectId("4cbcfd7df72291161b1d1bf2")})

61

Page 62: NoSQL Smackdown!

API- Can write MapReduce

jobs in JavaScript

- Morphia for Java

- Mongoose for node.js

62

Page 63: NoSQL Smackdown!

Deployments-

63

Page 65: NoSQL Smackdown!

Concerns

- Write durability?

- Sharding performance

- But everyone still wants

to date her

Journaling comingin 1.8!

65

Page 66: NoSQL Smackdown!

Voldemort

66

Page 67: NoSQL Smackdown!

67

Page 68: NoSQL Smackdown!

Origin

-Neo Technologies in 2003

-Malmö and San Francisco

68

Page 69: NoSQL Smackdown!

License

- GPL3, full-featured

- Commercial

$49/mo antiviral

$499/mo advanced

$1,999/mo enterprise

69

Page 70: NoSQL Smackdown!

Maturity

- Production since 2003

- 1.0 in Feb 2010

- Java 6Implementation Language

- Easily embeddable!

70

Page 71: NoSQL Smackdown!

Data Metaphor

- Graph

- Nodes, relationships

71

Page 72: NoSQL Smackdown!

4CG

-;NNB?Q

"LC;H

(IFFSQII>4SJ?M

+HIQM

7LCN?M QCNB7ILEM QCNB

3J?;EM

QCNB%HA;A?M CH

>CMJON;NCIH QCNB

All nodes and relationships have arbitrary properties

72

Page 73: NoSQL Smackdown!

Query Model

- REST/JSON

- Java traversal API

- JTA/JTS XA

- Bindings in Clojure, Ruby,

Python, PHP, Scala, Grails

73

Page 74: NoSQL Smackdown!

Scale Idiom- Traditionally focused on

single-node performance

- Recent HA support

- Master/slave

- ZK master election

- Writeable slaves74

Page 75: NoSQL Smackdown!

Support

- Neo Technologies

Deployments

- Box.net

- Box.net

- ThoughtWorks

75

Page 76: NoSQL Smackdown!

Voldemort

76

Page 77: NoSQL Smackdown!

77

Page 78: NoSQL Smackdown!

Origin-

Internal datastore forBasho’s Salesforce.comapps

(Hey, it seemed like a good idea at the time!)

78

Page 79: NoSQL Smackdown!

License-

APL 2 for OSS version

Closed-source “Enterprise DS” version

79

Page 80: NoSQL Smackdown!

Implementation Language-

Erlang, C, SpiderMonkey

JavaScript VM

Data Model-

Key/value store, but

with buckets!

80

Page 81: NoSQL Smackdown!

ValueKey

That’s it.

81

Page 82: NoSQL Smackdown!

Bucket A

ValueKey

ValueKey

ValueKey

ValueKey

Bucket B

ValueKey

ValueKey

ValueKey

ValueKey

82

Page 83: NoSQL Smackdown!

Bucket A

Timname

Developeroccupation

061972birthday

Littletoncity

Bucket B

Aureliusname

Bishopoccupation

110354birthday

Hippocity

83

Page 84: NoSQL Smackdown!

Does it scale?

- Like a boss!

- No distinguished node

- Tunable consistency, replication

- Add nodes without taking the cluster down

84

Page 85: NoSQL Smackdown!

API- HTTP interface (slow,

but featureful)

- Protocol Buffers (a

performance beast)

85

Page 86: NoSQL Smackdown!

API

- Key CRUD

- MapReduce in

JavaScript

- Graph traversals

translate to MapReduce

86

Page 87: NoSQL Smackdown!

Deployments-

87

Page 89: NoSQL Smackdown!

Voldemort

89

Page 90: NoSQL Smackdown!

90

Page 91: NoSQL Smackdown!

Origin-

Salvatore Sanfilippo wrote it for his analytics

site, llogg.com

Open Source-Brand open source

License-

91

Page 92: NoSQL Smackdown!

Implementation Language

- ANSI C, baby

- Wants a POSIX OS

- 340kB download!

92

Page 93: NoSQL Smackdown!

Data Model

-Key/value store++

-Strings

-Hashes, Sets

-Lists

-Sorted Sets

93

Page 94: NoSQL Smackdown!

Does it scale?

- Vertically, sure

- Plus it’s really fast

- Master/slave options

- Technically a CA system

94

Page 95: NoSQL Smackdown!

API- Binary socket interface

- Commands look like assembly language

- Drivers for 22+ languages

95

Page 96: NoSQL Smackdown!

96

Page 97: NoSQL Smackdown!

97

Page 98: NoSQL Smackdown!

98

Page 99: NoSQL Smackdown!

99

Page 100: NoSQL Smackdown!

Deployments-

craigslist100

Page 101: NoSQL Smackdown!

Community/Support

- Officially sponsored by VMware

101

Page 102: NoSQL Smackdown!

Voldemort

102

Page 103: NoSQL Smackdown!

Do you need this?

Maybe.103

Page 104: NoSQL Smackdown!

104

Page 106: NoSQL Smackdown!

Further ReadingBrewer’s Conjecturehttp://www.podc.org/podc2000/

Proof of Brewer’s Conjecture (the “CAP Theorem”)http://bit.ly/cap-theorem-proof

Amazon Dynamohttp://bit.ly/amazon-dynamohttp://www.allthingsdistributed.com/2007/10/amazons_dynamo.html

Google BigTablehttp://bit.ly/big-table

The CAP Theorem Explainedhttp://www.julianbrowne.com/article/viewer/brewers-cap-theorem

Visualzing NoSQL Databases on the CAP Venn Diagramhttp://blog.nahurst.com/visual-guide-to-nosql-systems

Redishttp://redis.io/

Cassandrahttp://cassandra.apache.org

MongoDBhttp://mongodb.org

106

Page 107: NoSQL Smackdown!

Further ReadingCouchDBhttp://couchdb.apache.org

Riakhttp://basho.com

Voldemorthttp://project-voldemort.com

Neo4Jhttp://neo4j.org

Pretty Much Everything About NoSQLhttp://nosql.mypopescu.com

107

Page 108: NoSQL Smackdown!

Photo CreditsWrestlershttp://www.flickr.com/photos/stigster/4573851095

Desert Roadhttp://www.flickr.com/photos/kenlund/2439199670

Kindergarten Graduationhttp://www.flickr.com/photos/moyermk/3102262394

Clipboardhttp://www.flickr.com/photos/wheatfields/264890076

Winning Wrestlerhttp://www.flickr.com/photos/jrandallc/2259174414

108