52
NoSQL Tuesday, March 22, 2011

NoSQL

Embed Size (px)

DESCRIPTION

Alternative NoSQL talk. Not as much theory. Not as much details on the architectural principles. More code samples.

Citation preview

Page 1: NoSQL

NoSQL

Tuesday, March 22, 2011

Page 2: NoSQL

The Software Crisis

Writing correct, understandable, and verifiable computer programs is difficult.

Edsger Dijkstra

Tuesday, March 22, 2011

Page 3: NoSQL

“as long as there were no machines, programming was no problem at all;

when we had a few weak computers, programming became a mild problem,

and now we have gigantic computers, programming has become an equally gigantic problem.”

The Software Crisis

Tuesday, March 22, 2011

Page 4: NoSQL

IMSThe Hierarchical

Database

(1966)

Vern WattsTuesday, March 22, 2011

Page 5: NoSQL

“A Relational Model for Large

Shared Databanks”

(1970)

Ted CoddTuesday, March 22, 2011

Page 6: NoSQL

“In striving to make every user happy, a technology can

actually leave the majority unhappy.”

“Every good idea is generalized to its level of

inapplicability.”

(Peter Principle)

Jim Gray

Tuesday, March 22, 2011

Page 7: NoSQL

Tuesday, March 22, 2011

Page 8: NoSQL

“NoSQL” Reintroduced(2008)

Eric Evans

Tuesday, March 22, 2011

Page 9: NoSQL

Total Cost of Ownership

• The price of a license

• The price of support

• The price of hardware

Oracle +/- 47k / CPU?Software update / support +/- 10k?

Tuesday, March 22, 2011

Page 10: NoSQL

Internet Scale

• Massive data collections

• Huge number of requests

• Coming from geographic areas across the globe

• 24/7

Tuesday, March 22, 2011

Page 11: NoSQL

Availability

Tuesday, March 22, 2011

Page 12: NoSQL

Data Models

Tuesday, March 22, 2011

Page 13: NoSQL

Data Models

Tuesday, March 22, 2011

Page 14: NoSQL

Column Oriented

key namedcolumn

namedcolumn

namedcolumn

namedcolumn

namedcolumn

Column Family ≈ Table

Empty cells are cheap (sparse

table)

Can grow “indefinitely”

Schemaless No secundary

indexes

Tuesday, March 22, 2011

Page 15: NoSQL

DatastoreService  service  =  ...;Key  key  =  KeyFactory.createKey(family,  recordId);Entity  entity  =  service.get(key);entity.getProperty(“firstname”);entity.getProperty(“surname”);

BigTable

Tuesday, March 22, 2011

Page 16: NoSQL

Column Oriented + Super Columns

key namedcolumn

namedcolumn

namedcolumn

namedcolumn

namedcolumn

namedcolumn

namedcolumn

namedcolumn

Super Columns

Tuesday, March 22, 2011

Page 17: NoSQL

Key Value Store

•Schemaless•Versioning

10110110

Tuesday, March 22, 2011

Page 18: NoSQL

Kyoto CabinetDB  db  =  new  DB(...);db.set(“ws103177”,                “Wilfred  Springer  <[email protected]>”);db.get(“ws103177”);  

1 mln records in 0.9 s

Tuesday, March 22, 2011

Page 19: NoSQL

Graph Database

SPARQL

Tuesday, March 22, 2011

Page 20: NoSQL

Document Store

Improved Indexing

<persons><person><name>Wilfred</name><surname>Springer</person>…</persons>

[{ "Name" : "Wilfred", "Surname" : "Springer"}, …]

JSON

XML

Serverside Processing

Tuesday, March 22, 2011

Page 21: NoSQL

Product

ASIN

CustomerReviews

DetailPageURL

ItemAttributes

LargeImage

URL

Width

Height

MediumImageURL

Width

Height

SalesRank

SimilarProducts

SmallImage

fs_author_terms

URL

Width

Height

ASIN

Title

Publisher

RelaseDate

Format

Binding

ProductGroup

Label

ProductName

Studio

PublicationDate

Title

Manufacterer

Amount

CurrencyCode

FormattedPrice

Type

Name

Content

Source

IsLinkSuppressed

TotalReviews ASIN

HelpfulVotes

Rating

Summary

Content

TotalVotes

Date

Reviewer

CustomerId

EditorialReviews

Review

Languages

ListPrice

Author

fs_browsenodes

fs_keywords_terms

fs_version

JSON

Tuesday, March 22, 2011

Page 22: NoSQL

ItemAttributes

Publisher

RelaseDate

Format

Binding

ProductGroup

Label

ProductName

Studio

PublicationDate

Title

Manufacterer

Amount

CurrencyCode

FormattedPrice

Type

NameLanguages

ListPrice

Author

Tuesday, March 22, 2011

Page 23: NoSQL

//  find  all  productsdb.products.find()  //  find  all  products

//  find  products  with  446  pages  (slow)db.products.find({“ItemAttributes.NumberOfPages”:  446})

//  find  products  with  446  pages  (fast)db.products.ensureIndex({"ItemAttributes.NumberOfPages":  1})  db.products.find({“ItemAttributes.NumberOfPages”:  446})

Various Queries

Product ItemAttributes NumberOfPages

Tuesday, March 22, 2011

Page 24: NoSQL

db.products.find(    {"fs_keywords_terms":  "java"},    {"ItemAttributes.Title"  :  1})

Find books on “java”

ProductItemAttributes Title

fs_keywords_terms

!"#$!%&%'()*+,-$.!/$01234(/3((5/+/60(60**0!7

!8#,9*!%&%!8:;<-;%=>+?*,@%#A%BCDCE%=*+>A$%F$#,#>A&%;GC+,#+C9%HI#$*%J>G%;G>KGCLL*G@!

J@"?*MN>G$@",*GL@&%O!?*AA*,P!E!9!E!+C9D*G,!E!L#+PC*9!E!)!E!$>ACP>>!E!,+Q!E!#Q!E!@>+?*,@!E!)CDC!E!@*+>A$!E!*$#,#>A!E!QGC+,#+C9!E!KI#$*!E!QG>KGCLL*G@!R

Tuesday, March 22, 2011

Page 25: NoSQL

db.products.find(    {"fs_keywords_terms":  "java"},    {"ItemAttributes.Title"  :  1}).sort({“SalesRank”:  -­‐1})

... with the worst sales rank

ProductItemAttributes Title

fs_keywords_terms

!"#$!%&%'()*+,-$.!/$01234(/3((5/+/60(60**0!7

!8#,9*!%&%!8:;<-;%=>+?*,@%#A%BCDCE%=*+>A$%F$#,#>A&%;GC+,#+C9%HI#$*%J>G%;G>KGCLL*G@!

J@"?*MN>G$@",*GL@&%O!?*AA*,P!E!9!E!+C9D*G,!E!L#+PC*9!E!)!E!$>ACP>>!E!,+Q!E!#Q!E!@>+?*,@!E!)CDC!E!@*+>A$!E!*$#,#>A!E!QGC+,#+C9!E!KI#$*!E!QG>KGCLL*G@!R

Tuesday, March 22, 2011

Page 26: NoSQL

db.products.group({    key:  {"ItemAttributes.NumberOfPages":  true  },      cond:  {},      initial:  {count:  0},      reduce:  function(obj,prev)  {  prev.count++  }})

Count books per #pages

Tuesday, March 22, 2011

Page 27: NoSQL

db.runCommand({mapreduce: "DenormAggCollection",query: { filter1: { '$in': [ 'A', 'B' ] }, filter2: 'C', filter3: { '$gt': 123 } },map: function() { emit( { d1: this.Dim1, d2: this.Dim2 }, { msum: this.measure1, recs: 1, mmin: this.measure1, mmax: this.measure2 < 100 ? this.measure2 : 0 } );},reduce: function(key, vals) { var ret = { msum: 0, recs: 0, mmin: 0, mmax: 0 }; for(var i = 0; i < vals.length; i++) { ret.msum += vals[i].msum; ret.recs += vals[i].recs; if(vals[i].mmin < ret.mmin) ret.mmin = vals[i].mmin; if((vals[i].mmax < 100) && (vals[i].mmax > ret.mmax)) ret.mmax = vals[i].mmax; } return ret; },finalize: function(key, val) { val.mavg = val.msum / val.recs; return val; },out: 'result1',verbose: true});db.result1. find({ mmin: { '$gt': 0 } }). sort({ recs: -1 }). skip(4). limit(8);

SELECT Dim1, Dim2, SUM(Measure1) AS MSum, COUNT(*) AS RecordCount, AVG(Measure2) AS MAvg, MIN(Measure1) AS MMin MAX(CASE WHEN Measure2 < 100 THEN Measure2 END) AS MMaxFROM DenormAggTableWHERE (Filter1 IN (’A’,’B’)) AND (Filter2 = ‘C’) AND (Filter3 > 123)GROUP BY Dim1, Dim2HAVING (MMin > 0)ORDER BY RecordCount DESCLIMIT 4, 8

!

"

#

$

%

!

&'

!

"

#

$

%

()*+,-./.01-230*2/4*5+123/6)-/,+55-./*+7/63/8-93/02/7:-/16,/;+2470*2</)-.+402=/7:-/30>-/*;/7:-/?*)802=/3-7@

A-63+)-3/1+37/B-/162+6559/6==)-=67-.@

C==)-=67-3/.-,-2.02=/*2/)-4*)./4*+273/1+37/?607/+2705/;02650>670*2@

A-63+)-3/462/+3-/,)*4-.+)65/5*=04@

D057-)3/:6E-/62/FGAHC470E-G-4*).I5**802=/3795-@

' C==)-=67-/;057-)02=/1+37/B-/6,,50-./7*/7:-/)-3+57/3-7</2*7/02/7:-/16,H)-.+4-@

& C34-2.02=J/!K/L-34-2.02=J/I!

G-E030*2/$</M)-67-./"N!NIN#IN'

G048/F3B*)2-</)048*3B*)2-@*)=

19OPQ A*2=*LRSQL Mongo

Tuesday, March 22, 2011

Page 28: NoSQL

Availability versus

Consistency

Tuesday, March 22, 2011

Page 29: NoSQL

CAP Theorem

Eric Brewer

Tuesday, March 22, 2011

Page 30: NoSQL

Availability Consistency

PartitionTolerance

Pick two

Tuesday, March 22, 2011

Page 31: NoSQL

Strong Consistency

AB

C

1

2

2

2

0 value = "foo"

value = "bar"

value = "bar" value = "bar"

value = "bar"

After the update, any subsequent access will return the updated value.

Tuesday, March 22, 2011

Page 32: NoSQL

Weak Consistency

AB

C

1 value = "bar"

value = "bar" /"foo"

value = "bar" / "foo"

value = "bar" / "foo"

0 value = "foo"

>1

>1

>1

The system does not guarantee that at any given point in the future subsequent access will return the updated value

Tuesday, March 22, 2011

Page 33: NoSQL

Eventual Consistency

If no updates are made to the object, eventually all accesses will return the last updated value.

AB

C

1 value = "bar"

value = "bar" value = "bar"

value = "bar"

0 value = "foo"

t

t

t

t ≥ 1

Tuesday, March 22, 2011

Page 34: NoSQL

Session Consistency

Within the “session”, the system guarantees read-your-writes consistency

2 value = "foo"

Session 1

Session 2

AB

C

1 value = "bar"

0 value = "foo"

2 value = "bar"

Tuesday, March 22, 2011

Page 35: NoSQL

Read-your-writes Consistency

Process A, after updating a data item always access the updated value and never sees an older value

AB

C

1 value = "bar"

0 value = "foo"

2 value = "bar"

Tuesday, March 22, 2011

Page 36: NoSQL

Monotonic Read Consistency

If a process has seen a particular value for the object, any subsequent access will never return any previous values

AB

C

0 value = "foo"

1 value = "foo"

2 value = "foo"

4 value = "bar"

3

value = "bar"

Tuesday, March 22, 2011

Page 37: NoSQL

Eventual Consistentency in RDBMS

Eventual consistency is not just a property of NoSQL Solutions

APrimary Backup replica

async

Log shipping

1

2

3

Tuesday, March 22, 2011

Page 38: NoSQL

No Strong Consistency in

Face Of...

Tuesday, March 22, 2011

Page 39: NoSQL

Network Partitions

Awrites new value

replicates new value

reads new value

Tuesday, March 22, 2011

Page 40: NoSQL

Network Partitions

Awrites new value

replicates new value

reads new value

!

Tuesday, March 22, 2011

Page 41: NoSQL

Partition Tolerance

Awrites new value

fails to replicate

new value

reads old value

Tuesday, March 22, 2011

Page 42: NoSQL

Partition Intolerance

A

failing attempt to write a new

value

fails to replicate

new value

Tuesday, March 22, 2011

Page 43: NoSQL

How to do better?

Tuesday, March 22, 2011

Page 44: NoSQL

W = 3

N = 4R = 2

A

Proper Replication Factor

Tuesday, March 22, 2011

Page 45: NoSQL

Optimizations

• Optimize read: R = 1, N = W

• Optimize write: W = 1, N = R

Tuesday, March 22, 2011

Page 46: NoSQL

Consistent HashingKey KA

B

C

DE

F

G

H

Tuesday, March 22, 2011

Page 47: NoSQL

W=3A

B

C

DE

F

G

H

Tuesday, March 22, 2011

Page 48: NoSQL

No free rideYou need to consider giving up on:

•Avoiding redundancy

•Referential integrity

•Strong consistency

•Ad hoc queries

•Joins

•Ease of reporting

•...

Tuesday, March 22, 2011

Page 49: NoSQL

NoSQL Today

Tuesday, March 22, 2011

Page 50: NoSQL

Resources

http://nosqltapes.com/

http://nosql-database.org/

http://nosqlsummer.org/

Tuesday, March 22, 2011

Page 51: NoSQL

Books

Tuesday, March 22, 2011