NoSQL

Preview:

DESCRIPTION

Alternative NoSQL talk. Not as much theory. Not as much details on the architectural principles. More code samples.

Citation preview

NoSQL

Tuesday, March 22, 2011

The Software Crisis

Writing correct, understandable, and verifiable computer programs is difficult.

Edsger Dijkstra

Tuesday, March 22, 2011

“as long as there were no machines, programming was no problem at all;

when we had a few weak computers, programming became a mild problem,

and now we have gigantic computers, programming has become an equally gigantic problem.”

The Software Crisis

Tuesday, March 22, 2011

IMSThe Hierarchical

Database

(1966)

Vern WattsTuesday, March 22, 2011

“A Relational Model for Large

Shared Databanks”

(1970)

Ted CoddTuesday, March 22, 2011

“In striving to make every user happy, a technology can

actually leave the majority unhappy.”

“Every good idea is generalized to its level of

inapplicability.”

(Peter Principle)

Jim Gray

Tuesday, March 22, 2011

Tuesday, March 22, 2011

“NoSQL” Reintroduced(2008)

Eric Evans

Tuesday, March 22, 2011

Total Cost of Ownership

• The price of a license

• The price of support

• The price of hardware

Oracle +/- 47k / CPU?Software update / support +/- 10k?

Tuesday, March 22, 2011

Internet Scale

• Massive data collections

• Huge number of requests

• Coming from geographic areas across the globe

• 24/7

Tuesday, March 22, 2011

Availability

Tuesday, March 22, 2011

Data Models

Tuesday, March 22, 2011

Data Models

Tuesday, March 22, 2011

Column Oriented

key namedcolumn

namedcolumn

namedcolumn

namedcolumn

namedcolumn

Column Family ≈ Table

Empty cells are cheap (sparse

table)

Can grow “indefinitely”

Schemaless No secundary

indexes

Tuesday, March 22, 2011

DatastoreService  service  =  ...;Key  key  =  KeyFactory.createKey(family,  recordId);Entity  entity  =  service.get(key);entity.getProperty(“firstname”);entity.getProperty(“surname”);

BigTable

Tuesday, March 22, 2011

Column Oriented + Super Columns

key namedcolumn

namedcolumn

namedcolumn

namedcolumn

namedcolumn

namedcolumn

namedcolumn

namedcolumn

Super Columns

Tuesday, March 22, 2011

Key Value Store

•Schemaless•Versioning

10110110

Tuesday, March 22, 2011

Kyoto CabinetDB  db  =  new  DB(...);db.set(“ws103177”,                “Wilfred  Springer  <wilfredspringer@sun.com>”);db.get(“ws103177”);  

1 mln records in 0.9 s

Tuesday, March 22, 2011

Graph Database

SPARQL

Tuesday, March 22, 2011

Document Store

Improved Indexing

<persons><person><name>Wilfred</name><surname>Springer</person>…</persons>

[{ "Name" : "Wilfred", "Surname" : "Springer"}, …]

JSON

XML

Serverside Processing

Tuesday, March 22, 2011

Product

ASIN

CustomerReviews

DetailPageURL

ItemAttributes

LargeImage

URL

Width

Height

MediumImageURL

Width

Height

SalesRank

SimilarProducts

SmallImage

fs_author_terms

URL

Width

Height

ASIN

Title

Publisher

RelaseDate

Format

Binding

ProductGroup

Label

ProductName

Studio

PublicationDate

Title

Manufacterer

Amount

CurrencyCode

FormattedPrice

Type

Name

Content

Source

IsLinkSuppressed

TotalReviews ASIN

HelpfulVotes

Rating

Summary

Content

TotalVotes

Date

Reviewer

CustomerId

EditorialReviews

Review

Languages

ListPrice

Author

fs_browsenodes

fs_keywords_terms

fs_version

JSON

Tuesday, March 22, 2011

ItemAttributes

Publisher

RelaseDate

Format

Binding

ProductGroup

Label

ProductName

Studio

PublicationDate

Title

Manufacterer

Amount

CurrencyCode

FormattedPrice

Type

NameLanguages

ListPrice

Author

Tuesday, March 22, 2011

//  find  all  productsdb.products.find()  //  find  all  products

//  find  products  with  446  pages  (slow)db.products.find({“ItemAttributes.NumberOfPages”:  446})

//  find  products  with  446  pages  (fast)db.products.ensureIndex({"ItemAttributes.NumberOfPages":  1})  db.products.find({“ItemAttributes.NumberOfPages”:  446})

Various Queries

Product ItemAttributes NumberOfPages

Tuesday, March 22, 2011

db.products.find(    {"fs_keywords_terms":  "java"},    {"ItemAttributes.Title"  :  1})

Find books on “java”

ProductItemAttributes Title

fs_keywords_terms

!"#$!%&%'()*+,-$.!/$01234(/3((5/+/60(60**0!7

!8#,9*!%&%!8:;<-;%=>+?*,@%#A%BCDCE%=*+>A$%F$#,#>A&%;GC+,#+C9%HI#$*%J>G%;G>KGCLL*G@!

J@"?*MN>G$@",*GL@&%O!?*AA*,P!E!9!E!+C9D*G,!E!L#+PC*9!E!)!E!$>ACP>>!E!,+Q!E!#Q!E!@>+?*,@!E!)CDC!E!@*+>A$!E!*$#,#>A!E!QGC+,#+C9!E!KI#$*!E!QG>KGCLL*G@!R

Tuesday, March 22, 2011

db.products.find(    {"fs_keywords_terms":  "java"},    {"ItemAttributes.Title"  :  1}).sort({“SalesRank”:  -­‐1})

... with the worst sales rank

ProductItemAttributes Title

fs_keywords_terms

!"#$!%&%'()*+,-$.!/$01234(/3((5/+/60(60**0!7

!8#,9*!%&%!8:;<-;%=>+?*,@%#A%BCDCE%=*+>A$%F$#,#>A&%;GC+,#+C9%HI#$*%J>G%;G>KGCLL*G@!

J@"?*MN>G$@",*GL@&%O!?*AA*,P!E!9!E!+C9D*G,!E!L#+PC*9!E!)!E!$>ACP>>!E!,+Q!E!#Q!E!@>+?*,@!E!)CDC!E!@*+>A$!E!*$#,#>A!E!QGC+,#+C9!E!KI#$*!E!QG>KGCLL*G@!R

Tuesday, March 22, 2011

db.products.group({    key:  {"ItemAttributes.NumberOfPages":  true  },      cond:  {},      initial:  {count:  0},      reduce:  function(obj,prev)  {  prev.count++  }})

Count books per #pages

Tuesday, March 22, 2011

db.runCommand({mapreduce: "DenormAggCollection",query: { filter1: { '$in': [ 'A', 'B' ] }, filter2: 'C', filter3: { '$gt': 123 } },map: function() { emit( { d1: this.Dim1, d2: this.Dim2 }, { msum: this.measure1, recs: 1, mmin: this.measure1, mmax: this.measure2 < 100 ? this.measure2 : 0 } );},reduce: function(key, vals) { var ret = { msum: 0, recs: 0, mmin: 0, mmax: 0 }; for(var i = 0; i < vals.length; i++) { ret.msum += vals[i].msum; ret.recs += vals[i].recs; if(vals[i].mmin < ret.mmin) ret.mmin = vals[i].mmin; if((vals[i].mmax < 100) && (vals[i].mmax > ret.mmax)) ret.mmax = vals[i].mmax; } return ret; },finalize: function(key, val) { val.mavg = val.msum / val.recs; return val; },out: 'result1',verbose: true});db.result1. find({ mmin: { '$gt': 0 } }). sort({ recs: -1 }). skip(4). limit(8);

SELECT Dim1, Dim2, SUM(Measure1) AS MSum, COUNT(*) AS RecordCount, AVG(Measure2) AS MAvg, MIN(Measure1) AS MMin MAX(CASE WHEN Measure2 < 100 THEN Measure2 END) AS MMaxFROM DenormAggTableWHERE (Filter1 IN (’A’,’B’)) AND (Filter2 = ‘C’) AND (Filter3 > 123)GROUP BY Dim1, Dim2HAVING (MMin > 0)ORDER BY RecordCount DESCLIMIT 4, 8

!

"

#

$

%

!

&'

!

"

#

$

%

()*+,-./.01-230*2/4*5+123/6)-/,+55-./*+7/63/8-93/02/7:-/16,/;+2470*2</)-.+402=/7:-/30>-/*;/7:-/?*)802=/3-7@

A-63+)-3/1+37/B-/162+6559/6==)-=67-.@

C==)-=67-3/.-,-2.02=/*2/)-4*)./4*+273/1+37/?607/+2705/;02650>670*2@

A-63+)-3/462/+3-/,)*4-.+)65/5*=04@

D057-)3/:6E-/62/FGAHC470E-G-4*).I5**802=/3795-@

' C==)-=67-/;057-)02=/1+37/B-/6,,50-./7*/7:-/)-3+57/3-7</2*7/02/7:-/16,H)-.+4-@

& C34-2.02=J/!K/L-34-2.02=J/I!

G-E030*2/$</M)-67-./"N!NIN#IN'

G048/F3B*)2-</)048*3B*)2-@*)=

19OPQ A*2=*LRSQL Mongo

Tuesday, March 22, 2011

Availability versus

Consistency

Tuesday, March 22, 2011

CAP Theorem

Eric Brewer

Tuesday, March 22, 2011

Availability Consistency

PartitionTolerance

Pick two

Tuesday, March 22, 2011

Strong Consistency

AB

C

1

2

2

2

0 value = "foo"

value = "bar"

value = "bar" value = "bar"

value = "bar"

After the update, any subsequent access will return the updated value.

Tuesday, March 22, 2011

Weak Consistency

AB

C

1 value = "bar"

value = "bar" /"foo"

value = "bar" / "foo"

value = "bar" / "foo"

0 value = "foo"

>1

>1

>1

The system does not guarantee that at any given point in the future subsequent access will return the updated value

Tuesday, March 22, 2011

Eventual Consistency

If no updates are made to the object, eventually all accesses will return the last updated value.

AB

C

1 value = "bar"

value = "bar" value = "bar"

value = "bar"

0 value = "foo"

t

t

t

t ≥ 1

Tuesday, March 22, 2011

Session Consistency

Within the “session”, the system guarantees read-your-writes consistency

2 value = "foo"

Session 1

Session 2

AB

C

1 value = "bar"

0 value = "foo"

2 value = "bar"

Tuesday, March 22, 2011

Read-your-writes Consistency

Process A, after updating a data item always access the updated value and never sees an older value

AB

C

1 value = "bar"

0 value = "foo"

2 value = "bar"

Tuesday, March 22, 2011

Monotonic Read Consistency

If a process has seen a particular value for the object, any subsequent access will never return any previous values

AB

C

0 value = "foo"

1 value = "foo"

2 value = "foo"

4 value = "bar"

3

value = "bar"

Tuesday, March 22, 2011

Eventual Consistentency in RDBMS

Eventual consistency is not just a property of NoSQL Solutions

APrimary Backup replica

async

Log shipping

1

2

3

Tuesday, March 22, 2011

No Strong Consistency in

Face Of...

Tuesday, March 22, 2011

Network Partitions

Awrites new value

replicates new value

reads new value

Tuesday, March 22, 2011

Network Partitions

Awrites new value

replicates new value

reads new value

!

Tuesday, March 22, 2011

Partition Tolerance

Awrites new value

fails to replicate

new value

reads old value

Tuesday, March 22, 2011

Partition Intolerance

A

failing attempt to write a new

value

fails to replicate

new value

Tuesday, March 22, 2011

How to do better?

Tuesday, March 22, 2011

W = 3

N = 4R = 2

A

Proper Replication Factor

Tuesday, March 22, 2011

Optimizations

• Optimize read: R = 1, N = W

• Optimize write: W = 1, N = R

Tuesday, March 22, 2011

Consistent HashingKey KA

B

C

DE

F

G

H

Tuesday, March 22, 2011

W=3A

B

C

DE

F

G

H

Tuesday, March 22, 2011

No free rideYou need to consider giving up on:

•Avoiding redundancy

•Referential integrity

•Strong consistency

•Ad hoc queries

•Joins

•Ease of reporting

•...

Tuesday, March 22, 2011

NoSQL Today

Tuesday, March 22, 2011

Resources

http://nosqltapes.com/

http://nosql-database.org/

http://nosqlsummer.org/

Tuesday, March 22, 2011

Books

Tuesday, March 22, 2011

No SQLwspringer@xebia.com

Tuesday, March 22, 2011

Recommended