Upload
wilfred-springer
View
1.243
Download
2
Tags:
Embed Size (px)
DESCRIPTION
Alternative NoSQL talk. Not as much theory. Not as much details on the architectural principles. More code samples.
Citation preview
NoSQL
Tuesday, March 22, 2011
The Software Crisis
Writing correct, understandable, and verifiable computer programs is difficult.
Edsger Dijkstra
Tuesday, March 22, 2011
“as long as there were no machines, programming was no problem at all;
when we had a few weak computers, programming became a mild problem,
and now we have gigantic computers, programming has become an equally gigantic problem.”
The Software Crisis
Tuesday, March 22, 2011
IMSThe Hierarchical
Database
(1966)
Vern WattsTuesday, March 22, 2011
“A Relational Model for Large
Shared Databanks”
(1970)
Ted CoddTuesday, March 22, 2011
“In striving to make every user happy, a technology can
actually leave the majority unhappy.”
“Every good idea is generalized to its level of
inapplicability.”
(Peter Principle)
Jim Gray
Tuesday, March 22, 2011
Tuesday, March 22, 2011
“NoSQL” Reintroduced(2008)
Eric Evans
Tuesday, March 22, 2011
Total Cost of Ownership
• The price of a license
• The price of support
• The price of hardware
Oracle +/- 47k / CPU?Software update / support +/- 10k?
Tuesday, March 22, 2011
Internet Scale
• Massive data collections
• Huge number of requests
• Coming from geographic areas across the globe
• 24/7
Tuesday, March 22, 2011
Availability
Tuesday, March 22, 2011
Data Models
Tuesday, March 22, 2011
Data Models
Tuesday, March 22, 2011
Column Oriented
…
key namedcolumn
namedcolumn
namedcolumn
namedcolumn
namedcolumn
Column Family ≈ Table
Empty cells are cheap (sparse
table)
Can grow “indefinitely”
Schemaless No secundary
indexes
Tuesday, March 22, 2011
DatastoreService service = ...;Key key = KeyFactory.createKey(family, recordId);Entity entity = service.get(key);entity.getProperty(“firstname”);entity.getProperty(“surname”);
BigTable
Tuesday, March 22, 2011
Column Oriented + Super Columns
…
key namedcolumn
namedcolumn
namedcolumn
namedcolumn
namedcolumn
…
namedcolumn
namedcolumn
namedcolumn
…
Super Columns
Tuesday, March 22, 2011
Key Value Store
•Schemaless•Versioning
10110110
Tuesday, March 22, 2011
Kyoto CabinetDB db = new DB(...);db.set(“ws103177”, “Wilfred Springer <[email protected]>”);db.get(“ws103177”);
1 mln records in 0.9 s
Tuesday, March 22, 2011
Graph Database
SPARQL
Tuesday, March 22, 2011
Document Store
Improved Indexing
<persons><person><name>Wilfred</name><surname>Springer</person>…</persons>
[{ "Name" : "Wilfred", "Surname" : "Springer"}, …]
JSON
XML
Serverside Processing
Tuesday, March 22, 2011
Product
ASIN
CustomerReviews
DetailPageURL
ItemAttributes
LargeImage
URL
Width
Height
MediumImageURL
Width
Height
SalesRank
SimilarProducts
SmallImage
fs_author_terms
URL
Width
Height
ASIN
Title
Publisher
RelaseDate
Format
Binding
ProductGroup
Label
ProductName
Studio
PublicationDate
Title
Manufacterer
Amount
CurrencyCode
FormattedPrice
Type
Name
Content
Source
IsLinkSuppressed
TotalReviews ASIN
HelpfulVotes
Rating
Summary
Content
TotalVotes
Date
Reviewer
CustomerId
EditorialReviews
Review
Languages
ListPrice
Author
fs_browsenodes
fs_keywords_terms
fs_version
JSON
Tuesday, March 22, 2011
ItemAttributes
Publisher
RelaseDate
Format
Binding
ProductGroup
Label
ProductName
Studio
PublicationDate
Title
Manufacterer
Amount
CurrencyCode
FormattedPrice
Type
NameLanguages
ListPrice
Author
Tuesday, March 22, 2011
// find all productsdb.products.find() // find all products
// find products with 446 pages (slow)db.products.find({“ItemAttributes.NumberOfPages”: 446})
// find products with 446 pages (fast)db.products.ensureIndex({"ItemAttributes.NumberOfPages": 1}) db.products.find({“ItemAttributes.NumberOfPages”: 446})
Various Queries
Product ItemAttributes NumberOfPages
Tuesday, March 22, 2011
db.products.find( {"fs_keywords_terms": "java"}, {"ItemAttributes.Title" : 1})
Find books on “java”
ProductItemAttributes Title
fs_keywords_terms
!"#$!%&%'()*+,-$.!/$01234(/3((5/+/60(60**0!7
!8#,9*!%&%!8:;<-;%=>+?*,@%#A%BCDCE%=*+>A$%F$#,#>A&%;GC+,#+C9%HI#$*%J>G%;G>KGCLL*G@!
J@"?*MN>G$@",*GL@&%O!?*AA*,P!E!9!E!+C9D*G,!E!L#+PC*9!E!)!E!$>ACP>>!E!,+Q!E!#Q!E!@>+?*,@!E!)CDC!E!@*+>A$!E!*$#,#>A!E!QGC+,#+C9!E!KI#$*!E!QG>KGCLL*G@!R
Tuesday, March 22, 2011
db.products.find( {"fs_keywords_terms": "java"}, {"ItemAttributes.Title" : 1}).sort({“SalesRank”: -‐1})
... with the worst sales rank
ProductItemAttributes Title
fs_keywords_terms
!"#$!%&%'()*+,-$.!/$01234(/3((5/+/60(60**0!7
!8#,9*!%&%!8:;<-;%=>+?*,@%#A%BCDCE%=*+>A$%F$#,#>A&%;GC+,#+C9%HI#$*%J>G%;G>KGCLL*G@!
J@"?*MN>G$@",*GL@&%O!?*AA*,P!E!9!E!+C9D*G,!E!L#+PC*9!E!)!E!$>ACP>>!E!,+Q!E!#Q!E!@>+?*,@!E!)CDC!E!@*+>A$!E!*$#,#>A!E!QGC+,#+C9!E!KI#$*!E!QG>KGCLL*G@!R
Tuesday, March 22, 2011
db.products.group({ key: {"ItemAttributes.NumberOfPages": true }, cond: {}, initial: {count: 0}, reduce: function(obj,prev) { prev.count++ }})
Count books per #pages
Tuesday, March 22, 2011
db.runCommand({mapreduce: "DenormAggCollection",query: { filter1: { '$in': [ 'A', 'B' ] }, filter2: 'C', filter3: { '$gt': 123 } },map: function() { emit( { d1: this.Dim1, d2: this.Dim2 }, { msum: this.measure1, recs: 1, mmin: this.measure1, mmax: this.measure2 < 100 ? this.measure2 : 0 } );},reduce: function(key, vals) { var ret = { msum: 0, recs: 0, mmin: 0, mmax: 0 }; for(var i = 0; i < vals.length; i++) { ret.msum += vals[i].msum; ret.recs += vals[i].recs; if(vals[i].mmin < ret.mmin) ret.mmin = vals[i].mmin; if((vals[i].mmax < 100) && (vals[i].mmax > ret.mmax)) ret.mmax = vals[i].mmax; } return ret; },finalize: function(key, val) { val.mavg = val.msum / val.recs; return val; },out: 'result1',verbose: true});db.result1. find({ mmin: { '$gt': 0 } }). sort({ recs: -1 }). skip(4). limit(8);
SELECT Dim1, Dim2, SUM(Measure1) AS MSum, COUNT(*) AS RecordCount, AVG(Measure2) AS MAvg, MIN(Measure1) AS MMin MAX(CASE WHEN Measure2 < 100 THEN Measure2 END) AS MMaxFROM DenormAggTableWHERE (Filter1 IN (’A’,’B’)) AND (Filter2 = ‘C’) AND (Filter3 > 123)GROUP BY Dim1, Dim2HAVING (MMin > 0)ORDER BY RecordCount DESCLIMIT 4, 8
!
"
#
$
%
!
&'
!
"
#
$
%
()*+,-./.01-230*2/4*5+123/6)-/,+55-./*+7/63/8-93/02/7:-/16,/;+2470*2</)-.+402=/7:-/30>-/*;/7:-/?*)802=/3-7@
A-63+)-3/1+37/B-/162+6559/6==)-=67-.@
C==)-=67-3/.-,-2.02=/*2/)-4*)./4*+273/1+37/?607/+2705/;02650>670*2@
A-63+)-3/462/+3-/,)*4-.+)65/5*=04@
D057-)3/:6E-/62/FGAHC470E-G-4*).I5**802=/3795-@
' C==)-=67-/;057-)02=/1+37/B-/6,,50-./7*/7:-/)-3+57/3-7</2*7/02/7:-/16,H)-.+4-@
& C34-2.02=J/!K/L-34-2.02=J/I!
G-E030*2/$</M)-67-./"N!NIN#IN'
G048/F3B*)2-</)048*3B*)2-@*)=
19OPQ A*2=*LRSQL Mongo
Tuesday, March 22, 2011
Availability versus
Consistency
Tuesday, March 22, 2011
CAP Theorem
Eric Brewer
Tuesday, March 22, 2011
Availability Consistency
PartitionTolerance
Pick two
Tuesday, March 22, 2011
Strong Consistency
AB
C
1
2
2
2
0 value = "foo"
value = "bar"
value = "bar" value = "bar"
value = "bar"
After the update, any subsequent access will return the updated value.
Tuesday, March 22, 2011
Weak Consistency
AB
C
1 value = "bar"
value = "bar" /"foo"
value = "bar" / "foo"
value = "bar" / "foo"
0 value = "foo"
>1
>1
>1
The system does not guarantee that at any given point in the future subsequent access will return the updated value
Tuesday, March 22, 2011
Eventual Consistency
If no updates are made to the object, eventually all accesses will return the last updated value.
AB
C
1 value = "bar"
value = "bar" value = "bar"
value = "bar"
0 value = "foo"
t
t
t
t ≥ 1
Tuesday, March 22, 2011
Session Consistency
Within the “session”, the system guarantees read-your-writes consistency
2 value = "foo"
Session 1
Session 2
AB
C
1 value = "bar"
0 value = "foo"
2 value = "bar"
Tuesday, March 22, 2011
Read-your-writes Consistency
Process A, after updating a data item always access the updated value and never sees an older value
AB
C
1 value = "bar"
0 value = "foo"
2 value = "bar"
Tuesday, March 22, 2011
Monotonic Read Consistency
If a process has seen a particular value for the object, any subsequent access will never return any previous values
AB
C
0 value = "foo"
1 value = "foo"
2 value = "foo"
4 value = "bar"
3
value = "bar"
Tuesday, March 22, 2011
Eventual Consistentency in RDBMS
Eventual consistency is not just a property of NoSQL Solutions
APrimary Backup replica
async
Log shipping
1
2
3
Tuesday, March 22, 2011
No Strong Consistency in
Face Of...
Tuesday, March 22, 2011
Network Partitions
Awrites new value
replicates new value
reads new value
Tuesday, March 22, 2011
Network Partitions
Awrites new value
replicates new value
reads new value
!
Tuesday, March 22, 2011
Partition Tolerance
Awrites new value
fails to replicate
new value
reads old value
Tuesday, March 22, 2011
Partition Intolerance
A
failing attempt to write a new
value
fails to replicate
new value
Tuesday, March 22, 2011
How to do better?
Tuesday, March 22, 2011
W = 3
N = 4R = 2
A
Proper Replication Factor
Tuesday, March 22, 2011
Optimizations
• Optimize read: R = 1, N = W
• Optimize write: W = 1, N = R
Tuesday, March 22, 2011
Consistent HashingKey KA
B
C
DE
F
G
H
Tuesday, March 22, 2011
W=3A
B
C
DE
F
G
H
Tuesday, March 22, 2011
No free rideYou need to consider giving up on:
•Avoiding redundancy
•Referential integrity
•Strong consistency
•Ad hoc queries
•Joins
•Ease of reporting
•...
Tuesday, March 22, 2011
NoSQL Today
Tuesday, March 22, 2011
Resources
http://nosqltapes.com/
http://nosql-database.org/
http://nosqlsummer.org/
Tuesday, March 22, 2011
Books
Tuesday, March 22, 2011