Upload
others
View
21
Download
0
Embed Size (px)
Citation preview
Objectives: describe a realistic NoSQL system, and compare it in detail with SQL
Describe the tradeoffs between SQL and
NoSQL
Describe the specific advantages and
disadvantages of MongoDB
Outcomes: Students should be able to:
Key/value store
Structured blobs (aka 'documents')
Concepts of NoSQL integrity.
Embedded documents.
References.
How to cope with lack of joins.
Key concepts:
Objectives, outcomes, and key conceptsMonday, April 6, 2015 10:34 AM
MongoDB Page 1
put(domain, key, value)
value = get(domain, key)
A "domain, key → value" storage system
with abstract methods:
An "eventual consistency" model of write.
The abstract pattern of a NoSQL system includes
"domain" and "key" are typically strings.
"domain" is usually set by you and is
analogous to a "table".
"key" may well be automatically generated.
Some things don't change:
The structure of the value.
Built-in support for value queries, indexing,
and
editing.
But NoSQL systems vary widely in:
Recall from last timeMonday, April 6, 2015 10:36 AM
MongoDB Page 2
NoSQL Database value structure
and/or metaphor
consistency
BigTable table row strong
XMLdb XML document strong
Google AppEngine Java object
serialization
strong
MongoDB JavaScript object
serialization
eventual
CouchDB JavaScript object
serialization
eventual
Neo4j JavaScript object
serialization
strong
Some examples of how values are interpretedMonday, April 6, 2015 10:48 AM
MongoDB Page 3
A serialization is any depiction of a memory
object (in java, javascript, c++, .... ) that can
be written to disk, read back, and
reconstructed.
string = Serialize(object)
object = Unserialize(string)
Formal operations:
The whole point:
Serialize(Unserialize(object)) = object,
Unserialize(Serialize(string)) = string
In JavaScript
a = { 'b': 1, 'c': 2 }
then the serialization of a is "{ 'b': 1, 'c': 2 }"
with a robust serialization-
that can be transmitted over the network
and reconstructed with complete fidelity.
-
Whole idea of JavaScript Object Notation
(JSON) is to create a subset of javascript
objects
have no circular references. -
are pure trees from a structural standpoint. -
JSON objects
and thus their pprints are their
serializations(!).
What is a serialization?Monday, April 6, 2015 6:08 PM
MongoDB Page 4
It started as a classical "XML db" in which
documents are XML and keys point to
documents.
Values are JavaScript Object Notation
(JSON) objects (with limits!) CRUD and queries are done in
JavaScript.
It evolved into a JavaScript database in
which
Thus, the language in the documentation
can be quite confusing
MongoDB started as one thing and ended up as another.
SQL Abstract
NoSQL
MongoDB
table domain collection
key key id
row value document
colum
n
(no default
definition)
JavaScript object
MongoDBMonday, April 6, 2015 10:40 AM
MongoDB Page 5
x = [ 'a', 'b', 3] // an array
x[0] is 'a'
x[1] is 'b'
x[2] is 3
y = { goo: 'ber', humans: 10 } // a dictionary
y['goo'] (also y.goo) is 'ber'
y['humans'] (also y.humans) is 10
Indexes of dicts must be strings.
Otherwise, nested structures are possible, e.g.,
a = { // a dict of arrays
name: 'Couch',
addr: ['1600 Pennsylvania Avenue', 'Washington', 'DC'],
phone: ['555-1212', '411']
}
After this,
a.addr[1] is 'Washington'
A crash course in JavaScript objectsMonday, April 4, 2016 2:30 PM
MongoDB Page 6
CRUD = Create/Retrieve/Update/Delete
This is the minimal set of primitive operations
that make something a data store.
Create:
From <https://docs.mongodb.org/manual/core/write-operations-introduction/>
Retrieve:
From <https://docs.mongodb.org/manual/core/read-operations-introduction/>
Update:
CRUD in MongoDBMonday, April 4, 2016 2:42 PM
MongoDB Page 7
Update:
From <https://docs.mongodb.org/manual/core/write-operations-introduction/>
Delete:
From <https://docs.mongodb.org/manual/core/write-operations-introduction/>
MongoDB Page 8
SQL MongoDB
INSERT INTO users(user_id, age,status)VALUES ("bcd001", 45, "A")
status: "A" }
db.users.insert( { user_id: "bcd001", age: 45,
)
UPDATE users
SET status = "C"
WHERE age > 25
db.users.update(
{ age: { $gt: 25 } },
{ $set: { status: "C" } },
{ multi: true }
)
DELETE FROM usersWHERE status = "D"
db.users.remove( { status: "D" } )
SQL MongoDB
SELECT user_id, status
FROM users
WHERE status = "A"
db.users.find(
{ status: "A" },
{ user_id: 1, status: 1, _id: 0 }
)
SELECT COUNT(*)FROM usersWHERE age > 30
db.users.find( { age: { $gt: 30 } } ).count()
EXPLAIN SELECT *FROM users
WHERE status = "A"
db.users.find( { status: "A" }
).explain()`
SQL MongoDB
CREATE INDEX foo ON users(status) db.users.createIndex( { status: 1 } )
http://docs.mongodb.org/manual/reference/sql-comparison/
default is BTREE clustered.
Indexing works exactly like postgresql:
seq versus indexed scan.
Planning works the same:
A MongoDB Rosetta StoneMonday, April 6, 2015 1:55 PM
MongoDB Page 9
You might ask how one implements common data structures, e.g., trees....
left: {name : 'Anselm'},
{ name: 'Alva',
left: {name: 'Ben'}
}
right: {name: 'Donna',
}
And graphs:
{ name: 'Alva',
friend-of: 'Anselm' }
{ name: 'Anselm',
friend-of: 'Donna' }
....
JavaScript data structures Monday, April 4, 2016 2:38 PM
MongoDB Page 10
... "For instance, on the planet Earth, man had always assumed that he was more
intelligent than dolphins because he had achieved so much -- the whell, New York,
wars, and so on -- while all the dolphins had ever done was muck about in the water
having a good time. But conversely, the dolphins had always believed that the were
far more intelligent than man -- for precisely the same reasons.
Douglas Adams, "Hitchikers' Guide to the
Galaxy"
Dolphins versus humansMonday, April 6, 2015 4:35 PM
MongoDB Page 11
With MongoDB, the problem is that in many ways, the good
news is the bad news.
Good news Bad news
You don't have to define
structure of collections in
advance
... but there is no concept of
structural consistency of
collections.
Values are JavaScript objects ... and unlimited in
structure, with all the
deleterious effects of that.
Queries are JavaScript objects ... and it is particularly ugly
and non-standard
JavaScript.
You only have to know
JavaScript.
... but what if your
application is not in
JavaScript?
You don't have to deal with
joins,
... because you're expected
to do without them
completely and use object
polymorphism instead.
You have to put all consistency
logic into the application
... and that is not good.
No triggers to deal with ... ditto
Good news and bad newsMonday, April 6, 2015 10:46 AM
MongoDB Page 12
via the interactive Mongo Shell "mongo"via a REST interface (using similar notation)
ssh comp115-05
mongo
> use couch
switched to db couch
> j = { name : "mongo" }
{ "name" : "mongo" }
> k = { x : 3 }
{ "x" : 3 }
> db.testData.insert( j )
> db.testData.insert( k )
> db.testData.find()
{ "_id" : ObjectId("5522c510aa87fa5f85e52cba"), "name" : "mongo" }
{ "_id" : ObjectId("5522c511aa87fa5f85e52cbb"), "x" : 3 }
The two records have different schemas.
MongoDB doesn't mind that.... your
application might!
Note that:
Interacting with MongoDBMonday, April 6, 2015 2:11 PM
MongoDB Page 13
Databases and collections spring into
existence automatically after you write to
them.
So, you had better not misspell anything...!
This goes double for column names.
A rather confusing property:
This translates to extreme brittleness of
application.
Caveat: always encapsulate writes in a
function so that you can't misspell keys!
Interacting with MongoDBMonday, April 6, 2015 1:27 PM
MongoDB Page 14
The SQL way to model one-many relationships is via
joins.
The MongoDB way is via so-called "embedded
documents".
For example, the relational schema: {
_id: "joe",
name: "Joe Bookreader"
}
{
patron_id: "joe",
street: "123 Fake Street",
city: "Faketon",
state: "MA",
zip: "12345"
}
{
patron_id: "joe",
street: "1 Some Other Street",
city: "Boston",
state: "MA",
zip: "12345"
}
could be modeled via embedding as:
{
_id: "joe",
name: "Joe Bookreader",
addresses: [
Joins versus embedded documentsMonday, April 6, 2015 2:09 PM
MongoDB Page 15
_id: "joe",
name: "Joe Bookreader",
addresses: [
{
street: "123 Fake Street",
city: "Faketon",
state: "MA",
zip: "12345"
},
{
street: "1 Some Other
Street",
city: "Boston",
state: "MA",
zip: "12345"
}
]
}
http://docs.mongodb.org/manual/tutorial/model-embedded-one-to-many-relationships-between-documents/
But this only works for trees. Otherwise one repeats
data.
MongoDB Page 16
Another way of doing things is to embed references instead
of documents (typically to the automatically indexed _id
field).
{
title: "MongoDB: The Definitive Guide",
author: [ "Kristina Chodorow", "Mike Dirolf" ],
published_date: ISODate("2010-09-24"),
pages: 216,
language: "English",
publisher: {
name: "O'Reilly Media",
founded: 1980,
location: "CA"
}
}
{
title: "50 Tips and Tricks for MongoDB Developer",
author: "Kristina Chodorow",
published_date: ISODate("2011-05-06"),
pages: 68,
language: "English",
publisher: {
name: "O'Reilly Media",
founded: 1980,
location: "CA"
}
}
becomes
{
name: "O'Reilly Media",
founded: 1980,
location: "CA",
books: [12346789, 234567890, ...]
}
Embedding referencesMonday, April 6, 2015 2:26 PM
MongoDB Page 17
books: [12346789, 234567890, ...]
}
{
_id: 123456789,
title: "MongoDB: The Definitive Guide",
author: [ "Kristina Chodorow", "Mike Dirolf" ],
published_date: ISODate("2010-09-24"),
pages: 216,
language: "English"
}
{
_id: 234567890,
title: "50 Tips and Tricks for MongoDB Developer",
author: "Kristina Chodorow",
published_date: ISODate("2011-05-06"),
pages: 68,
language: "English"
}
First look up documents containing references.
Then follow the references and associate them in
memory.
You have to follow references yourself. There is no "join".
There is no query optimization of joins.
Every following of a reference is a query.
References are mono-directional and costly to reverse; it's
usually necessary to explicitly back-reference for
efficiency, e.g., from book back to publisher.
Some caveats:
Good news: you can unravel this via JavaScript.
Bad news: you must!
Caveat: following references requires a function which can
remember what references what!
MongoDB Page 18
Via parent references:
_id: 234,
parent: 123,
name: 'Alva'
{
}
{
_id: 456
parent: 123,
name: 'Anselm'
}
{
_id: 123,
name: 'Linda'
}
Via array of child references:
_id: 234,
children: [],
name: 'Alva'
{
}
{
_id: 456
Modeling hierarchiesMonday, April 6, 2015 2:37 PM
MongoDB Page 19
_id: 456
children: [],
name: 'Anselm'
}
{
_id: 123,
children: [234, 456]
name: 'Linda',
}
See http://docs.mongodb.org/manual/applications/data-models-tree-structures/
A pattern is a skeleton of an algorithm
along with blanks to fill in and a scope of
application.
Aside: a lot of the MongoDB documentation is
"patterns".
MongoDB Page 20
Document modifications are atomic, while basically nothing else is.
So, to keep transformations atomic, coupled
data must be contained in the same
document!
So "MongoDB normal form" looks nothing like
4NF.
Ensuring atomicityMonday, April 6, 2015 2:48 PM
MongoDB Page 21
... your application has to talk JSON (to
other applications, e.g.)
... you only know JavaScript, Node.JS
When you want to reflect JavaScript objects for whatever reason, e.g.:
... the ability to compute in JavaScript
before returning results is valuable.
... you need to translate data from one
form to another on the server-side for
speed and/or portability.
Where you need significant server-side
computation, e.g.,
... for troubleshooting other programs.
... for catalogueing web resources.
When you need to keep records of
polymorphic transactions whose structure is
not predictable.
... you don't have relational structures to
manage. ... or the query patterns for relational
structures are known and permanent (so
you can use the "embedded documents"
pattern).
And it's acceptable if:
Where MongoDB excelsMonday, April 6, 2015 11:16 AM
MongoDB Page 22
pattern).
... one needs to look at data in several
distinct ways.
e.g., anything that's not a tree.
... there are non-trivial or complex
relationships between objects that should
be represented:
But it has great difficulty when:
MongoDB Page 23
Expects JSON input.
Produces visualizations.
The Tethys application is a javascript
framework for scientific visualization that:
Precompute the JSON objects.
Serve them up via MongoDB/REST
Make the client's life considerably easier.
In this case, MongoDB is perfect, in that one
can:
the objects have fixed and predetermined
formats that can be computed in advance,
so
a)
data transformation can be done offline,
and
b)
we can utilize our own integrity
mechanisms, stepping aside from
MongoDB's lack of integrity mechanisms.
c)
Note that:
caveats:
create Mongodb objects consistently. a)
maintain my own concept of data
consistency separate from MongoDB's.
b)
structure the application for robustness. c)
I must be very careful to:
A very practical application of MongoDBMonday, April 6, 2015 3:11 PM
MongoDB Page 24
All updates to "documents" are atomic.
"Write concerns" define guarantees to
require.
Writes are guaranteed to complete at the
script level.
Can limit concurrency (but only on one
server).
But it's completely impractical.
Can "simulate" two-phase commit.
But scripts can be concurrent!
MongoDB guaranteesMonday, April 6, 2015 11:50 AM
MongoDB Page 25
... to solve a uniquely NoSQL problem.
A strange concept...
Unacknowledged: continue without waiting for
acknowledgement.
Acknowledged: on a single server, the in-memory copy
of the data is up to date and will be returned for
subsequent reads.
Journaled: As well, this write is committed to a journal
and protected against power failures.
Replica Acknowledged: As well, the write is replicated
to one or more replica servers (specified).
A "write concern" controls how a write is processed in the
write action.
In short, one can specify the level of resilience to assure
before continuing the script.
Write concernsMonday, April 6, 2015 11:26 AM
MongoDB Page 26
From <http://docs.mongodb.org/manual/core/write-concern/>
Acknowledged:
From <http://docs.mongodb.org/manual/core/write-concern/>
Unacknowledged
Pictures of write concernsMonday, April 6, 2015 4:44 PM
MongoDB Page 27
From <http://docs.mongodb.org/manual/core/write-concern/>
Journaled:
From <http://docs.mongodb.org/manual/core/write-concern/>
Replica Acknowledged:
MongoDB Page 28
From <http://docs.mongodb.org/manual/core/write-concern/>
Writes do not block reads.
Reads continue -- on stale data -- while
writes are being executed.
But note that
Thus this is eventually consistent.
All that write concerns tell you is whether the
write committed, but not when.
MongoDB Page 29
Indexing does allow one to declare
columns "UNIQUE" as part of creating the
index.
Not many constraint options, but
MongoDB and constraintsMonday, April 6, 2015 4:51 PM
MongoDB Page 30
Serial execution mode stops after first
integrity violation, without rollback.
Parallel execution mode keeps going
regardless of the number of integrity
violations rejected.
Two execution modes for inserts: serial
and parallel.
In both cases, no rollback.
Basically, MongoDB doesn't have transactions:
While you can "simulate" a two-phase
commit, you have to do the rollback if the
commit fails.
See
http://docs.mongodb.org/manual/tutorial/per
form-two-phase-commits/
and weep!
MongoDB and transactionsMonday, April 6, 2015 4:50 PM
MongoDB Page 31
Capped Collections and Tailable Cursors Sharded clusters
Sneaky MongoDB tricks Monday, April 6, 2015 5:00 PM
MongoDB Page 32
Arbitrary JSON content.
First-in, first-out.
Finite number of records.
A capped collection is the MongoDB version of a ring buffer:
Only for capped collections.
A tailable cursor is a cursor that stays active
awaiting data from asynchronous sources:
Capped collections and tailable cursorsMonday, April 6, 2015 5:05 PM
MongoDB Page 33
A mechanism for storing very large data sets.
From <http://docs.mongodb.org/manual/core/distributed-write-operations/>
ShardingMonday, April 6, 2015 5:06 PM
MongoDB Page 34