34
Objectives: describe a realistic NoSQL system, and compare it in detail with SQL Describe the tradeoffs between SQL and NoSQL Describe the specific advantages and disadvantages of MongoDB Outcomes: Students should be able to: Key/value store Structured blobs (aka 'documents') Concepts of NoSQL integrity. Embedded documents. References. How to cope with lack of joins. Key concepts: Objectives, outcomes, and key concepts Monday, April 6, 2015 10:34 AM MongoDB Page 1

Objectives: describe a realistic NoSQL NoSQL · Objectives: describe a realistic NoSQL system, and compare it in detail with SQL Describe the tradeoffs between SQL and NoSQL Describe

  • Upload
    others

  • View
    21

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Objectives: describe a realistic NoSQL NoSQL · Objectives: describe a realistic NoSQL system, and compare it in detail with SQL Describe the tradeoffs between SQL and NoSQL Describe

Objectives: describe a realistic NoSQL system, and compare it in detail with SQL

Describe the tradeoffs between SQL and

NoSQL

Describe the specific advantages and

disadvantages of MongoDB

Outcomes: Students should be able to:

Key/value store

Structured blobs (aka 'documents')

Concepts of NoSQL integrity.

Embedded documents.

References.

How to cope with lack of joins.

Key concepts:

Objectives, outcomes, and key conceptsMonday, April 6, 2015 10:34 AM

MongoDB Page 1

Page 2: Objectives: describe a realistic NoSQL NoSQL · Objectives: describe a realistic NoSQL system, and compare it in detail with SQL Describe the tradeoffs between SQL and NoSQL Describe

put(domain, key, value)

value = get(domain, key)

A "domain, key → value" storage system

with abstract methods:

An "eventual consistency" model of write.

The abstract pattern of a NoSQL system includes

"domain" and "key" are typically strings.

"domain" is usually set by you and is

analogous to a "table".

"key" may well be automatically generated.

Some things don't change:

The structure of the value.

Built-in support for value queries, indexing,

and

editing.

But NoSQL systems vary widely in:

Recall from last timeMonday, April 6, 2015 10:36 AM

MongoDB Page 2

Page 3: Objectives: describe a realistic NoSQL NoSQL · Objectives: describe a realistic NoSQL system, and compare it in detail with SQL Describe the tradeoffs between SQL and NoSQL Describe

NoSQL Database value structure

and/or metaphor

consistency

BigTable table row strong

XMLdb XML document strong

Google AppEngine Java object

serialization

strong

MongoDB JavaScript object

serialization

eventual

CouchDB JavaScript object

serialization

eventual

Neo4j JavaScript object

serialization

strong

Some examples of how values are interpretedMonday, April 6, 2015 10:48 AM

MongoDB Page 3

Page 4: Objectives: describe a realistic NoSQL NoSQL · Objectives: describe a realistic NoSQL system, and compare it in detail with SQL Describe the tradeoffs between SQL and NoSQL Describe

A serialization is any depiction of a memory

object (in java, javascript, c++, .... ) that can

be written to disk, read back, and

reconstructed.

string = Serialize(object)

object = Unserialize(string)

Formal operations:

The whole point:

Serialize(Unserialize(object)) = object,

Unserialize(Serialize(string)) = string

In JavaScript

a = { 'b': 1, 'c': 2 }

then the serialization of a is "{ 'b': 1, 'c': 2 }"

with a robust serialization-

that can be transmitted over the network

and reconstructed with complete fidelity.

-

Whole idea of JavaScript Object Notation

(JSON) is to create a subset of javascript

objects

have no circular references. -

are pure trees from a structural standpoint. -

JSON objects

and thus their pprints are their

serializations(!).

What is a serialization?Monday, April 6, 2015 6:08 PM

MongoDB Page 4

Page 5: Objectives: describe a realistic NoSQL NoSQL · Objectives: describe a realistic NoSQL system, and compare it in detail with SQL Describe the tradeoffs between SQL and NoSQL Describe

It started as a classical "XML db" in which

documents are XML and keys point to

documents.

Values are JavaScript Object Notation

(JSON) objects (with limits!) CRUD and queries are done in

JavaScript.

It evolved into a JavaScript database in

which

Thus, the language in the documentation

can be quite confusing

MongoDB started as one thing and ended up as another.

SQL Abstract

NoSQL

MongoDB

table domain collection

key key id

row value document

colum

n

(no default

definition)

JavaScript object

MongoDBMonday, April 6, 2015 10:40 AM

MongoDB Page 5

Page 6: Objectives: describe a realistic NoSQL NoSQL · Objectives: describe a realistic NoSQL system, and compare it in detail with SQL Describe the tradeoffs between SQL and NoSQL Describe

x = [ 'a', 'b', 3] // an array

x[0] is 'a'

x[1] is 'b'

x[2] is 3

y = { goo: 'ber', humans: 10 } // a dictionary

y['goo'] (also y.goo) is 'ber'

y['humans'] (also y.humans) is 10

Indexes of dicts must be strings.

Otherwise, nested structures are possible, e.g.,

a = { // a dict of arrays

name: 'Couch',

addr: ['1600 Pennsylvania Avenue', 'Washington', 'DC'],

phone: ['555-1212', '411']

}

After this,

a.addr[1] is 'Washington'

A crash course in JavaScript objectsMonday, April 4, 2016 2:30 PM

MongoDB Page 6

Page 7: Objectives: describe a realistic NoSQL NoSQL · Objectives: describe a realistic NoSQL system, and compare it in detail with SQL Describe the tradeoffs between SQL and NoSQL Describe

CRUD = Create/Retrieve/Update/Delete

This is the minimal set of primitive operations

that make something a data store.

Create:

From <https://docs.mongodb.org/manual/core/write-operations-introduction/>

Retrieve:

From <https://docs.mongodb.org/manual/core/read-operations-introduction/>

Update:

CRUD in MongoDBMonday, April 4, 2016 2:42 PM

MongoDB Page 7

Page 8: Objectives: describe a realistic NoSQL NoSQL · Objectives: describe a realistic NoSQL system, and compare it in detail with SQL Describe the tradeoffs between SQL and NoSQL Describe

Update:

From <https://docs.mongodb.org/manual/core/write-operations-introduction/>

Delete:

From <https://docs.mongodb.org/manual/core/write-operations-introduction/>

MongoDB Page 8

Page 9: Objectives: describe a realistic NoSQL NoSQL · Objectives: describe a realistic NoSQL system, and compare it in detail with SQL Describe the tradeoffs between SQL and NoSQL Describe

SQL MongoDB

INSERT INTO users(user_id, age,status)VALUES ("bcd001", 45, "A")

status: "A" }

db.users.insert( { user_id: "bcd001", age: 45,

)

UPDATE users

SET status = "C"

WHERE age > 25

db.users.update(

{ age: { $gt: 25 } },

{ $set: { status: "C" } },

{ multi: true }

)

DELETE FROM usersWHERE status = "D"

db.users.remove( { status: "D" } )

SQL MongoDB

SELECT user_id, status

FROM users

WHERE status = "A"

db.users.find(

{ status: "A" },

{ user_id: 1, status: 1, _id: 0 }

)

SELECT COUNT(*)FROM usersWHERE age > 30

db.users.find( { age: { $gt: 30 } } ).count()

EXPLAIN SELECT *FROM users

WHERE status = "A"

db.users.find( { status: "A" }

).explain()`

SQL MongoDB

CREATE INDEX foo ON users(status) db.users.createIndex( { status: 1 } )

http://docs.mongodb.org/manual/reference/sql-comparison/

default is BTREE clustered.

Indexing works exactly like postgresql:

seq versus indexed scan.

Planning works the same:

A MongoDB Rosetta StoneMonday, April 6, 2015 1:55 PM

MongoDB Page 9

Page 10: Objectives: describe a realistic NoSQL NoSQL · Objectives: describe a realistic NoSQL system, and compare it in detail with SQL Describe the tradeoffs between SQL and NoSQL Describe

You might ask how one implements common data structures, e.g., trees....

left: {name : 'Anselm'},

{ name: 'Alva',

left: {name: 'Ben'}

}

right: {name: 'Donna',

}

And graphs:

{ name: 'Alva',

friend-of: 'Anselm' }

{ name: 'Anselm',

friend-of: 'Donna' }

....

JavaScript data structures Monday, April 4, 2016 2:38 PM

MongoDB Page 10

Page 11: Objectives: describe a realistic NoSQL NoSQL · Objectives: describe a realistic NoSQL system, and compare it in detail with SQL Describe the tradeoffs between SQL and NoSQL Describe

... "For instance, on the planet Earth, man had always assumed that he was more

intelligent than dolphins because he had achieved so much -- the whell, New York,

wars, and so on -- while all the dolphins had ever done was muck about in the water

having a good time. But conversely, the dolphins had always believed that the were

far more intelligent than man -- for precisely the same reasons.

Douglas Adams, "Hitchikers' Guide to the

Galaxy"

Dolphins versus humansMonday, April 6, 2015 4:35 PM

MongoDB Page 11

Page 12: Objectives: describe a realistic NoSQL NoSQL · Objectives: describe a realistic NoSQL system, and compare it in detail with SQL Describe the tradeoffs between SQL and NoSQL Describe

With MongoDB, the problem is that in many ways, the good

news is the bad news.

Good news Bad news

You don't have to define

structure of collections in

advance

... but there is no concept of

structural consistency of

collections.

Values are JavaScript objects ... and unlimited in

structure, with all the

deleterious effects of that.

Queries are JavaScript objects ... and it is particularly ugly

and non-standard

JavaScript.

You only have to know

JavaScript.

... but what if your

application is not in

JavaScript?

You don't have to deal with

joins,

... because you're expected

to do without them

completely and use object

polymorphism instead.

You have to put all consistency

logic into the application

... and that is not good.

No triggers to deal with ... ditto

Good news and bad newsMonday, April 6, 2015 10:46 AM

MongoDB Page 12

Page 13: Objectives: describe a realistic NoSQL NoSQL · Objectives: describe a realistic NoSQL system, and compare it in detail with SQL Describe the tradeoffs between SQL and NoSQL Describe

via the interactive Mongo Shell "mongo"via a REST interface (using similar notation)

ssh comp115-05

mongo

> use couch

switched to db couch

> j = { name : "mongo" }

{ "name" : "mongo" }

> k = { x : 3 }

{ "x" : 3 }

> db.testData.insert( j )

> db.testData.insert( k )

> db.testData.find()

{ "_id" : ObjectId("5522c510aa87fa5f85e52cba"), "name" : "mongo" }

{ "_id" : ObjectId("5522c511aa87fa5f85e52cbb"), "x" : 3 }

The two records have different schemas.

MongoDB doesn't mind that.... your

application might!

Note that:

Interacting with MongoDBMonday, April 6, 2015 2:11 PM

MongoDB Page 13

Page 14: Objectives: describe a realistic NoSQL NoSQL · Objectives: describe a realistic NoSQL system, and compare it in detail with SQL Describe the tradeoffs between SQL and NoSQL Describe

Databases and collections spring into

existence automatically after you write to

them.

So, you had better not misspell anything...!

This goes double for column names.

A rather confusing property:

This translates to extreme brittleness of

application.

Caveat: always encapsulate writes in a

function so that you can't misspell keys!

Interacting with MongoDBMonday, April 6, 2015 1:27 PM

MongoDB Page 14

Page 15: Objectives: describe a realistic NoSQL NoSQL · Objectives: describe a realistic NoSQL system, and compare it in detail with SQL Describe the tradeoffs between SQL and NoSQL Describe

The SQL way to model one-many relationships is via

joins.

The MongoDB way is via so-called "embedded

documents".

For example, the relational schema: {

_id: "joe",

name: "Joe Bookreader"

}

{

patron_id: "joe",

street: "123 Fake Street",

city: "Faketon",

state: "MA",

zip: "12345"

}

{

patron_id: "joe",

street: "1 Some Other Street",

city: "Boston",

state: "MA",

zip: "12345"

}

could be modeled via embedding as:

{

_id: "joe",

name: "Joe Bookreader",

addresses: [

Joins versus embedded documentsMonday, April 6, 2015 2:09 PM

MongoDB Page 15

Page 16: Objectives: describe a realistic NoSQL NoSQL · Objectives: describe a realistic NoSQL system, and compare it in detail with SQL Describe the tradeoffs between SQL and NoSQL Describe

_id: "joe",

name: "Joe Bookreader",

addresses: [

{

street: "123 Fake Street",

city: "Faketon",

state: "MA",

zip: "12345"

},

{

street: "1 Some Other

Street",

city: "Boston",

state: "MA",

zip: "12345"

}

]

}

http://docs.mongodb.org/manual/tutorial/model-embedded-one-to-many-relationships-between-documents/

But this only works for trees. Otherwise one repeats

data.

MongoDB Page 16

Page 17: Objectives: describe a realistic NoSQL NoSQL · Objectives: describe a realistic NoSQL system, and compare it in detail with SQL Describe the tradeoffs between SQL and NoSQL Describe

Another way of doing things is to embed references instead

of documents (typically to the automatically indexed _id

field).

{

title: "MongoDB: The Definitive Guide",

author: [ "Kristina Chodorow", "Mike Dirolf" ],

published_date: ISODate("2010-09-24"),

pages: 216,

language: "English",

publisher: {

name: "O'Reilly Media",

founded: 1980,

location: "CA"

}

}

{

title: "50 Tips and Tricks for MongoDB Developer",

author: "Kristina Chodorow",

published_date: ISODate("2011-05-06"),

pages: 68,

language: "English",

publisher: {

name: "O'Reilly Media",

founded: 1980,

location: "CA"

}

}

becomes

{

name: "O'Reilly Media",

founded: 1980,

location: "CA",

books: [12346789, 234567890, ...]

}

Embedding referencesMonday, April 6, 2015 2:26 PM

MongoDB Page 17

Page 18: Objectives: describe a realistic NoSQL NoSQL · Objectives: describe a realistic NoSQL system, and compare it in detail with SQL Describe the tradeoffs between SQL and NoSQL Describe

books: [12346789, 234567890, ...]

}

{

_id: 123456789,

title: "MongoDB: The Definitive Guide",

author: [ "Kristina Chodorow", "Mike Dirolf" ],

published_date: ISODate("2010-09-24"),

pages: 216,

language: "English"

}

{

_id: 234567890,

title: "50 Tips and Tricks for MongoDB Developer",

author: "Kristina Chodorow",

published_date: ISODate("2011-05-06"),

pages: 68,

language: "English"

}

First look up documents containing references.

Then follow the references and associate them in

memory.

You have to follow references yourself. There is no "join".

There is no query optimization of joins.

Every following of a reference is a query.

References are mono-directional and costly to reverse; it's

usually necessary to explicitly back-reference for

efficiency, e.g., from book back to publisher.

Some caveats:

Good news: you can unravel this via JavaScript.

Bad news: you must!

Caveat: following references requires a function which can

remember what references what!

MongoDB Page 18

Page 19: Objectives: describe a realistic NoSQL NoSQL · Objectives: describe a realistic NoSQL system, and compare it in detail with SQL Describe the tradeoffs between SQL and NoSQL Describe

Via parent references:

_id: 234,

parent: 123,

name: 'Alva'

{

}

{

_id: 456

parent: 123,

name: 'Anselm'

}

{

_id: 123,

name: 'Linda'

}

Via array of child references:

_id: 234,

children: [],

name: 'Alva'

{

}

{

_id: 456

Modeling hierarchiesMonday, April 6, 2015 2:37 PM

MongoDB Page 19

Page 20: Objectives: describe a realistic NoSQL NoSQL · Objectives: describe a realistic NoSQL system, and compare it in detail with SQL Describe the tradeoffs between SQL and NoSQL Describe

_id: 456

children: [],

name: 'Anselm'

}

{

_id: 123,

children: [234, 456]

name: 'Linda',

}

See http://docs.mongodb.org/manual/applications/data-models-tree-structures/

A pattern is a skeleton of an algorithm

along with blanks to fill in and a scope of

application.

Aside: a lot of the MongoDB documentation is

"patterns".

MongoDB Page 20

Page 21: Objectives: describe a realistic NoSQL NoSQL · Objectives: describe a realistic NoSQL system, and compare it in detail with SQL Describe the tradeoffs between SQL and NoSQL Describe

Document modifications are atomic, while basically nothing else is.

So, to keep transformations atomic, coupled

data must be contained in the same

document!

So "MongoDB normal form" looks nothing like

4NF.

Ensuring atomicityMonday, April 6, 2015 2:48 PM

MongoDB Page 21

Page 22: Objectives: describe a realistic NoSQL NoSQL · Objectives: describe a realistic NoSQL system, and compare it in detail with SQL Describe the tradeoffs between SQL and NoSQL Describe

... your application has to talk JSON (to

other applications, e.g.)

... you only know JavaScript, Node.JS

When you want to reflect JavaScript objects for whatever reason, e.g.:

... the ability to compute in JavaScript

before returning results is valuable.

... you need to translate data from one

form to another on the server-side for

speed and/or portability.

Where you need significant server-side

computation, e.g.,

... for troubleshooting other programs.

... for catalogueing web resources.

When you need to keep records of

polymorphic transactions whose structure is

not predictable.

... you don't have relational structures to

manage. ... or the query patterns for relational

structures are known and permanent (so

you can use the "embedded documents"

pattern).

And it's acceptable if:

Where MongoDB excelsMonday, April 6, 2015 11:16 AM

MongoDB Page 22

Page 23: Objectives: describe a realistic NoSQL NoSQL · Objectives: describe a realistic NoSQL system, and compare it in detail with SQL Describe the tradeoffs between SQL and NoSQL Describe

pattern).

... one needs to look at data in several

distinct ways.

e.g., anything that's not a tree.

... there are non-trivial or complex

relationships between objects that should

be represented:

But it has great difficulty when:

MongoDB Page 23

Page 24: Objectives: describe a realistic NoSQL NoSQL · Objectives: describe a realistic NoSQL system, and compare it in detail with SQL Describe the tradeoffs between SQL and NoSQL Describe

Expects JSON input.

Produces visualizations.

The Tethys application is a javascript

framework for scientific visualization that:

Precompute the JSON objects.

Serve them up via MongoDB/REST

Make the client's life considerably easier.

In this case, MongoDB is perfect, in that one

can:

the objects have fixed and predetermined

formats that can be computed in advance,

so

a)

data transformation can be done offline,

and

b)

we can utilize our own integrity

mechanisms, stepping aside from

MongoDB's lack of integrity mechanisms.

c)

Note that:

caveats:

create Mongodb objects consistently. a)

maintain my own concept of data

consistency separate from MongoDB's.

b)

structure the application for robustness. c)

I must be very careful to:

A very practical application of MongoDBMonday, April 6, 2015 3:11 PM

MongoDB Page 24

Page 25: Objectives: describe a realistic NoSQL NoSQL · Objectives: describe a realistic NoSQL system, and compare it in detail with SQL Describe the tradeoffs between SQL and NoSQL Describe

All updates to "documents" are atomic.

"Write concerns" define guarantees to

require.

Writes are guaranteed to complete at the

script level.

Can limit concurrency (but only on one

server).

But it's completely impractical.

Can "simulate" two-phase commit.

But scripts can be concurrent!

MongoDB guaranteesMonday, April 6, 2015 11:50 AM

MongoDB Page 25

Page 26: Objectives: describe a realistic NoSQL NoSQL · Objectives: describe a realistic NoSQL system, and compare it in detail with SQL Describe the tradeoffs between SQL and NoSQL Describe

... to solve a uniquely NoSQL problem.

A strange concept...

Unacknowledged: continue without waiting for

acknowledgement.

Acknowledged: on a single server, the in-memory copy

of the data is up to date and will be returned for

subsequent reads.

Journaled: As well, this write is committed to a journal

and protected against power failures.

Replica Acknowledged: As well, the write is replicated

to one or more replica servers (specified).

A "write concern" controls how a write is processed in the

write action.

In short, one can specify the level of resilience to assure

before continuing the script.

Write concernsMonday, April 6, 2015 11:26 AM

MongoDB Page 26

Page 27: Objectives: describe a realistic NoSQL NoSQL · Objectives: describe a realistic NoSQL system, and compare it in detail with SQL Describe the tradeoffs between SQL and NoSQL Describe

From <http://docs.mongodb.org/manual/core/write-concern/>

Acknowledged:

From <http://docs.mongodb.org/manual/core/write-concern/>

Unacknowledged

Pictures of write concernsMonday, April 6, 2015 4:44 PM

MongoDB Page 27

Page 28: Objectives: describe a realistic NoSQL NoSQL · Objectives: describe a realistic NoSQL system, and compare it in detail with SQL Describe the tradeoffs between SQL and NoSQL Describe

From <http://docs.mongodb.org/manual/core/write-concern/>

Journaled:

From <http://docs.mongodb.org/manual/core/write-concern/>

Replica Acknowledged:

MongoDB Page 28

Page 29: Objectives: describe a realistic NoSQL NoSQL · Objectives: describe a realistic NoSQL system, and compare it in detail with SQL Describe the tradeoffs between SQL and NoSQL Describe

From <http://docs.mongodb.org/manual/core/write-concern/>

Writes do not block reads.

Reads continue -- on stale data -- while

writes are being executed.

But note that

Thus this is eventually consistent.

All that write concerns tell you is whether the

write committed, but not when.

MongoDB Page 29

Page 30: Objectives: describe a realistic NoSQL NoSQL · Objectives: describe a realistic NoSQL system, and compare it in detail with SQL Describe the tradeoffs between SQL and NoSQL Describe

Indexing does allow one to declare

columns "UNIQUE" as part of creating the

index.

Not many constraint options, but

MongoDB and constraintsMonday, April 6, 2015 4:51 PM

MongoDB Page 30

Page 31: Objectives: describe a realistic NoSQL NoSQL · Objectives: describe a realistic NoSQL system, and compare it in detail with SQL Describe the tradeoffs between SQL and NoSQL Describe

Serial execution mode stops after first

integrity violation, without rollback.

Parallel execution mode keeps going

regardless of the number of integrity

violations rejected.

Two execution modes for inserts: serial

and parallel.

In both cases, no rollback.

Basically, MongoDB doesn't have transactions:

While you can "simulate" a two-phase

commit, you have to do the rollback if the

commit fails.

See

http://docs.mongodb.org/manual/tutorial/per

form-two-phase-commits/

and weep!

MongoDB and transactionsMonday, April 6, 2015 4:50 PM

MongoDB Page 31

Page 32: Objectives: describe a realistic NoSQL NoSQL · Objectives: describe a realistic NoSQL system, and compare it in detail with SQL Describe the tradeoffs between SQL and NoSQL Describe

Capped Collections and Tailable Cursors Sharded clusters

Sneaky MongoDB tricks Monday, April 6, 2015 5:00 PM

MongoDB Page 32

Page 33: Objectives: describe a realistic NoSQL NoSQL · Objectives: describe a realistic NoSQL system, and compare it in detail with SQL Describe the tradeoffs between SQL and NoSQL Describe

Arbitrary JSON content.

First-in, first-out.

Finite number of records.

A capped collection is the MongoDB version of a ring buffer:

Only for capped collections.

A tailable cursor is a cursor that stays active

awaiting data from asynchronous sources:

Capped collections and tailable cursorsMonday, April 6, 2015 5:05 PM

MongoDB Page 33

Page 34: Objectives: describe a realistic NoSQL NoSQL · Objectives: describe a realistic NoSQL system, and compare it in detail with SQL Describe the tradeoffs between SQL and NoSQL Describe

A mechanism for storing very large data sets.

From <http://docs.mongodb.org/manual/core/distributed-write-operations/>

ShardingMonday, April 6, 2015 5:06 PM

MongoDB Page 34