Upload
mogens-heller-grabe
View
3.208
Download
1
Embed Size (px)
Citation preview
d60 developing smart software solutions
So you want to liberate your data? April 2012
Agenda
• Data, queries, etc. • Concurrency • AggregaEon • Deployment • Durability • Things to be aware of
MongoDB
• Document database • Currently in v. 2.0.4 • Developed by 10gen • Open source
– server is GNU AGPL v3 – clients (the official) are Apache V2
• Absolutely free to use – you can get a commercial version of the db though – has support, SSL, and more security features
Conceptual data organizaEon
process database collection document
process
database table row
Data
Example 1
• Install • Mongo Shell • Show database contents • Add and show a document
Queries
including several other query operators: $gt, $gte, $lt, $lte, $exists, $all, etc...
Indexes
Updates
including several other update modifiers: $inc, $set, $addToSet, $rename, etc...
Example 2
• Import some data • Query • Update • Index • Query
ACID?
• Atomic: Yeah well, per document. • Consistent: Yeah well, can be. • Isolated: Yeah well, per document. • Durable: Yeah well, can be – not default though....
Concurrency
• Pushing it down the stack
Concurrency
• Preserve invariants with update precondiEons
Concurrency
• Use opEmisEc locking when replacing document
(and then check whether n is 0 or 1...)
Concurrency
• Use FindAndModify to “check out” documents
AggregaEon
• Map/reduce
AggregaEon
• Map/reduce – Map: for each document: emit 0 or more (key, value) tuples
– Reduce: given a (key, value[]), return 1 value
AggregaEon m = function() { var doc = this; doc.appearances.forEach(function(a) { emit(a, { count: 1, names: [doc.firstName + “ “ + doc.lastName] }); }); } r = function(key, values) { var count = 0; var names = []; values.forEach(function(v) { count += v.count; names = names.concat(v.names); }); return {count: count, names: names}; }
Example 3
• Use map/reduce to collect informaEon on who appeared in each episode
AggregaEon
• AggregaEon framework (not available unEl 2.2) – declaraEve syntax for construcEon of an aggregaEon pipeline
AggregaEon
• AggregaEon framework (not available unEl 2.2)
Deployment
• Several configuraEons – we’ll check out replica sets and sharding
Replica sets
• Master-‐slave with automaEc failover – Each mongod should be started with the -‐-‐replset argument
– AddiEonal nodes added from the shell – Make sure the number of nodes is odd, possibly by adding an arbiter
Replica sets
• Higher availability • Scale out reads • Backup without interfering with the primary
Sharding
• Auto-‐sharding – happens by user-‐defined shard key
– can be defined per collecEon
– requires special nodes: mongos (the load balancer) and a mongod that is configured to be a configuraEon server
Sharding
• Scale out writes
• LimitaEons: – Shard key is immutable – All inserts/updates must include the shard key – Cannot enforce (arbitrary) uniqueness across shards, only for shard key
Sharding + replica sets
MongoDB’s durability story
• Memory-‐mapped files. • fsync.
• Durability through replicaEon – pre 1.8
• Durability through journaling – an opEon since 1.8 – replica sets sEll cool though – default since 2.0
MongoDB’s durability story
• Inserts and updates are unsafe by default!! – only purpose: get awesome benchmarks – bad: bites you in the a**
• Exposed differently on drivers, but always maps to db.getLastError()
MongoDB’s durability story
• Conclusion: It’s cool that you can tweak it per operation, but it’s uncool that it’s unsafe.
Things to be aware of
• Safe mode off • 32/64 bit • Memory-‐mapped file • Global write lock • Indexes should always fit in RAM
Image credits The world’s most interesEng man: h8p://i.qkme.me/3mwy.jpg Bison: h8p://www.flickr.com/photos/johan-‐gril/5632513228/ Tired Fry: h8p://cdn.memegenerator.net/instances/400x/18731987.jpg Thanks for lerng me borrow your awesome images – if you ever meet me, I’ll buy you a beer. Seriously, I will.