Upload
mongodb
View
2.079
Download
1
Tags:
Embed Size (px)
DESCRIPTION
In this session, we'll examine schema design insights and trade-offs using real world examples. We'll look at three example applications: building an email inbox, selecting a shard key for a large scale web application, and using MongoDB to store user profiles. From these examples you should leave the session with an idea of the advantages and disadvantages of various approaches to modeling your data in MongoDB. Attendees should be well versed in basic schema design and familiar with concepts in the morning's basic schema design talk. No beginner topics will be covered in this session.
Citation preview
Technical Director, 10gen
@jonnyeight [email protected] alvinonmongodb.com
Alvin Richards
#MongoDBdays
Schema Design3 Real World Use Cases
I'm planning a Trip to LA…
Single Table En
Agenda
• Why is schema design important
• 3 Real World Schemas– Inbox– Indexed Attributes– Multiple Identities
• Conclusions
Why is Schema Design important?
• Largest factor for a performant system
• Schema design with MongoDB is different
• RBMS – "What answers do I have?"• MongoDB – "What question will I have?"
• Must consider use case with schema
#1 - Message Inbox
Let’s getSocial
Sending Messages
?
Reading my Inbox
?
Design Goals
• Efficiently send new messages to recipients
• Efficiently read inbox
3 Approaches (there are more)• Fan out on Read
• Fan out on Write
• Fan out on Write with Bucketing
Fan out on read – Send Message
Shard 1 Shard 2 Shard 3
Send Message
db.inbox.save( { to: [ "Bob", "Jane" ], … } )
Fan out on read – Inbox Read
Shard 1 Shard 2 Shard 3
Read Inbox
db.inbox.find( { to: "Bob" } )
// Shard on "from"db.shardCollection( "mongodbdays.inbox", { from: 1 } )
// Make sure we have an index to handle inbox readsdb.inbox.ensureIndex( { to: 1, sent: 1 } )
msg = { from: "Joe", to: [ "Bob", "Jane" ],
sent: new Date(), message: "Hi!",
}
// Send a messagedb.inbox.save( msg )
// Read my inboxdb.inbox.find( { to: "Bob" } ).sort( { sent: -1 } )
Fan out on read
Considerations
1 document per message sent Multiple recipients in an array key Reading inbox finds all messages with my
own name in the recipient field
✖Requires scatter-gather on sharded cluster
✖Then a lot of random IO on a shard to find everything
Fan out on write – Send Message
Shard 1 Shard 2 Shard 3
Send Message
db.inbox.save( { to: "Bob", …} )
Fan out on write– Read Inbox
Shard 1 Shard 2 Shard 3
Read Inbox
db.inbox.find( { to: "Bob" } )
// Shard on “recipient” and “sent” db.shardCollection( "mongodbdays.inbox", { ”recipient”: 1, ”sent”: 1 } )
msg = { from: "Joe”, recipient: [ "Bob", "Jane" ],
sent: new Date(), message: "Hi!",
}
// Send a messagefor ( recipient in msg.recipient ) {
msg.to = recipientdb.inbox.save( msg );
}
// Read my inboxdb.inbox.find( { to: "Joe" } ).sort( { sent: -1 } )
Fan out on write
Considerations
✖1 document per recipient per messageReading inbox is finding all of the
messages with me as the recipientCan shard on recipient, so inbox reads hit
one shard
✖But still lots of random IO on the shard
Fan out on write with buckets• Each “inbox” document is an array of
messages
• Append a message onto “inbox” of recipient
• Bucket inbox documents so there’s not too many per document
• Can shard on recipient, so inbox reads hit one shard
• A few documents to read the whole inbox
Bucketed fan out on write - Send
Shard 1 Shard 2 Shard 3
Send Message
db.inbox.update( { to: "Bob"}, { $push: { msg: … } })
Bucketed fan out on write - Read
Shard 1 Shard 2 Shard 3
Read Inbox
db.inbox.find( { to: "Bob" } )
// Shard on “owner / sequence”db.shardCollection( "mongodbdays.inbox", { owner: 1, sequence: 1 } )db.shardCollection( "mongodbdays.users", { user_name: 1 } )msg = { from: "Joe", to: [ "Bob", "Jane" ],
sent: new Date(), message: "Hi!",
}// Send a messagefor( recipient in msg.to) { count = db.users.findAndModify({ query: { user_name: msg.to[recipient] }, update: { "$inc": { "msg_count": 1 } }, upsert: true, new: true }).msg_count; sequence = Math.floor(count / 50);
db.inbox.update( { to: msg.to[recipient], sequence: sequence },
{ $push: { "messages": msg } },
{ upsert: true } );}// Read my inboxdb.inbox.find( { to: "Joe" } ).sort ( { sequence: -1 } ).limit( 2 )
Fan out on write – with buckets
Considerations
Fewer documents per recipientReading inbox is just finding a few bucketsCan shard on recipient, so inbox reads hit
one shard
✖But still some random IO on the shard
But…
• What if I do not / cannot retain all history?
– Space limited: Hours, Days, Weeks, $$$– Legislative limited: HIPPA, SOX, DPA
3 Approaches (there are more)• Bucket by Number of messages – just
seen that
• Fixed size Array
• Bucket by Date + TTL Collections
// Query with a date rangedb.inbox.find ( { owner: "Joe", messages: { $elemMatch: { sent: { $gte: ISODate("2013-04-04…") }}}})
// Remove elements based on a datedb.inbox.update( { owner: "Joe" }, { $pull: { messages: { sent: { $gte: ISODate("2013-04-04…") } } } } )
Inbox – Bucket by # messages
Considerations
Limited to a known range of messages
✖Shrinking documents• space can be reclaimed withdb.runCommand ( { compact: '<collection>' } )
✖Removing the document after the last element in the array as been removed– { "_id" : …, "messages" : [ ], "owner" : "friend1", "sequence" : 0 }
msg = { from: "Your Boss", to: [ "Bob" ], sent: new Date(), message: "CALL ME NOW!" }
// 2.4 Introduces $each, $sort and $slice for $pushdb.messages.update(
{ _id: 1 }, { $push: { messages: { $each: [ msg ],
$sort: { sent: 1 },
$slice: -50 }
} })
Maintain the latest – Fixed Size Array
Push this object onto the array
Sort the resulting array
by "sent"
Limit the array to 50 elements
Considerations
Limited to a known # of messages
✖Need to compute the size of the array based on retention period
// messages: one doc per user per day
db.inbox.findOne(){
_id: 1, to: "Joe", sequence: ISODate("2013-02-04T00:00:00.392Z"), messages: [ ] }
// Auto expires data after 31536000 seconds = 1 yeardb.messages.ensureIndex( { sequence: 1 }, { expireAfterSeconds: 31536000 } )
TTL Collections
Considerations
Limited to a known range of messages Automatic purge of expired data
No need to have a CRON task, etc. to do this
✖ Per Collection basis
#3 – Indexed Attributes
Design Goal
• Application needs to stored a variable number of attributes e.g.– User defined Form– Meta Data tags
• Queries needed– Equality– Range based
• Need to be efficient, regardless of the number of attributes
2 Approaches (there are more)• Attributes
• Attributes as Objects in an Array
// Flexible set of attributes
db.files.insert( { _id:"mongod", attr: { type: "binary", size: 256, created: ISODate("2013-04-01T18:13:42.689Z") } } )
// Need to create an index for each item in the sub-documentdb.files.ensureIndex( { "attr.type": 1 } )db.files.find( { "attr.type": "text"} )
// Can perform range queriesdb.files.ensureIndex( { "attr.size": 1 } )db.files.find( { "attr.size": { $gt: 64, $lte: 16384 } } )
Attributes
Considerations
Attributes can be queried via an IndexEquality & Range queries supported
✖Each attribute needs an Index
✖Each time you extend, you add an index
✖Single index is used (unless you have $or)
// Flexible set of attributes, each attribute is an object
db.files.insert( { _id: "mongod", attr: [ { type: "binary" }, { size: 256 }, { created: ISODate("2013-04-01T18:13:42.689Z") } ] } )
db.files.ensureIndex( { attr: 1 } )
Attributes as Objects in Array
// Range queriesdb.files.find( { attr: { $gt: { size:64 }, $lte: { size: 16384 } } } )
db.files.find( { attr: { $gte: { created: ISODate("2013-02-01T00:00:01.689Z") } } } )
// Multiple condition – Only the first predicate on the query can use the Index// ensure that this is the most selective. // Index Intersection will allow multiple indexes, see SERVER-3071
db.files.find( { $and: [ { attr: { $gte: { created: ISODate("2013-02-01T…") } } }, { attr: { $gt: { size:128 }, $lte: { size: 16384 } } } ] } )
// Each $or can use an indexdb.files.find( { $or: [ { attr: { $gte: { created: ISODate("2013-02-01T…") } } }, { attr: { $gt: { size:128 }, $lte: { size: 16384 } } } ] } )
Queries
Considerations
Attributes can be queried via a Single index
New attributes do not need extra Indexes Equality & Range queries supported
✖ $and can only use a Single Index
#3 – Multiple Identities
Design Goal
• Ability to look up by a number of different identities e.g.
• Username• Email address• FB Handle• LinkedIn URL
2 Approaches (there are more)• Multiple Identifiers in a single document
• Separate Identifiers from Content
db.users.findOne(){ _id: "joe", email: "[email protected], fb: "joe.smith", // facebook li: "joe.e.smith", // linkedin other: {…}}
// Shard collection by _iddb.shardCollection("mongodbdays.users", { _id: 1 } )
// Create indexes on each keydb.users.ensureIndex( { email: 1} )db.users.ensureIndex( { fb: 1 } )db.users.ensureIndex( { li: 1 } )
Single Document by User
Read by _id (shard key)
Shard 1 Shard 2 Shard 3
find( { _id: "joe"} )
Considerations
Lookup by shard key is routed to 1 shard
✖ Lookup by other identifier is scatter gathered across all shards
✖ Secondary keys cannot have a unique index
// Create a document that holds all the other user attributesdb.users.save( { _id: "1200-42", ... } )
// Shard collection by _iddb.shardCollection( "mongodbdays.users", { _id: 1 } )
// Create a document for each users documentdb.identities.save( { identifier : { hndl: "joe" }, user: "1200-42" } )db.identities.save( { identifier : { email: "[email protected]" }, user: "1200-42" } )db.identities.save( { identifier : { li: "joe.e.smith" }, user: "1200-42" } )
// Shard collection by _iddb.shardCollection( "mongodbdays.identities", { identifier : 1 } )
// Create unique indexdb.identities.ensureIndex( { identifier : 1} , { unique: true} )db.users.ensureIndex( { _id: 1} , { unique: true} )
Document per Identity
Read requires 2 queries
Shard 1 Shard 2 Shard 3
db.identities.find({"identifier" : { "hndl" : "joe" }})
db.users.find( { _id: "1200-42"} )
Considerations
Multiple queries, but always routed Lookup to Identities is a routed query Lookup to Users is a routed query
Unique indexes available
Conclusion
Summary
• Multiple ways to model a domain problem
• Understand the key uses cases of your app
• Balance between ease of query vs. ease of write
• Avoid Random IO
• Avoid Scatter / Gather query pattern
Technical Director, 10gen
@jonnyeight [email protected] alvinonmongodb.com
Alvin Richards
#MongoDBdays
Thank You