47
Technical Director Jared Rosoff Schema Design

Webinar: Schema Design

  • Upload
    mongodb

  • View
    1.984

  • Download
    0

Embed Size (px)

DESCRIPTION

One of the challenges that comes with moving to MongoDB is figuring how to best model your data. While most developers have internalized the rules of thumb for designing schemas for relational databases, these rules don't always apply to MongoDB. The simple fact that documents can represent rich, schema-free data structures means that we have a lot of viable alternatives to the standard, normalized, relational model. Not only that, MongoDB has several unique features, such as atomic updates and indexed array keys, that greatly influence the kinds of schemas that make sense.

Citation preview

Page 1: Webinar: Schema Design

Technical Director

Jared Rosoff

Schema Design

Page 2: Webinar: Schema Design

Agenda

• Working with documents

• Evolving a Schema

• Queries and Indexes

• Common Patterns

Page 3: Webinar: Schema Design

Terminology

RDBMS MongoDB

Database ➜ Database

Table ➜ Collection

Row ➜ Document

Index ➜ Index

Join ➜ Embedded Document

Foreign Key ➜ Reference

Page 4: Webinar: Schema Design

Working with Documents

Page 5: Webinar: Schema Design

Modeling Data

Page 6: Webinar: Schema Design

DocumentsProvide flexibility and performance

Page 7: Webinar: Schema Design

Normalized Data

Page 8: Webinar: Schema Design

De-Normalized (embedded) Data

Page 9: Webinar: Schema Design

Relational Schema DesignFocus on data storage

Page 10: Webinar: Schema Design

Document Schema DesignFocus on data use

Page 11: Webinar: Schema Design

Schema Design Considerations• How do we manipulate the data?

– Dynamic Ad-Hoc Queries– Atomic Updates– Map Reduce

• What are the access patterns of the application?– Read/Write Ratio– Types of Queries / Updates– Data life-cycle and growth rate

Page 12: Webinar: Schema Design

Data Manipulation

• Conditional Query Operators– Scalar: $ne, $mod, $exists, $type, $lt, $lte, $gt,

$gte, $ne– Vector: $in, $nin, $all, $size

• Atomic Update Operators– Scalar: $inc, $set, $unset– Vector: $push, $pop, $pull, $pushAll, $pullAll,

$addToSet

Page 13: Webinar: Schema Design

Data Access

• Flexible Schemas

• Ability to embed complex data structures

• Secondary Indexes

• Multi-Key Indexes

• Aggregation Framework– $project, $match, $limit, $skip, $sort, $group,

$unwind

• No Joins

Page 14: Webinar: Schema Design

Getting Started

Page 15: Webinar: Schema Design

Library Management Application• Patrons

• Books

• Authors

• Publishers

Page 16: Webinar: Schema Design

An ExampleOne to One Relations

Page 17: Webinar: Schema Design

patron = { _id: "joe" name: "Joe Bookreader”}

address = { patron_id = "joe", street: "123 Fake St. ", city: "Faketon", state: "MA", zip: 12345}

Modeling Patrons

patron = { _id: "joe" name: "Joe Bookreader", address: { street: "123 Fake St. ", city: "Faketon", state: "MA", zip: 12345 }}

Page 18: Webinar: Schema Design

One to One Relations

• Mostly the same as the relational approach

• Generally good idea to embed “contains” relationships

• Document model provides a holistic representation of objects

Page 19: Webinar: Schema Design

An ExampleOne To Many Relations

Page 20: Webinar: Schema Design

patron = { _id: "joe" name: "Joe Bookreader", join_date: ISODate("2011-10-15"), addresses: [ {street: "1 Vernon St.", city: "Newton", state: "MA", …}, {street: "52 Main St.", city: "Boston", state: "MA", …}, ]}

Modeling Patrons

Page 21: Webinar: Schema Design

Publishers and Books

• Publishers put out many books

• Books have one publisher

Page 22: Webinar: Schema Design

MongoDB: The Definitive Guide,By Kristina Chodorow and Mike DirolfPublished: 9/24/2010Pages: 216Language: English

Publisher: O’Reilly Media, CA

Book

Page 23: Webinar: Schema Design

book = { title: "MongoDB: The Definitive Guide", authors: [ "Kristina Chodorow", "Mike Dirolf" ] published_date: ISODate("2010-09-24"), pages: 216, language: "English", publisher: { name: "O’Reilly Media", founded: "1980", location: "CA" }}

Modeling Books – Embedded Publisher

Page 24: Webinar: Schema Design

publisher = { name: "O’Reilly Media", founded: "1980", location: "CA"}

book = { title: "MongoDB: The Definitive Guide", authors: [ "Kristina Chodorow", "Mike Dirolf" ] published_date: ISODate("2010-09-24"), pages: 216, language: "English"}

Modeling Books & Publisher Relationship

Page 25: Webinar: Schema Design

publisher = { _id: "oreilly", name: "O’Reilly Media", founded: "1980", location: "CA"}

book = { title: "MongoDB: The Definitive Guide", authors: [ "Kristina Chodorow", "Mike Dirolf" ] published_date: ISODate("2010-09-24"), pages: 216, language: "English", publisher_id: "oreilly"}

Publisher _id as a Foreign Key

Page 26: Webinar: Schema Design

publisher = { name: "O’Reilly Media", founded: "1980", location: "CA" books: [ "123456789", ... ]}

book = { _id: "123456789", title: "MongoDB: The Definitive Guide", authors: [ "Kristina Chodorow", "Mike Dirolf" ] published_date: ISODate("2010-09-24"), pages: 216, language: "English"}

Book _id as a Foreign Key

Page 27: Webinar: Schema Design

Where do you put the foreign Key?

• Array of books inside of publisher– Makes sense when many means a handful of

items– Useful when items have bound on potential

growth

• Reference to single publisher on books– Useful when items have unbounded growth

(unlimited # of books)

• SQL doesn’t give you a choice, no arrays

Page 28: Webinar: Schema Design

Another ExampleOne to Many Relations

Page 29: Webinar: Schema Design

Books and Patrons

• Book can be checked out by one Patron at a time

• Patrons can check out many books (but not 1000s)

Page 30: Webinar: Schema Design

patron = { _id: "joe" name: "Joe Bookreader", join_date: ISODate("2011-10-15"), address: { ... }}

book = { _id: "123456789" title: "MongoDB: The Definitive Guide", authors: [ "Kristina Chodorow", "Mike Dirolf" ], ...}

Modeling Checkouts

Page 31: Webinar: Schema Design

patron = {

_id: "joe"

name: "Joe Bookreader",

join_date: ISODate("2011-10-15"),

address: { ... },

checked_out: [

{ _id: "123456789", checked_out: "2012-10-15" },

{ _id: "987654321", checked_out: "2012-09-12" },

...

]

}

Modeling Checkouts

Page 32: Webinar: Schema Design

De-normalize for speed

De-normalizationProvides data locality

Page 33: Webinar: Schema Design

patron = { _id: "joe" name: "Joe Bookreader", join_date: ISODate("2011-10-15"), address: { ... }, checked_out: [ { _id: "123456789", title: "MongoDB: The Definitive Guide", authors: [ "Kristina Chodorow", "Mike Dirolf" ], checked_out: ISODate("2012-10-15") }, { _id: "987654321" title: "MongoDB: The Scaling Adventure", ... }, ... ]}

Modeling Checkouts -- de-normalized

Page 34: Webinar: Schema Design

Referencing vs. Embedding• Embedding is a bit like pre-joined data

• Document level ops are easy for server to handle

• Embed when the “many” objects always appear with (viewed in the context of) their parents.

• Reference when you need more flexibility

Page 35: Webinar: Schema Design

An ExampleSingle Table Inheritance

Page 36: Webinar: Schema Design

book = { title: "MongoDB: The Definitive Guide", authors: [ "Kristina Chodorow", "Mike Dirolf" ] published_date: ISODate("2010-09-24"), kind: loanable locations: [ ... ] pages: 216, language: "English", publisher: { name: "O’Reilly Media", founded: "1980", location: "CA" }}

Single Table Inheritance

Page 37: Webinar: Schema Design

An ExampleMany to Many Relations

Page 38: Webinar: Schema Design

Relational Approach

Page 39: Webinar: Schema Design

book = { title: "MongoDB: The Definitive Guide", authors = [ { _id: "kchodorow", name: "K-Awesome” }, { _id: "mdirolf", name: "Batman Mike” } ] published_date: ISODate("2010-09-24"), pages: 216, language: "English"}

author = { _id: "kchodorow", name: "Kristina Chodorow", hometown: "New York"}

Books and Authors

Page 40: Webinar: Schema Design

An ExampleTrees

Page 41: Webinar: Schema Design

book = { title: "MongoDB: The Definitive Guide", authors: [ "Kristina Chodorow", "Mike Dirolf" ], published_date: ISODate("2010-09-24"), pages: 216, language: "English", category: "MongoDB"}

category = { _id: MongoDB, parent: Databases }category = { _id: Databases, parent: Programming }

Parent Links

Page 42: Webinar: Schema Design

book = { _id: 123456789, title: "MongoDB: The Definitive Guide", authors: [ "Kristina Chodorow", "Mike Dirolf" ], published_date: ISODate("2010-09-24"), pages: 216, language: "English"}

category = { _id: MongoDB, children: [ 123456789, … ] }category = { _id: Databases, children: [ MongoDB, Postgres }category = { _id: Programming, children: [ DB, Languages ] }

Child Links

Page 43: Webinar: Schema Design

Modeling Trees

• Parent Links

- Each node is stored as a document

- Contains the id of the parent

• Child Links

- Each node contains the id’s of the children

- Can support graphs (multiple parents / child)

Page 44: Webinar: Schema Design

book = { title: "MongoDB: The Definitive Guide", authors: [ "Kristina Chodorow", "Mike Dirolf" ], published_date: ISODate("2010-09-24"), pages: 216, language: "English", categories: [ Programming, Databases, MongoDB ]}

book = { title: "MySQL: The Definitive Guide", authors: [ ”Michael Kofler" ], published_date: ISODate("2010-09-24"), pages: 216, language: "English", parent: "MongoDB", ancestors: [ Programming, Databases, MongoDB ]}

Array of Ancestors

Page 45: Webinar: Schema Design

An ExampleQueues

Page 46: Webinar: Schema Design

book = { _id: 123456789, title: "MongoDB: The Definitive Guide", authors: [ "Kristina Chodorow", "Mike Dirolf" ], published_date: ISODate("2010-09-24"), pages: 216, language: "English", available: 3}

db.books.findAndModify({ query: { _id: 123456789, available: { "$gt": 0 } }, update: { $inc: { available: -1 } }})

Book Document

Page 47: Webinar: Schema Design

Technical Director

Jared Rosoff

Questions?