22
Introduction to MongoDB Sharding Alberto Lerner Software Engineer – 10Gen [email protected]

Introduction to Sharding with MongoDB

  • Upload
    mongodb

  • View
    5.606

  • Download
    3

Embed Size (px)

DESCRIPTION

Alberto Lerner, Software Engineer at 10gen, presents at MongoUK in London, June 2010

Citation preview

Page 1: Introduction to Sharding with MongoDB

Introduction to MongoDB Sharding

Alberto LernerSoftware Engineer – 10Gen

[email protected]

Page 2: Introduction to Sharding with MongoDB

What is it about?

• It’s not about sharding, it’s resharding• What can sharding do for you• What you must do first to obtain it• Use case

Page 3: Introduction to Sharding with MongoDB

Sharding Basics

• To maintain the impression that things look like this

SearchCriteria using an index

scanning the collection

Page 4: Introduction to Sharding with MongoDB

Sharding Basics (cont)

• When they actually are like this

SearchCriteria using an index

scanning the collection

Page 5: Introduction to Sharding with MongoDB

A Detail

• Partitioning a collection is relatively easy• A bit of application logic to find a partition and

that’s it• Or is it?

Page 6: Introduction to Sharding with MongoDB

The Certainty

• Things change– You get spotted, your querying volume grows– You build new functionality, your access pattern

changes– You buy new machines, your fixed partitioning

scheme goes out the window

Page 7: Introduction to Sharding with MongoDB

Insurance

• Sharding is not about partitioning. It’s about repartitioning without you bothering to ask– Adding or removing shards– Splitting and moving chunks*– Logic of finding a chunk is MongoDB’s not the

application’s

* Chunk: an (arbitrary) unit that can move at once between shards

Page 8: Introduction to Sharding with MongoDB

What is it about?

• It’s not about sharding, it’s resharding• What can sharding do for you• What you must do first to obtain it• Use case

Page 9: Introduction to Sharding with MongoDB

Starting to Shard

• You can load data into a sharded collection or shard an existing one*– Automatic range partition will take place – The data placement will be taken care of

• By default, it will be sharded over _id but you can specify a different sharding key– An index will be built automatically over that key

* 1.6

Page 10: Introduction to Sharding with MongoDB

On Writes

• Write capacity becomes the sum of shards capacity

Page 11: Introduction to Sharding with MongoDB

A digression

• A shard can actually live in a group of replicated servers

• Fault-tolerance is obtained that way• Our focus here is incremental scalability and

aggregated performance

Page 12: Introduction to Sharding with MongoDB

On Reads, I

• Lookup over the shard key or a prefix thereof• Sharding at its best!– Search criteria can be satisfied by a single chunk– Lookup inside chunk uses index– May or may not need to access the collection

• Example:– Shard by user_id, return the user’s name

Page 13: Introduction to Sharding with MongoDB

On Reads, II

• Lookup over secondary index• Not bad: merges results from shards• Example: {country : “UK”} with secondary index over

country

Page 14: Introduction to Sharding with MongoDB

On Reads, III

• Lookups where indexes won’t help• Traversing shards sequentially or in parallel?*

*1.6

Page 15: Introduction to Sharding with MongoDB

What is it about?

• It’s not about sharding, it’s resharding• What can sharding do for you• What you must do first to obtain it• Use case

Page 16: Introduction to Sharding with MongoDB

The Sharding Key

• Choose wisely; you’re marrying it• Often, you’re better off defining a unique key

that stores data the application wants to query

• (Internally generated _id is really not it)

Page 17: Introduction to Sharding with MongoDB

Mind Your Queries

• Sure, dynamic partitioning is automatic• But, ultimately, the system’s response time

and scalability is connected to how your application query it

• If most important queries fall into category I, remaining ones in II, and seldom any query that matters in III, you’ll be fine

Page 18: Introduction to Sharding with MongoDB

Pick Your Indexes

• MongoDB allows sharding and secondary indexes

• Critical queries that are not served by the sharding index can use help

• Sometimes, you can’t help them all…• Index selection is a trade-off between

querying and updates/insertion/deletions

Page 19: Introduction to Sharding with MongoDB

What is it about?

• It’s not about sharding, it’s resharding• What can sharding do for you• What you must do first to obtain it• Use case

Page 20: Introduction to Sharding with MongoDB

Bit.ly History

• User creates URL shortener• Sharding is used to store all past URL’s of a

user– Sharding key: user_id– Indexes: timestamp(desc)

• Queries:– Shortened URLs by a given user– Last n URLs by any user

Page 21: Introduction to Sharding with MongoDB

Take Away

• Picture to keep in mind

Page 22: Introduction to Sharding with MongoDB

Questions?

www.mongodb.org