68
Software Engineer, 10gen Tyler Brock Introduction to Sharding Wednesday, March 27, 13

Sharding

  • Upload
    mongodb

  • View
    6.055

  • Download
    2

Embed Size (px)

DESCRIPTION

MongoDB was designed for humongous amounts of data, with the ability to scale horizontally via sharding. In this session, we’ll look at MongoDB’s approach to partitioning data, and the architecture of a sharded system. We’ll walk you through configuration of a sharded system, and look at how data is balanced across servers and requests are routed.

Citation preview

Page 1: Sharding

Software Engineer, 10gen

Tyler Brock

Introduction to Sharding

Wednesday, March 27, 13

Page 2: Sharding

Agenda

• Scaling Data• MongoDB's Approach• Architecture• Configuration• Mechanics

Wednesday, March 27, 13

Page 3: Sharding

Scaling Data

Wednesday, March 27, 13

Page 4: Sharding

Examining Growth

Wednesday, March 27, 13

Page 5: Sharding

Examining Growth

• User Growth– 1995: 0.4% of the world’s population– Today: 30% of the world is online (~2.2B)– Emerging Markets & Mobile

Wednesday, March 27, 13

Page 6: Sharding

Examining Growth

• User Growth– 1995: 0.4% of the world’s population– Today: 30% of the world is online (~2.2B)– Emerging Markets & Mobile

• Data Set Growth– Facebook’s data set is around 100 petabytes– 4 billion photos taken in the last year (4x a decade ago)

Wednesday, March 27, 13

Page 7: Sharding

Working Set Exceeds Physical Memory

Wednesday, March 27, 13

Page 8: Sharding

Read/Write Throughput Exceeds I/O

Wednesday, March 27, 13

Page 9: Sharding

Vertical Scalability (Scale Up)

Wednesday, March 27, 13

Page 10: Sharding

Horizontal Scalability (Scale Out)

Wednesday, March 27, 13

Page 11: Sharding

Data Store Scalability

• Custom Hardware– Oracle

• Custom Software– Facebook + MySQL– Google

Wednesday, March 27, 13

Page 12: Sharding

Data Store Scalability Today

• MongoDB Auto-Sharding• A data store that is

– Free– Publicly available– Open Source (https://github.com/mongodb/mongo)– Horizontally scalable– Application independent

Wednesday, March 27, 13

Page 13: Sharding

MongoDB's Approach to Sharding

Wednesday, March 27, 13

Page 14: Sharding

Partitioning

• User defines shard key• Shard key defines range of data• Key space is like points on a line• Range is a segment of that line

Wednesday, March 27, 13

Page 15: Sharding

Data Distribution

• Initially 1 chunk• Default max chunk size: 64mb• MongoDB automatically splits & migrates chunks

when max reached

Wednesday, March 27, 13

Page 16: Sharding

Routing and Balancing

• Queries routed to specific shards

• MongoDB balances cluster• MongoDB migrates data to

new nodes

Wednesday, March 27, 13

Page 17: Sharding

MongoDB Auto-Sharding

• Minimal effort required– Same interface as single mongod

• Two steps– Enable Sharding for a database– Shard collection within database

Wednesday, March 27, 13

Page 18: Sharding

Architecture

Wednesday, March 27, 13

Page 19: Sharding

What is a Shard?

• Shard is a node of the cluster• Shard can be a single mongod or a replica set

Wednesday, March 27, 13

Page 20: Sharding

• Config Server– Stores cluster chunk ranges and locations– Can have only 1 or 3 (production must have 3)– Not a replica set

Meta Data Storage

Wednesday, March 27, 13

Page 21: Sharding

Routing and Managing Data

• Mongos– Acts as a router / balancer– No local data (persists to config database)– Can have 1 or many

Wednesday, March 27, 13

Page 22: Sharding

Sharding infrastructure

Wednesday, March 27, 13

Page 23: Sharding

Configuration

Wednesday, March 27, 13

Page 24: Sharding

Example Cluster

• Don’t use this setup in production!- Only one Config server (No Fault Tolerance)- Shard not in a replica set (Low Availability)- Only one mongos and shard (No Performance Improvement)

Wednesday, March 27, 13

Page 25: Sharding

Starting the Configuration Server

• mongod --configsvr• Starts a configuration server on the default port (27019)

Wednesday, March 27, 13

Page 26: Sharding

Start the mongos Router

• mongos --configdb <hostname>:27019• For 3 configuration servers:

mongos --configdb <host1>:<port1>,<host2>:<port2>,<host3>:<port3>

• This is always how to start a new mongos, even if the cluster is already running

Wednesday, March 27, 13

Page 27: Sharding

Start the shard database

• mongod --shardsvr• Starts a mongod with the default shard port (27018)• Shard is not yet connected to the rest of the cluster• Shard may have already been running in production

Wednesday, March 27, 13

Page 28: Sharding

Add the Shard

• On mongos: - sh.addShard(‘<host>:27018’)

• Adding a replica set:

Wednesday, March 27, 13

Page 29: Sharding

Verify that the shard was added

• db.runCommand({ listshards:1 }) { "shards" : ! [{"_id”: "shard0000”,"host”: ”<hostname>:27018” } ],

"ok" : 1 }

Wednesday, March 27, 13

Page 30: Sharding

Enabling Sharding

• Enable sharding on a database

sh.enableSharding(“<dbname>”)

• Shard a collection with the given key

sh.shardCollection(“<dbname>.people”,{“country”:1})

• Use a compound shard key to prevent duplicates

sh.shardCollection(“<dbname>.cars”,{“year”:1, ”uniqueid”:1})

Wednesday, March 27, 13

Page 31: Sharding

Tag Aware Sharding

• Tag aware sharding allows you to control the distribution of your data

• Tag a range of shard keys– sh.addTagRange(<collection>,<min>,<max>,<tag>)

• Tag a shard– sh.addShardTag(<shard>,<tag>)

Wednesday, March 27, 13

Page 32: Sharding

Mechanics

Wednesday, March 27, 13

Page 33: Sharding

Partitioning

• Remember it's based on ranges

Wednesday, March 27, 13

Page 34: Sharding

Chunk is a section of the entire range

Wednesday, March 27, 13

Page 35: Sharding

Chunk splitting

• A chunk is split once it exceeds the maximum size• There is no split point if all documents have the same shard key• Chunk split is a logical operation (no data is moved)

Wednesday, March 27, 13

Page 36: Sharding

Balancing

• Balancer is running on mongos• Once the difference in chunks between the most dense shard

and the least dense shard is above the migration threshold, a balancing round starts

Wednesday, March 27, 13

Page 37: Sharding

Acquiring the Balancer Lock

• The balancer on mongos takes out a “balancer lock”• To see the status of these locks:

use configdb.locks.find({ _id: “balancer” })

Wednesday, March 27, 13

Page 38: Sharding

Moving the chunk

• The mongos sends a moveChunk command to source shard• The source shard then notifies destination shard• Destination shard starts pulling documents from source shard

Wednesday, March 27, 13

Page 39: Sharding

Committing Migration

• When complete, destination shard updates config server- Provides new locations of the chunks

Wednesday, March 27, 13

Page 40: Sharding

Cleanup

• Source shard deletes moved data- Must wait for open cursors to either close or time out- NoTimeout cursors may prevent the release of the lock

• The mongos releases the balancer lock after old chunks are

Wednesday, March 27, 13

Page 41: Sharding

Routing Requests

Wednesday, March 27, 13

Page 42: Sharding

Cluster Request Routing

• Targeted Queries• Scatter Gather Queries• Scatter Gather Queries with Sort

Wednesday, March 27, 13

Page 43: Sharding

Cluster Request Routing: Targeted Query

Wednesday, March 27, 13

Page 44: Sharding

Routable request received

Wednesday, March 27, 13

Page 45: Sharding

Request routed to appropriate shard

Wednesday, March 27, 13

Page 46: Sharding

Shard returns results

Wednesday, March 27, 13

Page 47: Sharding

Mongos returns results to client

Wednesday, March 27, 13

Page 48: Sharding

Cluster Request Routing: Non-Targeted Query

Wednesday, March 27, 13

Page 49: Sharding

Non-Targeted Request Received

Wednesday, March 27, 13

Page 50: Sharding

Request sent to all shards

Wednesday, March 27, 13

Page 51: Sharding

Shards return results to mongos

Wednesday, March 27, 13

Page 52: Sharding

Mongos returns results to client

Wednesday, March 27, 13

Page 53: Sharding

Cluster Request Routing: Non-Targeted Query with Sort

Wednesday, March 27, 13

Page 54: Sharding

Non-Targeted request with sort received

Wednesday, March 27, 13

Page 55: Sharding

Request sent to all shards

Wednesday, March 27, 13

Page 56: Sharding

Query and sort performed locally

Wednesday, March 27, 13

Page 57: Sharding

Shards return results to mongos

Wednesday, March 27, 13

Page 58: Sharding

Mongos merges sorted results

Wednesday, March 27, 13

Page 59: Sharding

Mongos returns results to client

Wednesday, March 27, 13

Page 60: Sharding

Shard Key

Wednesday, March 27, 13

Page 61: Sharding

Shard Key

• Shard key is immutable• Shard key values are immutable• Shard key must be indexed• Shard key limited to 512 bytes in size• Shard key used to route queries

– Choose a field commonly used in queries

• Only shard key can be unique across shards– `_id ̀field is only unique within individual shard

Wednesday, March 27, 13

Page 62: Sharding

Shard Key Considerations

• Cardinality• Write Distribution• Query Isolation• Reliability• Index Locality

Wednesday, March 27, 13

Page 63: Sharding

Conclusion

Wednesday, March 27, 13

Page 64: Sharding

Read/Write Throughput Exceeds I/O

Wednesday, March 27, 13

Page 65: Sharding

Working Set Exceeds Physical Memory

Wednesday, March 27, 13

Page 66: Sharding

Sharding Enables Scale

• MongoDB’s Auto-Sharding– Easy to Configure– Consistent Interface– Free and Open Source

Wednesday, March 27, 13

Page 68: Sharding

Software Engineer, 10gen

Tyler Brock

Thank You

Wednesday, March 27, 13