Scaling Crittercism to 30,000 Requests Per Second and Beyond with MongoDB

Preview:

Citation preview

Scaling to 30,000 Requests Per Secondand Beyond

with MongoDB

Mike ChesnutDirector of Operations Engineering

Crittercism

Scaling to 30,000 Requests Per Secondand Beyond

with MongoDB

Mike ChesnutDirector of Operations Engineering

Crittercism

40,000

How a Startup Gets Started

● Pick something and go with it

How a Startup Gets Started

● Pick something and go with it● Make mistakes along the way

How a Startup Gets Started

● Pick something and go with it● Make mistakes along the way● Correct the mistakes you can

How a Startup Gets Started

● Pick something and go with it● Make mistakes along the way● Correct the mistakes you can● Work around the ones you can’t

How a Startup Gets Started

What I’ll Talk About

What I’ll Talk About

● Crittercism - Background and Architecture

What I’ll Talk About

● Crittercism - Background and Architecture● Router (mongos) Architecture

What I’ll Talk About

● Crittercism - Background and Architecture● Router (mongos) Architecture● Sharding Considerations

What I’ll Talk About

● Crittercism - Background and Architecture● Router (mongos) Architecture● Sharding Considerations● The Balancing Act

What I’ll Talk About

● Crittercism - Background and Architecture● Router (mongos) Architecture● Sharding Considerations● The Balancing Act● Q&A

Critter-What?

Critter-What?

A Brief History...

Critter-What?

Our Founders(Rob, Andrew, Jeeyun)

Critter-What?

Our Founders(Rob, Andrew, Jeeyun)

Let’s make a mobile app!It’ll be awesome!

Critter-What?

(Unnamed Dating App)

Critter-What?

Critter-What?

Critter-What?

Our Founders(Rob, Andrew, Jeeyun)

Our app isn’t so awesomeafter all...

Critter-What?

Critter-What?

Critter-What?

Critter-What?

Critter-What?

Critter-What?

Critter-What?

Critter-What?

Architecture

Architecture

Architecture

API

Architecture

APIFeedback

Architecture

APIFeedback

Crashes

Architecture

APIFeedback

App Loads

Crashes

Architecture

APIFeedback

App Loads

Crashes

HandledExceptions

Architecture

APIFeedback

App Loads

Crashes

HandledExceptions

Architecture

API

App Loads

Crashes

HandledExceptions

Architecture

APIApp Loads

Crashes

HandledExceptions

Architecture

API

Crashes

HandledExceptions

App Loads

batch

Architecture

API

Crashes

HandledExceptions

Metadata

App Loads

batch

Architecture

DynamoDB

API

Crashes

HandledExceptions

Metadata

App Loads

batch

Architecture

DynamoDB

API

Crashes

HandledExceptions

Metadata

App Loads

batch

Architecture

DynamoDB

API

API

Crashes

HandledExceptions

Metadata

PerformanceData

Geo Data

App Loads

batch

Architecture

DynamoDB

API

API

Crashes

HandledExceptions

Metadata

PerformanceData

Geo Data

40,000 req/s

App Loads

batch

Growth

Router Architecture

Router Architecture

mongodserver

mongodserver

mongodserver

replica set

mongodserver

mongodserver

mongodserver

replica set

mongodserver

mongodserver

mongodserver

replica set

MongoDB Cluster

Router Architecture

mongodserver

mongodserver

mongodserver

replica set

mongodserver

mongodserver

mongodserver

replica set

mongodserver

mongodserver

mongodserver

replica set

clientprocess

application server

clientprocess

application server

Client Application(s) MongoDB Cluster

Router Architecture

mongodserver

mongodserver

mongodserver

replica set

mongodserver

mongodserver

mongodserver

replica set

mongodserver

mongodserver

mongodserver

replica setmongos

clientprocess

application server

mongos

clientprocess

application server

Client Application(s) MongoDB Cluster

Router Architecture

mongodserver

mongodserver

mongodserver

replica set

mongodserver

mongodserver

mongodserver

replica set

mongodserver

mongodserver

mongodserver

replica setmongos

clientprocess

application server

mongos

clientprocess

application server

Client Application(s) MongoDB Cluster

Router Architecture

mongodserver

mongodserver

mongodserver

replica set

mongodserver

mongodserver

mongodserver

replica set

mongodserver

mongodserver

mongodserver

replica setmongos

clientprocess

application server

mongos

clientprocess

application server

Client Application(s) MongoDB Clustermongodserver

mongodserver

configserver

config servers

Router Architecture

mongodserver

mongodserver

mongodserver

replica set

mongodserver

mongodserver

mongodserver

replica set

mongodserver

mongodserver

mongodserver

replica setmongos

clientprocess

application server

mongos

clientprocess

application server

Client Application(s) MongoDB Clustermongodserver

mongodserver

configserver

config servers

Router Architecture

mongodserver

mongodserver

mongodserver

replica set

mongodserver

mongodserver

mongodserver

replica set

mongodserver

mongodserver

mongodserver

replica setmongos

clientprocess

application server

mongos

clientprocess

application server

Client Application(s) MongoDB Clustermongodserver

mongodserver

configserver

config servers

Router Architecture

RS

RS

RS

conf

ms

app

ms

app

Router Architecture

RS

RS

RS

confms

app

ms

app

ms

app

ms

app

ms

app

ms

app

ms

app

ms

app

ms

app

ms

app

ms

appms

app

ms

app

ms

app

ms

app

ms

app

ms

app

ms

app

ms

app

ms

app

ms

app

ms

app

Router Architecture

RS

RS

RS

confms

app

ms

app

ms

app

ms

app

ms

app

ms

app

ms

app

ms

app

ms

app

ms

app

ms

appms

app

ms

app

ms

app

ms

app

ms

app

ms

app

ms

app

ms

app

ms

app

ms

app

ms

app

.

.

.

Single mongos per client problems we encountered:

Router Architecture

Router Architecture

Single mongos per client problems we encountered:● thousands of connections to config servers

Router Architecture

Single mongos per client problems we encountered:● thousands of connections to config servers● config server CPU load

Router Architecture

Single mongos per client problems we encountered:● thousands of connections to config servers● config server CPU load● configdb propagation delays

Router Architecture

RS

RS

RS

confms

app

ms

app

ms

app

ms

app

ms

app

ms

app

ms

app

ms

app

ms

app

ms

app

ms

appms

app

ms

app

ms

app

ms

app

ms

app

ms

app

ms

app

ms

app

ms

app

ms

app

ms

app

.

.

.

We went from this...

Router Architecture

RS

RS

RS

confms

app

ms

app

ms

app

ms

app

ms

app

ms

app

ms

app

ms

app

ms

app

ms

appms

appms

app

ms

app

ms

app

ms

app

ms

app

ms

app

ms

app

ms

app

ms

app

ms

app msapp

.

.

.

.

.

.

To this.

Router Architecture

mongodserver

mongodserver

mongodserver

replica set

mongodserver

mongodserver

mongodserver

replica set

mongodserver

mongodserver

mongodserver

replica setmongos

clientprocess

application server

mongos

clientprocess

application server

Client Application(s) MongoDB Cluster

Router Architecture

mongodserver

mongodserver

mongodserver

replica set

mongodserver

mongodserver

mongodserver

replica set

mongodserver

mongodserver

mongodserver

replica set

mongos

clientprocess

application server

mongos

clientprocess

application server

Client Application(s) MongoDB ClusterRouter Tier

Router Architecture

Separate mongos tier advantages:

Router Architecture

Separate mongos tier advantages:● greatly reduced number of connections to each mongod

Router Architecture

Separate mongos tier advantages:● greatly reduced number of connections to each mongod● far fewer hosts talking to the config servers

Router Architecture

Separate mongos tier advantages:● greatly reduced number of connections to each mongod● far fewer hosts talking to the config servers● much faster configdb propagation

Router Architecture

Separate mongos tier advantages:● greatly reduced number of connections to each mongod● far fewer hosts talking to the config servers● much faster configdb propagation

Disadvantages:

Router Architecture

Separate mongos tier advantages:● greatly reduced number of connections to each mongod● far fewer hosts talking to the config servers● much faster configdb propagation

Disadvantages:● additional network hop

Router Architecture

Separate mongos tier advantages:● greatly reduced number of connections to each mongod● far fewer hosts talking to the config servers● much faster configdb propagation

Disadvantages:● additional network hop● host failure has a larger effect

Router Architecture

RS

RS

RS

confms

app

ms

app

ms

app

ms

app

ms

app

ms

app

ms

app

ms

app

ms

app

ms

app

ms

appms

app

ms

app

ms

app

ms

app

ms

app

ms

app

ms

app

ms

app

ms

app

ms

app

ms

app

.

.

.

mongos-per-host failure:

Router Architecture

RS

RS

RS

confms

app

ms

app

ms

app

ms

app

ms

app

ms

app

ms

app

ms

app

ms

app

ms

app

ms

appms

app

ms

app

ms

app

ms

app

ms

app

ms

app

ms

app

ms

app

ms

app

ms

app

ms

app

.

.

.

mongos-per-host failure:

Router Architecture

RS

RS

RS

confms

app

ms

app

ms

app

ms

app

ms

app

ms

app

ms

app

ms

app

ms

app

ms

app

ms

appms

app

ms

app

ms

app

ms

app

ms

app

ms

app

ms

app

ms

app

ms

app

ms

app

ms

app

.

.

.

mongos-per-host failure:

Router Architecture

RS

RS

RS

confms

app

ms

app

ms

app

ms

app

ms

app

ms

app

ms

app

ms

app

ms

app

ms

appms

appms

app

ms

app

ms

app

ms

app

ms

app

ms

app

ms

app

ms

app

ms

app

ms

app msapp

.

.

.

.

.

.

Separate mongos tier failure:

Router Architecture

RS

RS

RS

confms

app

ms

app

ms

app

ms

app

ms

app

ms

app

ms

app

ms

app

ms

app

ms

appms

appms

app

ms

app

ms

app

ms

app

ms

app

ms

app

ms

app

ms

app

ms

app

ms

app msapp

.

.

.

.

.

.

Separate mongos tier failure:

Router Architecture

RS

RS

RS

confms

app

ms

app

ms

app

ms

app

ms

app

ms

app

ms

app

ms

app

ms

app

ms

appms

appms

app

ms

app

ms

app

ms

app

ms

app

ms

app

ms

app

ms

app

ms

app

ms

app msapp

.

.

.

.

.

.

Separate mongos tier failure:

Router Architecture

RS

RS

RS

confms

app

ms

app

ms

app

ms

app

ms

app

ms

app

ms

app

ms

app

ms

app

ms

appms

appms

app

ms

app

ms

app

ms

app

ms

app

ms

app

ms

app

ms

app

ms

app

ms

app msapp

.

.

.

.

.

.

So increase the number of mongos routers:

Router Architecture

RS

RS

RS

confms

app

ms

app

ms

app

ms

app

ms

app

ms

app

ms

app

ms

app

ms

app

ms

appms

appms

app

ms

app

ms

app

ms

app

ms

app

ms

app

ms

app

ms

app

ms

app

ms

appms

app

.

.

.

.

.

.

ms

ms

So increase the number of mongos routers:

Router Architecture - Evolve!

Router Architecture - Evolve!

mongodserver

mongodserver

mongodserver

replica set

mongodserver

mongodserver

mongodserver

replica set

mongodserver

mongodserver

mongodserver

replica setmongos

clientprocess

application server

mongos

clientprocess

application server

Client Application(s) MongoDB ClusterMaybe at first,doing themongos-per-hostarchitectureis fine.

Maybe at first,doing themongos-per-hostarchitectureis fine.

And it will probablyremain finefor quite a while.

Router Architecture - Evolve!

mongodserver

mongodserver

mongodserver

replica set

mongodserver

mongodserver

mongodserver

replica set

mongodserver

mongodserver

mongodserver

replica setmongos

clientprocess

application server

mongos

clientprocess

application server

Client Application(s) MongoDB Cluster

Router Architecture - Evolve!

mongodserver

mongodserver

mongodserver

replica set

mongodserver

mongodserver

mongodserver

replica set

mongodserver

mongodserver

mongodserver

replica set

mongos

clientprocess

application server

mongos

clientprocess

application server

Client Application(s) MongoDB ClusterRouter TierThis is an areawhere you canand should bewilling to adaptas you go(and as needed).

Sharding Considerations

Pick something you want to live with.

Sharding Considerations

Sharding Considerations

Sharding Considerations

Sharding Considerations

Sharding Considerations

Sharding Considerations

Sharding Considerations

Sharding Considerations

Sharding Considerations

Sharding Considerations

Sharding Considerations

Sharding Considerations

Sharding Considerations

Sharding Considerations

Sharding Considerations

Sharding Considerations

Sharding Considerations

Sharding Considerations

Sharding Considerations

Sharding Considerations

Sharding Considerations

Sharding Considerations

Sharding Considerations

Sharding Considerations

Sharding Considerations

Sharding Considerations

Sharding Considerations

Sharding Considerations

Sharding Considerations

Sharding Considerations

Sharding Considerations

Sharding Considerations

What could we have done differently?

Sharding Considerations

Sharding Considerations

Sharding Considerations

Sharding Considerations

Sharding Considerations

Sharding Considerations

Sharding Considerations

Sharding Considerations

Sharding Considerations

Sharding Considerations

Sharding Considerations

Sharding Considerations

Sharding Considerations

Sharding Considerations

Sharding Considerations

Sharding Considerations

Sharding Considerations

Sharding Considerations

Sharding Considerations

Sharding Considerations

Sharding Considerations

The Balancing Act

The Balancing Act

Why wouldn’t you run the balancer in the first place?

The Balancing Act

Why wouldn’t you run the balancer in the first place?● great question

The Balancing Act

Why wouldn’t you run the balancer in the first place?● great question● for us, it’s because we deleted some old data at one point, and left

a bunch of holes

The Balancing Act

Why wouldn’t you run the balancer in the first place?● great question● for us, it’s because we deleted some old data at one point, and left

a bunch of holes○ we turned it off while deleting this data

The Balancing Act

Why wouldn’t you run the balancer in the first place?● great question● for us, it’s because we deleted some old data at one point, and left

a bunch of holes○ we turned it off while deleting this data○ and then were unable to turn it back on

The Balancing Act

Why wouldn’t you run the balancer in the first place?● great question● for us, it’s because we deleted some old data at one point, and left

a bunch of holes○ we turned it off while deleting this data○ and then were unable to turn it back on

● but maybe you start without it

The Balancing Act

Why wouldn’t you run the balancer in the first place?● great question● for us, it’s because we deleted some old data at one point, and left

a bunch of holes○ we turned it off while deleting this data○ and then were unable to turn it back on

● but maybe you start without it● or maybe you need to turn it off for maintenance and forget to turn

it back on

The Balancing Act

Why wouldn’t you run the balancer in the first place?● great question● for us, it’s because we deleted some old data at one point, and left

a bunch of holes○ we turned it off while deleting this data○ and then were unable to turn it back on

● but maybe you start without it● or maybe you need to turn it off for maintenance and forget to turn

it back on

Obviously, don’t do this. But if you do, here’s what happens...

The Balancing Act

Fresh, new, empty cluster… But no balancer running.

The Balancing Act

The Balancing Act

The Balancing Act

The Balancing Act

The Balancing Act

The Balancing Act

The Balancing Act

The Balancing Act

The Balancing Act

The Balancing Act

The Balancing Act

The Balancing Act

The Balancing Act

The Balancing Act

The Balancing Act

The Balancing Act

The Balancing Act

The Balancing Act

The Balancing Act

The Balancing Act

Now we’re pretty full, so let’s add another shard...

The Balancing Act

The Balancing Act

And keep inserting...

The Balancing Act

The Balancing Act

The Balancing Act

The Balancing Act

The Balancing Act

Suddenly we find ourselves with a very unbalanced cluster.

The Balancing Act

But if we enable the balancer, it will DoS the 5th shard!

The Balancing Act

The approximate effect looks something like this:

The Balancing Act

The approximate effect looks something like this:

The Balancing Act

The approximate effect looks something like this:

The Balancing Act

The approximate effect looks something like this:

The Balancing Act

The approximate effect looks something like this:

The Balancing Act

The approximate effect looks something like this:

The Balancing Act

The approximate effect looks something like this:

So what can we do?

The Balancing Act

So what can we do?

1. add IOPS

The Balancing Act

So what can we do?

1. add IOPS2. make sure your config servers have plenty of CPU (and IOPS)

The Balancing Act

So what can we do?

1. add IOPS2. make sure your config servers have plenty of CPU (and IOPS)3. slowly move chunks manually

The Balancing Act

So what can we do?

1. add IOPS2. make sure your config servers have plenty of CPU (and IOPS)3. slowly move chunks manually4. approach a balanced state

The Balancing Act

So what can we do?

1. add IOPS2. make sure your config servers have plenty of CPU (and IOPS)3. slowly move chunks manually4. approach a balanced state5. hold your breath

The Balancing Act

So what can we do?

1. add IOPS2. make sure your config servers have plenty of CPU (and IOPS)3. slowly move chunks manually4. approach a balanced state5. hold your breath6. try re-enabling the balancer

The Balancing Act

How to manually balance:

1. determine a chunk on a hot shard2. monitor effects on both the source and target shards3. move the chunk4. allow the system to settle5. repeat

The Balancing Act

Conclusion here:

Run the balancer!

The Balancing Act

● Design ahead of timeo “NoSQL” lets you play it by earo but some of these decisions will bite you later

● Be willing to correct past mistakeso dedicate time and resources to adaptingo learn how to live with the mistakes you can’t correct

Summary

References

● MongoDB Blog post (details on shard migration):http://blog.mongodb.org/post/77278906988/crittercism-scaling-to-billions-of-requests-per-day-on

● MongoDB Webinar (details on manual chunk migrations):http://www.mongodb.com/presentations/webinar-back-basics-3-scaling-30000-requests-second-mongodb

● Documentation on mongos routers:http://docs.mongodb.org/master/core/sharded-cluster-query-routing/

● Documentation on the balancer:http://docs.mongodb.org/manual/tutorial/manage-sharded-cluster-balancer/

● Documentation on shard keys:http://docs.mongodb.org/manual/core/sharding-shard-key/

Crittercism: http://www.crittercism.com/ to learn more,and http://www.crittercism.com/careers/ if you want to help us!

Q&A

Thank You!

Recommended