29
Five MMS Monitoring Alerts to Keep Your MongoDB Deployment on Track Angshuman Bagchi ([email protected] ) Technical Services Engineer

Webinar: Five MMS Monitoring Alerts to Keep Your MongoDB Deployment on Track

  • Upload
    mongodb

  • View
    1.152

  • Download
    0

Embed Size (px)

DESCRIPTION

MongoDB Management Service (MMS) is is a cloud-based suite of services for managing MongoDB deployments, providing both monitoring and backup capabilities. In this webinar we'll outline 5 alerts you should set up in MMS to keep your MongoDB deployment on track. We’ll explore what each alert means for a MongoDB instance, as well as how to calibrate the alert triggers to be relevant to your environment.

Citation preview

Page 1: Webinar: Five MMS Monitoring Alerts to Keep Your MongoDB Deployment on Track

Five MMS Monitoring Alerts to Keep Your MongoDB Deployment on Track

Angshuman Bagchi ([email protected])Technical Services Engineer

Page 2: Webinar: Five MMS Monitoring Alerts to Keep Your MongoDB Deployment on Track

Agenda

• What is MMS Monitoring?• What are Alerts?• How to pick an Alert?• Five recommended Alerts• Wrap up

Page 3: Webinar: Five MMS Monitoring Alerts to Keep Your MongoDB Deployment on Track

What is MMS Monitoring?

Page 4: Webinar: Five MMS Monitoring Alerts to Keep Your MongoDB Deployment on Track
Page 5: Webinar: Five MMS Monitoring Alerts to Keep Your MongoDB Deployment on Track
Page 6: Webinar: Five MMS Monitoring Alerts to Keep Your MongoDB Deployment on Track

Who uses MMS?

Page 7: Webinar: Five MMS Monitoring Alerts to Keep Your MongoDB Deployment on Track

What are MMS alerts?

Page 8: Webinar: Five MMS Monitoring Alerts to Keep Your MongoDB Deployment on Track

Source:http://www.cleanfunnypics.com/no-its-not-empty/#axzz2pqknJJbC

Page 9: Webinar: Five MMS Monitoring Alerts to Keep Your MongoDB Deployment on Track
Page 10: Webinar: Five MMS Monitoring Alerts to Keep Your MongoDB Deployment on Track

How to pick an Alert?

Page 11: Webinar: Five MMS Monitoring Alerts to Keep Your MongoDB Deployment on Track

• Is there an absolute limit to alert on?• What is normal (baseline) ?• What is worrying (warning) ?• What is a definite problem (critical) ?• Likelihood of false positives ?

... there is no magic formula

Page 12: Webinar: Five MMS Monitoring Alerts to Keep Your MongoDB Deployment on Track

Five recommended alerts

• Host Recovering (All, but by definition Secondary)

• Replication Lag (Secondary)• Connections (All mongos, mongod)• Lock % (Primary, Secondary)• Replica (Primary, Secondary)

Page 13: Webinar: Five MMS Monitoring Alerts to Keep Your MongoDB Deployment on Track

Host Recovering

• General alert triggered if any instance enters RECOVERING mode

• Required for all use-cases• All Replica Sets should have this. • Sometimes, during maintenance this

may be expected

Page 14: Webinar: Five MMS Monitoring Alerts to Keep Your MongoDB Deployment on Track

Host Recovering

Page 15: Webinar: Five MMS Monitoring Alerts to Keep Your MongoDB Deployment on Track

Replication Lag

• No secondary should be behind• Secondary reads affected• All Replica Sets should have this• Only exception is configured slaveDelay

Page 16: Webinar: Five MMS Monitoring Alerts to Keep Your MongoDB Deployment on Track

Replication Lag

Absolute Limit?Yes, about 1 or 2s. To prevent false positives absolute threshold > 240s should be alerted

Normal Lag is ideally 0s

Worrying < 60s, some false positives

Critical > 240s

False positives Above 240s likelihood low.

Page 17: Webinar: Five MMS Monitoring Alerts to Keep Your MongoDB Deployment on Track

Example: replication lag

150,000s of lag ~ almost 2 days of lag!

Page 18: Webinar: Five MMS Monitoring Alerts to Keep Your MongoDB Deployment on Track

Example: replication lag

• Secondaries under specified vs primaries• Access patterns between primary /

secondaries• Insufficient bandwidth• Foreground index builds on secondaries

“…when you have eliminated the impossible, whatever remains, however improbable, must be the truth…” -- Sherlock Holmes

Sir Arthur Conan Doyle, The Sign of the Four

Page 19: Webinar: Five MMS Monitoring Alerts to Keep Your MongoDB Deployment on Track

Example: replication lag

Example:• ~1500 ops per minute (opcounters)• 0.1 MB per object (average object size,

local db)

~1500 ops/min / 60 seconds * 0.1 MB/op * 8b/B =~ 20 mbps required bandwidth

Page 20: Webinar: Five MMS Monitoring Alerts to Keep Your MongoDB Deployment on Track

Connections

• Each connection consumes ~ 1MB and a file descriptor

• 5000 connections => 5GB of RAM• Stability and predictability are key

Page 21: Webinar: Five MMS Monitoring Alerts to Keep Your MongoDB Deployment on Track

Pro-Tip: know thyself

You have to recognize normal to know when it isn’t.

Source: http://www.flickr.com/photos/skippy/6853920/

Page 22: Webinar: Five MMS Monitoring Alerts to Keep Your MongoDB Deployment on Track

Connections

Absolute Limit? Yes, but this is too high. We need to alert before that

NormalTBD based on deployment, number of nodes, connection pool settings, app servers, load etc. Say, X during peak load

Worrying 50% increase, so, 1.5X

Critical Double, so 2X

Page 23: Webinar: Five MMS Monitoring Alerts to Keep Your MongoDB Deployment on Track

Lock %

• Lock contention degrades performance• High lock % starves replication, reads.• Bounds need to be determined

Page 24: Webinar: Five MMS Monitoring Alerts to Keep Your MongoDB Deployment on Track

Lock %

Absolute Limit?Yes, >80% occasional degraded performance, 90% major impact regularly

NormalTBD. Write heavy loads see higher values. Normal, say X% during peak load

Worrying Double, so approximately 2X%

Critical TBD. For Prod > 80%

Page 25: Webinar: Five MMS Monitoring Alerts to Keep Your MongoDB Deployment on Track

Replica

• Represents oplog window• Depends on

– Rate of operations inserted into oplog– Size of operations– Size of oplog capped collection

• Normal maintenance window X 3 • Resizing the oplog is non-trivial

Page 26: Webinar: Five MMS Monitoring Alerts to Keep Your MongoDB Deployment on Track

Replica

Absolute Limit? 50% below Normal

Normal TBD. Say X hours during peak

Worrying 25% below Normal

Critical 50% below Normal

Page 27: Webinar: Five MMS Monitoring Alerts to Keep Your MongoDB Deployment on Track

Summary

• Use similar approach for other metrics• Different audiences for alerts

– Worrying alerts ops team– Critical goes out to a wider audience

• Get started with MMS Monitoring and alerts!

Page 28: Webinar: Five MMS Monitoring Alerts to Keep Your MongoDB Deployment on Track

I got alerted … now what?

Page 29: Webinar: Five MMS Monitoring Alerts to Keep Your MongoDB Deployment on Track

mms.mongodb.com

[email protected]