6. 6 Infrastructure Sizing RAM CPU Disk Size I/O Bandwidth
Availability
7. 7 Sizing Indexes need to be in RAM Working set needs to be
in RAM I/O Bandwidth - write load - Index updates - Working set
migration { _id: ObjectId(), tour: UUID, user: UUID, name: "Doug's
Dogs", desc: "The best hot-dog", clues: [ "Hungry for a Coney
Island?", "Ask for Dr. Frankenfurter", "Look for the hot dog stand"
] "geometry": { "type": "Point", "coordinates": [125.6, 10.1] }
}
8. 11 Load Testing
9. 12 Load Testing Test it like you use it, benchmarks dont
count
10. 13 Load Testing Test it like you use it, benchmarks dont
count Test to failure
11. 14 Load Testing Test it like you use it, benchmarks dont
count Test to failure Instrument your code!
12. 15 Load Testing Test it like you use it, benchmarks dont
count Test to failure Instrument your code!
https://github.com/breinero/Firehose
https://github.com/ParsePlatform/flashback
13. 16 Load Testing Test it like you use it, benchmarks dont
count Test to failure Instrument your code! Theres me
21. 24 Logging Save and Rotate Dont use --quiet --logpath !=
--dbpath Use component verbosity for debugging
22. 25 Security
23. 26 Security Firewall Bind ip Encrypt Networks Enable Access
Control Dont enable REST interface Auditing Limit Exposure and use
Principal of Least Privileges
24. 27 Tuning Best Practices Disable Transparent hugepages NTP
to synchronize time Set ulimits Use XFS or Ext4 Dont use NFS
Disable NUMA Have swap Read Production Notes Tunables Set IO
Scheduler NOOP Adjust readaheads ( MMapV1 ) Avoid cgroups SE Linux
(?) RAID
32. 35 Emergency Procedures
https://spinoff.nasa.gov/spinoff2002/images/070.jpg Backup and
Recovery File System Snapshot MMS Cloud Ops Manager Mongodump
33. 36 Backups and Recovery
https://spinoff.nasa.gov/spinoff2002/images/070.jpg PERFORM DRILLS
OFTEN AND ROUTINELY
34. 37 Emergency Procedures
https://spinoff.nasa.gov/spinoff2002/images/070.jpg Document your
Procedures Include ETAs Follow procedures in docs.mongodb.org
35. 38 Production Ready Architecture L.B.
36. 39 Production Ready Architecture L.B. Unindexed
queries
37. 40 Production Ready Architecture L.B. Unindexed queries
Leads to collection scans
38. 41 Production Ready Architecture L.B. Unindexed queries
Leads to collection scans Results in high latencies
39. 42 Classic Failure Scenario L.B. Unindexed queries Leads to
collection scans Results in high latenciesCauses memory
exhaustion
40. 43 Production Ready Architecture L.B. Unindexed queries
Leads to collection scans Results in high latenciesCauses memory
exhaustion CASCADING FAILURE
41. 44 Circuit Breaker Trigger Conditions Latency
stats.getMean() >= max OpsPerSecond stats.getN() >= max
ConcurrentOperations stats.getN()*stats.getMean() >= max
42. 45 Circuit Breaker Trigger Conditions Latency
stats.getMean() >= max OpsPerSecond stats.getN() >= max
ConcurrentOperations stats.getN()*stats.getMean() >= max
https://github.com/breinero/Firehose
43. 46 Production Ready Architecture L.B.
44. 47 Client Side Dont use ensureIndex() in application Look
out for connection bombs --maxConnect DO use operation timeouts
DONT cause socket timeouts Lower keepalives Avoid retry bombs
45. 48 Requirements & Specs Make a DevOps Contract Database
Access Requirements Database Access Fulfillment Specification
Cluster Configuration Monitoring and Alerting Specification
46. 49 Monitoring Opcounters Memory Page Faults Queues
Replication Lag Oplog Window Background Flush Average Disk
space