Upload
laurie-denness
View
6.224
Download
0
Embed Size (px)
Citation preview
Leveling Up Monitoring:
A Decade of Automating and Scaling Nagios
Katherine Daniels and Laurie Denness
@beerops - @lozzd Velocity 2016
@beerops - @lozzd Velocity 2016
Katherine Daniels@beerops
Senior Operations Engineer, Etsy Co-Author of Effective DevOps
Laurie Denness @lozzd
Staff Operations Engineer, Etsy Official Graph Enthusiast
3
Agenda
@beerops - @lozzd Velocity 2016
Au to mat i o n
2
D e p loy i nato r
3
S c a l i ng + To o l i ng
4
I n T h e B e g i n n i ng . . .
1
25MActive Buyers
About Etsy
1.6MActive Sellers
$2.39B2015 Annual GMS
(As of March 31, 2016)
Monitoring!
@beerops - @lozzd Velocity 2016
@beerops - @lozzd Velocity 2016
bit.ly/yaynagios
https://kartar.net/2015/08/monitoring-survey-2015---tools/
@beerops - @lozzd Velocity 2016
In The Beginning
@beerops - @lozzd Velocity 2016
@beerops - @lozzd Velocity 2016
@beerops - @lozzd Velocity 2016
Sometimes your statement needs emphasis with a black background.
@beerops - @lozzd Velocity 2016
@beerops - @lozzd Velocity 2016
L E S S O N S L E A R N E D :
Templates are awesome.
@beerops - @lozzd Velocity 2016
@beerops - @lozzd Velocity 2016
@beerops - @lozzd Velocity 2016
@beerops - @lozzd Velocity 2016
@beerops - @lozzd Velocity 2016
define service { use generic-service hostgroups Linux_hosts,!email-only-servers service_description SSH check_command check_ssh }
@beerops - @lozzd Velocity 2016
define service { use disk-space-service hostgroup_name email-only-servers contact_groups ops_nonurgent }
@beerops - @lozzd Velocity 2016
@beerops - @lozzd Velocity 2016
L E S S O N S L E A R N E D :
Start small.
@beerops - @lozzd Velocity 2016
Nagios and Chef
@beerops - @lozzd Velocity 2016
@beerops - @lozzd Velocity 2016
24
@beerops - @lozzd Velocity 2016
L E S S O N S L E A R N E D :
Automation is awesome!
@beerops - @lozzd Velocity 2016
L E S S O N S L E A R N E D :
Automation is awesome!
HA HA JUST KIDDING
@beerops - @lozzd Velocity 2016
@beerops - @lozzd Velocity 2016
@beerops - @lozzd Velocity 2016
L E S S O N S L E A R N E D :
Trust but verify.
@beerops - @lozzd Velocity 2016
How Many Repos?
@beerops - @lozzd Velocity 2016
@beerops - @lozzd Velocity 2016
@beerops - @lozzd Velocity 2016
L E S S O N S L E A R N E D :
?!?!?!?!??!?!
@beerops - @lozzd Velocity 2016
L E S S O N S L E A R N E D :
Try, fail, learn, and try again.
Problems
Problems
• Four git repos, inconsistent mess, duplication
Problems
• Four git repos, inconsistent mess, duplication
• Broken semi-useful automation - need to regain trust
Problems
• Four git repos, inconsistent mess, duplication
• Broken semi-useful automation - need to regain trust
• Some shared config, some unique
Problems
• Four git repos, inconsistent mess, duplication
• Broken semi-useful automation - need to regain trust
• Some shared config, some unique
• Gain confidence in changes
Problems
• Four git repos, inconsistent mess, duplication
• Broken semi-useful automation - need to regain trust
• Some shared config, some unique
• Gain confidence in changes
• Stop editing on the production box
@beerops - @lozzd Velocity 2016
Nagios and Chef
@beerops - @lozzd Velocity 2016
Nagios and Chefand Deployinator!
@beerops - @lozzd Velocity 2016
Solution 1: Merge everything: find and remove duplication,
shared configs
@beerops - @lozzd Velocity 2016
Thanks Murphy!
@beerops - @lozzd Velocity 2016
Super Secret Option!!!
@beerops - @lozzd Velocity 2016
@beerops - @lozzd Velocity 2016
@beerops - @lozzd Velocity 2016
@beerops - @lozzd Velocity 2016
Solution 2:
Using Jenkins CI to test changes before production
@beerops - @lozzd Velocity 2016
Solution 3:
Use Deployinator to run Chef recipe to generate automated configs
Chart Tit le
Chart Tit le
@beerops - @lozzd Velocity 2016
Solution 4:
Use Deployinator to rsync config to all boxes
• git pull repo on deploy host
• git pull repo on deploy host
• Run Chef recipe to add automated pieces
• git pull repo on deploy host
• Run Chef recipe to add automated pieces
• Re-run the try-nagios script against that
• git pull repo on deploy host
• Run Chef recipe to add automated pieces
• Re-run the try-nagios script against that
• rsync copy from deploy box to Nagios hosts
• git pull repo on deploy host
• Run Chef recipe to add automated pieces
• Re-run the try-nagios script against that
• rsync copy from deploy box to Nagios hosts
• Create symlink for nagios.cfg
• git pull repo on deploy host
• Run Chef recipe to add automated pieces
• Re-run the try-nagios script against that
• rsync copy from deploy box to Nagios hosts
• Create symlink for nagios.cfg
• Restart Nagios
@beerops - @lozzd Velocity 2016
L E S S O N S L E A R N E D :
Use the tools you have.
@beerops - @lozzd Velocity 2016
Scaling things up!
@beerops - @lozzd Velocity 2016
@beerops - @lozzd Velocity 2016
@beerops - @lozzd Velocity 2016
@beerops - @lozzd Velocity 2016
@beerops - @lozzd Velocity 2016
Core Workers
@beerops - @lozzd Velocity 2016
Core Workers
@beerops - @lozzd Velocity 2016
@beerops - @lozzd Velocity 2016
L E S S O N S L E A R N E D :
If at first you don’t succeed, rub some webscale on it.
@beerops - @lozzd Velocity 2016
Iterating and Iterating
@beerops - @lozzd Velocity 2016
L E S S O N S L E A R N E D :
Iterate
Iterate
Iterate
@beerops - @lozzd Velocity 2016
To Infinity and Beyond
@beerops - @lozzd Velocity 2016
http://github.com/etsy/opsweekly
http://github.com/etsy/opsweekly
Chart Tit le
Chart Tit le
Final Lessons Learned
• Templates are awesome
• Start small
• Automation is awesome
• Trust but verify
• Learn from (y)our mistakes
• Iterate on the tools you have
Open Source Summary
Open Source Summary
• http://github.com/etsy/deployinator
• http://github.com/etsy/pushbot
• http://github.com/etsy/trylib
• http://github.com/etsy/opsweekly
• http://github.com/etsy/nagios-herald
• http://github.com/RJ/irccat
THANK YOU!
@beerops - @lozzd Velocity 2016