Upload
matthew-skelton
View
3.979
Download
0
Tags:
Embed Size (px)
DESCRIPTION
Software operability and run book collaboration - slides from @UNICOMSeminars DevOps Summit in Amsterdam, 14 November 2013
Citation preview
#u
nid
evop
s
Software Operability and Run Book Collaboration
Matthew Skelton14th November 2013
DevOps SummitAmsterdamwww.devopssumit.com
@matthewpskeltonsoftwareoperability.com
#u
nid
evop
s
Agenda
• Software Operability
• Run Book Collaboration
• Making Operability Work
• Questions
#u
nid
evop
s
Background
• Software systems since 1998• Build & Deployment at
thetrainline.com• London Continuous Delivery
meetup group - londoncd.org.uk• Experience DevOps workshops
#u
nid
evop
s
Software Operability
#u
nid
evop
s
Software Operability
• Operability: the properties of a system which make it work well in Production
#u
nid
evop
s
“Non-Functional”
#u
nid
evop
s
Operable Systems
Since 1929, Mallorca, Spain
#u
nid
evop
s
Software Operability
• David Copeland (@davetron5000):“How your software runs in production is all that matters. The most amazing abstractions, cleanest code, or beautiful algorithms are meaningless if your code doesn’t run well on production.”
• http://www.naildrivin5.com/blog/2013/06/16/production-is-all-that-matters.html
#u
nid
evop
s
Operational Criteria
• Deploy• Monitor• Diagnose• Debug• Query• Control• Inspect• Clear• ...
#u
nid
evop
s
Ops Folk are Users Too!
#u
nid
evop
s
Run Book Collaboration
#u
nid
evop
s
Operational considerations
#u
nid
evop
s
Operational considerations
#u
nid
evop
s
Operational considerations
#u
nid
evop
s
Run Book
#u
nid
evop
s
Example
• 1 Table of Contents• 2 System Overview
– 2.1 Service Overview– 2.2 Contributing Applications, Daemons, and Windows Services– 2.3 Hours of Operation– 2.4 Execution Design– 2.5 Infrastructure and Network Design– 2.6 Resilience, Fault Tolerance and High-Availability– 2.7 Throttling and Partial Shutdown– 2.8 Required Resources– 2.9 Expected Traffic and Load
• 2.9.1 Hot or Peak Periods• 2.9.2 Warm Periods• 2.9.3 Cool or Quiet Periods
– 2.10 Environmental Differences– 2.11 Tools
• 3 Security and Access Control• 4 System Configuration
– 4.1 Configuration Management
• 5 System Backup and Restore – 5.1 Backup Requirements
• 5.1.1 Special Files
– 5.2 Backup Procedures– 5.3 Restore Procedures
• 6 Monitoring and Alerting – 6.1 Error Messages– 6.2 Events– 6.3 Health Checks– 6.4 Other Messages
• 7 Operational Tasks – 7.1 Deployment– 7.2 Batch Processing– 7.3 Power Procedures– 7.4 Routine Checks
• 7.4.1 System Rebuilds
– 7.5 Troubleshooting
• 8 Maintenance Tasks – 8.1 Maintenance Procedures
• 8.1.1 Patching – 8.1.1.1 Normal Cycle– 8.1.1.2 Zero-Day Vulnerabilities
• 8.1.2 GMT/BST time changes• 8.1.3 Cleardown Activities
– 8.1.3.1 Log Rotation
– 8.2 Testing • 8.2.1 Technical Testing• 8.2.2 Post-Deployment
• 9 Failure and Recovery Procedures – 9.1 Failover– 9.2 Recovery– 9.3 Troubleshooting Failover and Recovery
• 10 Contact Details
#u
nid
evop
s
Example
• 1 Table of Contents• 2 System Overview
– 2.1 Service Overview– 2.2 Contributing Applications,
Daemons, and Windows Services
– 2.3 Hours of Operation– 2.4 Execution Design– 2.5 Infrastructure and Network
Design– 2.6 Resilience, Fault Tolerance
and High-Availability– 2.7 Throttling and Partial
Shutdown– 2.8 Required Resources
– 2.9 Expected Traffic and Load
• 3 Security and Access Control• 4 System Configuration • 5 System Backup and Restore • 6 Monitoring and Alerting • 7 Operational Tasks • 8 Maintenance Tasks • 9 Failure and Recovery
Procedures • 10 Contact Details
#u
nid
evop
s
Templates
#u
nid
evop
s
Focus on Collaboration
#u
nid
evop
s
Feedback Loops
Gene Kim:http://itrevolution.com/the-three-ways-principles-underpinning-devops/
#u
nid
evop
s
Run Book as Collaboration
• Focus on the collaboration• Run book is a means, not an end• Throw it away when complete (?)• Aim to automate more over time
• See http://runbookcollab.info/
#u
nid
evop
s
Making Operability Work
#u
nid
evop
s
“Non-Functional”
#u
nid
evop
s
Operational Features
Features
#u
nid
evop
s
“I just want to write code”
#u
nid
evop
s
Mysterious Coding Tricks
#u
nid
evop
s
On-call for Responsibility
#u
nid
evop
s
The operability of operability
• Operational Features, not “NFRs”• Sustainable collaboration• Sensible, fair on-call rotas• Over-compensate in time off• Avoid burn-out
#u
nid
evop
s
What’s Next?
#u
nid
evop
s
Further Reading
• Patterns for Performance and Operability– Ford, Gileadi, Purba,
Moerman
• http://whoownsmyoperability.com/– Recommended reading lists
#u
nid
evop
s
Operability Book
• Software Operability – How to make software work well in Production– Due early 2014
• Sign up at OperabilityBook.com
• Discount code for DevOps Summit attendees
#u
nid
evop
s
Experience DevOps
• A hands-on workshop for DevOps culture
• Forthcoming dates:– Amsterdam: 15 November 2013– Bangalore: December 2013– London: February 2014 (tbc)
• http://experiencedevops.org/
#u
nid
evop
s
Questions & Discussion
Matthew Skelton @matthewpskelton
softwareoperability.com operabilitybook.com
#u
nid
evop
s
Acknowledgements
http://www.danatronics.com/s db_apps.html
http://www.guavaworks.com/company-blog/guava-doesnt-do-cookie-cutter.html
http://www.carpages.co.uk/ford/ford-sand-sculptures-05-09-11.asp
http://paranoidnews.org/wp-content/uploads/2010/10/Alien-Hunt-Alarm-Clock.jpg
http://particulations.blogspot.co.uk/ 2010/08/headingley-hole.html
http://marvel.wikia.com/ Stephen_Strange_(Earth-616)
#u
nid
evop
s
Further Slides
#u
nid
evop
s
PIPELINE Conference
• Continuous Delivery• Tuesday 8th April 2014• London, UK• http://pipelineconf.info/• @PipelineConf