64
#unidevops Software Operability, Run Book Collaboration, and DevOps Matthew Skelton 27th February 2014 DevOps Summit, London, UK www.devopssummit.com @matthewpskelton softwareoperability.com

Software operability and run book collaboration London Feb 2014

Embed Size (px)

Citation preview

Page 1: Software operability and run book collaboration London Feb 2014

#u

nid

ev

op

s

Software Operability,

Run Book Collaboration,

and DevOps

Matthew Skelton

27th February 2014

DevOps Summit,

London, UK

www.devopssummit.com

@matthewpskelton

softwareoperability.com

Page 2: Software operability and run book collaboration London Feb 2014

#u

nid

ev

op

s

Agenda

• Software Operability

• Run Book Collaboration

• Making Operability Work

• Questions

Page 3: Software operability and run book collaboration London Feb 2014

#u

nid

ev

op

s

Background

• Software systems since 1998

• Continuous Delivery specialist, DevOps enthusiast, Operability nut

• London Continuous Delivery meetupgroup - londoncd.org.uk

• Experience DevOps workshops

• PIPELINE Conference

Page 4: Software operability and run book collaboration London Feb 2014

#u

nid

ev

op

s

Software

Operability

Page 5: Software operability and run book collaboration London Feb 2014

#u

nid

ev

op

s

Software Operability

• Definitions

• Examples

• Why focus on operability?

• How DevOps can help

Page 6: Software operability and run book collaboration London Feb 2014

#u

nid

ev

op

s

Operability?

Page 7: Software operability and run book collaboration London Feb 2014

#u

nid

ev

op

s

Etymology of Operability?

• Cognates:

– Opera

– Operate

– Operational

– Inter-operability

Page 8: Software operability and run book collaboration London Feb 2014

#u

nid

ev

op

s

Page 9: Software operability and run book collaboration London Feb 2014

#u

nid

ev

op

s

Software Operability

• Operability: the properties of a

system which make it work well in

Production

Page 10: Software operability and run book collaboration London Feb 2014

#u

nid

ev

op

s

Operable Systems

Since 1929,

Mallorca, Spain

Page 11: Software operability and run book collaboration London Feb 2014

#u

nid

ev

op

s

Software Operability

• David Copeland (@davetron5000):

“How your software runs in

production is all that matters. The

most amazing abstractions, cleanest

code, or beautiful algorithms are

meaningless if your code doesn’t run

well on production.”

• http://www.naildrivin5.com/blog/2013/06/16/production-is-all-that-matters.html

Page 12: Software operability and run book collaboration London Feb 2014

#u

nid

ev

op

s

Operational Criteria

• Deploy

• Monitor

• Diagnose

• Debug

• Query

• Control

• Inspect

• Clear

• ...

Page 13: Software operability and run book collaboration London Feb 2014

#u

nid

ev

op

s

“Non-Functional”

Page 14: Software operability and run book collaboration London Feb 2014

#u

nid

ev

op

s

Shaped by Operability

• Hooks (internal APIs) for:

– Logging

– Monitoring

– Diagnostics

– Health checks

– Data clear-down

– Service / daemon / container control

Page 15: Software operability and run book collaboration London Feb 2014

#u

nid

ev

op

s

Ops Folk are Users Too!

Page 16: Software operability and run book collaboration London Feb 2014

#u

nid

ev

op

s

Page 17: Software operability and run book collaboration London Feb 2014

#u

nid

ev

op

s

Why focus on Operability?

• Deploy more rapidly, frequently

• High cost of Production outage

• Systems now more complicated

Page 18: Software operability and run book collaboration London Feb 2014

#u

nid

ev

op

s

Outages are Embarrassing!

Page 19: Software operability and run book collaboration London Feb 2014

#u

nid

ev

op

s

Operational considerations

Page 20: Software operability and run book collaboration London Feb 2014

#u

nid

ev

op

s

Operational considerations

Page 21: Software operability and run book collaboration London Feb 2014

#u

nid

ev

op

s

Operational considerations

Page 22: Software operability and run book collaboration London Feb 2014

#u

nid

ev

op

s

How DevOps can help

• DevOps is one way to address poor operability

• Improved collaboration and communication between Dev teams and Ops teams

• Example: Run Book Collaboration

Page 23: Software operability and run book collaboration London Feb 2014

#u

nid

ev

op

s

Run Book

Collaboration

Page 24: Software operability and run book collaboration London Feb 2014

#u

nid

ev

op

s

Run Book Collaboration

• Feedback loops and learning

• What is a run book?

• How can run book collaboration

help operability?

Page 25: Software operability and run book collaboration London Feb 2014

#u

nid

ev

op

s

Feedback Loops

Gene Kim:

http://itrevolution.com/the-three-ways-principles-underpinning-devops/

Page 26: Software operability and run book collaboration London Feb 2014

#u

nid

ev

op

s

Run Book

Page 27: Software operability and run book collaboration London Feb 2014

#u

nid

ev

op

s

Templates

Page 28: Software operability and run book collaboration London Feb 2014

#u

nid

ev

op

s

Example

• 1 Table of Contents

• 2 System Overview – 2.1 Service Overview

– 2.2 Contributing Applications, Daemons, and Windows Services

– 2.3 Hours of Operation

– 2.4 Execution Design

– 2.5 Infrastructure and Network Design

– 2.6 Resilience, Fault Tolerance and High-Availability

– 2.7 Throttling and Partial Shutdown

– 2.8 Required Resources

– 2.9 Expected Traffic and Load • 2.9.1 Hot or Peak Periods• 2.9.2 Warm Periods• 2.9.3 Cool or Quiet Periods

– 2.10 Environmental Differences

– 2.11 Tools

• 3 Security and Access Control

• 4 System Configuration – 4.1 Configuration Management

• 5 System Backup and Restore – 5.1 Backup Requirements

• 5.1.1 Special Files

– 5.2 Backup Procedures

– 5.3 Restore Procedures

• 6 Monitoring and Alerting – 6.1 Error Messages

– 6.2 Events

– 6.3 Health Checks

– 6.4 Other Messages

• 7 Operational Tasks – 7.1 Deployment

– 7.2 Batch Processing

– 7.3 Power Procedures

– 7.4 Routine Checks • 7.4.1 System Rebuilds

– 7.5 Troubleshooting

• 8 Maintenance Tasks – 8.1 Maintenance Procedures

• 8.1.1 Patching – 8.1.1.1 Normal Cycle

– 8.1.1.2 Zero-Day Vulnerabilities

• 8.1.2 GMT/BST time changes• 8.1.3 Cleardown Activities

– 8.1.3.1 Log Rotation

– 8.2 Testing • 8.2.1 Technical Testing• 8.2.2 Post-Deployment

• 9 Failure and Recovery Procedures – 9.1 Failover

– 9.2 Recovery

– 9.3 Troubleshooting Failover and Recovery

• 10 Contact Details

Page 29: Software operability and run book collaboration London Feb 2014

#u

nid

ev

op

s

Example

• 1 Table of Contents

• 2 System Overview – 2.1 Service Overview

– 2.2 Contributing Applications, Daemons, and Windows Services

– 2.3 Hours of Operation

– 2.4 Execution Design

– 2.5 Infrastructure and Network Design

– 2.6 Resilience, Fault Tolerance and High-Availability

– 2.7 Throttling and Partial Shutdown

– 2.8 Required Resources

– 2.9 Expected Traffic and Load

• 3 Security and Access Control

• 4 System Configuration

• 5 System Backup and Restore

• 6 Monitoring and Alerting

• 7 Operational Tasks

• 8 Maintenance Tasks

• 9 Failure and Recovery Procedures

• 10 Contact Details

Page 30: Software operability and run book collaboration London Feb 2014

#u

nid

ev

op

s

Example

2.1 Service Overview

2.2 Contributing Applications, Daemons, and Windows Services

2.3 Hours of Operation

2.4 Execution Design

2.5 Infrastructure and Network Design

2.6 Resilience, Fault Tolerance and High-Availability

2.7 Throttling and Partial Shutdown

2.8 Required Resources

2.9 Expected Traffic and Load

Page 31: Software operability and run book collaboration London Feb 2014

#u

nid

ev

op

s

It‟s Not Documentation

Page 32: Software operability and run book collaboration London Feb 2014

#u

nid

ev

op

s

Focus on Collaboration

Page 33: Software operability and run book collaboration London Feb 2014

#u

nid

ev

op

s

Outcomes

• Better understanding

• Better cross-team working

• Reduction in operational problems

• Fewer outages

• Reduced long-term cost-of-

ownership

Page 34: Software operability and run book collaboration London Feb 2014

#u

nid

ev

op

s

Run Book as Collaboration

• Focus on the collaboration

• Run book is a means, not an end

• Throw it away when complete (?)

• Aim to automate more over time

• See http://runbookcollab.info/

Page 35: Software operability and run book collaboration London Feb 2014

#u

nid

ev

op

s

Making Operability

Work

Page 36: Software operability and run book collaboration London Feb 2014

#u

nid

ev

op

s

Making Operability Work

• NFRs vs Operational Features

• Budget changes

• Organisation changes

• Responsibility changes

• Avoid on-call anti-patterns

Page 37: Software operability and run book collaboration London Feb 2014

#u

nid

ev

op

s

“Non-Functional”

Page 38: Software operability and run book collaboration London Feb 2014

#u

nid

ev

op

s

Operational Features

Features

Page 39: Software operability and run book collaboration London Feb 2014

#u

nid

ev

op

s

Taking Operability Seriously

• Single product backlog

– End-user + Operational features

– New features + bugs

• Product Owner on call

– Accountable for operational failures

– Seriously!

Page 40: Software operability and run book collaboration London Feb 2014

#u

nid

ev

op

s

Page 41: Software operability and run book collaboration London Feb 2014

#u

nid

ev

op

s

Budget changes

• “What is your budget code?”

• Capex vs. Opex?

• Remove budget barriers to

regular, effective communication

Page 42: Software operability and run book collaboration London Feb 2014

#u

nid

ev

op

s

Niek Bartholomeus (@niekbartho) - http://niek.bartholomeus.be/https://speakerdeck.com/niekbartho/self-organization-vs-global-optimization-a-comparison-between-

traditional-and-modern-organizations

Page 43: Software operability and run book collaboration London Feb 2014

#u

nid

ev

op

s

Organisation changes

• “I‟ll need to ask my manager first”

• Lack of autonomy

• Remove reporting barriers to regular, effective communication

• More at http://bit.ly/DevOpsTopologies

Page 44: Software operability and run book collaboration London Feb 2014

#u

nid

ev

op

s

“I just want to write code”

Page 45: Software operability and run book collaboration London Feb 2014

#u

nid

ev

op

s

Mysterious Coding Tricks

Page 46: Software operability and run book collaboration London Feb 2014

#u

nid

ev

op

s

On-call for Responsibility

Page 47: Software operability and run book collaboration London Feb 2014

#u

nid

ev

op

s

On-call Anti-Patterns

• Too much overtime pay

• Too little overtime pay

• Rota team too small

• No training in incident response

• No team ownership of product

• No team autonomy for changes

Page 48: Software operability and run book collaboration London Feb 2014

#u

nid

ev

op

s

On call - Goal

• Team members want to help

make things better

• Empowered to fix problems

• Reduce the times they are woken

up

Page 49: Software operability and run book collaboration London Feb 2014

#u

nid

ev

op

s

The operability of operability

• Operational Features, not “NFRs”

• Sustainable collaboration

• Sensible, fair on-call rotas

• Over-compensate in time off

• Avoid burn-out

Page 50: Software operability and run book collaboration London Feb 2014

#u

nid

ev

op

s

Recapitulation

Page 51: Software operability and run book collaboration London Feb 2014

#u

nid

ev

op

s

Software Operability

Making software

systems work well

in Production

Page 52: Software operability and run book collaboration London Feb 2014

#u

nid

ev

op

s

Run Book Collaboration

Shared focus on operability throughout the delivery cycle

Page 53: Software operability and run book collaboration London Feb 2014

#u

nid

ev

op

s

Making Operability Operable

Use DevOps team patterns for sustainable operability

Page 54: Software operability and run book collaboration London Feb 2014

#u

nid

ev

op

s

What‟s Next?

Page 55: Software operability and run book collaboration London Feb 2014

#u

nid

ev

op

s

Further Reading

• Patterns for

Performance and

Operability

– Ford, Gileadi, Purba, Moerman

• http://whoownsmyoperability.com/

– Recommended reading lists

Page 56: Software operability and run book collaboration London Feb 2014

#u

nid

ev

op

s

Further Reading

• Release It!– Michael Nygard

(@mnygard)

• http://www.michaelnygard.com/

Page 57: Software operability and run book collaboration London Feb 2014

#u

nid

ev

op

s

Operability Book

• Software Operability – How to make software work well in Production– Due early late 2014

• Sign up at OperabilityBook.com

• Discount code for DevOps Summit attendees

Page 58: Software operability and run book collaboration London Feb 2014

#u

nid

ev

op

s

Experience DevOps

• A hands-on workshop for DevOps

culture

• Forthcoming dates:

– London: 28th February 2014

• http://experiencedevops.org/

Page 59: Software operability and run book collaboration London Feb 2014

#u

nid

ev

op

s

PIPELINE

• Continuous Delivery

• „Unconference‟ format

• Tuesday 8th April 2014

• London, UK

• http://pipelineconf.info/

• @PipelineConf

Page 60: Software operability and run book collaboration London Feb 2014

#u

nid

ev

op

s

Questions &

Discussion

Matthew Skelton

@matthewpskelton

softwareoperability.com

operabilitybook.com

bit.ly/DevOpsTopologies

Page 61: Software operability and run book collaboration London Feb 2014

#u

nid

ev

op

s

Acknowledgements

http://pianofortekeys.files.wordpress.com/ 2013/04/ariadnne_wideweb__470x3300.jpg

http://www.blinkenlights.nl/images/ blinkenlights-big.jpeg

http://www.danatronics.com/s db_apps.html

http://riverbankoftruth.com/ wp-content/uploads/2013/07/embarrassed-chimp22.jpg

http://www.thinkgeek.com/edm/ 20040709.html

http://indianaohindiana.com/wp-content/uploads/2013/10/Tome.jpg

http://www.guavaworks.com/company-blog/guava-doesnt-do-cookie-cutter.html

http://www.carpages.co.uk/ford/ford-sand-sculptures-05-09-11.asp

http://www.thisismoney.co.uk/money/experts/ article-2324270/Take-smaller-pension-pots-tax-free-leave-final-salary-untouched.html

http://paranoidnews.org/wp-content/uploads/2010/10/Alien-Hunt-Alarm-Clock.jpg

http://particulations.blogspot.co.uk/ 2010/08/headingley-hole.html

http://marvel.wikia.com/ Stephen_Strange_(Earth-616)

Page 62: Software operability and run book collaboration London Feb 2014

#u

nid

ev

op

s

Further Slides

Page 63: Software operability and run book collaboration London Feb 2014

#u

nid

ev

op

s

The Phoenix Project

Page 64: Software operability and run book collaboration London Feb 2014

#u

nid

ev

op

s

Continuous Delivery