75
Practical Operability Techniques for Teams Matthew Skelton Skelton Thatcher Consulting skeltonthatcher.com / @SkeltonThatcher IPEXPO Europe, ExCeL, London – 05 October 2017

Practical operability techniques for teams - IPEXPO 2017

Embed Size (px)

Citation preview

Page 1: Practical operability techniques for teams - IPEXPO 2017

Practical Operability Techniques for Teams

Matthew SkeltonSkelton Thatcher Consulting

skeltonthatcher.com / @SkeltonThatcher

IPEXPO Europe, ExCeL, London – 05 October 2017

Page 2: Practical operability techniques for teams - IPEXPO 2017

TodayWhat is operability?

Modern loggingRun Book dialogue sheets

Endpoint healthchecksCorrelation IDs

User Personas for dashboards

Page 3: Practical operability techniques for teams - IPEXPO 2017

Training1-day tutorial with exercises

Book via Unicom: http://www.unicom.co.uk/workshops.html

Page 4: Practical operability techniques for teams - IPEXPO 2017

YouSoftware Developer

Tester / QADevOps Engineer

Team LeaderHead of Department

Page 5: Practical operability techniques for teams - IPEXPO 2017

Operability:use modern logging, Run Book

dialogue sheets, endpoint healthchecks, correlation IDs,

and user personas as team collaboration techniques

Page 6: Practical operability techniques for teams - IPEXPO 2017
Page 7: Practical operability techniques for teams - IPEXPO 2017

About us

Co-founders at Skelton Thatcher Consulting

Matthew Skelton Rob Thatcher

Page 8: Practical operability techniques for teams - IPEXPO 2017

Team-first digital transformation30+ organisations

UK, US, EU, India, China

Page 9: Practical operability techniques for teams - IPEXPO 2017

We build modern capabilities

by mentoring your teams

Page 10: Practical operability techniques for teams - IPEXPO 2017

Team Guide to Software OperabilityMatthew Skelton & Rob Thatcher

skeltonthatcher.com/publications

Download a free sample chapter

Page 11: Practical operability techniques for teams - IPEXPO 2017

Practical Operability Techniques for Teams

Page 12: Practical operability techniques for teams - IPEXPO 2017

What is operability?

Page 13: Practical operability techniques for teams - IPEXPO 2017

Operability

making software work well in Production

Page 14: Practical operability techniques for teams - IPEXPO 2017
Page 15: Practical operability techniques for teams - IPEXPO 2017

Logging with Event IDs

Page 16: Practical operability techniques for teams - IPEXPO 2017
Page 17: Practical operability techniques for teams - IPEXPO 2017

logging with Event IDs

reduce time to detect problemsincrease team engagement

increase configurabilityenhance collaboration

#operability

Page 18: Practical operability techniques for teams - IPEXPO 2017

search by event

Event ID

{Delivered,InTransit,Arrived}

Page 19: Practical operability techniques for teams - IPEXPO 2017

transaction trace

Correlation ID

612999958…

Page 20: Practical operability techniques for teams - IPEXPO 2017

How many distinct event types (state transitions)

in your application?

Page 21: Practical operability techniques for teams - IPEXPO 2017
Page 22: Practical operability techniques for teams - IPEXPO 2017

represent distinct states

Page 23: Practical operability techniques for teams - IPEXPO 2017

enum

Human-readable sets: unique values, sparse,

immutable

C#, Java, Python, node(Ruby, PHP, …)

Page 24: Practical operability techniques for teams - IPEXPO 2017

Technical

Domain

public enum EventID

{

// Badly-initialised logging data

NotSet = 0,

// An unrecognised event has occurred

UnexpectedError = 10000,

ApplicationStarted = 20000,

ApplicationShutdownNoticeReceived = 20001,

MessageQueued = 40000,

MessagePeeked = 40001,

BasketItemAdded = 60001,

BasketItemRemoved = 60002,

CreditCardDetailsSubmitted = 70001,

// ...

}

Page 25: Practical operability techniques for teams - IPEXPO 2017

BasketItemAdded = 60001

BasketItemRemoved = 60002

Page 26: Practical operability techniques for teams - IPEXPO 2017

log using Event IDs with a modern

‘structured logging’ library

Page 27: Practical operability techniques for teams - IPEXPO 2017

Example: video processing

On-demand processing of TV advertisementsAd-agency TV broadcasterHigh throughputGlitch-free video & audio

Page 28: Practical operability techniques for teams - IPEXPO 2017

Storage I/O

Worker Job

Queue

Upload

Page 29: Practical operability techniques for teams - IPEXPO 2017
Page 30: Practical operability techniques for teams - IPEXPO 2017
Page 31: Practical operability techniques for teams - IPEXPO 2017

Example: video processing

Discover processing bottlenecksTrigger alerts via LogEntries / HostedGraphiteReport on KPIsTarget areas for improvement

Page 32: Practical operability techniques for teams - IPEXPO 2017

Run Book dialogue sheets

Page 33: Practical operability techniques for teams - IPEXPO 2017
Page 34: Practical operability techniques for teams - IPEXPO 2017

Run Book dialogue sheets

Checklists for typical operational considerations

Team-friendly exploration

Page 35: Practical operability techniques for teams - IPEXPO 2017

runbooktemplate.infoRun Book dialogue sheets

Page 36: Practical operability techniques for teams - IPEXPO 2017

System characteristics

Hours of operation

During what hours does the service or system actually need to operate? Can portions or features of the system be unavailable at times if needed?

Hours of operation - core features

(e.g. 03:00-01:00 GMT+0)

Hours of operation - secondary features

(e.g. 07:00-23:00 GMT+0)

Data and processing flows

How and where does data flow through the system? What controls or triggers data flows?(e.g. mobile requests / scheduled batch jobs / inbound IoT sensor data )

Page 37: Practical operability techniques for teams - IPEXPO 2017

http://runbooktemplate.info/

Page 38: Practical operability techniques for teams - IPEXPO 2017

runbooktemplate.infoRun Book dialogue sheets

Page 39: Practical operability techniques for teams - IPEXPO 2017

Endpoint healthchecks

Page 40: Practical operability techniques for teams - IPEXPO 2017
Page 41: Practical operability techniques for teams - IPEXPO 2017

endpoint healthchecks

Every runnable app/service/daemon exposes /status/health

An HTTP GET to the endpoint returns:200 – "I am healthy"

500 – "I am sick"

Page 42: Practical operability techniques for teams - IPEXPO 2017

endpoint healthchecks

Use JSON as a response type –parsable by both machines and

humans!

Page 43: Practical operability techniques for teams - IPEXPO 2017

endpoint healthchecks

For databases and other non-HTTP components, run a lightweight HTTP

service in front of the component200 / 500 responses

Page 44: Practical operability techniques for teams - IPEXPO 2017

https://github.com/Lugribossk/simple-dashboard

Page 45: Practical operability techniques for teams - IPEXPO 2017

Correlation IDs

Page 46: Practical operability techniques for teams - IPEXPO 2017
Page 47: Practical operability techniques for teams - IPEXPO 2017

‘Unique-ish’ identifier for each request

Passed through downstream layers

Page 48: Practical operability techniques for teams - IPEXPO 2017

Unique-ish ID

Page 49: Practical operability techniques for teams - IPEXPO 2017

Synchronous HTTP:

X-HEADER e.g. X-trace-idX-trace-id: 348e1cf8

If header is present, pass it on

(Yes, RFC6648, but this is internal only)

Page 50: Practical operability techniques for teams - IPEXPO 2017

Asynchonous (queues, etc.):

Message Attributes, name:value paire.g. "trace-id":"348e1cf8"

AWS SQS: SendMessage() / ReceiveMessage()

Log the Correlation ID if present

Page 51: Practical operability techniques for teams - IPEXPO 2017

Example: electronic trading

High speed, low latencyTrading options & derivativesConnected to stock exchangesSub-millisecond timings> £1 million per day traded

Page 52: Practical operability techniques for teams - IPEXPO 2017
Page 53: Practical operability techniques for teams - IPEXPO 2017
Page 54: Practical operability techniques for teams - IPEXPO 2017
Page 55: Practical operability techniques for teams - IPEXPO 2017

Correlations IDs for trading

Evidence for timely operationHelp identify bottlenecksTarget areas for perf tuningIdentify race conditions Increase operability

Page 56: Practical operability techniques for teams - IPEXPO 2017

Lightweight user personas

Page 57: Practical operability techniques for teams - IPEXPO 2017
Page 58: Practical operability techniques for teams - IPEXPO 2017

Lightweight user personas:

Ops EngineerTest Engineer

Build & Deployment EngineerService Owner

Page 59: Practical operability techniques for teams - IPEXPO 2017

Lightweight user personas:

Consider the User Experience (UX) of engineers and team members using

and working with the software

Page 60: Practical operability techniques for teams - IPEXPO 2017

http://www.keepitusable.com/blog/?tag=alan-cooper

Page 61: Practical operability techniques for teams - IPEXPO 2017

https://www.geckoboard.com/blog/visualisation-upgrades-progressing-towards-a-more-useful-and-beautiful-dashboard/

Page 62: Practical operability techniques for teams - IPEXPO 2017

Lightweight user personas:

What data does the User Persona need visible on a dashboard in order to make decisions rapidly & safely?

Page 63: Practical operability techniques for teams - IPEXPO 2017

Summary

Page 64: Practical operability techniques for teams - IPEXPO 2017

Operability

making software work well in Production

Page 65: Practical operability techniques for teams - IPEXPO 2017

logging with Event IDs

use enum-based Event IDs to explore runtime behaviour

and fault conditions

Page 66: Practical operability techniques for teams - IPEXPO 2017

Run Book dialogue sheets

explore and establish operational requirements as

a team, around a physical table, together

Page 67: Practical operability techniques for teams - IPEXPO 2017

endpoint healthchecks

HTTP 200 / 500 responses to /status/health call with

JSON details – good for tools and humans

Page 68: Practical operability techniques for teams - IPEXPO 2017

Correlation IDs

trace execution using correlation IDs:

synchronous (HTTP X-trace-id) async (SQS MessageAttribute)

Page 69: Practical operability techniques for teams - IPEXPO 2017

lightweight user personas

explore the UX and needs of different roles for rapid

decisions via dashboards

Page 70: Practical operability techniques for teams - IPEXPO 2017

Operabilityuse modern logging, Run Book

dialogue sheets, endpoint healthchecks, correlation IDs,

and user personas as team collaboration techniques

Page 71: Practical operability techniques for teams - IPEXPO 2017

Team Guide to Software OperabilityMatthew Skelton & Rob Thatcher

skeltonthatcher.com/publications

Download a free sample chapter

Page 72: Practical operability techniques for teams - IPEXPO 2017

Training1-day tutorial with exercises

Book via Unicom: http://www.unicom.co.uk/workshops.html

Page 73: Practical operability techniques for teams - IPEXPO 2017

Resources• Training: Practical Operability for Developers and Testers – led

by Matthew Skelton and Rob Thatcher – 1-day workshop –http://www.unicom.co.uk/practical-operability-for-developers-and-testers.html

• Team Guide to Software Operability by Matthew Skelton and Rob Thatcher (Skelton Thatcher Publications, 2016) http://operabilitybook.com/

• Run Book template & Run Book dialogue sheets http://runbooktemplate.info/

Page 74: Practical operability techniques for teams - IPEXPO 2017

Questions?Twitter: @SkeltonThatcher | #operabilityemail: [email protected]

Page 75: Practical operability techniques for teams - IPEXPO 2017

thank you

@SkeltonThatcherskeltonthatcher.com