Rollback: The Impossible Dream

Preview:

DESCRIPTION

Roll back doesn’t exist. It’s not real. It’s a fantasy, a dream, a delusion. Any vendor who tells you they have a roll back capability is lying to you. And lying to you in a downright dangerous way that will come back to haunt you at 4am in a war room when someone says:“We can’t fix this. Let’s roll back the deployment.”This talk is designed to explain and demonstrate to Operations staff:Why roll back is a fantasy and explained with a dash of Werner HeisenbergWhy it is dangerous and how you can recognize when you’re about to get trappedHow you can avoid falling into that trap of considering it an appropriate compensating control.It’ll also explain what you can actually do operationally instead of “rolling back”. This will cover other alternative compensating controls that can help you get running again and resolve your outage whilst still allowing you to find root cause.

Citation preview

RollbackThe Impossible Dream

by James Turnbull

jamtur01 @ githubkartar @ twitter

jamesturnbull on freenodejames @ puppetlabs.com

About Me

VP Technical Operations at Puppet Labs

Puppet guy

Ruby guy

Talks funny

A show of hands

Who thinks they know what rollback

is?

Last set of hands

YMMV

Definitions

Traditional

Modern

Fact or Fiction?

Accept certain constraints

Constraint #1Apply sufficient

capital

Constraint #2Idempotent

Constraint #3Cascade-less

failure

Constraint #4Resources

A Philosophical Digression

If I know where I amI don’t know how I got there

If I know how I got thereI don’t know where I am

Very few “systems” are

truly deterministic

A Mathematical Digression

On system rollback and totalised fieldsAn algebraic approach to system change

Mark Burgess and Alva Couch20th June 2011

http://cfengine.com/markburgess/papers/totalfield.pdf

So what’s wrong with rollback?

Risk

Learning from mistakes

Complex systems are

… complex

Human error

What is the problem rollback is

trying to solve?

What is the problem YOU are trying to solve?

So how can we mitigate Rollback

shortcomings?

PreventativeDesign

Rollback is (often) an architecture

problem

Increase Resilience

Operational Intelligence

A little bit of DevOps in every byte…

Small, iterative changes

Accept that failure happens

“We can’t test that? Okay we can

roll it back if it breaks…”

Assumption is the mother of all fuckups*

“But the system can’t be {run|upgraded|deployed} like that

because…”

Conclusions

Rollback is possible but not probable

If you have to have “rollback” accept

constraints

You can mitigate the need for it

Thank you!

Questions/Insults?

jamtur01 @ githubkartar @ twitter

jamesturnbull on freenodejames @ puppetlabs.com

Recommended