20
© 2020 Bloomberg Finance L.P. All rights reserved. Why SREs can't afford to NOT do Chaos Engineering SREcon20 Americas December 7, 2020 Mikolaj Pawlikowski Software Engineering Lead @mikopawlikowski

Why SREs can't afford to NOT do Chaos Engineering · 7/12/2020  · © 2020 Bloomberg Finance L.P. All rights reserved. Why SREs can't afford to NOT do Chaos Engineering SREcon20

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Why SREs can't afford to NOT do Chaos Engineering · 7/12/2020  · © 2020 Bloomberg Finance L.P. All rights reserved. Why SREs can't afford to NOT do Chaos Engineering SREcon20

© 2020 Bloomberg Finance L.P. All rights reserved.

Why SREs can't afford to NOT do Chaos Engineering

SREcon20 AmericasDecember 7, 2020

Mikolaj PawlikowskiSoftware Engineering Lead@mikopawlikowski

Page 2: Why SREs can't afford to NOT do Chaos Engineering · 7/12/2020  · © 2020 Bloomberg Finance L.P. All rights reserved. Why SREs can't afford to NOT do Chaos Engineering SREcon20

© 2020 Bloomberg Finance L.P. All rights reserved.

Today’s Talk• What is Chaos Engineering?

• What Chaos Engineering is NOT?

• Where do I start?

• Tools

• Chaos Engineering (for) people

Page 3: Why SREs can't afford to NOT do Chaos Engineering · 7/12/2020  · © 2020 Bloomberg Finance L.P. All rights reserved. Why SREs can't afford to NOT do Chaos Engineering SREcon20

© 2020 Bloomberg Finance L.P. All rights reserved.

Chaos Engineering

“Chaos Engineering is the discipline of experimenting on a systemin order to build confidence in the system’s capability

to withstand turbulent conditions in production.”

-- Principles of Chaos Engineeringhttps://principlesofchaos.org

Page 4: Why SREs can't afford to NOT do Chaos Engineering · 7/12/2020  · © 2020 Bloomberg Finance L.P. All rights reserved. Why SREs can't afford to NOT do Chaos Engineering SREcon20

© 2020 Bloomberg Finance L.P. All rights reserved.

Chaos Engineering

“Chaos Engineering is the discipline of experimenting on a systemin order to build confidence in the system’s capability

to withstand turbulent conditions in production.”

-- Principles of Chaos Engineeringhttps://principlesofchaos.org

Page 5: Why SREs can't afford to NOT do Chaos Engineering · 7/12/2020  · © 2020 Bloomberg Finance L.P. All rights reserved. Why SREs can't afford to NOT do Chaos Engineering SREcon20

© 2020 Bloomberg Finance L.P. All rights reserved.

Chaos Engineering

Reliability

Page 6: Why SREs can't afford to NOT do Chaos Engineering · 7/12/2020  · © 2020 Bloomberg Finance L.P. All rights reserved. Why SREs can't afford to NOT do Chaos Engineering SREcon20

© 2020 Bloomberg Finance L.P. All rights reserved.

Chaos Engineering & SRE

SRE ChaosEngineering

Page 7: Why SREs can't afford to NOT do Chaos Engineering · 7/12/2020  · © 2020 Bloomberg Finance L.P. All rights reserved. Why SREs can't afford to NOT do Chaos Engineering SREcon20

© 2020 Bloomberg Finance L.P. All rights reserved.

Chaos Engineering & SRE

SRE

ChaosEngineering

orOr perhaps… this?

Page 8: Why SREs can't afford to NOT do Chaos Engineering · 7/12/2020  · © 2020 Bloomberg Finance L.P. All rights reserved. Why SREs can't afford to NOT do Chaos Engineering SREcon20

© 2020 Bloomberg Finance L.P. All rights reserved.

Chaos Engineering myths• ”It’s Chaos Monkey, right?”

• “It’s testing in production”

• “It’s only for massively distributed systems systems”

• ”It only works on <insert the technology here>”

• “It’s breaking things randomly”

Page 9: Why SREs can't afford to NOT do Chaos Engineering · 7/12/2020  · © 2020 Bloomberg Finance L.P. All rights reserved. Why SREs can't afford to NOT do Chaos Engineering SREcon20

© 2020 Bloomberg Finance L.P. All rights reserved.

Chaos Engineering in 4 steps1. Observability: pick a variable and a reliable way of measuring it

2. Steady state: the normal range for the variable

3. Hypothesis: when X happens, the variable behaves like this

4. Run the experiment!

Fun gimmick => scientific method

Page 10: Why SREs can't afford to NOT do Chaos Engineering · 7/12/2020  · © 2020 Bloomberg Finance L.P. All rights reserved. Why SREs can't afford to NOT do Chaos Engineering SREcon20

© 2020 Bloomberg Finance L.P. All rights reserved.

Demo 1

Page 11: Why SREs can't afford to NOT do Chaos Engineering · 7/12/2020  · © 2020 Bloomberg Finance L.P. All rights reserved. Why SREs can't afford to NOT do Chaos Engineering SREcon20

© 2020 Bloomberg Finance L.P. All rights reserved.

Demo 2

Page 12: Why SREs can't afford to NOT do Chaos Engineering · 7/12/2020  · © 2020 Bloomberg Finance L.P. All rights reserved. Why SREs can't afford to NOT do Chaos Engineering SREcon20

© 2020 Bloomberg Finance L.P. All rights reserved.

Where do I start?

Page 13: Why SREs can't afford to NOT do Chaos Engineering · 7/12/2020  · © 2020 Bloomberg Finance L.P. All rights reserved. Why SREs can't afford to NOT do Chaos Engineering SREcon20

Risk vs. reward; ROI

Page 14: Why SREs can't afford to NOT do Chaos Engineering · 7/12/2020  · © 2020 Bloomberg Finance L.P. All rights reserved. Why SREs can't afford to NOT do Chaos Engineering SREcon20

https://www.cdc.gov/nchs/fastats/leading-causes-of-death.htm

Page 15: Why SREs can't afford to NOT do Chaos Engineering · 7/12/2020  · © 2020 Bloomberg Finance L.P. All rights reserved. Why SREs can't afford to NOT do Chaos Engineering SREcon20

Chaos Engineering (for) people

Page 16: Why SREs can't afford to NOT do Chaos Engineering · 7/12/2020  · © 2020 Bloomberg Finance L.P. All rights reserved. Why SREs can't afford to NOT do Chaos Engineering SREcon20

© 2020 Bloomberg Finance L.P. All rights reserved.

Toolshttps://github.com/powerfulseal/powerfulsealhttps://chaostoolkit.orghttps://github.com/alexei-led/pumbahttps://github.com/Shopify/toxiproxyhttps://github.com/Netflix/chaosmonkeyhttps://byteman.jboss.org/https://github.com/storax/kubedoom

More: https://github.com/dastergon/awesome-chaos-engineering

Page 17: Why SREs can't afford to NOT do Chaos Engineering · 7/12/2020  · © 2020 Bloomberg Finance L.P. All rights reserved. Why SREs can't afford to NOT do Chaos Engineering SREcon20

© 2020 Bloomberg Finance L.P. All rights reserved.

Manninghttps://www.manning.com/books/chaos-engineering

Chaos Engineering:Site reliability through controlled disruption

Page 18: Why SREs can't afford to NOT do Chaos Engineering · 7/12/2020  · © 2020 Bloomberg Finance L.P. All rights reserved. Why SREs can't afford to NOT do Chaos Engineering SREcon20

© 2020 Bloomberg Finance L.P. All rights reserved.

Let’s connect!

Page 19: Why SREs can't afford to NOT do Chaos Engineering · 7/12/2020  · © 2020 Bloomberg Finance L.P. All rights reserved. Why SREs can't afford to NOT do Chaos Engineering SREcon20

© 2020 Bloomberg Finance L.P. All rights reserved.

Photo CreditsAll photos found on unsplash.com:

• Austin Ban - https://unsplash.com/@austinban• Leio McLaren - https://unsplash.com/@leio• Mark Riechers - https://unsplash.com/@mriechers• Sebastian Herrmann - https://unsplash.com/@herrherrmann• Sam Loyd - https://unsplash.com/@samloyd• Gerald Schömbs - https://unsplash.com/@geerald• Christina @ wocintechchat.com - https://unsplash.com/@wocintechchat

Page 20: Why SREs can't afford to NOT do Chaos Engineering · 7/12/2020  · © 2020 Bloomberg Finance L.P. All rights reserved. Why SREs can't afford to NOT do Chaos Engineering SREcon20

© 2020 Bloomberg Finance L.P. All rights reserved.

Thank you!https://www.bloomberg.com/careers