36
© 2021, Amazon Web Services, Inc. or its affiliates. AWS RESILIENCE AND CHAOS ENGINEERING DAY © 2021, Amazon Web Services, Inc. or its affiliates. AWS Resilience and Chaos Engineering Day Gunnar Grosch Sr. Developer Advocate, AWS @gunnargrosch

AWS Resilience and Chaos Engineering Day

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

Page 1: AWS Resilience and Chaos Engineering Day

© 2021, Amazon Web Services, Inc. or its affiliates.

AWS RESILIENCE AND CHAOS ENGINEERING DAY

© 2021, Amazon Web Services, Inc. or its affiliates.

AWS Resilience and Chaos Engineering Day

Gunnar Grosch

Sr. Developer Advocate, AWS

@gunnargrosch

Page 2: AWS Resilience and Chaos Engineering Day

© 2021, Amazon Web Services, Inc. or its affiliates.

AWS RESILIENCE AND CHAOS ENGINEERING DAY

What we’ll cover in this session

• Challenges with distributed systems

• What chaos engineering is

• Phases of chaos engineering

• Common use cases for chaos engineering

Page 3: AWS Resilience and Chaos Engineering Day

© 2021, Amazon Web Services, Inc. or its affiliates.

AWS RESILIENCE AND CHAOS ENGINEERING DAY

© 2021, Amazon Web Services, Inc. or its affiliates.

Challenges withdistributed systems

Page 4: AWS Resilience and Chaos Engineering Day

© 2021, Amazon Web Services, Inc. or its affiliates.

AWS RESILIENCE AND CHAOS ENGINEERING DAY

Distributed systems are complex

Message

Message

Reply

Reply

ServerNetworkClient

https://aws.amazon.com/builders-library/challenges-with-distributed-systems/

Page 5: AWS Resilience and Chaos Engineering Day

© 2021, Amazon Web Services, Inc. or its affiliates.

AWS RESILIENCE AND CHAOS ENGINEERING DAY

Traditional testing is not enough

TESTING = VERIFYING A KNOWN CONDITION

Unit testingof components

Tested in isolation to ensure function meets expectations

Functional testingof integrations

Each execution path testedto assure expected results

Page 6: AWS Resilience and Chaos Engineering Day

© 2021, Amazon Web Services, Inc. or its affiliates.

AWS RESILIENCE AND CHAOS ENGINEERING DAY

And it can get more complicated…

IOError: No space left on device

close failed in file object destructor:

IOError: No space left on device

close failed in file object destructor:

IOError: No space left on device

close failed in file object destructor:

IOError: No space left on device

close failed in file object destructor:

IOError: No space left on device

close failed in file object destructor:

IOError: No space left on device

close failed in file object destructor:

IOError: No space left on device

close failed in file object destructor:

IOError: No space left on device

close failed in file object destructor:

IOError: No space left on device

close failed in file object destructor:

logfile

ROTATE

logfile.0

ROTATE

logfile.1

ROTATE

logfile.2

ROTATE

logfile.3

ROTATE

logfile.n

ROTATE

Page 7: AWS Resilience and Chaos Engineering Day

© 2021, Amazon Web Services, Inc. or its affiliates.

AWS RESILIENCE AND CHAOS ENGINEERING DAY

© 2021, Amazon Web Services, Inc. or its affiliates.

What is chaos engineering?

Page 8: AWS Resilience and Chaos Engineering Day

© 2021, Amazon Web Services, Inc. or its affiliates.

AWS RESILIENCE AND CHAOS ENGINEERING DAY

Chaos engineering is the discipline of experimenting on a system in order to build confidence in the system’s capability to withstand turbulent conditions in production

principlesofchaos.org

Page 9: AWS Resilience and Chaos Engineering Day

© 2021, Amazon Web Services, Inc. or its affiliates.

AWS RESILIENCE AND CHAOS ENGINEERING DAY

Chaos engineering is the discipline of experimenting on a system in order to build confidence in the system’s capability to withstand turbulent conditions in production

principlesofchaos.org

Page 10: AWS Resilience and Chaos Engineering Day

© 2021, Amazon Web Services, Inc. or its affiliates.

AWS RESILIENCE AND CHAOS ENGINEERING DAY

Chaos engineering is the discipline of experimenting on a system in order to build confidence in the system’s capability to withstand turbulent conditions in production

principlesofchaos.org

Page 11: AWS Resilience and Chaos Engineering Day

© 2021, Amazon Web Services, Inc. or its affiliates.

AWS RESILIENCE AND CHAOS ENGINEERING DAY

Chaos engineering is the discipline of experimenting on a system in order to build confidence in the system’s capability to withstand turbulent conditions in production

principlesofchaos.org

Page 12: AWS Resilience and Chaos Engineering Day

© 2021, Amazon Web Services, Inc. or its affiliates.

AWS RESILIENCE AND CHAOS ENGINEERING DAY

Chaos engineering is the discipline of experimenting on a system in order to build confidence in the system’s capability to withstand turbulent conditions in production

principlesofchaos.org

Page 13: AWS Resilience and Chaos Engineering Day

© 2021, Amazon Web Services, Inc. or its affiliates.

AWS RESILIENCE AND CHAOS ENGINEERING DAY

The process of chaos engineering

• Stressing an application by creating disruptive events

• Observing how the system responds

• Implementing improvements

Page 14: AWS Resilience and Chaos Engineering Day

© 2021, Amazon Web Services, Inc. or its affiliates.

AWS RESILIENCE AND CHAOS ENGINEERING DAY

Fundamental goals with chaos engineering

• Improve resilience and performance

• Uncover hidden issues

• Expose blind spotsMonitoring, observability, and alarm

• And more

Page 15: AWS Resilience and Chaos Engineering Day

© 2021, Amazon Web Services, Inc. or its affiliates.

AWS RESILIENCE AND CHAOS ENGINEERING DAY

When to do chaos experiments

• Adding new code

• Creating new features

• Adding services

• Adding or changing dependencies

• Continuously

• And more

Page 16: AWS Resilience and Chaos Engineering Day

© 2021, Amazon Web Services, Inc. or its affiliates.

AWS RESILIENCE AND CHAOS ENGINEERING DAY

Don’t ask what happens if a system fails; ask what happens when it fails

28

Page 17: AWS Resilience and Chaos Engineering Day

© 2021, Amazon Web Services, Inc. or its affiliates.

AWS RESILIENCE AND CHAOS ENGINEERING DAY

© 2021, Amazon Web Services, Inc. or its affiliates.

Phases of chaos engineering

Page 18: AWS Resilience and Chaos Engineering Day

© 2021, Amazon Web Services, Inc. or its affiliates.

AWS RESILIENCE AND CHAOS ENGINEERING DAY

Phases of chaos engineering

Steady state

Hypothesis

Run experiment

Verify

Improve

Page 19: AWS Resilience and Chaos Engineering Day

© 2021, Amazon Web Services, Inc. or its affiliates.

AWS RESILIENCE AND CHAOS ENGINEERING DAY

Phases of chaos engineering

Steady state

Hypothesis

Run experiment

Verify

Improve

Page 20: AWS Resilience and Chaos Engineering Day

© 2021, Amazon Web Services, Inc. or its affiliates.

AWS RESILIENCE AND CHAOS ENGINEERING DAY

Phases of chaos engineering

Steady state

Hypothesis

Run experiment

Verify

Improve

Page 21: AWS Resilience and Chaos Engineering Day

© 2021, Amazon Web Services, Inc. or its affiliates.

AWS RESILIENCE AND CHAOS ENGINEERING DAY

Phases of chaos engineering

Steady state

Hypothesis

Run experiment

Verify

Improve

Page 22: AWS Resilience and Chaos Engineering Day

© 2021, Amazon Web Services, Inc. or its affiliates.

AWS RESILIENCE AND CHAOS ENGINEERING DAY

Phases of chaos engineering

Steady state

Hypothesis

Run experiment

Verify

Improve

Page 23: AWS Resilience and Chaos Engineering Day

© 2021, Amazon Web Services, Inc. or its affiliates.

AWS RESILIENCE AND CHAOS ENGINEERING DAY

Phases of chaos engineering

Steady state

Hypothesis

Run experiment

Verify

Improve

Page 24: AWS Resilience and Chaos Engineering Day

© 2021, Amazon Web Services, Inc. or its affiliates.

AWS RESILIENCE AND CHAOS ENGINEERING DAY

© 2021, Amazon Web Services, Inc. or its affiliates.

Use cases

Page 25: AWS Resilience and Chaos Engineering Day

© 2021, Amazon Web Services, Inc. or its affiliates.

AWS RESILIENCE AND CHAOS ENGINEERING DAY

Use cases

One-offexperiments

Periodic game days

Automated experiments

Page 26: AWS Resilience and Chaos Engineering Day

© 2021, Amazon Web Services, Inc. or its affiliates.

AWS RESILIENCE AND CHAOS ENGINEERING DAY

Use cases

One-offexperiments

Periodic game days

Automated experiments

Page 27: AWS Resilience and Chaos Engineering Day

© 2021, Amazon Web Services, Inc. or its affiliates.

AWS RESILIENCE AND CHAOS ENGINEERING DAY

Use cases

One-offexperiments

Periodic game days

Automated experiments

Page 28: AWS Resilience and Chaos Engineering Day

© 2021, Amazon Web Services, Inc. or its affiliates.

AWS RESILIENCE AND CHAOS ENGINEERING DAY

Use cases

One-offexperiments

Periodic game days

Automated experiments

Page 29: AWS Resilience and Chaos Engineering Day

© 2021, Amazon Web Services, Inc. or its affiliates.

AWS RESILIENCE AND CHAOS ENGINEERING DAY

Automated experiments

Recurring scheduled

experiments

Event-triggered experiments

Continuous delivery experiments

Page 30: AWS Resilience and Chaos Engineering Day

© 2021, Amazon Web Services, Inc. or its affiliates.

AWS RESILIENCE AND CHAOS ENGINEERING DAY

Automated experiments

Recurring scheduled

experiments

Event-triggered experiments

Continuous delivery experiments

Page 31: AWS Resilience and Chaos Engineering Day

© 2021, Amazon Web Services, Inc. or its affiliates.

AWS RESILIENCE AND CHAOS ENGINEERING DAY

Automated experiments

Recurring scheduled

experiments

Event-triggered experiments

Continuous delivery experiments

Page 32: AWS Resilience and Chaos Engineering Day

© 2021, Amazon Web Services, Inc. or its affiliates.

AWS RESILIENCE AND CHAOS ENGINEERING DAY

Automated experiments

Recurring scheduled

experiments

Event-triggered experiments

Continuous delivery experiments

Page 33: AWS Resilience and Chaos Engineering Day

© 2021, Amazon Web Services, Inc. or its affiliates.

AWS RESILIENCE AND CHAOS ENGINEERING DAY

Automated experiments

Recurring scheduled

experiments

Event-triggered experiments

Continuous delivery experiments

Page 34: AWS Resilience and Chaos Engineering Day

© 2021, Amazon Web Services, Inc. or its affiliates.

AWS RESILIENCE AND CHAOS ENGINEERING DAY

Use cases

One-offexperiments

Periodic game days

Automated experiments

Page 35: AWS Resilience and Chaos Engineering Day

© 2021, Amazon Web Services, Inc. or its affiliates.

AWS RESILIENCE AND CHAOS ENGINEERING DAY

Recap

• Challenges with distributed systems

• What chaos engineering is

• Phases of chaos engineering

• Common use cases for chaos engineering

Page 36: AWS Resilience and Chaos Engineering Day

© 2021, Amazon Web Services, Inc. or its affiliates.

AWS RESILIENCE AND CHAOS ENGINEERING DAY

Thank you!

© 2021, Amazon Web Services, Inc. or its affiliates.

Gunnar Grosch

Sr. Developer Advocate, AWS

@gunnargrosch