80
Protect your users with Circuit Breakers Scott Triglia VanPy Day 2016 1 Jim’s

Protect your users with Circuit breakers

Embed Size (px)

Citation preview

Protect your users with Circuit Breakers

Scott Triglia VanPy Day 2016

1

Jim’s

2

Yelp’s Mission:Connecting people with great

local businesses.

Yelp StatsAs of Q1 2016

90M 3270%102M

Scott Triglia

4

@scott_triglia

Work with the $$$

Let’s talk Circuit Breakers

5

6

7

8

9

10

Our goals today: introduce a basic circuit breaker

11

Our goals today: a modular circuit breaker

12

Our goals today: test it out on several scenarios

13

14

15

1616

1

2

34

the fundamental rule of SOA your systems will fail

what’s your response?17

18

https://kateheddleston.com/blog/the-null-process

1919

1

2

34

20

Nygard’s circuit breaker

21

22

23

24

25

Circuit Breaker States: * Healthy (or “closed”) * Recovering (or “half-open”) * Unhealthy (or “open”)

26

27

Recovery:

* Wait for recovery_timeout seconds* Send a trial request, trust its results

28

Before a circuit breaker:

29

Before a circuit breaker: * Diners wait forever to get food

30

Before a circuit breaker: * Diners wait forever to get food * Kitchen has a growing backlog

31

Before a circuit breaker: * Diners wait forever to get food * Kitchen has a growing backlog * New diners making things worse

32

With a circuit breaker:

33

With a circuit breaker: * Fewer frustrated users

34

With a circuit breaker: * Fewer frustrated users * Reduced load on the backend

35

With a circuit breaker: * Fewer frustrated users * Reduced load on the backend * A non-null process

36

37

38

39

40

41

Should our waiters all agree?

Module 1:

42

43

44

New Behavior: * Clients inform each other * Processes are no longer independent

45

* Propagate failure faster

* Requires distributed datastore * Forces decisions about consistency

46

What should we do in response?

Module 2:

47

48

49

50

51

New Behavior: * Code can check in advance about healthiness of system * Automatic monitoring!

52

* Build features on top of system health status

* Requires a single source of truth?

53

Who decides we’re unhealthy?

Module 3:

54

55

def signal_overload(cb): if len(jobs) > THRESH: cb.mark_unhealthy()

56

New Behavior:

* CB gets signals from anywhere * Signal combining logic

57

* Allows many (many) new signals

* Must combine signals * Adds complexity to system

58

How do we recover?

Module 4:

59

60

Dark launched requests:

* Reject but process normally * Dangerous with side effects

61

Synthetic Requests:

* Dark launching with fake requests * Not necessarily representative

62

Live requests:

* Definitely representative * Exposes real users to pain

63

New Behavior:

* Traffic determines health * Removal of recovery timeouts

64

* Faster(?) recovery * No timeout tuning required * Dark launching not always possible * Synthetic can be unrepresentative

65

in summary

66

Your system will fail, have a plan!

67

The basic CB is better than nothing

68

Questions to ask:

* Should our waiters all agree? * How should I deal with unhealthiness? * Who decides we’re unhealthy? * How do we recover?

69

Questions to ask:

* Should our waiters all agree? * How should I deal with unhealthiness? * Who decides we’re unhealthy? * How do we recover?

70

Questions to ask:

* Should our waiters all agree? * How should I deal with unhealthiness? * Who decides we’re unhealthy? * How do we recover?

71

Questions to ask:

* Should our waiters all agree? * How should I deal with unhealthiness? * Who decides we’re unhealthy? * How do we recover?

72

Questions to ask:

* Should our waiters all agree? * How should I deal with unhealthiness? * Who decides we’re unhealthy? * How do we recover?

73

…and much more!

Much comes down to your use case

74

Questions?

75

[email protected] @scott_triglia

77

Can’t we do better than rejecting requests?

http://techblog.netflix.com/2011/12/making-netflix-api-more-resilient.html

79

How do I safely test out a new circuit breaker?

https://engineering.heroku.com/blogs/2015-06-30-improved-production-stability-with-circuit-breakers/