66
System Reliability and Resilience and stuff

System Reliability and Resilience and stuff. Some things need to be cleared up first

Embed Size (px)

Citation preview

Page 1: System Reliability and Resilience and stuff. Some things need to be cleared up first

SystemReliability and

Resilienceand stuff

Page 2: System Reliability and Resilience and stuff. Some things need to be cleared up first

Some things need to be cleared up first

Page 3: System Reliability and Resilience and stuff. Some things need to be cleared up first

http://en.wikipedia.org/wiki/Vedette_(cabaret)

Page 4: System Reliability and Resilience and stuff. Some things need to be cleared up first

tuple

Page 5: System Reliability and Resilience and stuff. Some things need to be cleared up first

//Initialize customer and invoiceInitialize(customer, invoice);

Page 6: System Reliability and Resilience and stuff. Some things need to be cleared up first

public void Initialize(Customer customer, Invoice

invoice){

customer.Name = “asdf”;invoice.Date = DateTime.Now;

}

Page 7: System Reliability and Resilience and stuff. Some things need to be cleared up first

Initialize(customer, invoice);//did something happen to customer// and/or invoice?

Page 8: System Reliability and Resilience and stuff. Some things need to be cleared up first

customer.Name =

InitNameFrom(customer, invoice);invoice.Date =

InitDateFrom(customer, invoice);

Page 9: System Reliability and Resilience and stuff. Some things need to be cleared up first

customer.Name =

GetNameFrom(customer, invoice);invoice.Date =

GetDateFrom(customer, invoice);

Page 10: System Reliability and Resilience and stuff. Some things need to be cleared up first

var results = Initialize(customer,

invoice);

customer.Name = results.Item1;invoice.Date = results.Item2;

Page 11: System Reliability and Resilience and stuff. Some things need to be cleared up first

public tuple<string, DateTime>Initialize(customer,

invoice){

return new Tuple<string, DateTime>

(“asdf”, DateTime.Now);}

Page 12: System Reliability and Resilience and stuff. Some things need to be cleared up first

public static bool TryParse(string s, out DateTime result)

or

public static tuple<bool, DateTime?>

TryParse(string s)

Page 13: System Reliability and Resilience and stuff. Some things need to be cleared up first

tuple• Avoid side effects• Avoid out parameters•multiple values without a specific type

Page 14: System Reliability and Resilience and stuff. Some things need to be cleared up first

null object

Page 15: System Reliability and Resilience and stuff. Some things need to be cleared up first

private ILogger _logger;public MyClass(ILogger logger) {

_logger = logger;}

if (_logger != null) {_logger.Debug(

“it worked on my machine!”);}

Page 16: System Reliability and Resilience and stuff. Some things need to be cleared up first

null checks for everyone!

Page 17: System Reliability and Resilience and stuff. Some things need to be cleared up first

forget one and…

Page 18: System Reliability and Resilience and stuff. Some things need to be cleared up first

public class NullLogger : ILogger {

public void Debug(string text) {

//do sweet nothing}

}

Page 19: System Reliability and Resilience and stuff. Some things need to be cleared up first

private ILogger _logger = new NullLogger();

public MyClass(ILogger logger) {_logger = logger;

}

_logger.Debug(“it worked on my machine!”);

Page 20: System Reliability and Resilience and stuff. Some things need to be cleared up first

null object• Can eliminate null checks• Simple to implement

Page 21: System Reliability and Resilience and stuff. Some things need to be cleared up first

Circuit Breaker

Page 22: System Reliability and Resilience and stuff. Some things need to be cleared up first
Page 23: System Reliability and Resilience and stuff. Some things need to be cleared up first

Retry

Page 24: System Reliability and Resilience and stuff. Some things need to be cleared up first

Your

App

licat

ion Out of Process

Dependency

N times

Page 25: System Reliability and Resilience and stuff. Some things need to be cleared up first

Out of Process Dependency

N times*

Y clients

Page 26: System Reliability and Resilience and stuff. Some things need to be cleared up first

= Denial of

Service Attack

Page 27: System Reliability and Resilience and stuff. Some things need to be cleared up first

Limit the # of retries

Page 28: System Reliability and Resilience and stuff. Some things need to be cleared up first

N * Ybecomes5 * Y

Page 29: System Reliability and Resilience and stuff. Some things need to be cleared up first

Y isstill a

problem

Page 30: System Reliability and Resilience and stuff. Some things need to be cleared up first
Page 31: System Reliability and Resilience and stuff. Some things need to be cleared up first

Circuit Breaker

Page 32: System Reliability and Resilience and stuff. Some things need to be cleared up first
Page 33: System Reliability and Resilience and stuff. Some things need to be cleared up first

State Machine

On :: Off

Page 34: System Reliability and Resilience and stuff. Some things need to be cleared up first

On Offwhen not healthy

Page 35: System Reliability and Resilience and stuff. Some things need to be cleared up first

Off Onmanually

Page 36: System Reliability and Resilience and stuff. Some things need to be cleared up first

Get to softwarebefore we ask you to dance

Page 37: System Reliability and Resilience and stuff. Some things need to be cleared up first

Healthyor

Unhealthy

Out of Process Dependency

Page 38: System Reliability and Resilience and stuff. Some things need to be cleared up first

State is independent of requestor

Out of Process Dependency

Page 39: System Reliability and Resilience and stuff. Some things need to be cleared up first

Your

App

licat

ion Has many

independent external dependencies

Page 40: System Reliability and Resilience and stuff. Some things need to be cleared up first

Your

App

licat

ion

Can throttle itself

Page 41: System Reliability and Resilience and stuff. Some things need to be cleared up first

Your

App

licat

ion

Has a wait threshold

Page 42: System Reliability and Resilience and stuff. Some things need to be cleared up first

Your Application

External Dependency

Circuit Breaker

Threshold = 2Pause = 10msTimeout = 30sState = ClosedRequest

Request

Failure (i.e. HTTP 500)Failure Count = 1Pause 10ms

Request

Failure (i.e. HTTP 500)Failure Count = 2State = Open

OperationFailedException

Page 43: System Reliability and Resilience and stuff. Some things need to be cleared up first

Threshold = 2Pause = 10msTimeout = 30sState = OpenRequest

30s has not passed

CircuitBreakerOpenException

Request

30s has not passed

CircuitBreakerOpenException

System can try to

become healthyfor 30s

Your Application

External Dependency

Circuit Breaker

Page 44: System Reliability and Resilience and stuff. Some things need to be cleared up first

Threshold = 2Pause = 10msTimeout = 30sState = ½ OpenRequest

Request

Failure (i.e. HTTP 500)Failure Count = 2State = Open

OperationFailedException

30s has passed

Your Application

External Dependency

Circuit Breaker

Page 45: System Reliability and Resilience and stuff. Some things need to be cleared up first

Threshold = 2Pause = 10msTimeout = 30sState = ½ OpenRequest

Request

Failure Count = 0State = Closed

Response

30s has passed

Response

Your Application

External Dependency

Circuit Breaker

Page 46: System Reliability and Resilience and stuff. Some things need to be cleared up first

ClosedOpen

½ Open

Page 47: System Reliability and Resilience and stuff. Some things need to be cleared up first

½ Open is like a

manual reset

Page 48: System Reliability and Resilience and stuff. Some things need to be cleared up first

PauseTimeout

Page 49: System Reliability and Resilience and stuff. Some things need to be cleared up first

Pausebetween calls

in the loop

Page 50: System Reliability and Resilience and stuff. Some things need to be cleared up first

Timeoutbefore you

can call again

Page 51: System Reliability and Resilience and stuff. Some things need to be cleared up first

Exceptions

Page 52: System Reliability and Resilience and stuff. Some things need to be cleared up first

OperationFailed:

AggregateException

Page 53: System Reliability and Resilience and stuff. Some things need to be cleared up first

CircuitBreakerOpen:

ApplicationException

Page 54: System Reliability and Resilience and stuff. Some things need to be cleared up first

Don’t Loose Exception Info

Page 55: System Reliability and Resilience and stuff. Some things need to be cleared up first

Always use InnerException(s)

Page 56: System Reliability and Resilience and stuff. Some things need to be cleared up first

Threshold = 3State = ClosedRequest

Request

Failure (i.e. HTTP 500)Request

Failure (i.e. HTTP 500)Failure Count = 2

Failure Count = 0State = Closed

Response

Response

Request?Your

ApplicationExternal

DependencyCircuit

Breaker

Failure Count = 1

Page 57: System Reliability and Resilience and stuff. Some things need to be cleared up first

SegregateDependencies

Page 58: System Reliability and Resilience and stuff. Some things need to be cleared up first

circuitBreaker(“database”)

circuitBreaker(“weatherservice”)

Page 59: System Reliability and Resilience and stuff. Some things need to be cleared up first

Dependency type, endpoint svc,

endpoint

Page 60: System Reliability and Resilience and stuff. Some things need to be cleared up first

Where?

Page 61: System Reliability and Resilience and stuff. Some things need to be cleared up first

Your

App

licat

ion Out of Process

DependencyCi

rcui

t Bre

aker

Prox

y

Page 62: System Reliability and Resilience and stuff. Some things need to be cleared up first

Watch forInception

Page 63: System Reliability and Resilience and stuff. Some things need to be cleared up first

Your

App

licat

ion W

eb ServiceCi

rcui

t Bre

aker

Circ

uit B

reak

er

Prox

y

DatabaseRepo

sitor

y

Page 64: System Reliability and Resilience and stuff. Some things need to be cleared up first

circuit breaker• retry looping• slow down attempts• good neighbour

Page 65: System Reliability and Resilience and stuff. Some things need to be cleared up first

¡Muchas gracias!

Page 66: System Reliability and Resilience and stuff. Some things need to be cleared up first

gracias

Donald Belcham@dbelcham

[email protected]