36
Failure Detectors CS 717 Ashish Motivala Dec 6 th 2001

Failure Detectors CS 717 Ashish Motivala Dec 6 th 2001

Embed Size (px)

Citation preview

Page 1: Failure Detectors CS 717 Ashish Motivala Dec 6 th 2001

Failure Detectors

CS 717Ashish Motivala

Dec 6th 2001

Page 2: Failure Detectors CS 717 Ashish Motivala Dec 6 th 2001

Some Papers Relevant Papers• Unreliable Failure Detectors for Reliable Distributed

Systems. Tushar Deepak Chandra and Sam Toueg. Journal of the ACM.

• A gossip-style failure detection service. R. van Renesse, Y. Minsky, and M. Hayden. Middleware '98.

• Scalable Weakly-consistent Infection-style Process Group Membership protocol. Ashish Motivala, Abhinandan Das, Indranil Gupta. To be submitted to DSN 2002 tomorrow. http://www.cs.cornell.edu/gupta/swim

• On the Quality of Service of Failure Detectors. Wei Chen, Cornell University (with Sam Toueg, Advisor, and Marcos Aguilera, Contributing Author). DSN 2000.

• Fail-aware failure detectors. C. Fetzer and F. Cristian. In Proceedings of the 15th Symposium on Reliable Distributed Systems.

Page 3: Failure Detectors CS 717 Ashish Motivala Dec 6 th 2001

Asynchronous vs Synchronous Model

– No value to assumptions about process speed– Network can arbitrarily delay a message– But we assume that messages are sequenced and

retransmitted (arbitrary numbers of times), so they eventually get through.

• Failures in asynchronous model?• Usually, limited to process “crash” faults

– If detectable, we call this “fail-stop” – but how to detect?

Page 4: Failure Detectors CS 717 Ashish Motivala Dec 6 th 2001

Asynchronous vs Synchronous Model

• No value to assumptions about process speed

• Network can arbitrarily delay a message

• But we assume that messages are sequenced and retransmitted (arbitrary numbers of times), so they eventually get through.

• Assume that every process will run within bounded delay

• Assume that every link has bounded delay

• Usually described as “synchronous rounds”

Page 5: Failure Detectors CS 717 Ashish Motivala Dec 6 th 2001

Failures in Asynchronous and Synchronous Systems

• Usually, limited to process “crash” faults

• If detectable, we call this “fail-stop” – but how to detect?

• Can talk about message “omission” failures: failure to send is the usual approach

• But network assumed reliable (loss “charged” to sender)

• Process crash failures, as in asynchronous setting

• “Byzantine” failures: arbitrary misbehavior by processes

Page 6: Failure Detectors CS 717 Ashish Motivala Dec 6 th 2001

Realistic???• Asynchronous model is too weak since they

have no clocks(real systems have clocks, “most” timing meets expectations… but heavy tails)

• Synchronous model is too strong (real systems lack a way to implement synchronize rounds)

• Partially Synchronous Model: async n/w with a reliable channel

• Timed Asynchronous Model: time bounds on clock drift rates and message delays [Fetzer]

Page 7: Failure Detectors CS 717 Ashish Motivala Dec 6 th 2001

Impossibility Results

• Consensus: All processes need to agree on a value• FLP Impossibility of Consensus

– A single faulty process can prevent consensus– Realistic because a slow process is indistinguishable from a

crashed one.• Chandra/Toueg Showed that FLP Impossibility applies to

many problems, not just consensus– In particular, they show that FLP applies to group

membership, reliable multicast– So these practical problems are impossible in

asynchronous systems• They also look at the weakest condition under which

consensus can be solved

Page 8: Failure Detectors CS 717 Ashish Motivala Dec 6 th 2001

Byzantine Consensus

• Example: 3 processes, 1 is faulty (A, B, C)• Non-faulty processes A and B start with input 0 and 1,

respectively• They exchange messages: each now has a set of inputs

{0, 1, x}, where x comes from C• C sends 0 to A and 1 to B• A has {0, 1, 0} and wants to pick 0. B has {0, 1, 1} and

wants to pick 1.

• By definition, impossibility in this model means “xxx can’t always be done”

Page 9: Failure Detectors CS 717 Ashish Motivala Dec 6 th 2001

Chandra/Toueg Idea

• Theoretical Idea• Separate problem into

– The consensus algorithm itself– A “failure detector:” a form of oracle that announces

suspected failure– But the process can change its decision

• Question: what is the weakest oracle for which consensus is always solvable?

Page 10: Failure Detectors CS 717 Ashish Motivala Dec 6 th 2001

Sample properties

• Completeness: detection of every crash– Strong completeness: Eventually, every process that

crashes is permanently suspected by every correct process

– Weak completeness: Eventually, every process that crashes is permanently suspected by some correct process

Page 11: Failure Detectors CS 717 Ashish Motivala Dec 6 th 2001

Sample properties

• Accuracy: does it make mistakes?– Strong accuracy: No process is suspected before it

crashes.– Weak accuracy: Some correct process is never

suspected– Eventual {strong/ weak} accuracy: there is a time

after which {strong/weak} accuracy is satisfied.

Page 12: Failure Detectors CS 717 Ashish Motivala Dec 6 th 2001

A sampling of failure detectors

CompletenessAccuracy

Strong Weak Eventually Strong Eventually Weak

Strong PerfectP

StrongS

Eventually PerfectP

Eventually Strong S

Weak D WeakW

D Eventually Weak W

Page 13: Failure Detectors CS 717 Ashish Motivala Dec 6 th 2001

Perfect Detector?

• Named Perfect, written P• Strong completeness and strong accuracy• Immediately detects all failures• Never makes mistakes

Page 14: Failure Detectors CS 717 Ashish Motivala Dec 6 th 2001

Example of a failure detector

• The detector they call W: “eventually weak”• More commonly: W: “diamond-W”• Defined by two properties:

– There is a time after which every process that crashes is suspected by some correct process {weak completeness}

– There is a time after which some correct process is never suspected by any correct process {weak accuracy}

• Eg. we can eventually agree upon a leader. If it crashes, we eventually, accurately detect the crash

Page 15: Failure Detectors CS 717 Ashish Motivala Dec 6 th 2001

W: Weakest failure detector

• They show that W is the weakest failure detector for which consensus is guaranteed to be achieved

• Algorithm is pretty simple– Rotate a token around a ring of processes– Decision can occur once token makes it around once

without a change in failure-suspicion status for any process

– Subsequently, as token is passed, each recipient learns the decision outcome

Page 16: Failure Detectors CS 717 Ashish Motivala Dec 6 th 2001

Building systems with W

• Unfortunately, this failure detector is not implementable

• This is the weakest failure detector that solves consensus

• Using timeouts we can make mistakes at arbitrary times

Page 17: Failure Detectors CS 717 Ashish Motivala Dec 6 th 2001

Group Membership Service

XAsynchronous Lossy Networkpi

pj pi

X

pj’s Membership list

JoinLeaveFailure

Process Group

Page 18: Failure Detectors CS 717 Ashish Motivala Dec 6 th 2001

Data Dissemination using Epidemic Protocols

• Want efficiency, robustness, speed and scale• Tree distribution is efficient, but fragile and

hard configure• Gossip is efficient and robust but has high

latency. Almost linear in network load and scales O(nlogn) in detection time with number of processes.

Page 19: Failure Detectors CS 717 Ashish Motivala Dec 6 th 2001

State Monotonic Property

• A gossip message contains the state of the sender of the gossip.

• The receiver used a merge function to merge the received state and the sent state.

• Need some kind of monotonicity in state and in gossip

Page 20: Failure Detectors CS 717 Ashish Motivala Dec 6 th 2001

Simple Epidemic

• Assume a fixed population of size n• For simplicity, assume homogeneous

spreading– Simple epidemic: any one can infect any one with

equal probability

• Assume that k members are already in infected

• And that the infection occurs in rounds

Page 21: Failure Detectors CS 717 Ashish Motivala Dec 6 th 2001

Probability of Infection

• Probability Pinfect(k,n) that a particular uninfected member is infected in a round if k are already in a round if k are already infected?

• Pinfect(k,n) = 1 – P(nobody infects member)

= 1 – (1 – 1/n)k

E(#newly infected members) = (n-k)x Pinfect(k,n)

Basically its a Binomial Distribution

Page 22: Failure Detectors CS 717 Ashish Motivala Dec 6 th 2001

2 Phases

• Intuition: 2 Phases

• First Half: 1 -> n/2 Phase 1• Second Half: n/2 -> n Phase 2

• For large n, Pinfect(n/2,n) ~ 1 – (1/e)0.5 ~ 0.4

Page 23: Failure Detectors CS 717 Ashish Motivala Dec 6 th 2001

Infection and Uninfection

• Infection– Initial Growth Factor is very high about 2– At the half way mark its about 1.4– Exponential growth

• Uninfection– Slow death of uninfection to start– At half way mark its about 0.4– Exponential decline

Page 24: Failure Detectors CS 717 Ashish Motivala Dec 6 th 2001

Rounds

• Number of rounds necessary to infect the entire population is O(log n)

• Robbert uses and base of 1.585 for experiments

Page 25: Failure Detectors CS 717 Ashish Motivala Dec 6 th 2001

How the Protocol Works

• Each member maintains a list of (address heartbeat) pairs.

• Periodically each member gossips:– Increments his heartbeat– Sends (part of) list to a randomly chosen member

• On receipt of gossip, merge the lists• Each member maintains the last heartbeat of

each list member

Page 26: Failure Detectors CS 717 Ashish Motivala Dec 6 th 2001
Page 27: Failure Detectors CS 717 Ashish Motivala Dec 6 th 2001
Page 28: Failure Detectors CS 717 Ashish Motivala Dec 6 th 2001
Page 29: Failure Detectors CS 717 Ashish Motivala Dec 6 th 2001
Page 30: Failure Detectors CS 717 Ashish Motivala Dec 6 th 2001

SWIMGroup Membership Service

XAsynchronous Lossy Networkpi

pj pi

X

pj’s Membership list

JoinLeaveFailure

Process Group

Page 31: Failure Detectors CS 717 Ashish Motivala Dec 6 th 2001

System Design

• Join, Leave, Failure : broadcast to all processes• Need to detect a process failure at some

process quickly (to be able to broadcast it)• Failure Detector Protocol Specifications

– Detection Time– Accuracy– Load

Specified by application designer to SWIM

Optimized by SWIM

Page 32: Failure Detectors CS 717 Ashish Motivala Dec 6 th 2001

SWIM Failure Detector Protocol

Protocol period= T time units

X

X

K randomprocesses

pi pj

Page 33: Failure Detectors CS 717 Ashish Motivala Dec 6 th 2001

• Expected Detection time e/(e-1) protocol periods

• Load: O(K) per process– Inaccuracy probability exponential in K

• Process failures detected – in O(log N) protocol periods w.h.p.– in O(N) protocol periods deterministically

Properties

Page 34: Failure Detectors CS 717 Ashish Motivala Dec 6 th 2001

Why not Heartbeating ?

• Centralized : single failure point• All-to-all : O(N) load per process• Logical ring : unpredictability on multiple failures

Page 35: Failure Detectors CS 717 Ashish Motivala Dec 6 th 2001

0

1

2

3

4

5

6

7

Number of Processes

Mea

n Ti

me

to F

ailu

re

Det

ectio

n / R

TT

Experimental

Expected

Win2000, 100 Base-T Ethernet LANProtocol Period = 3*RTT, RTT=10 ms, K=1

LAN Scalability

Page 36: Failure Detectors CS 717 Ashish Motivala Dec 6 th 2001

Deployment

• Broadcast ‘suspicion’ before ‘declaring’ process failure• Piggyback broadcasts through ping messages

– Epidemic-style broadcast

• WAN– Load on core routers– No representatives per subnet/domain