Failure Detectors & Consensus

Failure Detectors & Consensus

Agenda

Unreliable Failure Detectors (CHANDRA TOUEG)

Reducibility ◊S≥◊W, ◊W≥◊S Solving Consensus using ◊S (MOSTEFAOUI RAYNAL)

Unreliable Failure Detectors

A distributed failure detector D consists of a local failure detector module Dp at each process p

When Dp suspects a process j to have crashed it adds j to suspectsp, if later on Dp realizes it made a mistake it can remove j from suspectsp

Failure detectors are defined in terms of abstract properties. Namely, two classes of completeness and four classes of accuracy.

Completeness Classes

Strong CompletenessEventually, every process that crashes is

permanently suspected by every correct process

Weak CompletenessEventually, every process that crashes is

permanently suspected by some correct process

Accuracy Classes

Strong Accuracy No process is suspected before it crashes

Weak Accuracy Some correct process is never suspected

Eventual Strong Accuracy There is a time after which correct processes are not

suspected by any correct process Eventual Weak Accuracy

There is a time after which some correct process is never suspected by any correct process.

Failure Detectors Classes

Strong

Weak

Strong Weak Eventual Strong Eventual Weak

AccuracyCompleteness

PerfectP

Q

StrongS

WeakW

Eventually Perfect◊P

◊Q

Eventually Strong◊S

Eventually Weak◊W

Reducibility

A Distributed Algorithm TD→D’ transforms a failure detector D into a failure detector D’ if it maintains a variable outputp at every process p which emulates the output using D’

TD→D’ is called a reduction algorithm and D’ is reducible to D, denoted D ≥ D’ (D’ is “weaker”)

A simple T◊S → ◊W ?

From Weak Completeness to Strong Completeness T ◊W → ◊S

Code for process p

outputp ← ΦTask 1: repeat forever

suspectsp ← ◊Wp

send(p, suspectsp) to all

Task 2: upon receiving (q, suspectsq) for some q

outputp ← (outputp U suspectsq) – {q}

◊S≥◊W && ◊W≥◊S → ◊W=◊S

Consensus

In the Consensus problem every process pi proposes a value vi and all correct processes have to decide on some value v, in relation to the set of proposed values.

More formally, a distributed consensus algorithm must satisfy: Termination: Every correct process eventually decides on some value. Validity: If a process decides v, then v was proposed by some process

(non triviality) Agreement: No two correct processes decide differently

It is impossible to solve consensus in asynchronous system even if only one process might crash [FLP]

Solving Consensus using ◊S

Code for process pi 1 ≤ i ≤ n (r=round, c=coordinator, est=estimation, v=value, n=#processes)

Task 1: ri ← 0; esti ← vi;1. while didn’t decide do 2. c ← (ri mod n) + 1; est_from_ci ← ∟; ri ← ri + 13. if (i = c) then est_from_ci ← esti

4. else then 5. wait until <EST, ri, v> is received from pc or c is suspected6. if <EST, ri, v> received then est_from_ci ← v7. send <EST, ri, est_from_ci> to all8. wait until <EST, ri, est_from_c> collected from a majority of processes9. reci ← {est_from_c | <EST, ri, est_from_c> was received}10. if reci = {v} then decide v and send <DECIDE, v> to all11. if reci = {v, ∟} then esti ← v

Task 2:1. Upon reception of <DECIDE, V> decide v and send <DECIDE, v> to all

Documents

Failure Detectors & Consensus