110
Reaching Consensus: Why it can’t be done For Distributed Algorithms 2014 Presentation by Ziv Ronen Based on “Impossibility of Distributed Consensus with One Faulty Process” By: Michael J. Fischer, Nancy A. Lynch Michael S. Paterson

Reaching Consensus: Why it can’t be done

  • Upload
    stacy

  • View
    78

  • Download
    0

Embed Size (px)

DESCRIPTION

Reaching Consensus: Why it can’t be done. For Distributed Algorithms 2014 Presentation by Ziv Ronen Based on “ Impossibility  of Distributed  Consensus  with One Faulty Process ” By: Michael J. Fischer, Nancy A. Lynch Michael S. Paterson. Main Menu. The problem - PowerPoint PPT Presentation

Citation preview

Page 1: Reaching Consensus: Why it can’t be done

Reaching Consensus:Why it can’t be done

For Distributed Algorithms 2014Presentation by Ziv Ronen

Based on “Impossibility of Distributed Consensus with One Faulty Process” By:

Michael J. Fischer, Nancy A. Lynch

Michael S. Paterson

Page 2: Reaching Consensus: Why it can’t be done

2

Main Menu

• The problem• Why the problem is unsolvable• If time allow: how to solve the problem with

initial faulty processors

Page 3: Reaching Consensus: Why it can’t be done

3

The Problem:• Consensus in the real world• Our mission• Model:– Objectives– Network– Possible faults

Page 4: Reaching Consensus: Why it can’t be done

4

Consensus in the real world• There are many cases when we want that

several processors agree on an action.• Usually, it is more important that all

processors will agree on the same action then which action will be chosen.

• For example, if we have a database, we will want that any transaction will be committed by all processors or by none of them.

Page 5: Reaching Consensus: Why it can’t be done

5

Consensus in the real world-Cont.

• Such agreement in fault free network is trivial.– For instance, we can choose a leader that tell all

the other what to do.• However, real world processors are subject to

failures– They might stop working (good case).– They might go haywire (bad case). – They might become malevolent (worse case).

Page 6: Reaching Consensus: Why it can’t be done

6

Our mission

• We will want to find an algorithm that, for any decision in every network, will choose a single action to perform.

• However, we want that there will be at least two options, and that both of them can actually happen.

Page 7: Reaching Consensus: Why it can’t be done

7

Our Model - objectives• We will work on a simplified problem, in which the

processors only need to agree on a number that can be either 1 (commit) or 0 (discard).

• Initially Each processor chooses is initial number randomly (simulate decisions based on the system condition).– 1 if can commit, 0 if can’t.

• Each processor need to choose an action. After the action was chosen, it can’t be redone

• In the end, all the processors need to agree on action, meaning they all choose 1 or 0

Page 8: Reaching Consensus: Why it can’t be done

8

Our Model – objectives (cont.)

• We will required that the algorithm could return both 1 and 0 (maybe for different cases).– So “always discard” or “always commit” is not a

possible policy for our data base.

Page 9: Reaching Consensus: Why it can’t be done

9

Our Model – Network

• We will assume fully asynchronic network– If we send a message to a non-faulty processor, it

will reach it after finite, unbounded time.• We will also assume the network is fully

connected. For generality we will also assume full knowledge of direction– so any other topology can be simulated.

Page 10: Reaching Consensus: Why it can’t be done

10

Why asynchronic?If processor work

synchronic asynchronic

P1

P2

P1

P2

M2 M2

tick

Page 11: Reaching Consensus: Why it can’t be done

11

Why asynchronic?But if one fail…

synchronic asynchronic

P1

P2

P1

P2

M2 M2

tick

P2 is faulty!

Page 12: Reaching Consensus: Why it can’t be done

12

Our Model – Possible faults

• We will assume that the processors can only stop working entirely.

• We will also assume that only a single processor can malfunction in any given run.

• However, we will assume that:– Other processors can’t tell that a processor stop

working.– A processor can fail at any given time.

Page 13: Reaching Consensus: Why it can’t be done

13

Our Model - more formally• N≥2 processors.• For each processor:

– Input value Xp{0,1}, part of the problem input.– Output value yp{0,1,b}, initially b, can only change ones.– Infinite storage

• Messages are of the form (p,m) where p is the target processor and m is the message. Any processor can send such message to any other processor.

• We will assume that every message stay in a “messages buffer” between the time it was send and received.– Initially, the buffer is empty.

• Goal: at the end, for each p1,p2: yp1 = yp2 ≠b

Page 14: Reaching Consensus: Why it can’t be done

14

Our model – example, initial state

1X1=1Y1=b

2 X2=0Y2=b

3X3=1Y3=b

4X4=0Y4=b

Messages buffer

Page 15: Reaching Consensus: Why it can’t be done

15

Our model – example, different state

1X1=1Y1=b

2 X2=0Y2=0

3X3=1Y3=0

4X4=0Y4=b

Messages buffer

2,m1

4,m2

4,m3

2,m2

2,m3

Page 16: Reaching Consensus: Why it can’t be done

16

Our model – example, final state

1X1=1Y1=0

2 X2=0Y2=0

3X3=1Y3=0

4X4=0Y4=0

Messages buffer

2,m1

4,m2

2,m3

Page 17: Reaching Consensus: Why it can’t be done

17

Why Consensus is impossible:

• Intuition • Proof– Definitions– Lemma 1 – Lemma 2– Lemma 3

Page 18: Reaching Consensus: Why it can’t be done

18

Intuition

• Let show the intuition for why this is an impossible task.

• I will demonstrate on the problem of database consensus.– All the databases should have output value 1 if all working

databases have input value 1.– All the databases should have output value 0 if at least one

working database have input value 0.– In this case, working mean not failing at the beginning of

the algorithm.

Page 19: Reaching Consensus: Why it can’t be done

19

Initial state

• We will choose an initial state where both results are possible.

• In our case, if processor 1 failed during the algorithm, the result might be 1.

• Otherwise, the result should be 0.

1X1=0Y1=b

2 X2=1Y2=b

3X3=1Y3=b

4X4=1Y4=b

Page 20: Reaching Consensus: Why it can’t be done

20

case 1:

• If 1 sent is first message:

• All processors know that it can’t commit .

• The algorithm should decide 0.

1X1=0Y1=b

2 X2=1Y2=b

3X3=1Y3=b

4X4=1Y4=b0I failed to

commit

Page 21: Reaching Consensus: Why it can’t be done

21

case 2:

• If 1 failed before sending this message,the algorithm should decide without him.

• Since all other processor can commit, the algorithm should decide 1.

1X1=0Y1=b

2 X2=1Y2=b

3X3=1Y3=b

4X4=1Y4=b1

Page 22: Reaching Consensus: Why it can’t be done

22

Quasi failure

• Let say that a processor “quasi failed” if:– It may be alive or dead.– If he is alive, he will execute its next step after the

algorithm “finished” without him.

1X1=0Y1=b

ZZ

Page 23: Reaching Consensus: Why it can’t be done

23

Quasi failure - Intuition

1X1=0Y1=b

1X1=0Y1=b

Schrödinger's cat Processor

Page 24: Reaching Consensus: Why it can’t be done

24

Quasi failure – our example

• If 1 quasi failed:• The algorithm have 3

choices:1

X1=0Y1=b

2 X2=1Y2=b

3X3=1Y3=b

4X4=1Y4=b

ZZ

Page 25: Reaching Consensus: Why it can’t be done

25

Quasi failure choices (1/3)

• Decide 0.• In this case, if processor

one actually failed:• The result will be

wrong!

1X1=0Y1=b

2 X2=1Y2=b

3X3=1Y3=b

4X4=1Y4=b

ZZ

0

Page 26: Reaching Consensus: Why it can’t be done

26

Quasi failure choices (2/3)

• Decide 1.• In this case, if the

processor wake up:• The result will be

wrong!

1X1=0Y1=b

2 X2=1Y2=b

3X3=1Y3=b

4X4=1Y4=b

ZZ

1

Page 27: Reaching Consensus: Why it can’t be done

27

Quasi failure choices(3/3)

• Not deciding.• In this case, if the

processor actually failed:

• The algorithm will never decide.

1X1=0Y1=b

2 X2=1Y2=b

3X3=1Y3=b

4X4=1Y4=b

ZZ

?

Page 28: Reaching Consensus: Why it can’t be done

28

Intuition – summary• There is an initial state where both answers are possible

(Lemma 2).• There is an event in a specific processor (in our case,

processor 1 starts working and sending its message) that is occurrence, No matter when(Lemma 1), determine the outcome.

• If a processor quasi-fail, we can’t decide (because the answer depend on whether he actually fail, and we can’t know that).

• If we will not decide, then we will reach another one of those state (Lemma 3) and be stuck forever.

Page 29: Reaching Consensus: Why it can’t be done

29

Intuition – summary(cont.)

• Remember that in the example, we forced them to agree according to some policy. In the real problem (and in the following proof) we just need them to agree on the same value, no matter which.

Page 30: Reaching Consensus: Why it can’t be done

30

Proof – definitions (1/6)

• Configuration: the combination of the internal state (input, output, memory) for each processor and the messages in the buffer.

• Step: an action of on processor. For processor p, consists of:– Try receiving a message (removing it from the

messages buffer). If succeed, receive (p,m). If failed, receive (p,).

– Conduct computation. May send any finite amount of messages

Page 31: Reaching Consensus: Why it can’t be done

31

Configuration and step

1X1=1Y1=b

2 X2=0Y2=bY2=1

3 4

Messages buffer

2,m1

2,m1

2,m1

Step 1

Step 2

Page 32: Reaching Consensus: Why it can’t be done

32

Proof – definitions (2/6)

• Event e=(p,m): the receiving of message m by p– Since our processors are deterministic, the change of the

configuration by step is depend only on the received message.

– The event e=(p,) is always possible for any p. • e(C): the configuration reached from C by the event

e.• Schedule: a finite or infinite sequence σ of events.

– σ(C): The final configuration from initial configuration C

Page 33: Reaching Consensus: Why it can’t be done

33

Event and sequences

1X1=1Y1=b

2 X2=0Y2=bY2=1

3 4

Messages buffer

2,m1

2,m1

2,m1

(2,m1)

(1,)

σ =((1,),(2,m1))

Page 34: Reaching Consensus: Why it can’t be done

34

Proof – definitions (3/6)• Reachable: configuration C is reachable from C’ if schedule

σ exists so: σ(C’) = C• Accessible configuration: Configuration C is accessible if

exists an initial configuration C’ so C is reachable from C’.• DV(C): The set {v|v≠b and p:v=yp}, or the values that

were chosen by some processor.• A protocol is partially correct if:– If configuration C is accessible, |DV(C)|≤1– Two accessible configurations C,C’ exists so: DV(C)={0},

DV(C’)={1}

Page 35: Reaching Consensus: Why it can’t be done

35

Partially correctness

1X1=1Y1=bY1=0

2 X2=0Y2=bY2=1

3 4

Messages buffer

2,m1

2,m1

2,m1

DV(C)={} DV(C)={0}DV(C)={0,1}

Page 36: Reaching Consensus: Why it can’t be done

36

Proof – definitions (4/6)• Nonfaulty: processor is nonfaulty if it take infinite number of

steps.• Faulty: a Non-Nonfaulty processor (stop taking step after

some time).• Admissible: a run is admissible if it contain at most one faulty

processor and the messages buffer is fair.• Deciding: a run is deciding if eventually for some processor p,

yp≠b• A protocol P is totally correct in spite of one fault if:

– P is partially correct.– Every Admissible run in P is deciding run

Page 37: Reaching Consensus: Why it can’t be done

37

Main Theorem

• No consensus protocol is totally correct in spite of one fault

• We will assume the contrary: assume protocol P’ is totally correct in spite of one fault

Page 38: Reaching Consensus: Why it can’t be done

38

Lemma 1

• For any two disjoint finite schedule σ1 ,σ2 and initial configuration C exists: σ1(σ2(C)) = σ2(σ1(C)) – Disjoint: involving different processors.

• Proof:– From the system definition, since σ1 ,σ2 don’t

interact.

Page 39: Reaching Consensus: Why it can’t be done

39

Lemma 1 – visually

1X1=1Y1=b

2 X2=0Y2=bY2=1

3 4

Messages buffer

2,m1

1,m2

1,m3

2,m1 1,m21,m3

2,m1

4,m4

4,m4

4,m5

4,m5

Sequence 1Sequence 2

Page 40: Reaching Consensus: Why it can’t be done

40

Lemma 1 – visually (opposite order)

1X1=1Y1=b

2 X2=0Y2=bY2=1

3 4

Messages buffer

2,m1

1,m2

1,m3

2,m1 1,m21,m3

2,m1

4,m4

4,m4

4,m5

4,m5

Sequence 1Sequence 2

Page 41: Reaching Consensus: Why it can’t be done

41

Lemma 1 – visually

Normal order: Opposite order:

Page 42: Reaching Consensus: Why it can’t be done

42

Proof – definitions (5/6)

• Let FDV(C) be the union of DV(C’) for each C’ reachable from C. – If FDV(C) = {0,1}, C is bivalent.– If |FDV(C)|=1, C is univalent.– If FDV(C) = {0}, C is 0-valent.– If FDV(C) = {1}, C is 1-valent.– P’ is totally correct, so FDV(C) ≠.

• Intuitively, FDV(C) the possible decisions from configuration C.

Page 43: Reaching Consensus: Why it can’t be done

43

Lemma 2

• Lemma: There is a bivalent initial configuration.

Page 44: Reaching Consensus: Why it can’t be done

44

Lemma 2 – Proof (1/3)

• Assume otherwise:• From partial correctness, P’ have both

0-valent and 1-valent initial configurations.• Let call two initial configurations adjacent if they

differ only by a single processor input value.• Any two initial configurations can be joined by a

chain of adjacent configuration.• Hence, there are two adjacent 0-valent and 1-

valent initial configurations. explanation

Page 45: Reaching Consensus: Why it can’t be done

45

Lemma 2 – Proof (2/3)

• Remainder 1: there are two adjacent 0-valent and 1-valent initial configurations. – Let call them C0, C1 accordingly.

• C0, C1 are adjacent, so there is only one processor, p, that has different input value between them.

• Remainder 2: P’ is totally correct in spite of one fault.– So P’ should reach a decision even if a processor fail.

Page 46: Reaching Consensus: Why it can’t be done

46

Lemma 2 – Proof (3/3)

• Let R be an admissible run from C0 where p fail. From totally correctness in spite of one fault, R must reach a deciding run. Let σ be the corresponding schedule.

• If 1DV(σ(C0)) , then 1FDV(C0), but C0 is0-valent. So 1DV(σ(C0)), therefore DV(σ(C0))={0}

• However, since the only different between C0, C1 is p and p fail, σ is legal on C1 and σ(C0)σ(C1) (equal except p, which fail and therefore didn’t decide) and so DV(σ(C0))=DV(σ(C1)) ={0}, 0FDV(σ(C1)), but C1 is 1-valent.

Page 47: Reaching Consensus: Why it can’t be done

47

Proof – definitions (6/6)

• For any configuration C and event e=(p,m) so e(C) is legal, Let Rne(C) be the set of all configuration reachable from C without applying e.– Note that e can be applied on any C’Rne(C)

• Let eR(C) be {e(C’)| C’Rne(C)}• Let two configuration, C,C’ be called neighbors if

one is reachable from the other in a single step.– Equivalent to saying that an event e exists such that

C’=e(C) or C=e(C’)

Page 48: Reaching Consensus: Why it can’t be done

48

Lemma 3

• If C is bivalent then for each e=(p,m), eR(C) contain bivalent state.

Page 49: Reaching Consensus: Why it can’t be done

49

Lemma 3 – Proof (1/7)

• Let assume that every DeR(C) is univalent.• C is bivalent, and therefore, for any i{0,1} exists a i-

valent configuration Ei that is reachable from C. Let σi be a schedule that fulfill Ei=σi(C).

• let the configuration Fi be:– If eσi, Fi =e(Ei)– If eσi, then σi=σi‘(e(σi‘’)). Fi =e(σi‘’(C))

• In both cases, FieR(C), and therefore Fi is i-valent– Since either Fi is reachable from Ei or vice-versa.

Page 50: Reaching Consensus: Why it can’t be done

50

Lemma 3 – Proof (2/7)

• So, eR(C) contain both 0-valent and 1-valent configuration.

• By easy induction on the length of the schedule to Fi (when e(C) is j-valent for j≠i) there exists two neighbors C0, C1 so Di =e(Ci) is i-valent for i{0,1}.

• Without loss of generality, assume C1=e’(C0)

Page 51: Reaching Consensus: Why it can’t be done

51

“Easy Induction” (in pictures) for e(C) is 0-valent: case A (base)

C=C0

1-valent

C1

e

0-valent

e

F1

Page 52: Reaching Consensus: Why it can’t be done

52

“Easy Induction” (in pictures) for e(C) is 0-valent: case B (step)

C

1-valent

C1

e

0-valent

e

C0

0-valent

0-valent

e e

C

F1

Induction

Page 53: Reaching Consensus: Why it can’t be done

53

“Easy Induction” (in pictures) for e(C) is 0-valent: case C (contradiction )

C

0-valent

e

R

0-valent

bivalent

ee

e(R)eR(C), e(R) is bivalent, contradiction

F1

Page 54: Reaching Consensus: Why it can’t be done

54

Lemma 3 – Proof (3/7)

• Remainders: – e=(p,m).– C0, C1 are neighbors.

– Di =e(Ci) is i-valent for i{0,1}.– C1=e’(C0).– Lemma1: If two schedules are disjoints, you can

execute them in any order.

Page 55: Reaching Consensus: Why it can’t be done

55

Lemma 3 – Proof (4/7)

• Let e’=(p’,m’). – If p’≠p: the schedules σ=(e), σ’=(e’) are disjoints,

So by lemma1: D1=e(e’(C0))=σ(σ’(C0))=σ’(σ(C0))=e’(e(C0))=e’(D0).But then 1FDV(D0), contradiction.

– If p’=p: so lets look on a finite, deciding run when where p take no step. Since it mimic a single fault (quasi-fail) in p, and P’ is totally correct in spite of one fault, there is such run.

Page 56: Reaching Consensus: Why it can’t be done

56

If p’≠p:

From “Impossibility of Distributed Consensus with One Faulty Process” By: Michael J. Fischer, Nancy A. Lynch, Michael S. Paterson

Page 57: Reaching Consensus: Why it can’t be done

57

Lemma 3 – Proof (5/7)

• A deciding run Where p quasi-fail:– Let σ be the corresponding schedule.– Let A=σ(C).– A is deciding configuration, meaning |DV(A)|>0

and therefore |FDV(A)|=1(from partly correctness of P’)

– σ‘=(e’,e), σ‘’=(e) are disjoint from σ, since σ contain no event with p (p quasi-fail), and σ‘, σ‘’ contain only event with p (since p=p’).

Page 58: Reaching Consensus: Why it can’t be done

58

Lemma 3 – Proof (6/7)

• A deciding run Where p quasi-fail:– Let σ be the corresponding schedule.– Let A=σ(C).– A is deciding configuration, meaning A is univalent

(from partly correctness of P’)– σ‘=(e’,e), σ‘’=(e) are disjoint from σ, since σ

contain no event with p (p quasi-fail), and σ‘, σ‘’ contain only event with p (since p=p’).

Page 59: Reaching Consensus: Why it can’t be done

59

Lemma 3 – Proof (7/7)

• From lemma1: e(A)=σ’’(σ(C0)) = σ(σ’’(C0)) = σ(e(C0))= σ(D0),0FDV(A)

• From lemma1: e(e’(A))=σ’(σ(C0)) = σ(σ’(C0)) = σ(D1), 1FDV(A)

• But now A is bivalent, contradiction!

Page 60: Reaching Consensus: Why it can’t be done

60

If p’=p:

From “Impossibility of Distributed Consensus with One Faulty Process” By: Michael J. Fischer, Nancy A. Lynch, Michael S. Paterson

Page 61: Reaching Consensus: Why it can’t be done

61

If p’=p:

From “Impossibility of Distributed Consensus with One Faulty Process” By: Michael J. Fischer, Nancy A. Lynch, Michael S. Paterson

From Lemma 1

Page 62: Reaching Consensus: Why it can’t be done

62

If p’=p:

From “Impossibility of Distributed Consensus with One Faulty Process” By: Michael J. Fischer, Nancy A. Lynch, Michael S. Paterson

From Lemma 1

Page 63: Reaching Consensus: Why it can’t be done

63

If p’=p:

From “Impossibility of Distributed Consensus with One Faulty Process” By: Michael J. Fischer, Nancy A. Lynch, Michael S. Paterson

Two configuration That are reachable from A

Page 64: Reaching Consensus: Why it can’t be done

64

If p’=p:

From “Impossibility of Distributed Consensus with One Faulty Process” By: Michael J. Fischer, Nancy A. Lynch, Michael S. Paterson

A Bivalent butσ is deciding

Page 65: Reaching Consensus: Why it can’t be done

65

Proof – conclusion(1/4)• In order to finish the proof, we will now show an execution

that never reach a decision.• Remainder:

– A protocol P is totally correct in spite of one fault if:• P is partially correct.• Every Admissible run in P is deciding run

– A run is admissible if it contain at most one faulty processor and the messages buffer is fair.

– a run is deciding if eventually for some processor p, yp≠b (And therefore, reaching an univalent configuration).

• We will assume that P is partially correct and find an Admissible run that is not deciding

Page 66: Reaching Consensus: Why it can’t be done

66

Proof – conclusion(2/4)• First, we will define a way to assure that the run is Admissible.

Let have a queue of the processors and define stages in the following way:– The stage end when a the first process in the process queue receive

the earliest message sent to it (or no message if none was sent).– At the end of stage, the processor is removed from the head of the

queue and enter the tail.• Since each stage end with the next processor in the queue and

with the earliest message sent to it, infinite stages will mean:– Infinite step in each processor– Every message will eventually be received.

• Therefore, the run will be admissible.

Page 67: Reaching Consensus: Why it can’t be done

67

The run will be admissible

1 2

3 4

Processor Queue2314

P4 P3 P2 P1m4 m3 m2 m1

m5 m10 m7m6 m8

m9

Processor in the j entry will run after at most j stages (3)

Message at place j will be sent after at most N * j stages (4 * 3 = 12)

Page 68: Reaching Consensus: Why it can’t be done

68

The run will be admissible 1

1 2

3 4

Processor Queue2314

P4 P3 P2 P1m4 m3 m2 m1

m5 m10 m7m6 m8

m9

stage

Page 69: Reaching Consensus: Why it can’t be done

69

The run will be admissible 2

1 2

3 4

Processor Queue3142

P4 P3 P2 P1m4 m3 m10 m1

m5 m7m6 m8

m9

stage

Page 70: Reaching Consensus: Why it can’t be done

70

The run will be admissible 3

1 2

3 4

Processor Queue1423

P4 P3 P2 P1m4 m5 m10 m1

m6 m7m8m9

stage

Page 71: Reaching Consensus: Why it can’t be done

71

The run will be admissible 4

1 2

3 4

Processor Queue4231

P4 P3 P2 P1m4 m5 m10 m7

m6 m8m9

stage

Page 72: Reaching Consensus: Why it can’t be done

72

The run will be admissible

1 2

3 4

Processor Queue2314

P4 P3 P2 P1m5 m10 m7m6 m8

m9

Page 73: Reaching Consensus: Why it can’t be done

73

Proof – conclusion(3/4)

• We will assume that P is partially correct and find an Admissible run that is not deciding.– Now, let make sure that it is not deciding:1. Start from a bivalent configuration C (Lemma2)2. Let e denote the first message in the message queue

for the first processor in the processors queue. There is a bivalent configuration C’ reachable from C by a schedule that end by e (Lemma3).

3. C = C’ (stage end).4. Return to step 2.

Page 74: Reaching Consensus: Why it can’t be done

74

Proof – conclusion(4/4)

• We will assume that P is partially correct and find an Admissible run that is not deciding.– Since each stage end in bivalent configuration, the

run is not deciding.• Therefore, P is not totally correct!

Q.E.D

Page 75: Reaching Consensus: Why it can’t be done

75

THE END!Question?

exit Initially dead processors

Page 76: Reaching Consensus: Why it can’t be done

76

Chain of adjacent configuration (d=4)

0-valent 1-valent

1X1=1Y1=b

2 X2=1Y2=b

3X3=1Y3=b

4X4=0Y4=b

1X1=0Y1=b

2 X2=0Y2=b

3X3=0Y3=b

4X4=1Y4=b

Page 77: Reaching Consensus: Why it can’t be done

77

Chain of adjacent configuration

0-valent 1-valent

1X1=1Y1=b

2 X2=1Y2=b

3X3=1Y3=b

4X4=0Y4=b

1X1=0Y1=b

2 X2=0Y2=b

3X3=0Y3=b

4X4=1Y4=b

?-valent

1X1=0Y1=b

2 X2=1Y2=b

3X3=1Y3=b

4X4=0Y4=b

Page 78: Reaching Consensus: Why it can’t be done

78

Chain of adjacent configuration – case1: 1-valent

0-valent 1-valent

1X1=1Y1=b

2 X2=1Y2=b

3X3=1Y3=b

4X4=0Y4=b

1X1=0Y1=b

2 X2=0Y2=b

3X3=0Y3=b

4X4=1Y4=b

1-valent

1X1=0Y1=b

2 X2=1Y2=b

3X3=1Y3=b

4X4=0Y4=b

Page 79: Reaching Consensus: Why it can’t be done

79

Chain of adjacent configuration case2: 0-valent

0-valent 1-valent

1X1=1Y1=b

2 X2=1Y2=b

3X3=1Y3=b

4X4=0Y4=b

1X1=0Y1=b

2 X2=0Y2=b

3X3=0Y3=b

4X4=1Y4=b

0-valent

1X1=0Y1=b

2 X2=1Y2=b

3X3=1Y3=b

4X4=0Y4=b

Page 80: Reaching Consensus: Why it can’t be done

80

Chain of adjacent configuration(d=3)

0-valent 1-valent

1X1=0Y1=b

2 X2=0Y2=b

3X3=0Y3=b

4X4=1Y4=b

0-valent

1X1=0Y1=b

2 X2=1Y2=b

3X3=1Y3=b

4X4=0Y4=b

Page 81: Reaching Consensus: Why it can’t be done

81

Chain of adjacent configuration(d=3…2…1)

0-valent 1-valent

1X1=0Y1=b

2 X2=0Y2=b

3X3=0Y3=b

4X4=1Y4=b

0-valent

1X1=0Y1=b

2 X2=0Y2=b

3X3=0Y3=b

4X4=0Y4=b

Page 82: Reaching Consensus: Why it can’t be done

82

Chain of adjacent configuration(d=1)

0-valent 1-valent

1X1=0Y1=b

2 X2=0Y2=b

3X3=0Y3=b

4X4=1Y4=b

1X1=0Y1=b

2 X2=0Y2=b

3X3=0Y3=b

4X4=0Y4=b

Page 83: Reaching Consensus: Why it can’t be done

83

Initially dead processors

• Assume:– N processors.– At least L= (The majority) processors are

alive.– The processors don’t know who is alive.

• We want to reach a consensus.

Page 84: Reaching Consensus: Why it can’t be done

84

Two stages Algorithm – stage 1

• In the first stage, we will build a distributed directed graph G.

• The graphs will be built in the following way:– Each processor have a corresponding node.– Each processor send its id to any other processor.– Each processor will wait for messages from L-1

other processors.– If a message from processor i reach processor j, an

edge (i,j) will be added to the graph.

Page 85: Reaching Consensus: Why it can’t be done

85

stage 1 – Example (2 processor view point)

1

2 3

4

5

6

7

Page 86: Reaching Consensus: Why it can’t be done

86

stage 1 – Example (2 processor view point)

1

2 3

4

5

6

7

Page 87: Reaching Consensus: Why it can’t be done

87

stage 1 – Example (2 processor view point)

1

2 3

4

5

6

7

Page 88: Reaching Consensus: Why it can’t be done

88

stage 1 – Example (Global View)

1

2 3

4

5

6

7

Page 89: Reaching Consensus: Why it can’t be done

89

Two stages Algorithm – stage 2

• In the second stage, we will build a graph G+ which is the transitive closure of G, so that every processor know about enough of the graph.

• The graphs will be built in the following way:– Each processor send to all the other its:1. id.2. Initial value.3. L-1 neighbors.– Each processor wait until it received such message

from all its ancestors.

Page 90: Reaching Consensus: Why it can’t be done

90

stage 2 – Example (processor 2 view point)

1

2 3

4

5

6

7

2 ,x2, (3,4,5)

Page 91: Reaching Consensus: Why it can’t be done

91

stage 2 – Example (processor 2 view point)

1

2 3

4

5

6

7

3,x3,[2,4,5]

4,x4,[2,3,5]

5,x5,[2,4,6]

Page 92: Reaching Consensus: Why it can’t be done

92

stage 2 – Example (processor 2 view point)

1

2 3

4

5

6

7

Page 93: Reaching Consensus: Why it can’t be done

93

stage 2 – Example: transitive closure (processor 2 view point)

1

2 3

4

5

6

7

6,x6,[2,3,5]

Page 94: Reaching Consensus: Why it can’t be done

94

stage 2 – Example: transitive closure (processor 2 view point)

1

2 3

4

5

6

7

Page 95: Reaching Consensus: Why it can’t be done

95

stage 2 – Example: transitive closure (processor 2 view point)

1

2 3

4

5

6

7

Page 96: Reaching Consensus: Why it can’t be done

96

stage 2 – Example: transitive closure (processor 2 view point)

1

2 3

4

5

6

7

Page 97: Reaching Consensus: Why it can’t be done

97

stage 2 – Example: transitive closure (processor 7 view point)

1

2 3

4

5

6

7

Page 98: Reaching Consensus: Why it can’t be done

98

Clique in G+ (1/2)• Claim: G+ contain 1, and only one, clique of size L or more that is

not fully contained in other clique.• Proof by the following steps. contain at least one:

– For each k < N, because the in-degree of each node in G is L-1, if G contain a path of size k then:• G contain a cycle of size at least L.

or• G contain a path of size k+1

– Corollary: G contain a path of size N, it contain a cycle of size at least L (because option 2 is not possible).

– Corollary: G contain a cycle of size at least L.– Since G+ is a transitive closure of G, if G contain cycle of size k then G+

contain a clique of size k.

Page 99: Reaching Consensus: Why it can’t be done

99

Contain at least one Clique: Path of size L

A1

L-1…

Page 100: Reaching Consensus: Why it can’t be done

100

Contain at least one Clique: Path of size L

1A

L-2…

A2

At least

L-2…

At most

1

1

Page 101: Reaching Consensus: Why it can’t be done

101

Contain at least one Clique: Path of size L

A1

L-2…

A2

At least

L-3…

At most

2

A3

At least

L-4…

At most

1

1 1 1

Page 102: Reaching Consensus: Why it can’t be done

102

Contain at least one Clique: Path of size L

A1

L-2…

At least

L-i…

A L-1

At least

0…

Ai

…At most

i-1

A L

…At most

L-2

…At most

L-1

… …1 1 1

At least

0…

Path of size L

Page 103: Reaching Consensus: Why it can’t be done

103

Contain at least one Clique: Induction for k≥L

Path of size k-(L-1)

APath of

size (L-2)…

At most

L-2

Page 104: Reaching Consensus: Why it can’t be done

104

Contain at least one Clique: Induction for k≥L, case 1:

Cycle of at least L

Path of size k-(L-1)

APath of

size (L-2)…

At most

L-2

Page 105: Reaching Consensus: Why it can’t be done

105

Contain at least one Clique: Induction for k≥L, case 2:

Path of size k+1

Path of size k-(L-1)

APath of

size (L-2)…

At most

L-2

B

Page 106: Reaching Consensus: Why it can’t be done

106

Contain at least one Clique:

A1

A L

Ai…… Aj …

Page 107: Reaching Consensus: Why it can’t be done

107

Clique in G+ (2/2)

• Contain at most one clique:– If contain two, since L is the majority of node, then

there is a node in both clique.– From transitive, the node set that is a union of the

nodes in both clique is a clique.

Page 108: Reaching Consensus: Why it can’t be done

108

Contain at most one clique

i j

Transitivity

Page 109: Reaching Consensus: Why it can’t be done

109

Two stages Algorithm – Finish

• Claim: each living processor know about the clique.– That because each node in the graph is a child of a

processor in the clique, and therefore all nodes in the clique are ancestor of it and he will wait for them.

• The consensus: Let f be any function of the form f:({0,1} X 2|V|)->{0,1}, f known by all processor (part of there state). Then f(Unique Clique) is a binary value known by all processors.

• Consensus is reached!

Page 110: Reaching Consensus: Why it can’t be done

110

THE END!Question?

exit