84
Distributed Computing Group The Consensus Problem Roger Wattenhofer a lot of kudos to Maurice Herlihy and Costas Busch for some of their slides

D istributed C omputing G roup

  • Upload
    heller

  • View
    49

  • Download
    0

Embed Size (px)

DESCRIPTION

The Consensus Problem Roger Wattenhofer a lot of kudos to Maurice Herlihy and Costas Busch for some of their slides. D istributed C omputing G roup. Sequential Computation. thread. memory. object. object. Concurrent Computation. threads. memory. object. object. Asynchrony. - PowerPoint PPT Presentation

Citation preview

Page 1: D istributed C omputing  G roup

DistributedComputing

Group

The Consensus Problem

Roger Wattenhofer

a lot of kudos to Maurice Herlihy

and Costas Busch for some of their slides

Page 2: D istributed C omputing  G roup

Distributed Computing Group Roger Wattenhofer 2

Sequential Computation

memory

object object

thread

Page 3: D istributed C omputing  G roup

Distributed Computing Group Roger Wattenhofer 3

Concurrent Computation

memory

object object

thre

ads

Page 4: D istributed C omputing  G roup

Distributed Computing Group Roger Wattenhofer 4

Asynchrony

• Sudden unpredictable delays– Cache misses (short)– Page faults (long)– Scheduling quantum used up (really

long)

Page 5: D istributed C omputing  G roup

Distributed Computing Group Roger Wattenhofer 5

Model Summary

• Multiple threads– Sometimes called processes

• Single shared memory• Objects live in memory• Unpredictable asynchronous

delays

Page 6: D istributed C omputing  G roup

Distributed Computing Group Roger Wattenhofer 6

Road Map

• We are going to focus on principles– Start with idealized models– Look at a simplistic problem– Emphasize correctness over

pragmatism– “Correctness may be theoretical, but

incorrectness has practical impact”

Page 7: D istributed C omputing  G roup

Distributed Computing Group Roger Wattenhofer 7

You may ask yourself …

I’m no theory weenie - why all the theorems and proofs?

Page 8: D istributed C omputing  G roup

Distributed Computing Group Roger Wattenhofer 8

Fundamentalism

• Distributed & concurrent systems are hard– Failures– Concurrency

• Easier to go from theory to practice than vice-versa

Page 9: D istributed C omputing  G roup

Distributed Computing Group Roger Wattenhofer 9

The Two Generals

Red army winsIf both sides

attack together

Page 10: D istributed C omputing  G roup

Distributed Computing Group Roger Wattenhofer 10

Communications

Red armies send messengers across

valley

Page 11: D istributed C omputing  G roup

Distributed Computing Group Roger Wattenhofer 11

Communications

Messengers don’t always make

it

Page 12: D istributed C omputing  G roup

Distributed Computing Group Roger Wattenhofer 12

Your Mission

Design a protocol to ensure that red armies attack

simultaneously

Page 13: D istributed C omputing  G roup

Distributed Computing Group Roger Wattenhofer 13

Real World Generals

Date: Wed, 11 Dec 2002 12:33:58 +0100From: Friedemann Mattern <[email protected]>To: Roger Wattenhofer <[email protected]>Subject: Vorlesung

Sie machen jetzt am Freitag, 08:15 die Vorlesung Verteilte Systeme, wie vereinbart. OK? (Ich bin jedenfalls am Freitag auch gar nicht da.) Ich uebernehme das dann wieder nach den Weihnachtsferien.

Page 14: D istributed C omputing  G roup

Distributed Computing Group Roger Wattenhofer 14

Real World Generals

Date: Mi 11.12.2002 12:34From: Roger Wattenhofer <[email protected]>To: Friedemann Mattern <[email protected]>Subject: Re: Vorlesung

OK. Aber ich gehe nur, wenn sie diese Email nochmals bestaetigen... :-)

Gruesse -- Roger Wattenhofer

Page 15: D istributed C omputing  G roup

Distributed Computing Group Roger Wattenhofer 15

Real World Generals

Date: Wed, 11 Dec 2002 12:53:37 +0100From: Friedemann Mattern <[email protected]>To: Roger Wattenhofer <[email protected]>Subject: Naechste Runde: Re: Vorlesung ...

Das dachte ich mir fast. Ich bin Praktiker und mache es schlauer: Ich gehe nicht, unabhaengig davon, ob Sie diese email bestaetigen (beziehungsweise rechtzeitig erhalten). (:-)

Page 16: D istributed C omputing  G roup

Distributed Computing Group Roger Wattenhofer 16

Real World Generals

Date: Mi 11.12.2002 13:01From: Roger Wattenhofer <[email protected]>To: Friedemann Mattern <[email protected]>Subject: Re: Naechste Runde: Re: Vorlesung ...

Ich glaube, jetzt sind wir so weit, dass ich diese Emails in der Vorlesung auflegen werde...

Page 17: D istributed C omputing  G roup

Distributed Computing Group Roger Wattenhofer 17

Real World Generals

Date: Wed, 11 Dec 2002 18:55:08 +0100From: Friedemann Mattern <[email protected]>To: Roger Wattenhofer <[email protected]>Subject: Re: Naechste Runde: Re: Vorlesung ...

Kein Problem. (Hauptsache es kommt raus, dass der Prakiker am Ende der schlauere ist... Und der Theoretiker entweder heute noch auf das allerletzte Ack wartet oder wissend das das ja gar nicht gehen kann alles gleich von vornherein bleiben laesst...(:-))

Page 18: D istributed C omputing  G roup

Distributed Computing Group Roger Wattenhofer 18

Theorem

There is no non-trivial protocol that ensures the

red armies attacks simultaneously

Page 19: D istributed C omputing  G roup

Distributed Computing Group Roger Wattenhofer 19

Proof Strategy

• Assume a protocol exists• Reason about its properties• Derive a contradiction

Page 20: D istributed C omputing  G roup

Distributed Computing Group Roger Wattenhofer 20

Proof

1. Consider the protocol that sends fewest messages

2. It still works if last message lost3. So just don’t send it

– Messengers’ union happy

4. But now we have a shorter protocol!

5. Contradicting #1

Page 21: D istributed C omputing  G roup

Distributed Computing Group Roger Wattenhofer 21

Fundamental Limitation

• Need an unbounded number of messages

• Or possible that no attack takes place

Page 22: D istributed C omputing  G roup

Distributed Computing Group Roger Wattenhofer 22

You May Find Yourself …

I want a real-time YAFA compliant Two Generals

protocol using UDP datagrams running on our

enterprise-level fiber tachyion network ...

I want a real-time YAFA compliant Two Generals

protocol using UDP datagrams running on our

enterprise-level fiber tachyion network ...

Page 23: D istributed C omputing  G roup

Distributed Computing Group Roger Wattenhofer 23

You might say

I want a real-time YAFA compliant Two Generals

protocol using UDP datagrams running on our

enterprise-level fiber tachyion network ...

Yes, Ma’am, right away!Yes, Ma’am, right away!

Page 24: D istributed C omputing  G roup

Distributed Computing Group Roger Wattenhofer 24

You might say

I want a real-time dot-net compliant Two Generals

protocol using UDP datagrams running on our

enterprise-level fiber tachyion network ...

Yes, Ma’am, right away!

Advantage:•Buys time to find another job•No one expects software to work anyway

Advantage:•Buys time to find another job•No one expects software to work anyway

Page 25: D istributed C omputing  G roup

Distributed Computing Group Roger Wattenhofer 25

You might say

I want a real-time dot-net compliant Two Generals

protocol using UDP datagrams running on our

enterprise-level fiber tachyion network ...

Yes, Ma’am, right away!

Advantage:•Buys time to find another job•No one expects software to work anyway

Disadvantage:•You’re doomed•Without this course, you may not even know you’re doomed

Disadvantage:•You’re doomed•Without this course, you may not even know you’re doomed

Page 26: D istributed C omputing  G roup

Distributed Computing Group Roger Wattenhofer 26

You might say

I want a real-time YAFA compliant Two Generals

protocol using UDP datagrams running on our

enterprise-level fiber tachyion network ...

I can’t find a fault-tolerant algorithm, I guess I’m just a

pathetic loser.

I can’t find a fault-tolerant algorithm, I guess I’m just a

pathetic loser.

Page 27: D istributed C omputing  G roup

Distributed Computing Group Roger Wattenhofer 27

You might say

I want a real-time YAFA compliant Two Generals

protocol using UDP datagrams running on our

enterprise-level fiber tachyion network ...

I can’t find a fault-tolerant algorithm, I guess I’m just a

pathetic loser.

I can’t find a fault-tolerant algorithm, I guess I’m just a

pathetic loser.

Advantage:•No need to take course

Advantage:•No need to take course

Page 28: D istributed C omputing  G roup

Distributed Computing Group Roger Wattenhofer 28

You might say

I want a real-time YAFA compliant Two Generals

protocol using UDP datagrams running on our

enterprise-level fiber tachyion network ...

I can’t find a fault-tolerant algorithm, I guess I’m just a

pathetic loser

I can’t find a fault-tolerant algorithm, I guess I’m just a

pathetic loser

Advantage:•No need to take course

Advantage:•No need to take courseDisadvantage:

•Boss fires you, hires University St. Gallen graduate

Disadvantage:•Boss fires you, hires University St. Gallen graduate

Page 29: D istributed C omputing  G roup

Distributed Computing Group Roger Wattenhofer 29

You might say

I want a real-time YAFA compliant Two Generals

protocol using UDP datagrams running on our

enterprise-level fiber tachyion network ...

Using skills honed in course, I can avert certain disaster!

•Rethink problem spec, or•Weaken requirements, or•Build on different platform

Using skills honed in course, I can avert certain disaster!

•Rethink problem spec, or•Weaken requirements, or•Build on different platform

Page 30: D istributed C omputing  G roup

Distributed Computing Group Roger Wattenhofer 30

Consensus: Each Thread has a Private Input

32 1921

Page 31: D istributed C omputing  G roup

Distributed Computing Group Roger Wattenhofer 31

They Communicate

Page 32: D istributed C omputing  G roup

Distributed Computing Group Roger Wattenhofer 32

They Agree on Some Thread’s Input

1919 19

Page 33: D istributed C omputing  G roup

Distributed Computing Group Roger Wattenhofer 33

Consensus is important

• With consensus, you can implement anything you can imagine…

• Examples: with consensus you can decide on a leader, implement mutual exclusion, or solve the two generals problem

Page 34: D istributed C omputing  G roup

Distributed Computing Group Roger Wattenhofer 34

You gonna learn

• In some models, consensus is possible

• In some other models, it is not

• Goal of this and next lecture: to learn whether for a given model consensus is possible or not … and prove it!

Page 35: D istributed C omputing  G roup

Distributed Computing Group Roger Wattenhofer 35

Consensus #1shared memory

• n processors, with n > 1• Processors can atomically read or

write (not both) a shared memory cell

Page 36: D istributed C omputing  G roup

Distributed Computing Group Roger Wattenhofer 36

Protocol (Algorithm?)

• There is a designated memory cell c.• Initially c is in a special state “?”

• Processor 1 writes its value v1 into c, then decides on v1.

• A processor j (j not 1) reads c until j reads something else than “?”, and then decides on that.

Page 37: D istributed C omputing  G roup

Distributed Computing Group Roger Wattenhofer 37

Unexpected Delay

Swapped outback at

??? ???

Page 38: D istributed C omputing  G roup

Distributed Computing Group Roger Wattenhofer 38

Heterogeneous Architectures

??? ???

Pentium

Pentium

286

yawn

(1)

Page 39: D istributed C omputing  G roup

Distributed Computing Group Roger Wattenhofer 39

Fault-Tolerance

??? ???

Page 40: D istributed C omputing  G roup

Distributed Computing Group Roger Wattenhofer 40

Consensus #2wait-free shared memory

• n processors, with n > 1• Processors can atomically read or

write (not both) a shared memory cell

• Processors might crash (halt)• Wait-free implementation… huh?

Page 41: D istributed C omputing  G roup

Distributed Computing Group Roger Wattenhofer 41

Wait-Free Implementation

• Every process (method call) completes in a finite number of steps

• Implies no mutual exclusion• We assume that we have wait-free

atomic registers (that is, reads and writes to same register do not overlap)

Page 42: D istributed C omputing  G roup

Distributed Computing Group Roger Wattenhofer 42

A wait-free algorithm…

• There is a cell c, initially c=“?”• Every processor i does the

followingr = Read(c);

if (r == “?”) then Write(c, vi); decide vi;

else decide r;

Page 43: D istributed C omputing  G roup

Distributed Computing Group Roger Wattenhofer 43

Is the algorithm correct?

time

cell c32

17

?

??

3217

32!

17!

Page 44: D istributed C omputing  G roup

Distributed Computing Group Roger Wattenhofer 44

Theorem:No wait-free consensus

??? ???

Page 45: D istributed C omputing  G roup

Distributed Computing Group Roger Wattenhofer 45

Proof Strategy

• Make it simple– n = 2, binary input

• Assume that there is a protocol• Reason about the properties of any

such protocol• Derive a contradiction

Page 46: D istributed C omputing  G roup

Distributed Computing Group Roger Wattenhofer 46

Wait-Free Computation

• Either A or B “moves”• Moving means

– Register read– Register write

A moves B moves

Page 47: D istributed C omputing  G roup

Distributed Computing Group Roger Wattenhofer 47

The Two-Move TreeInitial state

Final states

Page 48: D istributed C omputing  G roup

Distributed Computing Group Roger Wattenhofer 48

Decision Values

1 0 0 1 1 1

Page 49: D istributed C omputing  G roup

Distributed Computing Group Roger Wattenhofer 49

Bivalent: Both Possible

1 1 1

bivalent

1 0 0

Page 50: D istributed C omputing  G roup

Distributed Computing Group Roger Wattenhofer 50

Univalent: Single Value Possible

1 1 1

univalent

1 0 0

Page 51: D istributed C omputing  G roup

Distributed Computing Group Roger Wattenhofer 51

1-valent: Only 1 Possible

0 1 1 1

1-valent

01

Page 52: D istributed C omputing  G roup

Distributed Computing Group Roger Wattenhofer 52

0-valent: Only 0 possible

1 1 1

0-valent

1 0 0

Page 53: D istributed C omputing  G roup

Distributed Computing Group Roger Wattenhofer 53

Summary

• Wait-free computation is a tree• Bivalent system states

– Outcome not fixed

• Univalent states– Outcome is fixed– Maybe not “known” yet– 1-Valent and 0-Valent states

Page 54: D istributed C omputing  G roup

Distributed Computing Group Roger Wattenhofer 54

Claim

Some initial system state is bivalent

(The outcome is not always fixed from the start.)

Page 55: D istributed C omputing  G roup

Distributed Computing Group Roger Wattenhofer 55

A 0-Valent Initial State

• All executions lead to decision of 0

0 0

Page 56: D istributed C omputing  G roup

Distributed Computing Group Roger Wattenhofer 56

A 0-Valent Initial State

• Solo execution by A also decides 0

0

Page 57: D istributed C omputing  G roup

Distributed Computing Group Roger Wattenhofer 57

A 1-Valent Initial State

• All executions lead to decision of 1

1 1

Page 58: D istributed C omputing  G roup

Distributed Computing Group Roger Wattenhofer 58

A 1-Valent Initial State

• Solo execution by B also decides 1

1

Page 59: D istributed C omputing  G roup

Distributed Computing Group Roger Wattenhofer 59

A Univalent Initial State?

• Can all executions lead to the same decision?

0 1

Page 60: D istributed C omputing  G roup

Distributed Computing Group Roger Wattenhofer 60

State is Bivalent

• Solo execution by A must decide 0

• Solo execution by B must decide 1

0 1

Page 61: D istributed C omputing  G roup

Distributed Computing Group Roger Wattenhofer 61

0-valent

Critical States

1-valent

critical

Page 62: D istributed C omputing  G roup

Distributed Computing Group Roger Wattenhofer 62

Critical States

• Starting from a bivalent initial state

• The protocol can reach a critical state– Otherwise we could stay bivalent

forever– And the protocol is not wait-free

Page 63: D istributed C omputing  G roup

Distributed Computing Group Roger Wattenhofer 63

From a Critical State

c

If A goes first, protocol decides 0

If B goes first, protocol decides 1

0-valent 1-valent

Page 64: D istributed C omputing  G roup

Distributed Computing Group Roger Wattenhofer 64

Model Dependency

• So far, memory-independent!• True for

– Registers– Message-passing– Carrier pigeons– Any kind of asynchronous

computation

Page 65: D istributed C omputing  G roup

Distributed Computing Group Roger Wattenhofer 65

What are the Threads Doing?

• Reads and/or writes• To same/different registers

Page 66: D istributed C omputing  G roup

Distributed Computing Group Roger Wattenhofer 66

Possible Interactions

x.read() y.read() x.write() y.write()

x.read() ? ? ? ?

y.read() ? ? ? ?

x.write() ? ? ? ?

y.write() ? ? ? ?

Page 67: D istributed C omputing  G roup

Distributed Computing Group Roger Wattenhofer 67

Reading Registers

A runs solo,

decides 0

B reads x

1

0

A runs solo,

decides 1

c

States look the same to

A

Page 68: D istributed C omputing  G roup

Distributed Computing Group Roger Wattenhofer 68

Possible Interactions

x.read() y.read() x.write() y.write()

x.read() no no no no

y.read() no no no no

x.write() no no ? ?

y.write() no no ? ?

Page 69: D istributed C omputing  G roup

Distributed Computing Group Roger Wattenhofer 69

Writing Distinct Registers

A writes y B writes x

10

c

The song remains the same

A writes yB writes x

Page 70: D istributed C omputing  G roup

Distributed Computing Group Roger Wattenhofer 70

Possible Interactions

x.read() y.read() x.write() y.write()

x.read() no no no no

y.read() no no no no

x.write() no no ? no

y.write() no no no ?

Page 71: D istributed C omputing  G roup

Distributed Computing Group Roger Wattenhofer 71

Writing Same Registers

States look the same to

A

A writes x B writes x

1A runs solo, decides 1

c

0

A runs solo, decides 0 A writes x

Page 72: D istributed C omputing  G roup

Distributed Computing Group Roger Wattenhofer 72

That’s All, Folks!

x.read() y.read() x.write() y.write()

x.read() no no no no

y.read() no no no no

x.write() no no no no

y.write() no no no no

Page 73: D istributed C omputing  G roup

Distributed Computing Group Roger Wattenhofer 73

Theorem

• It is impossible to solve consensus using read/write atomic registers– Assume protocol exists– It has a bivalent initial state– Must be able to reach a critical state– Case analysis of interactions

• Reads vs others• Writes vs writes

Page 74: D istributed C omputing  G roup

Distributed Computing Group Roger Wattenhofer 74

What Does Consensus have to do with Distributed

Systems?

Page 75: D istributed C omputing  G roup

Distributed Computing Group Roger Wattenhofer 75

We want to build a Concurrent FIFO Queue

Page 76: D istributed C omputing  G roup

Distributed Computing Group Roger Wattenhofer 76

With Multiple Dequeuers!

Page 77: D istributed C omputing  G roup

Distributed Computing Group Roger Wattenhofer 77

A Consensus Protocol

2-element array

FIFO Queue with red and black balls

8

Coveted red ball Dreaded black ball

Page 78: D istributed C omputing  G roup

Distributed Computing Group Roger Wattenhofer 78

Protocol: Write Value to Array

0 10

Page 79: D istributed C omputing  G roup

Distributed Computing Group Roger Wattenhofer 79

0

Protocol: Take Next Item from Queue

0 18

Page 80: D istributed C omputing  G roup

Distributed Computing Group Roger Wattenhofer 80

0 1

Protocol: Take Next Item from Queue

I got the coveted red ball, so I will decide my

value

I got the dreaded black ball, so I will decide the

other’s value from the array8

Page 81: D istributed C omputing  G roup

Distributed Computing Group Roger Wattenhofer 81

Why does this Work?

• If one thread gets the red ball• Then the other gets the black ball• Winner can take her own value• Loser can find winner’s value in

array– Because threads write array

before dequeuing from queue

Page 82: D istributed C omputing  G roup

Distributed Computing Group Roger Wattenhofer 82

Implication

• We can solve 2-thread consensus using only– A two-dequeuer queue– Atomic registers

Page 83: D istributed C omputing  G roup

Distributed Computing Group Roger Wattenhofer 83

Implications

• Assume there exists– A queue implementation from atomic

registers

• Given– A consensus protocol from queue and

registers

• Substitution yields– A wait-free consensus protocol from atomic

registers contradiction

Page 84: D istributed C omputing  G roup

Distributed Computing Group Roger Wattenhofer 84

Corollary

• It is impossible to implement a two-dequeuer wait-free FIFO queue with read/write shared memory.

• This was a proof by reduction; important beyond NP-completeness…