30
A Closer Look at Fault Tolerance Gadi Taubenfeld SRDC 2013 1 Gadi Taubenfeld

A Closer Look at Fault Tolerance - pdfs.semanticscholar.org€¦ · A Closer Look at Fault Tolerance Gadi Taubenfeld SRDC 2013 1 Gadi Taubenfeld . Example: Perfect Renaming Gadi Taubenfeld

  • Upload
    others

  • View
    4

  • Download
    0

Embed Size (px)

Citation preview

Page 1: A Closer Look at Fault Tolerance - pdfs.semanticscholar.org€¦ · A Closer Look at Fault Tolerance Gadi Taubenfeld SRDC 2013 1 Gadi Taubenfeld . Example: Perfect Renaming Gadi Taubenfeld

A Closer Look at Fault Tolerance

Gadi Taubenfeld SRDC 2013 1

Gadi Taubenfeld

Page 2: A Closer Look at Fault Tolerance - pdfs.semanticscholar.org€¦ · A Closer Look at Fault Tolerance Gadi Taubenfeld SRDC 2013 1 Gadi Taubenfeld . Example: Perfect Renaming Gadi Taubenfeld

Example: Perfect Renaming

Gadi Taubenfeld SRDC 2013 2

17 39 11 99 27

5 3 1 2 4

Page 3: A Closer Look at Fault Tolerance - pdfs.semanticscholar.org€¦ · A Closer Look at Fault Tolerance Gadi Taubenfeld SRDC 2013 1 Gadi Taubenfeld . Example: Perfect Renaming Gadi Taubenfeld

Example: Perfect Renaming

Gadi Taubenfeld SRDC 2013 3

5

39

1 2 4

1-resilient

Page 4: A Closer Look at Fault Tolerance - pdfs.semanticscholar.org€¦ · A Closer Look at Fault Tolerance Gadi Taubenfeld SRDC 2013 1 Gadi Taubenfeld . Example: Perfect Renaming Gadi Taubenfeld

Example: Perfect Renaming

Gadi Taubenfeld SRDC 2013 4

5

39

1 2

27

Not 1-resilient

Page 5: A Closer Look at Fault Tolerance - pdfs.semanticscholar.org€¦ · A Closer Look at Fault Tolerance Gadi Taubenfeld SRDC 2013 1 Gadi Taubenfeld . Example: Perfect Renaming Gadi Taubenfeld

Example: Perfect Renaming

Gadi Taubenfeld SRDC 2013 5

5

39

1 2

27 39 27 17 11 99

Not 1-resilient Not 1-resilient

Page 6: A Closer Look at Fault Tolerance - pdfs.semanticscholar.org€¦ · A Closer Look at Fault Tolerance Gadi Taubenfeld SRDC 2013 1 Gadi Taubenfeld . Example: Perfect Renaming Gadi Taubenfeld

A General Definition See paper for details

Gadi Taubenfeld SRDC 2013 6

For a given function f: N N, an algorithm is (t,f)-resilient if in the

presence of t’ faults at most f(t’) participating correct processes may

not terminate their operations, for 0 ≤ t’ ≤ t.

Not covered in this talk

Page 7: A Closer Look at Fault Tolerance - pdfs.semanticscholar.org€¦ · A Closer Look at Fault Tolerance Gadi Taubenfeld SRDC 2013 1 Gadi Taubenfeld . Example: Perfect Renaming Gadi Taubenfeld

Notation

Gadi Taubenfeld SRDC 2013 7

Correct active process

Correct process that has terminated

Faulty process

Page 8: A Closer Look at Fault Tolerance - pdfs.semanticscholar.org€¦ · A Closer Look at Fault Tolerance Gadi Taubenfeld SRDC 2013 1 Gadi Taubenfeld . Example: Perfect Renaming Gadi Taubenfeld

Wait-freedom [Herlihy 1991]

Gadi Taubenfeld SRDC 2013 8

In the presence of any number of faults, all the correct participating processes must terminate.

1 faults

2 faults

3 faults

4 faults

0 faults P1 P2 P3 P4 P5 P6

5 faults

Page 9: A Closer Look at Fault Tolerance - pdfs.semanticscholar.org€¦ · A Closer Look at Fault Tolerance Gadi Taubenfeld SRDC 2013 1 Gadi Taubenfeld . Example: Perfect Renaming Gadi Taubenfeld

Almost-wait-freedom

Gadi Taubenfeld SRDC 2013 9

In the presence of any number of faults, all the correct participating processes, except maybe one, must terminate.

1 faults

2 faults

3 faults

4 faults

0 faults P1 P2 P3 P4 P5 P6

5 faults

Page 10: A Closer Look at Fault Tolerance - pdfs.semanticscholar.org€¦ · A Closer Look at Fault Tolerance Gadi Taubenfeld SRDC 2013 1 Gadi Taubenfeld . Example: Perfect Renaming Gadi Taubenfeld

Partially-wait-freedom

Gadi Taubenfeld SRDC 2013 10

In the presence of any number of t ≤ n-1 faults all the correct participating processes, except maybe t of them, must terminate.

1 faults

2 faults

3 faults

4 faults

0 faults P1 P2 P3 P4 P5 P6

5 faults

Page 11: A Closer Look at Fault Tolerance - pdfs.semanticscholar.org€¦ · A Closer Look at Fault Tolerance Gadi Taubenfeld SRDC 2013 1 Gadi Taubenfeld . Example: Perfect Renaming Gadi Taubenfeld

Weakly-wait-freedom

Gadi Taubenfeld SRDC 2013 11

In the presence of any number of faults, if there are two or more correct participating processes then one correct participating processes must terminate.

1 faults

2 faults

3 faults

4 faults

0 faults P1 P2 P3 P4 P5 P6

5 faults

Page 12: A Closer Look at Fault Tolerance - pdfs.semanticscholar.org€¦ · A Closer Look at Fault Tolerance Gadi Taubenfeld SRDC 2013 1 Gadi Taubenfeld . Example: Perfect Renaming Gadi Taubenfeld

Technical Results

Gadi Taubenfeld SRDC 2013 12

Problem Model Weakly WF

PartiallyWF

Almost WF

Complexity

Election SM/MP

Test&set SM

Perfect renaming

SM/MP

Stack SM

Swap SM

Fetch&add SM

Consensus Set-consensus

SM/MP

SM -- Shared Memory using atomic registers

MP – Message Passing (send/receive)

Thm: There is no 1-resilient implementation

using atomic registers or messages

Page 13: A Closer Look at Fault Tolerance - pdfs.semanticscholar.org€¦ · A Closer Look at Fault Tolerance Gadi Taubenfeld SRDC 2013 1 Gadi Taubenfeld . Example: Perfect Renaming Gadi Taubenfeld

Technical Results

Gadi Taubenfeld SRDC 2013 13

Problem Model Weakly WF

PartiallyWF

Almost WF

Complexity

Election SM/MP

Test&set SM

Perfect renaming

SM/MP

Stack SM

Swap SM

Fetch&add SM

Consensus Set-consensus

SM/MP

x x x

SM -- Shared Memory using atomic registers

MP – Message Passing (send/receive)

Page 14: A Closer Look at Fault Tolerance - pdfs.semanticscholar.org€¦ · A Closer Look at Fault Tolerance Gadi Taubenfeld SRDC 2013 1 Gadi Taubenfeld . Example: Perfect Renaming Gadi Taubenfeld

Gadi Taubenfeld SRDC 2013 14

Problem Model Weakly WF

PartiallyWF

Almost WF

Complexity Upper Lower

Election SM log n +2 log n +1

Election MP O(n^2)

Test&set SM n+1 n

Perfect renaming one-shot

SM O(n log n)

Perfect renaming one-shot

MP O(n^3)

Perfect renaming Long-lived

SM O(n^2)

Technical Results

SM -- # of atomic registers

MP -- # of messages

Page 15: A Closer Look at Fault Tolerance - pdfs.semanticscholar.org€¦ · A Closer Look at Fault Tolerance Gadi Taubenfeld SRDC 2013 1 Gadi Taubenfeld . Example: Perfect Renaming Gadi Taubenfeld

Gadi Taubenfeld SRDC 2013 15

An almost-wait-free symmetric election process p program

turn = p for level = 1 to log n do repeat if done = 1 then return(0) fi if turn p then for j =1 to level - 1 do if V[j] = p then V[j] = 0 fi od return(0) fi until V[level] = 0 V[level] = p if turn p then for j =1 to level do if V[j] = p then V[j] = 0 fi od return(0) od done = 1; return(1)

0 turn 0 0 0 0 0 0 0 0 0 done V

1 log n . . .

p

Inspired by Styer & Peterson PODC 1989

Page 16: A Closer Look at Fault Tolerance - pdfs.semanticscholar.org€¦ · A Closer Look at Fault Tolerance Gadi Taubenfeld SRDC 2013 1 Gadi Taubenfeld . Example: Perfect Renaming Gadi Taubenfeld

16

An almost-wait-free symmetric test&set bit process p program

if turn 0 then return(0) fi turn = p repeat for j =1 to n-1 do if lock[j] = 0 then lock[j] = p fi od locked = 1 for j =1 to n-1 do if lock[j] p then locked = 0 fi od until turn p or locked = 1 or winner = 1 if turn p or winner = 1 then for j =1 to n-1 do if lock[j] = p then lock[j] = 0 fi od return(0) fi winner = 1; return(1)

test&set

winner = 0; turn = 0 for j =1 to n-1 do if lock[j] = p then lock[j] = 0 fi od

reset

0 turn

0

0 0 0 0 0 0 0 0

locked

lock

1 n-1 . . .

p

0 winner

(local)

Page 17: A Closer Look at Fault Tolerance - pdfs.semanticscholar.org€¦ · A Closer Look at Fault Tolerance Gadi Taubenfeld SRDC 2013 1 Gadi Taubenfeld . Example: Perfect Renaming Gadi Taubenfeld

Gadi Taubenfeld SRDC 2013 17

A trivial almost-wait-free symmetric election Program for a process with identifier my.id

counter := 0

Send my.id to all the other processes;

Each time a message is received do

if my.id < message.val then return(0) else counter := counter +1 fi

if counter =n-1 then return(1) fi

od

Is there a better algorithm ?

Page 18: A Closer Look at Fault Tolerance - pdfs.semanticscholar.org€¦ · A Closer Look at Fault Tolerance Gadi Taubenfeld SRDC 2013 1 Gadi Taubenfeld . Example: Perfect Renaming Gadi Taubenfeld

Gadi Taubenfeld SRDC 2013 18

Perfect Renaming Partially-wait-free, Long-lived

0 0 0 0 0

Almost-wait-free test&set bit

What about almost-wait-free renaming ?

1 2 3 4 5

Page 19: A Closer Look at Fault Tolerance - pdfs.semanticscholar.org€¦ · A Closer Look at Fault Tolerance Gadi Taubenfeld SRDC 2013 1 Gadi Taubenfeld . Example: Perfect Renaming Gadi Taubenfeld

Gadi Taubenfeld SRDC 2013 19

Fetch&add, Swap, Stack Partially-wait-free

Fetch&add

Swap

Stack

Test&set

+ atomic registers

WF WF

WF

WF

Almost-WF

Partially-WF

Partially-WF

Partially-WF

[Afek, Weisberger, Weisman PODC 93]

What about almost-wait-free ?

Page 20: A Closer Look at Fault Tolerance - pdfs.semanticscholar.org€¦ · A Closer Look at Fault Tolerance Gadi Taubenfeld SRDC 2013 1 Gadi Taubenfeld . Example: Perfect Renaming Gadi Taubenfeld

Open Problems

Improve our results: – Computability: Is there an almost-wait-free perfect renaming, stack, swap, f&a, … – Complexity: improve the space/message/time …

Type of faults: crash, omission, Byzantine, …

Time: asynchronous, synchronous, … Other objects: queue, …

Failure models: uniform, non-uniform

Other models: unbounded concurrency, failure detectors …

Gadi Taubenfeld SRDC 2013 20

Page 21: A Closer Look at Fault Tolerance - pdfs.semanticscholar.org€¦ · A Closer Look at Fault Tolerance Gadi Taubenfeld SRDC 2013 1 Gadi Taubenfeld . Example: Perfect Renaming Gadi Taubenfeld

Fault- tolerance What can go wrong?

Processes

Communication links

Messages

Shared memory

Timing failures

Gadi Taubenfeld SRDC 2013 21

Memory reordering

Page 22: A Closer Look at Fault Tolerance - pdfs.semanticscholar.org€¦ · A Closer Look at Fault Tolerance Gadi Taubenfeld SRDC 2013 1 Gadi Taubenfeld . Example: Perfect Renaming Gadi Taubenfeld

Example: Using Flags

Gadi Taubenfeld 22 ICDCN 2013

x and y : atomic bits, initially 0

Q: Is it possible that both processes read the value 0 ?

0 x 0 y

Process A

write.x(1)

read.y

Process B

write.y(1)

read.x

Page 23: A Closer Look at Fault Tolerance - pdfs.semanticscholar.org€¦ · A Closer Look at Fault Tolerance Gadi Taubenfeld SRDC 2013 1 Gadi Taubenfeld . Example: Perfect Renaming Gadi Taubenfeld

Gadi Taubenfeld 23 ICDCN 2013

Example: Using Flags x and y : atomic bits, initially 0

0 x 0 y

Process A

write.x(1)

read.y

Process B

write.y(1)

read.x

Fact: Many hardware architectures do not support sequential consistency because thy think it is too strong

Page 24: A Closer Look at Fault Tolerance - pdfs.semanticscholar.org€¦ · A Closer Look at Fault Tolerance Gadi Taubenfeld SRDC 2013 1 Gadi Taubenfeld . Example: Perfect Renaming Gadi Taubenfeld

Gadi Taubenfeld 24 ICDCN 2013

Example: Using Flags x and y : atomic bits, initially 0

0 x 0 y

Process A

write.x(1)

read.y

Process B

write.y(1)

read.x

Solution: Memory barriers

Page 25: A Closer Look at Fault Tolerance - pdfs.semanticscholar.org€¦ · A Closer Look at Fault Tolerance Gadi Taubenfeld SRDC 2013 1 Gadi Taubenfeld . Example: Perfect Renaming Gadi Taubenfeld

Gadi Taubenfeld 25 ICDCN 2013

Assumption: At most one memory reordering is possible.

Example: Using Flags x and y : atomic bits, initially 0

0 x 0 y

Process A

write.x(1)

read.y

Process B

write.y(1)

read.x

Page 26: A Closer Look at Fault Tolerance - pdfs.semanticscholar.org€¦ · A Closer Look at Fault Tolerance Gadi Taubenfeld SRDC 2013 1 Gadi Taubenfeld . Example: Perfect Renaming Gadi Taubenfeld

Gadi Taubenfeld 26 ICDCN 2013

Assumption: At most one memory reordering is possible.

Process A

write.x(1)

write.x(1)

read.y

Process B

write.y(1)

write.y(1)

read.x

Example: Using Flags

Page 27: A Closer Look at Fault Tolerance - pdfs.semanticscholar.org€¦ · A Closer Look at Fault Tolerance Gadi Taubenfeld SRDC 2013 1 Gadi Taubenfeld . Example: Perfect Renaming Gadi Taubenfeld

Question

Gadi Taubenfeld 27 ICDCN 2013

How can we provide some level of resiliency against

memory reordering and reduce the number of memory

barriers required.

Page 28: A Closer Look at Fault Tolerance - pdfs.semanticscholar.org€¦ · A Closer Look at Fault Tolerance Gadi Taubenfeld SRDC 2013 1 Gadi Taubenfeld . Example: Perfect Renaming Gadi Taubenfeld

Gadi Taubenfeld 28

X: atomic register

write.x(1)

write.x(2)

write.x(3)

read.x

ICDCN 2013

No reordering: x {3}

One reordering: x {2,3}

Two reordering: x {1,2,3}

X: 2-atomic register

write.x(1)

write.x(2)

write.x(3)

read.x

No reordering: x {2,3}

One reordering: x {1,2,3}

1. Design your algorithm to be correct assuming weak objects (2-atomic registers)

2. Replace the weak objects with strong objects (1-atomic registers)

Get “some” resiliency against memory

reordering (I.e., need less barriers)

Memory reordering resiliency: design strategy

Page 29: A Closer Look at Fault Tolerance - pdfs.semanticscholar.org€¦ · A Closer Look at Fault Tolerance Gadi Taubenfeld SRDC 2013 1 Gadi Taubenfeld . Example: Perfect Renaming Gadi Taubenfeld

Gadi Taubenfeld 29 ICDCN 2013

1. Design your algorithm to be correct assuming weak objects (2-atomic registers)

2. Replace the weak objects with strong objects (1-atomic registers)

Get “some” resiliency against memory

reordering (I.e., need less barriers)

Memory reordering resiliency: design strategy

What weak objects ?

How much ?

Page 30: A Closer Look at Fault Tolerance - pdfs.semanticscholar.org€¦ · A Closer Look at Fault Tolerance Gadi Taubenfeld SRDC 2013 1 Gadi Taubenfeld . Example: Perfect Renaming Gadi Taubenfeld

The End

Gadi Taubenfeld SRDC 2013 30