CSCE 668 DISTRIBUTED ALGORITHMS AND SYSTEMS Fall 2011 Prof. Jennifer Welch CSCE 668 Self Stabilization 1

CSCE 668DISTRIBUTED ALGORITHMS AND SYSTEMS

Fall 2011Prof. Jennifer WelchCSCE 668

Self Stabilization 1

Reference

CSCE 668Self Stabilization

2

Self-Stabilization, Shlomi Dolev, MIT Press, 2000. Chapter 2

Slides prepared for the book by Shlomi Dolev available at

http://www.cs.bgu.ac.il/~dolev/book/slides.html

Self-Stabilization


3

A powerful form of fault-tolerance. Starting from an arbitrary system

configuration, the algorithm is able to start working properly all on its own

Arbitrary system configuration is caused by some transient failure: message loss, corrupted memory, processor failure, loss of synchrony,…

As long as system is well-behaved sufficiently long, the algorithm can correct itself.

Paradigm has been applied to both shared memory and message passing models

Definitions


4

Execution no longer defined to start with an initial configuration instead can start with an arbitrary configuration

Depending on the problem to be solved, certain executions are considered legal, forming the set LE.

A configuration C is safe if every admissible execution starting with C is in LE.

An algorithm is self-stabilizing if every admissible execution reaches a safe configuration.

Self-Stabilization Definition


5

…………

…………

…

……

arbitraryconfiguration

safeconfiguration

legalexecution …

Communication Model


6

A "hybrid" of message passing and shared memory

Communication topology is represented as an undirected graph not necessarily fully connected

Processors correspond to vertices Corresponding to each edge (pi,pj) are two

shared read/write registers: Rij : written by pi and read by pj

Rji : written by pj and read by pi

Communication Model


7

p0

p1

p3

p2

R01R10

R12

R21

R32

R23

R31

R13

Self-Stabilizing Spanning Tree Definition


8

Every processor has a variable parent in its local state.

There is a distinguished root processor. LE consists of all admissible executions

in which the parent variables form a spanning tree rooted at root.

SS Spanning Tree Algorithm


9

Each processor has local variable parent, id of neighbor who is parent dist, estimated distance to root

Root sets dist to 0, and copies state to all its "outgoing" registers

Non-root reads neighbors' states from “incoming” registers and adopts as its parent the neighbor with the smallest distance, and sets its distance to one more

Nodes perform these actions repeatedly



10

Code for root p0:

while true do parent := dist := 0 for each neighbor pj do

R0j := 0 // write shared variable

endfor



11

Code for non-root pi:while true do

for each neighbor pj do neigh-dist[j] := Rji // read shared variable

dist := 1 + min{neigh-dist[j] : pj is a neighbor} foundParent := false

for each neighbor pj do if !foundParent and neigh-dist[j] = dist - 1 then parent := j; foundParent := true endif

Rij := dist // write shared variable endforendwhile

storage of negative valuesis not allowed

Output of Spanning Tree Algorithm


12

2

0

1

3

2

1

1

2

numbers are distancesred arrows indicate parentsblack edges are non-tree edges

root

Correctness Proof of SS ST Alg


13

Definition: Executions are partitioned into asynchronous rounds, which are the shortest segments containing at least one step by each processor.

Definition: is the degree (maximum number of neighbors) of the communication graph.

Definition: D is the diameter of the communication graph.



14

Lemma: Consider any admissible execution. There exists T1 < T2 < … < TD such that after asynchronous round Tk:(a) every proc. at distance ≤ k from root has dist = shortest path distance to root and parent variables form a BFS tree(b) every proc. at distance > k from root has dist ≥ k.



15

Proof: By induction on k.

Basis (k = 1): Let T1 = 5.

Initially all distances are nonnegative. Procs might start with program counter in the middle of

an iteration of the outer while loop; after at most 2 rounds, partial iterations are done.

After next rounds, all non-root procs have completed read for-loop at least once and computed dist: all are > 0

After next rounds, all non-root procs have completed write for-loop at least once

After next rounds, all non-root procs have completed read for-loop at least once and computed dist: every neighbor of root reads 0 from root and > 0 from every other node, so sets dist to 1 and parent to root.



16

Induction (k > 1): Assume for k - 1 and show for k. Let Tk = Tk-1 + 2.

Consider the execution just after end of asynchronous round Tk-1. After next rounds, all non-root nodes have executed write for-

loop at least once (and written their dist values). After next rounds, all non-root nodes have executed read for-

loop at least once. Suppose pi is at distance d ≤ k from root.

pi has at least one neighbor pj at distance d-1 ≤ k-1 from root, and no neighbor that is closer to the root.

By inductive hypothesis, pj's register has correct value in it and all other neighbors of pi have registers with values ≥ d-1.

Thus pi correctly computes dist and parent. Suppose pi is at distance > k from root.

Every neighbor of pi is at distance ≥ k from root. By inductive hypothesis, all their registers have values ≥ k-1. Thus pi computes dist to be ≥ k.



17

Since every processor is at most distance D from root, previous lemma implies that a correct breadth-first spanning tree has been constructed after O(D) asynchronous rounds, no matter what the starting configuration.

Another Classic SS Algorithm


18

Proposed by Dijkstra Suggested for mutual exclusion

we will view it as a "token circulation" algorithm

Uses a stronger model of computation in one atomic step, a proc can read all its

"incoming" registers and write all its "outgoing" registers

Ring Communication Topology


19

Procs are arranged in a unidirectional ring.

Only need one register for each proc.

p0 p1

p3 p2

R3

R2

R1

R0

p0 writes into R0,p1 reads from R0,etc.

Processor's States


20

Each processor's state consists solely of an integer, ranging from 0 to K - 1 (for suitable value of K)

Actually, processor just stores this information in its register.

Definition of Holding the Token


21

Proc p0 holds the token if R0 = Rn-1.

Proc pi (other than p0) holds the token if Ri ≠ Ri-1.

Self-Stabilizing Token Circulation Definition


22

LE consists of all admissible executions in which in every configuration only one processor

holds the token and every processor holds the token infinitely

often

(Note resemblance to mutual exclusion problem.)

Dijkstra's Algorithm


23

code for p0:

while true do if R0 = Rn-1 then

R0 := (R0 + 1) mod K

endifendwhile

executes atomically

code for pi, i ≠ 0:

while true do if Ri≠ Ri-1 then

Ri := Ri-1

endifendwhile

Analysis of Dijkstra's Algorithm


24

Lemma: If all registers are equal in a configuration, then the configuration is safe.

Proof: p0 p1

p3 p2

3

3

3

3 Suppose K = 5.

4

4

4

4

0

0

0

0

1



25

If execution begins with arbitrary values between 0 and K-1 in the registers, how can we show that eventually all the values will be the same (i.e., reach a safe state)?

Depends on K being large enough. Suppose K = n+1 (so there are n+1

different values). Lemma 1: In every configuration, there

is at least one integer in {0,…,K-1} that does not appear in any register.



26

Lemma 2: In every admissible execution (starting from any configuration), p0 holds the token, and thus changes R0, at least once during every n rounds.

Proof: Suppose in contradiction there is a segment of n rounds in which p0 does not change R0.

Once p1 takes a step in the first round, R1 = R0, and this equality remains true.

Once p2 takes a step in the second round, R2 = R1 = R0, and this equality remains true.

… Once pn-1 takes a step in the (n-1)-st round, Rn-1 = Rn-2 = …

= R0.

So when p0 takes a step in the n-th round, it will change R0.



27

Theorem: In any admissible execution starting at any configuration C, a safe configuration is reached within O(n2) rounds.

Proof: Let j be a value not in any register in C. By Lemma 2, p0 changes R0 (by incrementing

it) at least once every n rounds. Thus eventually R0 holds j, in configuration D,

after at most O(n2) rounds. Since other procs only copy values, no

register holds j between C and D. After at most n more rounds, the value j

propagates around the ring to pn-1.

What about Reducing K?


28

Easy to see that K = n (n different values) suffices: either there is a missing value or p0's value is unique.

Can also show that K = n - 1 (n-1 different values) suffices.

But if K < n - 1 (less than n-1 different values), then there is a counter-example.

If the strong atomicity model is weakened to our familiar read/write atomicity, then K > 2n - 2 suffices.

Documents

CSCE 668 DISTRIBUTED ALGORITHMS AND SYSTEMS Fall 2011 Prof. Jennifer Welch CSCE 668 Self Stabilization 1