Upload
sandeep-poonia
View
1.120
Download
1
Tags:
Embed Size (px)
DESCRIPTION
CLOCK SYNCHRONIZATION Time ordering and clock synchronization Leader election Mutual exclusion Distributed transactions Deadlock detection
Citation preview
DISTRIBUTED OPERATING
SYSTEMS
Sandeep Kumar Poonia
CANONICAL PROBLEMS IN DISTRIBUTED SYSTEMS
Time ordering and clock synchronization
Leader election
Mutual exclusion
Distributed transactions
Deadlock detection
THE IMPORTANCE OF SYNCHRONIZATION
Because various components of a distributedsystem must cooperate and exchange information,synchronization is a necessity.
Various components of the system must agree onthe timing and ordering of events. Imagine abanking system that did not track the timing andordering of financial transactions. Similar chaoswould ensure if distributed systems were notsynchronized.
Constraints, both implicit and explicit, are thereforeenforced to ensure synchronization of components.
CLOCK SYNCHRONIZATION
As in non-distributed systems, the knowledge of “when events occur” is necessary.
However, clock synchronization is often more difficult in distributed systems because there is no ideal time source, and because distributed algorithms must sometimes be used.
Distributed algorithms must overcome:
Scattering of information
Local, rather than global, decision-making
CLOCK SYNCHRONIZATION
Time is unambiguous in centralized systems
System clock keeps time, all entities use this for
time
Distributed systems: each node has own
system clock
Crystal-based clocks are less accurate (1 part in
million)
Problem: An event that occurred after another
may be assigned an earlier time
LACK OF GLOBAL TIME IN DS
It is impossible to guarantee that
physical clocks run at the same
frequency
Lack of global time, can cause problems
Example: UNIX make
Edit output.c at a client
output.o is at a server (compile at server)
Client machine clock can be lagging behind
the server machine clock
LACK OF GLOBAL TIME – EXAMPLE
When each machine has its own clock, an
event that occurred after another event may
nevertheless be assigned an earlier time.
LOGICAL CLOCKS
For many problems, internal consistency of
clocks is important
Absolute time is less important
Use logical clocks
Key idea:
Clock synchronization need not be absolute
If two machines do not interact, no need to
synchronize them
More importantly, processes need to agree on
the order in which events occur rather than the
time at which they occurred
EVENT ORDERING Problem: define a total ordering of all events that
occur in a system
Events in a single processor machine are totally
ordered
In a distributed system:
No global clock, local clocks may be unsynchronized
Can not order events on different machines using local
times
Key idea [Lamport ]
Processes exchange messages
Message must be sent before received
Send/receive used to order events (and synchronize
clocks)
LOGICAL CLOCKS
Often, it is not necessary for a computer to know the exact time, only relative time. This is known as “logical time”.
Logical time is not based on timing but on the ordering of events.
Logical clocks can only advance forward, not in reverse.
Non-interacting processes cannot share a logical clock.
Computers generally obtain logical time using interrupts to update a software clock. The more interrupts (the more frequently time is updated), the higher the overhead.
LAMPORT’S LOGICAL CLOCK SYNCHRONIZATION ALGORITHM
The most common logical clock synchronization algorithm
for distributed systems is Lamport‟s Algorithm. It is used in
situations where ordering is important but global time is not
required.
Based on the “happens-before” relation:
Event A “happens-before” Event B (A→B) when all
processes involved in a distributed system agree that
event A occurred first, and B subsequently occurred.
This DOES NOT mean that Event A actually occurred
before Event B in absolute clock time.
LAMPORT’S LOGICAL CLOCK SYNCHRONIZATION
ALGORITHM
A distributed system can use the “happens-before” relation when:
Events A and B are observed by the same process, or by multiple processes with the same global clock
Event A acknowledges sending a message and Event B acknowledges receiving it, since a message cannot be received before it is sent
If two events do not communicate via messages, they are considered concurrent – because order cannot be determined and it does not matter. Concurrent events can be ignored.
LAMPORT’S LOGICAL CLOCK SYNCHRONIZATION
ALGORITHM (CONT.)
In the previous examples, Clock C(a) < C(b)
If they are concurrent, C(a) = C(b)
Concurrent events can only occur on the same system,
because every message transfer between two systems
takes at least one clock tick.
In Lamport‟s Algorithm, logical clock values for events may
be changed, but always by moving the clock forward. Time
values can never be decreased.
An additional refinement in the algorithm is often used:
If Event A and Event B are concurrent. C(a) = C(b), some unique
property of the processes associated with these events can be used
to choose a winner. This establishes a total ordering of all events.
Process ID is often used as the tiebreaker.
LAMPORT’S LOGICAL CLOCK
SYNCHRONIZATION ALGORITHM (CONT.)
Lamport‟s Algorithm can thus be used in distributed
systems to ensure synchronization:
A logical clock is implemented in each node in
the system.
Each node can determine the order in which
events have occurred in that system’s own point
of view.
The logical clock of one node does not need to
have any relation to real time or to any other
node in the system.
EVENT ORDERING USING HB
Goal: define the notion of time of an event such
that
If A-> B then C(A) < C(B)
If A and B are concurrent, then C(A) <, = or > C(B)
Solution:
Each processor maintains a logical clock LCi
Whenever an event occurs locally at I, LCi = LCi+1
When i sends message to j, piggyback Lci
When j receives message from i
If LCj < LCi then LCj = LCi +1 else do nothing
Claim: this algorithm meets the above goals
PROCESS EACH WITH ITS OWN CLOCK
•At time 6 , Process 0 sends message A to Process 1
•It arrives to Process 1 at 16( It took 10 ticks to make journey
•Message B from 1 to 2 takes 16 ticks
•Message C from 2 to 1 leaves at
60 and arrives at 56 -Not Possible
•Message D from 1 to 0 leaves at
64 and arrives at 54 -Not Possible
LAMPORT’S ALGORITHM CORRECTS THE CLOCKS
Use „happened-before‟
relation
Each message carries the
sending time (as per sender‟s
clock)
When arrives, receiver fast
forwards its clock to be one
more than the sending time.
(between every two events,
the clock must tick at least
once)
PHYSICAL CLOCKS
The time difference between two computers is known as drift. Clock drift over time is known as skew. Computer clock manufacturers specify a maximum skew rate in their products.
Computer clocks are among the least accurate modern timepieces.
Inside every computer is a chip surrounding a quartz crystal oscillator to record time. These crystals cost 25 seconds to produce.
Average loss of accuracy: 0.86 seconds per day
This skew is unacceptable for distributed systems. Several methods are now in use to attempt the synchronization of physical clocks in distributed systems:
PHYSICAL CLOCKS
17th Century: time has been measured
astronomically
Solar Day: interval between two consecutive
transit of sun
Solar Second: 1/86400th of solar day
PHYSICAL CLOCKS 1948: Atomic Clocks are invented
Accurate clocks are atomic oscillators (one part in 1013)
BIH decide TAI(International Atomic Time)
TAI seconds is now about 3 msec less than solar day
BIH solves the problem by introducing leap seconds
Whenever discrepancy between TAI and solar time grow to
800 msec
This time is called Universal Coordinated Time(UTC)
When BIH announces leap second, the power companies
raise their frequency to 61 & 51 Hz for 60 & 50 sec, to
advance all the clocks in their distribution area.
PHYSICAL CLOCKS - UTC
Coordinated Universal Time
(UTC) is the international
time standard.
UTC is the current term for
what was commonly
referred to as Greenwich
Mean Time (GMT).
Zero hours UTC is midnight in
Greenwich, England, which
lies on the zero longitudinal
meridian.
UTC is based on a 24-hour
clock.
PHYSICAL CLOCKS Most clocks are less accurate (e.g., mechanical watches)
Computers use crystal-based blocks (one part in million)
Results in clock drift
How do you tell time?
Use astronomical metrics (solar day)
Coordinated universal time (UTC) – international standard
based on atomic time
Add leap seconds to be consistent with astronomical time
UTC broadcast on radio (satellite and earth)
Receivers accurate to 0.1 – 10 ms
Need to synchronize machines with a master or with one
another
CLOCK SYNCHRONIZATION
Each clock has a maximum drift rate 1- dC/dt <= 1+
Two clocks may drift by 2 t in time t
To limit drift to resynchronize after every
2 seconds
CHRISTIAN’S ALGORITHM
Assuming there is one time server with UTC:
Each node in the distributed system periodically polls the time server.
Time( treply) is estimated as t + (Treply + Treq)/2
This process is repeated several times and an average is provided.
Machine Treply then attempts to adjust its time.
Disadvantages:
Must attempt to take delay between server Treply and time server into account
Single point of failure if time server fails
CRISTIAN’S ALGORITHM
Synchronize machines to a
time server with a UTC
receiver
Machine P requests time from
server every seconds
Receives time t from
server, P sets clock to
t+treply where treply is the
time to send reply to P
Use (treq+treply)/2 as an
estimate of treply
Improve accuracy by
making a series of
measurements
PROBLEM WITH CRISTIAN’S ALGORITHM
Major Problem
Time must never run
backward
If sender‟s clock is
fast, CUTC will be
smaller than the
sender‟s current value
of C
Minor Problem
It takes nonzero time
for the time server‟s
reply
This delay may be large
and vary with network
load
SOLUTION
Major Problem
Control the clock
Suppose that the timer set
to generate 100 intrpt/sec
Normally each interrupt
add 10 msec to the time
To slow down add only 9
msec
To advance add 11 msec to
the time
Minor Problem
Measure it
Make a series of
measurements for accuracy
Discard the measurements
that exceeds the threshold
value
The message that came
back fastest can be taken to
be the most accurate.
BERKELEY ALGORITHM
Used in systems without UTC receiver
Keep clocks synchronized with one another
One computer is master, other are slaves
Master periodically polls slaves for their times
Average times and return differences to slaves
Communication delays compensated as in Cristian‟s
algo
Failure of master => election of a new master
BERKELEY ALGORITHM
a) The time daemon asks all the other machines for their clock values
b) The machines answer
c) The time daemon tells everyone how to adjust their clock
30
DECENTRALIZED AVERAGING ALGORITHM
Each machine on the distributed system has a daemon
without UTC.
Periodically, at an agreed-upon fixed time, each machine
broadcasts its local time.
Each machine calculates the correct time by averaging
all results.
31
NETWORK TIME PROTOCOL (NTP)
Enables clients across the Internet to be synchronized accurately to UTC. Overcomes large and variable message delays
Employs statistical techniques for filtering, based on past quality of servers and several other measures
Can survive lengthy losses of connectivity: Redundant servers
Redundant paths to servers
Provides protection against malicious interference through authentication techniques
32
NETWORK TIME PROTOCOL (NTP) (CONT.)
Uses a hierarchy of servers located across the Internet.
Primary servers are directly connected to a UTC time
source.
33
NETWORK TIME PROTOCOL (NTP) (CONT.)
NTP has three modes: Multicast Mode:
Suitable for user workstations on a LAN
One or more servers periodically multicasts the time to other machines on the network.
Procedure Call Mode: Similar to Christian‟s Algorithm
Provides higher accuracy than Multicast Mode because delays are compensated.
Symmetric Mode: Pairs of servers exchange pairs of timing messages that contain
time stamps of recent message events.
The most accurate, but also the most expensive mode
Although NTP is quite advanced, there is still a drift of 20-35 milliseconds!!!
MORE PROBLEMS
Causality
Vector timestamps
Global state and termination detection
Election algorithms
LOGICAL CLOCKS
For many DS algorithms, associating
an event to an absolute real time is
not essential, we only need to know
an unambiguous order of events
Lamport's timestamps
Vector timestamps
LOGICAL CLOCKS (CONT.)
Synchronization based on “relative time”.
“relative time” may not relate to the “real
time”. Example: Unix make (Is output.c updated after the
generation of output.o?)
What‟s important is that the processes in
the Distributed System agree on the
ordering in which certain events occur.
Such “clocks” are referred to as Logical
Clocks.
EXAMPLE: WHY ORDER MATTERS?
Replicated accounts in Jaipur(JP) and Bikaner(BN)
Two updates occur at the same time
Current balance: $1,000
Update1: Add $100 at BN; Update2: Add interest of 1% at JP
Whoops, inconsistent states!
LAMPORT ALGORITHM
Clock synchronization does not have to be
exact
Synchronization not needed if there is no
interaction between machines
Synchronization only needed when machines
communicate
i.e. must only agree on ordering of interacting
events
LAMPORT’S “HAPPENS-BEFORE” PARTIAL
ORDER
Given two events e & e`, e < e` if:
1. Same process: e <i e`, for some
process Pi
2. Same message: e = send(m) and
e`=receive(m) for some message m
3. Transitivity: there is an event e* such
that e < e* and e* < e`
CONCURRENT EVENTS
Given two events e & e`:
If neither e < e` nor e`< e, then e || e`
P1
P2
P3
Real Time
a b
c
f
d
e
m1
m2
LAMPORT LOGICAL CLOCKS
Substitute synchronized clocks with a global
ordering of events
ei < ej LC(ei) < LC(ej)
LCi is a local clock: contains increasing values
each process i has own LCi
Increment LCi on each event occurrence
within same process i, if ej occurs before ek
LCi(ej) < LCi(ek)
If es is a send event and er receives that send,
then
LCi(es) < LCj(er)
LAMPORT ALGORITHM
Each process increments local clock
between any two successive events
Message contains a timestamp
Upon receiving a message, if received
timestamp is ahead, receiver fast forward
its clock to be one more than sending
time
LAMPORT ALGORITHM (CONT.)
Timestamp
Each event is given a timestamp t
If es is a send message m from pi, then
t=LCi(es)
When pj receives m, set LCj value as follows
If t < LCj, increment LCj by one
Message regarded as next event on j
If t ≥ LCj, set LCj to t+1
LAMPORT’S ALGORITHM ANALYSIS (1)
Claim: ei < ej LC(ei) < LC(ej)
Proof: by induction on the length of the
sequence of events relating to ei and ej
P1
P2
P3
Real Timea b
c
f
d
e
m1
m2
1 2
3
5
4
1
g
5
LAMPORT’S ALGORITHM ANALYSIS (2) LC(ei) < LC(ej) ei < ej ?
Claim: if LC(ei) < LC(ej), then it is notnecessarily true that ei < ej
P1
P2
P3
Real Timea b
c
f
d
e
m1
m2
1 2
3
5
4
1
g
2
TOTAL ORDERING OF EVENTS
Happens before is only a partial order
Make the timestamp of an event e of
process Pi be: (LC(e),i)
(a,b) < (c,d) iff a < c, or a = c and b < d
APPLICATION: TOTALLY-ORDERED MULTICASTING
Message is timestamped with sender‟s
logical time
Message is multicast (including sender itself)
When message is received
It is put into local queue
Ordered according to timestamp
Multicast acknowledgement
Message is delivered to applications only
when
It is at head of queue
It has been acknowledged by all involved
processes
APPLICATION: TOTALLY-ORDERED MULTICASTING
Update 1 is time-stamped and multicast. Added to local queues.
Update 2 is time-stamped and multicast. Added to local queues.
Acknowledgements for Update 2 sent/received. Update 2 can now be processed.
Acknowledgements for Update 1 sent/received. Update 1 can now be processed.
(Note: all queues are the same, as the timestamps have been used to ensure the “happens-before” relation holds.)
LIMITATION OF LAMPORT’S ALGORITHM
ei < ej LC(ei) < LC(ej)
However, LC(ei) < LC(ej) does not imply ei < ej
for instance, (1,1) < (1,3), but events a and e are concurrent
P1
P2
P3
Real Timea b
c
f
d
e
m1
m2
(1,1) (2,1)
(3,2)
(5,3)
(4,2)
(1,3)
g
(2,3)
VECTOR TIMESTAMPS
Pi‟s clock is a vector VTi[]
VTi[i] = number of events Pi has
stamped
VTi[j] = what Pi thinks number of
events Pj has stamped (i j)
VECTOR TIMESTAMPS (CONT.)
Initialization the vector timestamp for each process is
initialized to (0,0,…,0)
Local eventwhen an event occurs on process Pi, VTi[i]
VTi[i] + 1
e.g., on processor 3, (1,2,1,3) (1,2,2,3)
Message passingwhen Pi sends a message to Pj, the message
has timestamp t[]=VTi[]
when Pj receives the message, it sets VTj[k] to max (VTj[k],t[k]), for k = 1, 2, …, N
e.g., P2 receives a message with timestamp (3,2,4) and P2‟s timestamp is (3,4,3), then P2 adjust its timestamp to (3,4,4)
VECTOR TIMESTAMPS (CONT.)
COMPARING VECTORS
VT1 = VT2 iff VT1[i] = VT2[i] for all i
VT1 VT2 iff VT1[i] VT2[i] for all i
VT1 < VT2 iff VT1 VT2 & VT1 VT2
for instance, (1, 2, 2) < (1, 3, 2)
VECTOR TIMESTAMP ANALYSIS
Claim: e < e‟ iff e.VT < e‟.VT
P1
P2
P3
Real Timea b
c
f
d
e
m1
m2
[1,0,0]
[2,0,0]
[2,1,0]
[2,2,3]
[2,2,0]
[0,0,1]
g
[0,0,2]
APPLICATION: CAUSALLY-ORDERED MULTICASTING
For ordered delivery, we also need…
Multicast msgs (reliable but may be out-of-order)
Vi[i] is only incremented when sending
When k gets a msg from j, with timestamp ts,
the msg is buffered until:
1: ts[j] = Vk[j] + 1
(this is the next timestamp that k is expecting from j)
2: ts[i] ≤ Vk[i] for all i ≠ j
(k has seen all msgs that were seen by j when j sent the
msg)
CAUSALLY-ORDERED MULTICASTING
P2
a
P1
c
d
P3
e
g
[1,0,0]
[1,0,0][1,0,0]
[1,0,1]
[1,0,1]
Post a
r: Reply a
Message a arrives at P2 before the reply r from P3 does
b
[1,0,1]
[0,0,0][0,0,0]
[0,0,0]
CAUSALLY-ORDERED MULTICASTING (CONT.)P2
a
P1 P3
d
g
[1,0,0]
[1,0,0]
[1,0,1]
Post a
r: Reply a
Buffered
c
[1,0,0]
The message a arrives at P2 after the reply from P3; The reply is
not delivered right away.
b
[1,0,1]
[0,0,0][0,0,0]
[0,0,0]
Deliver r
ORDERED COMMUNICATION
Totally ordered multicast
Use Lamport timestamps
Causally ordered multicast
Use vector timestamps
VECTOR CLOCKS
Each process i maintains a vector Vi
Vi[i] : number of events that have occurred at i
Vi[j] : number of events I knows have occurred at process j
Update vector clocks as follows Local event: increment Vi[I]
Send a message :piggyback entire vector V
Receipt of a message: Vj[k] = max( Vj[k],Vi[k] ) Receiver is told about how many events the sender knows
occurred at another process k
Also Vj[i] = Vj[i]+1
GLOBAL STATE
Global state of a distributed system
Local state of each process
Messages sent but not received (state of the queues)
Many applications need to know the state of the
system
Failure recovery, distributed deadlock detection
Problem: how can you figure out the state of a
distributed system?
Each process is independent
No global clock or synchronization
Distributed snapshot: a consistent global state
GLOBAL STATE (1)
a) A consistent cut
b) An inconsistent cut
DISTRIBUTED SNAPSHOT ALGORITHM
Assume each process communicates with another process using unidirectional point-to-point channels (e.g, TCP connections)
Any process can initiate the algorithm
Checkpoint local state
Send marker on every outgoing channel
On receiving a marker
Checkpoint state if first marker and send marker on outgoing channels, save messages on all other channels until:
Subsequent marker on a channel: stop saving state for that channel
DISTRIBUTED SNAPSHOT
A process finishes when
It receives a marker on each incoming channel
and processes them all
State: local state plus state of all channels
Send state to initiator
Any process can initiate snapshot
Multiple snapshots may be in progress
Each is separate, and each is distinguished by tagging
the marker with the initiator ID (and sequence
number)
A
C
BM
M
SNAPSHOT ALGORITHM EXAMPLE
a) Organization of a process and channels for a distributed
snapshot
SNAPSHOT ALGORITHM EXAMPLE
b) Process Q receives a marker for the first time and records its local state
c) Q records all incoming message
d) Q receives a marker for its incoming channel and finishes recording the state of the incoming channel
TERMINATION DETECTION
Detecting the end of a distributed computation
Notation: let sender be predecessor, receiver be successor
Two types of markers: Done and Continue
After finishing its part of the snapshot, process Q sends a Done or a Continue to its predecessor
Send a Done only when All of Q‟s successors send a Done
Q has not received any message since it check-pointed its local state and received a marker on all incoming channels
Else send a Continue
Computation has terminated if the initiator receives Done messages from everyone
DISTRIBUTED SYNCHRONIZATION
Distributed system with multiple processes may
need to share data or access shared data
structures
Use critical sections with mutual exclusion
Single process with multiple threads
Semaphores, locks, monitors
How do you do this for multiple processes in a
distributed system?
Processes may be running on different machines
Solution: lock mechanism for a distributed
environment
Can be centralized or distributed
CENTRALIZED MUTUAL EXCLUSION
Assume processes are numbered
One process is elected coordinator (highest ID process)
Every process needs to check with coordinator before entering the critical section
To obtain exclusive access: send request, await reply
To release: send release message
Coordinator: Receive request: if available and queue empty, send
grant; if not, queue request
Receive release: remove next request from queue and send grant
MUTUAL EXCLUSION:
A CENTRALIZED ALGORITHM
a) Process 1 asks the coordinator for permission to enter a critical region. Permission is granted
b) Process 2 then asks permission to enter the same critical region. The coordinator does not reply.
c) When process 1 exits the critical region, it tells the coordinator, when then replies to 2
PROPERTIES
Simulates centralized lock using blocking calls
Fair: requests are granted the lock in the order they were
received
Simple: three messages per use of a critical section
(request, grant, release)
Shortcomings:
Single point of failure
How do you detect a dead coordinator?
A process can not distinguish between “lock in use” from a dead
coordinator
No response from coordinator in either case
Performance bottleneck in large distributed systems
DISTRIBUTED ALGORITHM
[Ricart and Agrawala]: needs 2(n-1) messages
Based on event ordering and time stamps
Process k enters critical section as follows
Generate new time stamp TSk = TSk+1
Send request(k,TSk) all other n-1 processes
Wait until reply(j) received from all other processes
Enter critical section
Upon receiving a request message, process j
Sends reply if no contention
If already in critical section, does not reply, queue request
If wants to enter, compare TSj with TSk and send reply if TSk<TSj,
else queue
A DISTRIBUTED ALGORITHM
a) Two processes want to enter the same critical region at the same moment.
b) Process 0 has the lowest timestamp, so it wins.
c) When process 0 is done, it sends an OK also, so 2 can now enter the critical region.
PROPERTIES
Fully decentralized
N points of failure!
All processes are involved in all decisions
Any overloaded process can become a
bottleneck
ELECTION ALGORITHMS
Many distributed algorithms need one process
to act as coordinator
Doesn‟t matter which process does the job, just
need to pick one
Election algorithms: technique to pick a unique
coordinator (aka leader election)
Examples: take over the role of a failed
process, pick a master in Berkeley clock
synchronization algorithm
Types of election algorithms: Bully and Ring
algorithms
BULLY ALGORITHM
Each process has a unique numerical ID
Processes know the Ids and address of every other process
Communication is assumed reliable
Key Idea: select process with highest ID
Process initiates election if it just recovered from failure or if coordinator failed
3 message types: election, OK, I won
Several processes can initiate an election simultaneously Need consistent result
O(n2) messages required with n processes
BULLY ALGORITHM DETAILS Any process P can initiate an election
P sends Election messages to all process with higher Ids and awaits OK messages
If no OK messages, P becomes coordinator and sends I won messages to all process with lower Ids
If it receives an OK, it drops out and waits for an I won
If a process receives an Election msg, it returns an OK and starts an election
If a process receives a I won, it treats sender an coordinator
BULLY ALGORITHM EXAMPLE
The bully election algorithm
Process 4 holds an election
Process 5 and 6 respond, telling 4 to stop
Now 5 and 6 each hold an election
BULLY ALGORITHM EXAMPLE
d) Process 6 tells 5 to stop
e) Process 6 wins and tells everyone
LAST CLASS
Vector timestamps
Global state
Distributed Snapshot
Election algorithms
TODAY: STILL MORE CANONICAL
PROBLEMS
Election algorithms
Bully algorithm
Ring algorithm
Distributed synchronization and mutual
exclusion
Distributed transactions
ELECTION ALGORITHMS
Many distributed algorithms need one process
to act as coordinator
Doesn‟t matter which process does the job, just
need to pick one
Election algorithms: technique to pick a unique
coordinator (aka leader election)
Examples: take over the role of a failed
process, pick a master in Berkeley clock
synchronization algorithm
Types of election algorithms: Bully and Ring
algorithms
BULLY ALGORITHM
Each process has a unique numerical ID
Processes know the Ids and address of every other process
Communication is assumed reliable
Key Idea: select process with highest ID
Process initiates election if it just recovered from failure or if coordinator failed
3 message types: election, OK, I won
Several processes can initiate an election simultaneously
Need consistent result
BULLY ALGORITHM DETAILS
Any process P can initiate an election
P sends Election messages to all process with higher Ids and awaits OK messages
If no OK messages, P becomes coordinator and sends I wonmessages to all process with lower Ids
If it receives an OK, it drops out and waits for an I won
If a process receives an Election msg, it returns an OK and starts an election
If a process receives a I won, it treats sender an coordinator
BULLY ALGORITHM EXAMPLE
The bully election algorithm
Process 4 holds an election
Process 5 and 6 respond, telling 4 to stop
Now 5 and 6 each hold an election
BULLY ALGORITHM EXAMPLE
d) Process 6 tells 5 to stop
e) Process 6 wins and tells everyone
RING-BASED ELECTION
Processes have unique Ids and arranged in a logical ring
Each process knows its neighbors
Select process with highest ID
Begin election if just recovered or coordinator has failed
Send Election to closest downstream node that is alive
Sequentially poll each successor until a live node is found
Each process tags its ID on the message
Initiator picks node with highest ID and sends a coordinator
message
Multiple elections can be in progress
Wastes network bandwidth but does no harm
A RING ALGORITHM
Election algorithm using a ring.
COMPARISON
Assume n processes and one election in
progress
Bully algorithm
Worst case: initiator is node with lowest ID
Triggers n-2 elections at higher ranked nodes: O(n2)
msgs
Best case: immediate election: n-2 messages
Ring
2 (n-1) messages always
A TOKEN RING ALGORITHM
a) An unordered group of processes on a network.
b) A logical ring constructed in software.
• Use a token to arbitrate access to critical section
• Must wait for token before entering CS
• Pass the token to neighbor once done or if not interested
• Detecting token loss in non-trivial
COMPARISON
A comparison of three mutual exclusion
algorithms.
AlgorithmMessages per
entry/exit
Delay before entry (in
message times)Problems
Centralized 3 2 Coordinator crash
Distributed 2 ( n – 1 ) 2 ( n – 1 )Crash of any
process
Token ring 1 to 0 to n – 1Lost token, process
crash
TRANSACTIONS
Transactions provide higher
level mechanism for atomicity
of processing in distributed
systems
Have their origins in databases
Banking example: Three
accounts A:$100, B:$200,
C:$300
Client 1: transfer $4 from A to
B
Client 2: transfer $3 from C to
B
Result can be inconsistent
unless certain properties are
imposed on the accesses
Client 1 Client 2
Read A: $100
Write A: $96
Read C: $300
Write C:$297
Read B: $200
Read B: $200
Write B:$203
Write B:$204
ACID PROPERTIES
Atomic: all or nothing
Consistent: transaction takes
system from one consistent
state to another
Isolated: Immediate effects
are not visible to other
(serializable)
Durable: Changes are
permanent once transaction
completes (commits)
Client 1 Client 2
Read A: $100
Write A: $96
Read B: $200
Write B:$204
Read C: $300
Write C:$297
Read B: $204
Write B:$207