Election Algorithms. Topics r Issues r Detecting Failures r Bully algorithm r Ring algorithm

Preview:

Citation preview

Election Algorithms

Topics

Issues Detecting Failures Bully algorithm Ring algorithm

Readings

Van Steen and Tanenbaum: 5.4 Coulouris: 11.3

Election Algorithms

Remember using Lamport clocks for total order

Can you think of another way to do this? It turns out that you can use a sequencer.

All operations go to a sequencer The sequencer assigns numbers to each message

before the message goes to each replica What if the sequencer goes down?

Election Algorithms

Many distributed algorithms require a process to act as a coordinator.

The coordinator can be any process that organizes actions of other processes.

A coordinator may fail How is a new coordinator chosen or

elected?

Election Algorithms

Assumptions Each process has a unique number to

distinguish them. One process per machine (which suggests that

an IP address can be the unique identifier) Processes know each other’s process number Processes do not know which ones are currently

up and which ones are down. General Approach

Locate the process with the process with the highest process number and designate it as the coordinator.

Election algorithms differ in how they do this.

Issues in Dealing with Coordinator Failure

Detecting Failure• Any node might detect failure first• Multiple processes might detect failure at once.

Election• Must run without coordination• Must deal with arbitrary process failures• All nodes must agree on when election is over

and who the new coordinator is.

Detecting Failures

Timeouts are used to detect failuresT = 2Ttrans + Tprocess

• Where Ttran is maximum transmission delay and Tprocess represents the maximum delay for processing a message.

If a process fails to respond to a message request within T seconds then an election is initiated.

Bully Algorithm

When a process, P, notices that the coordinator is no longer responding to requests, it initiates an election. P sends an ELECTION message to all

processes with higher numbers. If no one responds, P wins the election and

becomes a coordinator. If one of the higher-ups answers, it takes

over. P’s job is done.

Bully Algorithm When a process gets an ELECTION message

from one of its lower-numbered colleagues: Receiver sends an OK message back to the

sender to indicate that he is alive and will take over.

Receiver holds an election, unless it is already holding one.

Eventually, all processes give up but one, and that one is the new coordinator.

The new coordinator announces its victory by sending all processes a message telling them that starting immediately it is the new coordinator.

Bully Algorithm

If a process that was previously down comes back: It holds an election. If it happens to be the highest process

currently running, it will win the election and take over the coordinator’s job.

“Biggest guy” always wins and hence the name “bully” algorithm.

The Bully Algorithm (Example)

The bully election algorithm Process 4 holds an election Process 5 and 6 respond, telling 4 to stop Now 5 and 6 each hold an election

The Bully Algorithm (Example)

d) Process 6 tells 5 to stope) Process 6 wins and tells everyone

Bully AlgorithmAnalysis

Best case The node with second highest identifier

detects failure Total messages = N-2

• One message for each of the other processes indicating the process with the second highest identifier is the new coordinator.

Worst case The node with lowest identifier detects failure.

This causes N-1 processes to initiate the election algorithm each sending messages to processes with higher identifiers.

Total messages = O(N2)

Bully Algorithm Discussion

How many processes are used to detect a coordinator failure? As many as you want. You could have all

other processes check out the coordinator. It is impossible for two processes to be

elected at the same time.

Ring Algorithm Use a ring (processes are physically or logically

ordered, so that each process knows who its successor is).

Algorithm When a process notices that coordinator is not

functioning:• Builds an ELECTION message (containing its own process

number)• Sends the message to its successor (if successor is down,

sender skips over it and goes to the next member along the ring, or the one after that, until a running process is located).

• At each step, sender adds its own process number to the list in the message.

Ring Algorithm Algorithm (continued)

When the message gets back to the process that started it all:

• Process recognizes the message that contains its own process number

• Changes message type to COORDINATOR• Circulates message once again to inform everyone

else: Who the new coordinator is (list member with highest number); Who the members of the new ring are.

• When message has circulated once, it is removed.

• Even if two ELECTIONS started at once, everyone will pick same leader since node with highest identifier is picked.

Ring Algorithm

Initiation:1. Process 4 sends an

ELECTION message to its successor (or next alive process) with its ID

Ring Algorithm

Initiation:2. Each process adds its own ID and forwards the ELECTION message

Ring Algorithm contd…

Leader Election:3. Message comes back to initiator, here the initiator is 4.4. Initiator announces the winner by sending another message around the ring

Ring Algorithm Analysis

•At best 2(N-1 ) messages are passed•One round for the ELECTION message

•One round for the COORDINATOR

•Assumes that only a single process starts an election.

•Multiple elections cause an increase in messages but no real harm done.

Summary

Synchronization between processes often requires that one process acts as a coordinator.

The coordinator is not fixed. Election algorithms determine the

coordinator.