Chapter 4 Reliable, Atomic and Causal Broadcast Presented By Kiran Simon

Chapter 4

Reliable, Atomic and Causal Broadcast

Presented By Kiran Simon

Broadcasting: 3 Properties of interest: Reliability, Consistent ordering and Causality preservation.

Reliability property Requires that a broadcast message be received by all operational nodes.

Consistent ordering propertyRequires that different messages sent by different nodes be delivered to all the nodes

Causality preservationRequires that order in which the messages are delivered are is consistent with the causality

Between the send events of these messages.

These 3 properties bring in 3 different broadcast primitives 1. Reliable Broadcast, 2. AtomicBroadcast and 3. Causal Broadcast.

Reliable Broadcast supports reliability only.

Atomic Broadcast , in addition to reliability supports ordering.

Causal Broadcast . Ensures that the order in which messages are delivereed are consistent with the causal ordering of these messages.

Reliable Broadcast using Message Forwarding.Should ensure that all the nodes get the message even if the sender node fails after

Sending message to some of the nodes.

A tree (logical not physical) is used to make sure that the message will be reached to all Nodes. The root being the sender.

Assumptions:If a node fails, then we assume that all other nodes find out about that in a finite

Time. Also we assume that each node has a copy of FAILED (set of all the nodes failed)

Over view:

Works on the basis of concept of Succesors.

The succesors gets the message from its predecessors and sends too their successors.

A node i. On receiving a message sends an ackoledgemnt to the sender. If the node does not send an acknoledgment and if the status of the node I is Failed then , it is considered as failed and the sender takes the responsibility of node i

If the root fails after sending some messages to some nodes, then some other node which has the message will have to finish the task.

All the nodes except root execute the same protocol.

Each node maintain in them sendto, ackfrom and ackto. Represeents the nodes to which the message must be send, the nodes from which the acknowledgments are expected and set of nodes to which acknowledgment has to be sent.

An Approach by piggybacking acknowledgmentsUses the Trans Protocol for reliable broadcasting. It uses positive and

negative acknowledgments on messages which are being broadcast by nodes. The basic idea is to piggy back acknowledgments and negative acknowledgmentsOn a broadcast message.

To support the protocol, each node maintains an ack-list, a nack-list, a received list and a Pending Retransmission list. Ack list- message identifiers of the messages for which the node has to send an acknowledgments

Nack-list- message identifiers of the messages for which the node has to send negative acknowledgments Received List-Messages that this code has received or sent recently and which has to be retransmitted

PR-list – Contains message identifiers of messages whose retransmission has been requested by some node

Whenever a new message has to send a new message, 1Append ack-list to m2.Append nack-list to m3.Broadcast m

If a node doesn’t get a positive acknowledgment of a message for a long time, it adds the message to the PR-list.

On receiving a message, it is saved in received list and its id is added to the ack-list. If the message is in the nack-list it is deleted from there. Also if it is in PR list, then it is deleted too.

Example A Ba Cb Dc Ecd Cb Fec In this case only c is a negative acknowledgement.

The example below is an example of message transmission where missed messages are detected transitively

Example A Ba Cb Ecd Cb Fbec Ba Gfb

Atomic Broadcast:It requires that in addition to reliability, different messages must be

delivered in all the nodes at the same order.

Unlike reliable broadcast where the message after receiving assumption was made that it is delivered to the higher layers, in atomic broadcasting it has to be ensured that the messages are delivered in the correct order.

Extention of Trans protocol: ( to satisfy ordering property)Here the negative and positive acknowledgements are appended to the

messages itself. We define observable Predicate for delivery OPD(P,A,C) where P is a node, A and C are messages. We denote that the sender of a message A by P A .If OPD(P,A,C) is true, it states that the node P is certain that PC has received and acknowledged the message A at the time of broadcasting of C

The predicate is true if and only if from the sequence of all the messages received, by deleting some of those messages, P can from a sequence of all the messages received, by deleting some of the messages P can form a sequence Sm of messages.

Example For a sequence of messages transmitted by 4 different processors.

B1 D1 A1d1 C1d1b1a1 D2a1c1 D1 C2d2d1 B2a1c2

The negative acknowledgements and acknowledgements of the messages can be represented as a graph below

B2

C2

D2 D1

C1

B1 A1

The dashed lines are for negative acknowledgements and the solid lines for positive acknowledgements.

D2 implicitly acknowledges D1 as both come from the same node. Eventually all nodes will have the same global graph as the retransmission involves the same exact original message.

OPD(P,A,C) represents that there is a path from C to A in the graph formed by the messages received by P and there is no negative acknowledgement edge from C to any node ni the path from C to A.

In the partial order, if C follows A , it implies that C acknowledges the message A and also all the messages that A acknowledges. The partial order graph for the sequence is as shown

B2

c2

C1

D2

D1

B1 A1

A Centralized method:In this method consistent ordering of messages is guaranteed by

conceptually funneling each message through cenralised message exchange.

If multiple nodes broadcast there is no surety that the messages reach their destination in a specific order. So the messages are send through a centralised message exchange. To ensure that the exchange doesn’t fail , it is rotated b/w different nodes.

The senders actually send the messages through the message exchange. Instead , a sender node directly transmits the message and on receiving , nodes save them in a buffer queue. A global sequence number s generated by the token site (which is one of the token nodes) and transmitted to all nodes for acknowledgement. The token site is rotated among a set of nodes called the token list.

2 phases for protocol –A normal phase and –Reformation phase

-normal phase- has activities which takes place when no failure occurs

-Reformation phase- Goes into reformation phase when some nodes fail.

Normal Phase:

Each node I has the following information.Mi[j] : The sequence number of the next broadcast message it

expects from a node j. A missing message can be detected when a message with a sequence number higher than expected comes.

gseqi: The next global sequence number it expects. Same as above.

3 activities normally takes place in the normal phase-- Transmitting-- Assigning Global sequence number-- Committing

Transmitting:The sender node keeps on transmitting the message till it gets ack

from the token site.

Assigning Global sequence numberThe token site acknowledges messages broadcast by nodes. The

ACK can be processed by the node only if seq=gseq and the corresponding message is in Qb. When it is processes gseq is incremented .

If seq<gseq then it is a duplicate message and if seq>gseq then there are some missed messages.

Committing:When atleast L+1 token sites have succesfully received the

broadcast message and the token sites successfully transferred L times , message is said to be committed. The committed messages are delivered by nodes in the order of their global sequence numbers.

Reformation Phase:Entered when a failure is detected. The reformation process redefines the token list. Any site that detects the failure initiates the reformation and is

called originatorThere will be different token lists at different times and so a

version number is given to the token list.A new token list will always have a higher version number than

the older one.There will be only one valid token list

The list formed becomes a valid token list only when it satisfies The majority test and the sequence test.

The majority test requires that a valid list has a majority of the nodes. Thus we can ensure that there is only one valid list at a time.

The sequence test ensures that a site joins a list with higher version number Than it belonged to before.

Also the protocol ensures that none of the messages that was committed with the old listAre lost. This is done by resiliency test.

Reformation protocol, a 3 phase protocol. Phase 1: The originator forms a new list.Phase 2: The new list is formed , which consists of all the nodes which have responded.

The majority and resiliency test are applied to the new list. Phase 3: The originator generates a new token and passes it to the new token siteWhich it accepts and starts acknowledging the message and reformation process.

The Three Phase Protocol:Assignes priorities to all the messages , the message with the lower priority

is delivered first.

Should make sure that no messages with a lower priority reach the nodes later. For this the nodes explicitly agree to a priority of messages.

* When broadcasting the node assigns a priority to the message. * Also a message has a tag “ deliverable “ and “undeliverable”.

Working:

* The sender broadcasts. the message.

* The receiver gets the message and keep it is the queue , tags them as undeliverable and assigns a priority which is one greater than the highest priority of all the messages in the queue.

* The priority is send to the sender by all the nodes and then the sender

Sets the highest priority as the global priority and send that to all the receiving nodes. * The reciever changes the priority of the messages to the new priority. The message is Tagged deliverable. The queue is sorted and the messages with the lowest priority is delevEred until a message with tag undeliverable is encountered.

Failures:The failure of a node wont cause any problem. If the sender node failes before the message is not delivered, then the node with

Message tagged undeliverable acts as the sender and the coordinator. Also a separate garbage collection scheme will be also required.

Using Synchronized clocks:Uses clocks to implement the ordering.

Here only fail stop failures are taken into account. Node failures and link failures are not Taken care of. ( Assumed it doesn’t happen)Also assumes that n/w delay is bounded. ‘Delta’ time at most for a message m to reach for Node ‘a’ to ‘b’.

Worst case message delay is D. (depends upon delta)

The clocks of 2 nodes may at most differ by beta. So the time between the 2 nodes may atmost be t+D+beta. (say X =D+beta)So the time at the new node may be t+X.

Working:* Put the timestamp on the message by the sender and also the node id. The

Node sends to the neighbours. If an intermediate node gets the message it sends to all Outgoing links.

* The schedule ends by time t+X.

* Also the messages are kept in the History of node.

* In the forwarding part, if the clock time of the intermediate node is greater Than the t + X, the schedule is ended.

* Also in the reciever node if the message it recieves is part of its history , then Also the message is discarded.

* The messages are delivered in the order of the timestamp.

A Protocol for CSMA/CD Networks:

For ethernet and like networks.Network interface does the MAC layer protocol for CSMA/CD network.NI responsible for all the MAC layer protocol activities. So there are chances that while broadcasting the nodes may miss some of

Messages and to support this reliable broadcast this protocol is used…

Assumptions:Number of nodes that may miss the broadcast message is less than the total

Number of nodes.The NI can cause a collision , any time ( even while receiving a message)

By sending a jamming signal.

Working:* Each node has a counter. Every message is attached with a sequence

Number which the current value of the counter. If no collision occurs then the Counter is incremented.

* The whole protocol works on proper usage of the sequence number.

* When a counter value is same as the sequence number , then there Are no missed messages. If the counter value is less than the sequence numberThen there are some missed messages.

* The alive nodes partitioned into 2. Nodes with the same counter values Nodes with counter values less than global sequence numbers.

* The node with an incorrect counter value should be stopped. So the missedMessages can be retransmitted. This is done by the NI by sending a jamming signal So that a collision will occur. ( This happens while the message is being received and Not after the message is received).

* Then a retransmission of all the messages in the range Counter+1 andGlobal sequence number –1 is requested.

* While the retransmission takes place the counters are not incremented

Causal Broadcast::

* Required when causal ordering of the messages are required. ( ie the delivery of message depends on the causality of the send event)

* Required for operations in distributed data bases etc.

* We can say 2 requirements for this ,weaker states that causality should be preserved and the stronger states that both the same ordering at all the nodes should be there and also the causality should be

preserved.

Causal Broadcast without total Ordering:

Here no guarantee that the messages will be delivered in the same Order at all nodes , but causality will be preserved.

To achieve this care should be taken that the messages in the delivery Queue of the nodes are in an order such that causality is preserved.

Working: When node performs a causal broadcast, then the message is added to

Buffer. If the node itself is one of the destinations , then it is added to theDelivery queue.

When the message m , which is in the buffer is transmitted to another Node, a series of messages which precedes the message m is also send with it.In the form of a transfer packet..

When the destination node receives the packet it process the messages in the order in which it is present in the transfer packet . If the node is one of the destination nodes of the message, then it is put into the delivery queue. Other wise kept in the buffer.

Assumption:Only the n/w failure which will cause the partitioning of the n/w will

Cause failures. So the network is assumed to be free of failures.

Causal Broadcast with Total Ordering :

Broadcasts the message to all the nodes in the same order and also Preserve the causality of the messages.

The nodes in the system divided into 1 primary node, n backups andThe rest simple nodes.

Uses counters and sequence numbers to disseminate the dependenciesBetween the nodes. The nodes send the messages to the primary ( PS ) to broadcast.

Each node has a counter which is used to assign the sequence Numbers.

Also each node has an array seq[ ], which holds the sequence Number of the last message send by each node.

Also each node has a variable las-msg which is used to identify the Duplicate messages.

The PS has an array expected[ ] which stores the seq number of the next Message it expects from all the nodes.

When a message is send to the PS by the node, it also sends it array seq[ ]..( for common sequence numbers for all the messages for all the nodes)

Also a sequence number is assigned to the message by the PS called the gseq. This is for globally ordering the messages.

Working:The message is send to the PS by a node together with seq[ ].

The PS broadcasts it after assigning gseq . If gseq is less than or equal to The last message received by a particular node then it is a duplicate.

The PS receives a message from a node and it checks the seq num with the expecting[ ]. If they are same then it is OK

Backup node: When the PS fails, then the backup node becomes the primary.

It sets the Ctr value to the gseq value of the last message it received. Then it requests all the nodes to resend messages with a sequence number more than or equal to expected[j].

Documents

Chapter 4 Reliable, Atomic and Causal Broadcast Presented By Kiran Simon