Upload
karin-norton
View
233
Download
1
Tags:
Embed Size (px)
Citation preview
Message PassingMessage Passing
Inter Process Communication
• Original sharing (shared-data approach)
Shared memory
P1 P2
P1 P2
• Copy sharing (message passing approach)• Basic IPC mechanism in distributed systems
Desirable Features of a Good MPS
• Simple• Clean & simple semantics to avoid worry about system
or network aspects
• Uniform Semantics• Local communication• Remote communication
• Efficiency• Aim to reduce no. of messages exchanged
• Reliability• Cope with failure problems & guaranteed delivery of
messages. Also handle duplicate messages.
• Correctness• Handle group communication
• Atomicity• Ordered Delivery• Survivability
• Flexibility• Users have flexibility to choose & specify type & level of
reliability & correctness requirement
• Security• Secure end to end communication
• Portability• Message passing system & applications using it should
be portable.
Message Structure
A block of information formatted by a sending process such that it is meaningful to receiving process.
Various issues like who is sender/receiver, what if node crashes, receiver not ready etc have to be dealt with.
Sending process address
Receiving process address
Type( Actual data or pointer to data)
Number of bytes /elements
AddressesSequence number or message ID
Structural informationActual data or pointer to the data Variable size data
Fixed length header
Synchronization
Synchronization is achieved by communication primitives– Blocking – Nonblocking
The two types of semantics are used on both send & receive primitives.
Complexities in synchronization– How receiver knows when message is received in
message buffer in non blocking receive?• Polling • Interrupt
– Blocking send/receive could get blocked forever if receiver/sender crashes or message is lost.
• Timeout
When both send and receive primitives use blocking semantics.
Synchronous Communication
Execution resumed
Send(ack)Execution resumed
Send(msg);
Execution suspended
Receive(msg);
Execution supended
Receiver’s execution
Sender’s execution
Msg
Ack
Blocked stateExecution state
Synchronous vs. Asynchronous Communication
Synchronous Communication– Advantages
• Simple & easy to implement• Reliable
– Disadvantages• Limits concurrency• Can lead to communication deadlock• Less flexible as compared to asynchronous• Hardware is more expensive
Asynchronous Communication – Advantages
• Doesn't require synchronization of both communication sides
• Cheap, timing is not as critical as for synchronous transmission, therefore hardware can be made cheaper
• Set-up is very fast, well suited for applications where messages are generated at irregular intervals
• Allows more parallelism
– Disadvantages• Large relative overhead, a high proportion of the
transmitted bits are uniquely for control purposes and thus carry no useful information
• Not very reliable
Buffering
Null Buffer (No Buffering) Single Message Buffer Unbounded Capacity Buffer Finite Bound ( Multiple Message) Buffer
Null Buffer Involves single copy.
Can be implemented in following ways:– Sender sends only when receives acknowledgement from
receiver i.e. receiver executes ‘receive’. It remains blocked otherwise.
– After executing ‘send’, sender waits for acknowledgement. If not received within timeout period, it assumes message was discarded & resends.
Not suitable for asynchronous transmission. Receiver blocked till entire message transferred over network.
MessageSending process Message
Receiving process
Single Message Buffer
Used in Synchronous Communication
Single message buffer on receiver’s side.
Message buffer may be in kernel’s or receiver’s address space
Transfer involves two copy operations
MessageSending process
Receiving process
Single msg buffer
Nodeboundary
Unbounded Capacity Buffer
Used in asynchronous communication.
As sender does not wait for receiver to be ready, all unreceived messages can be stored for later delivery.
Practically impossible
Finite Bound Buffer
Used in asynchronous communication.
Msg 1
Msg 2
Msg 3
Msg n
Multiple-message Buffer/ mailbox / port
Message
Sending process
Receiving process
Buffer overflow is possible. Can be dealt in two ways:– Unsuccessful communication
• Message transfer fails when there is no more buffer space. Less reliable.
– Flow-controlled communication• Sender is blocked until the receiver accepts some
messages, creating space in buffer. This requires some synchronization, thus not truly asynchronous.
Message buffer may be in kernel’s or receiver’s address space
Extra overhead for buffer management.
Multidatagram Messages
Maximum transfer unit (MTU) - data that can be transmitted at a time.
Packet (datagram) – Message data + control information.
Single datagram message - Messages smaller than MTU of the network can be sent in a single packet (datagram).
Multidatagram messages - Messages larger than MTU have to be fragmented and sent in multiple packets.
Disassembling and reassembling in sequence, of packets of multidatagram messages, on the receiver side is responsibility of the message passing system.
Encoding and Decoding of message data
Structure of the program objects should be preserved when they are transmitted from sender’s address space to receiver’s address space. Difficult as:-• An absolute pointer value looses its meaning when
transferred from one address space to another. Ex. Tree. Necessary to send object-type information also.
• There must be some way for receiver to identify which program object is stored where in message buffer & how much space each program object occupies.
• Encoding – program objects converted into stream by sender • Decoding – reconstruction of program objects from message
data
Representations used for encoding & decoding: Tagged representation
– Type of each program object along with its value is encoded in the message
– Quantity of data transferred more– Time taken to encode/ decode data is more
Untagged representation– Message data contains only program objects. Receiving
process should have prior knowledge on how to decode data as it is not self-describing.
Process Addressing
Explicit addressing• Send (process_id , msg)• Receive (process_id , msg)
Implicit addressing• Send_any (service_id , msg) //functional addressing• Receive_any (process_id , msg)
Methods for Process Addressing
machine_id@local_id – machine address @ receiving process identifier– Local ids need to be unique for only one machine– Does not support process migration
machine_id@local_id@machine_id– machine on which process is created @ its local process
identifier @ last known location of process– Link based addressing – link information left on previous
node– A mapping table maintained by kernel for all processes
created on another node but running on this node.– Current location of receiving process is sent to sender,
which it caches.– Drawbacks
• Overload of locating process large if process migrated many times.
• Not possible to locate process if intermediate node is down.
Both methods location non-transparent
Location Transparent Process Addressing
Centralized process identifier allocator – counter– Not reliable & scalable
Two-level naming scheme– High level machine independent name, low level machine
dependent name– Name server maintains mapping table– Kernel of sending machine obtains low level name of
receiving process from name server and also caches it– When process migrates only low level name changes– Used in functional addressing– Not scalable & reliable.
Failure Handling
Loss of request msg
Lost
Sender Receiver
Send request
Loss of response msg
Send request
Lost
Sender Receiver
Response message
Request message
Successful request execution
Send response
Unsuccessful execution of the request
Send request
Sender
Request message
Receiver
Unsuccessful request execution
crash
Restarted
Four message reliable IPC protocol
Acknowledgment
Reply
Request
Acknowledgment
client server
Blocked stateExecution state
Three message reliable IPC protocol
Acknowledgment
Reply
Request
client server
Blocked stateExecution state
Two message reliable IPC protocol
Reply
Request
client server
Blocked stateExecution state
Fault Tolerant Communication
Send request
Lost
Client Server
Response msg
Request message
Successful request execution
Lost
Response Msg
Send request
Retransmit Request Msg
Retransmit Request Msg
Retransmit Request Msg
Send request
Send request
Crash
Unsuccessful request execution
Successful request execution
Timeout
Timeout
Timeout
At – least once semantics
Idempotency
Repeatability
An idempotent operation produces the same result without any side effect no matter how many times it is performed with the same arguments..
debit(amount) if (balance ≥ amount) { balance = balance-amount; return (“Success”, balance);} else return (“Failure, balance);
end;
request
Debit(100)Process debit routine balance =1000-100=900
(success , 900)
response
lost
Return (success , 900)
Send request
Time out
Retransmit request
Response
(success , 800)
Process debit routine balance=900-100=800
Send request
Server (balance = 1000)Client
Handling Duplicate Request
Using the timeout-based retransmission of request , the server may execute the same request message more than once.
If the execution is non-idempotent, its repeated execution will destroy the consistency of information.
Exactly–once semantics is used, which ensures that only one execution of server’s operation is performed.
Use a unique identifier for every request that the client makes and to set up a reply cache in the kernel’s address space on the server machine to cache replies.
Req -1
Req-id Reply
Reply cache
No Match found , so process request-1
Receive balance
=900
Send request-1
Time out
Client Server (balance=1000)
Check reply cache for request - 1
Match found
Extract reply
Return ( success , 900)(Success,900)
response
Debit (100)
Retransmit request -1
Lost
Debit (100)
Request-1
Check reply cache for request - 1
Save reply
Return (success,900)
Send request-1
(success,900)
Ques. Which of the following operations are idempotent?i. Read_next_record(filename)
ii. Read_record(filename, record_no)
iii. Append_record(filename, record)
iv. Write_record(filename, after_record_n,record)
v. Seek(filename, position)
vi. Add(integer1,integer2)
vii. Increment(variable_name)
Handling lost and out-of-sequence packets in multidatagram messages
Stop-and-wait protocol– Acknowledge each packet separately– Communication Overhead
Blast protocol – Single acknowledgement for all packets. What if ?
• Packets are lost in communication• Packets are received out of sequence
– Use bitmap to identify the packets of message.– Header has two extra fields- total no. of packets,
position of this packet in complete message.– Selective repeat send is implemented for
unreceived packets. – If receiver sends (5,01001), sender sends back the
1st & 4th packet again.
Group Communication
• One to many
• Many to one
• Many to many
One to Many
Multicast Communication Broadcast Communication
Open Group – Any process can send message to group as a
whole. Group of replicated servers. Closed Group
– Only members of a group can send message to the group. Collection of processors doing parallel processing.
Group Management
Centralized group server – Create & delete groups dynamically & allow
processes to join or leave group– Poor reliability & scalability
Distributed Approach– Open group – outsider can send a message to
all group members announcing its presence– Closed group also have to be open with respect
to joining
Group Addressing
Two-level naming scheme– High level group name
• ASCII name independent of location of processes in group
• Used by user applications– Low level group name
• Multicast address / Broadcast address• One to one communication (Unicast) to implement
group communication– Low level name :- List of machine identifiers of all
machines belonging to a group– Packets sent = no. of machines in group
Centralized group server
Multicast Multicast is asynchronous communication
– Sending process can’t wait for response of all receivers– Sending process not aware of all receivers
Unbuffered Multicast/ Buffered Multicast
Send to all semantics– Message sent to each process of multicast group
Bulletin Board semantics– Message addressed to channel that acts like bulletin
board– Receiving process copies message from channel– Relevance of message to receiver depends on its state– Messages not accepted within a certain time after
transmission may no longer be useful
Flexible Reliability in Multicast
0-reliable 1-reliable m out of n reliable All reliable
Atomic Multicast– All - or - nothing property – Required for all - reliable semantics– Involves repeated retransmissions by sender– What if sender/ receiver crashes or goes down?– Include message identifier & field to indicate atomic
multicast– Receiver also performs atomic multicast of message
Group Communication Primitives
send
send_group– Simplifies design & implementation of group
communication– Indicates whether to use name server or group
server– Can include extra parameter to specify degree
of reliability or atomicity
Many to one Communication
Multiple senders – one receiver. Selective receiver
– Accepts from unique sender Non selective receiver
– Accepts from any sender from a specified group
Many-to-many Communication
Ordered message delivery– All messages are delivered to all receivers in an
order acceptable to the application– Requires message sequencing
No ordering constraint for message delivery
S1 R1 R2 S2
m1
m2
m2
Timem1
Absolute Ordering
Messages delivered to all receivers in the exact order in which they were sent
Use global timestamps as message identifiers & sliding window protocol with it
S1 R1 R2 S2
m1
m2
m2
Timem1
t2
t1 < t2
t1
Consistent Ordering
All messages are delivered to all receiver process in the same order.
This order may be different from the order in which messages were sent.
S1 R1 R2 S2
m1
m2
m2Time
m1
t2
t1 < t2
t1
Centralized Algorithm– Kernels of sending machines send messages to a single
receiver (sequencer) that assigns a sequence no. to each message then multicasts it.
Distributed algorithm– Sender assigns temporary sequence no. larger than
previous sequence nos., & sends to group.– Each member returns a proposed sequence no. Member
(i) calculates it as
max(Fmax,Pmax) + 1 +i/N
Fmax: Largest seq. no. of any message this member received till yet
Pmax: Largest proposed seq. no. by this member – Sender selects largest sequence no. & sends to all
members in a commit message– Committed messages are delivered to application
programs in order of their final sequence nos.
Causal Ordering
Two message sending events causally related (any possibility of second message influenced by first one) then messages delivered in order to all receivers.
Two message sending events are said to be causally related if they are correlated by the happened-before relation.
Time
R1 R2 R3 S2
m1m2
m2
m1
S1
m1
m3
m3
Happened before relation satisfies following conditions:– If a & b are events in same process & a occurs
before b.– If a is event of sending a message by one
process & b is event of receipt of same message by another process.
– If a→b & b →c then a →c
CBCAST Protocol
1523 1523 1522 1423
Vector of Process A
Vector of Process B
Vector of Process C
Vector of Process D
1 Msg524
Process A sends new msg
Deliver DelayA[1]=C[1]+1 not satisfied
DelayA[3]<=D[3] not satisfied
S[i]=R[i]+1 and S[j]<=R[j] for all j<>i
4.3BSD Unix IPC Mechanism
Network independent
Uses sockets for end point communication.
Two level naming scheme for naming communication end points. Socket has high level string name, low level communication domain dependent name.
Flexible. Provides sockets with different communication semantics.
Supports broadcast facility if underlying network supports it.
IPC Primitives socket() creates a new socket of a certain socket type,
identified by an integer number, and allocates system resources to it.
bind() is typically used on the server side, and associates a socket with a socket address structure, i.e. a specified local port number and IP address.
connect() is used in connection based communication by a client process to request a connection establishment between its socket & socket of server process.
listen() is used on the server side in connection based communication to listen to its socket for client requests.
accept() is used on the server side. It accepts a received incoming attempt to create a new TCP connection from the remote client.
Read/ Write Primitives
Read / write – connection based communication Recvfrom/ sendto - connectionless communication
TCP/IP Socket Calls for Connection
socket()
bind()
listen()
accept()
socket()
connect()
recv()
send()
close()
send()
recv()
close()
Server Client
Blocks until connection from client
Process request
create socket
bind local IP address of socket to port
place socket in passive mode ready to accept requests
take next request from queue (or wait) then forks and create new socket for client connection
Issue connection request to server
Transfer message strings withsend/recv or read/write
Close socket
UDP/IP Socket Calls for Connection
socket()
bind()
recvfrom()
socket()
sendto()
sendto() recvfrom()
close()
Server Client
blocks until datagramreceived from a client
Process request
create socket
bind local IP address of socket to port
Receive senders address and senders datagram
request
Close socket
reply
specify senders address and send datagram