35
Interprocess Communication Chapter 4 Distributed Systems, Concepts and Design

Interprocess communication

Embed Size (px)

Citation preview

Page 1: Interprocess communication

Interprocess Communication

Chapter 4 Distributed Systems, Concepts and Design

Page 2: Interprocess communication

Distributed Systems

• “A distributed system is a collection of independent computers that appear to its users as a single system.” (Tanenbaum)

• Distributed systems are therefore built around communication. Actually, it could be argued that computers are used more as communication devices than computational devices.

Page 3: Interprocess communication

Communications

• Because communications are critical to distributed systems, communications protocols tend to be well defined. A key form of communications is interprocess communications, based on low-level message passing over the network.

• Protocols are sets of rules that must be followed to enable standardized communications.

Page 4: Interprocess communication

Overhead• Overhead is a financial term that refers to indirect

costs in a business. For example, a merchant cannot sell you a product for the price that he pays because he has additional costs beyond buying the merchandise such as rent and staff wages. Overhead always puts pressure on profits, so it must be kept to a minimum. Because corporations treat information technology as overhead, overhead is a major concern in this course. Activities that support work rather than doing work in IT are also costs and are referred to as overhead.

Page 5: Interprocess communication

Communications Overhead• In most communication systems, overhead is a key

concern. Overhead activities are background operations that do not directly involve sending and receiving messages. Headers and footers involve sending extra information, so they are overhead. In a phone system, overhead includes time spent setting up and tearing down the circuit path over which a phone call can take place. TCP is like a phone call, since it has to set up, tear down, and manage operations in addition to “talk time.”

Page 6: Interprocess communication

Headers and trailers

• Each level is packaged as data to other levels with a header attached.

Headers Trailer

Message

Note that short messages are mostly overhead while long messagesinvolve a much higher proportion of actual work.

Page 7: Interprocess communication

Normal Operation of TCPFigure 2-4a in Tanenbaum et al

SYN

SYN, ACK(SYN)

ACK(SYN)

request

FIN

ACK(req+FIN)

answer

ACK(FIN)

FIN

1

2

3

4

5

6

7

8

9

Steps 4 and 7 do thecommunication. Allof the rest of the TCP messages are overhead operations.

KEY:SYNchronizeACKnowledgeFINished

Page 8: Interprocess communication

Transactional TCPFigure 2-4b in Tanenbaum et al

SYN, request, FIN

SYN, ACK(FIN), answer, FIN

ACK(FIN)

1

2

3

By sending the message and response with the overhead signals, transactional TCP can speed up throughput and reduce overhead time delays.

Page 9: Interprocess communication

Classroom Exercise• Calculate the percentage improvement in throughput

of Transactional TCP (sending 3 messages instead of 9) under the following assumptions:

• 1) Short packets, dominated by latency of 10 ms.• 2) Ethernet LAN, 10 ms latency, 10Mbps bandwidth,

maximum Ethernet packet size of 1500 bytes.• 3) TCP/IP WAN, 20 ms latency, 500 Mbps bandwidth,

maximum TCP packet size of 64KB. (Latency assumes multiple hops between routers)

• Thought exercise: When is Transactional TCP worthwhile?

Page 10: Interprocess communication

Ethernet Jumbo Frames

• Ethernet Jumbo Frames of 9KB are possible if supported end to end. A 9KB Ethernet frame can hold an 8 KB TCP/IP datagram (NFS standard) plus packet overhead. Ethernet cannot use 64KB packets because it uses CRC for error correction, and CRC has an upper limit of 12KB, which is hard to change. [P. Dykstra]

Page 11: Interprocess communication

Upper Bound of TCP• Dykstra’s article (see References) is a good discussion

of frame (packet, datagram) size.• Dykstra quotes an article by Matt Mathis et al. which

sets this limit on TCP WAN performance:• Throughput <= ~0.7 * MSS / (rtt * sqrt(packet_loss))• MSS – Max Segment Size = Packet size minus TCP

headers• rtt = Round trip time (about 40 ms NYC – LA)• packet_loss = percentage of packets lost (wide

variation, 0.1 % is a typical value.

Page 12: Interprocess communication

Importance of Mathis Formula

• If you examine the formula:• Throughput <= ~0.7 * MSS / (rtt * sqrt(packet_loss))• You will see that throughput is dominated by the

maximum segment size, since the error rate has an inverse square effect on performance. In general, doubling the MSS doubles performance.

• Remember that maximum segment size, packet size, datagram size and frame size all mean approximately the same thing.

Page 13: Interprocess communication

Storing Data• Data stored in digital format is composed of binary

sequences that have a combination of logical and arbitrary meanings attached to them. Most binary formats for numbers are logical, although there are a lot of differences in storage sizes and handling negative numbers and exponents. While it is somewhat logical that 0101 represents 5 as a short integer, it is somewhat less logical that 01000001 represents A and 01100001represents a in the ASCII code or that 00011000 represents A and 00010100 represents a in the EBCDIC code.

Page 14: Interprocess communication

Numeric formats• Some computers store data in memory in different

ways, so that a value of 11110000 might be stored so that the 1111 is in the lowest memory location on one computer and the 0000 on another. The same binary integer would have different meanings as an unsigned integer or a signed integer with two’s complement notation. There are different formats for storing floating point numbers. Computers have different register sizes, making default word sizes of 8, 16, 32, 36 or 64 bits most practical in different CPUs.

Page 15: Interprocess communication

Transferring Data• With different coding schemes, memory storage

order, word sizes and numeric formats, generic attempts to transfer information between systems must carefully define formats for the transferred data and have ways to convert data to the data transfer format and back to another format. Such a scheme must understand the format at both ends of the transaction. The intermediate format is called an External Data Representation (XDR), and a set of commands to accomplish that is called an Interface Definition Language (IDL).

Page 16: Interprocess communication

External Data Representation

• There are three different common approaches to XDR:

• CORBA’s common data representation, which can be used by a variety of languages.

• Java’s object serialization, which can even pass complex objects across a network, but is limited to Java only.

• Extensible Markup Language (XML), which can represent even structured data as ASCII text.

Page 17: Interprocess communication

Marshalling and Unmarshalling

• Converting information to a network transportable form (XDR) following the specifications of an IDL is called marshalling. Converting it back to an application readable format is called unmarshalling.

Page 18: Interprocess communication

Java Object Serialization• Serialization transforms an object into a sequence of

bytes. This allows objects to be saved to files or transferred across a network, and is a key feature of Java. Since objects can have attributes that are also objects, and those objects can have object attributes, serialization allows a very complex structure to be transferred across a network or stored in a file.

• Classes that need to be stored in files or transferred over a network should implement the java.io.serializable interface.

Page 19: Interprocess communication

Reflection

• Java supports reflection—the ability to enquire about the properties of a class, including the names and types of its instance variables. Classes can be created from their names, and a constructor with specified arguments can create a class. Reflection makes serialization and deserialization possible and allows a class to be instantiated by a Java Virtual Machine after transfer across a network.

Page 20: Interprocess communication

The Document is the Object   XML (eXtensible Markup Language)Describes the structure of a documentDefines new tagsSpecifies metadata that lets programs discover

document structure  DOM (Document Object Model)Allows programmatic access to XML structure and content of XML documents  XSL (eXtensible Style Language)The XML version of Style sheets

Page 21: Interprocess communication

What is XML?• XML stands for eXtensible Markup Language.• XML specification defines a syntax and

document organization for data, represented by tag/value pairs.

• XML Elements have data surrounded by matching start and end tags.

• XML Attributes are optional in some start tags and have an identifier with an = sign.

• There is a well defined syntax that can be parsed.

Page 22: Interprocess communication

XML Namespaces

• An XML namespace is a collection of names, identified by a URI reference, which are used in XML documents as element types and attribute names. XML namespaces have internal structure and are not, mathematically speaking, sets.

• The file that identifies the namespace can be specified as an attribute called xmlns like this:xmlns:pers = “http://www.cdk4.net/person

• See http://www.w3.org/XML/ for specifications.

Page 23: Interprocess communication

XML Schemas

• An XML Schema defines the elements and attributes that can be used in a document, how they can be nested, the order and number of the elements, and whether an element is empty or can include text. Default values and types are defined. An example is Coulouris figure 4.12 shown on the next slide.

Page 24: Interprocess communication

Figure 4.12 An XML schema for the Person structure

<xsd:schema xmlns:xsd = URL of XML schema definitions >

<xsd:element name= "person" type ="personType" />

<xsd:complexType name="personType">

<xsd:sequence>

<xsd:element name = "name" type="xs:string"/>

<xsd:element name = "place" type="xs:string"/>

<xsd:element name = "year" type="xs:positiveInteger"/>

</xsd:sequence>

<xsd:attribute name= "id" type = "xs:positiveInteger"/>

</xsd:complexType>

</xsd:schema>

Page 25: Interprocess communication

XML: Structured Data in a Text File

  Spreadsheets, address books, configuration parameters, financial transactions, product catalogs…

  XML defines a set of rules and conventions for designing text formats for such data

  Easy to generate and read by computer  Extensible

Page 26: Interprocess communication

Role of XML

• Applications built on different technologies can communicate via XML.

• New integration tools and integration servers capitalize on emergence of XML as an integration technology.

• Many .NET and J2EE technologies, such as SOAP, XML Web Services, JXTA, XML-RPC, and EJB use or are based on XML.

Page 27: Interprocess communication

Client/Server Communication

• Communication in Client/Server systems uses a variety of well specified request/reply mechanisms with send and receive protocols defined by TCP, RPC, Java RMI, CORBA and other formats.

Page 28: Interprocess communication

Figure 4.14Request-reply communication

Request

ServerClient

doOperation

(wait)

(continuation)

Replymessage

getRequest

execute

method

messageselect object

sendReply

Page 29: Interprocess communication

Message Oriented Communication

• Remote procedure calls and remote object invocation are not always sufficient or appropriate for all communications in distributed systems. They tend to be optimized for immediate connections between two systems, and may be inadequate for operations that persist over time or involve multiple connections requiring synchronization. For this, message oriented protocols such as mail protocols have been developed.

Page 30: Interprocess communication

Persistent Communication

• In persistent communication, a message may be stored until it can be passed on to a recipient. Compare this to the distinction between a simple telephone and an answering machine. Without the answering machine, you must be present when the phone rings to get a message.

Page 31: Interprocess communication

Message Oriented Middleware

• In MOM, applications communicate by inserting messages in specific queues. As the queues are processed, messages are forwarded to other computers. There may be several intermediates. At the destination queue, individual messages may be accepted and acted upon, and responses sent back through the system. Only passing to the receiver’s queue is guaranteed by the system. Accepting, reading or acting upon the message is up to the receiver.

Page 32: Interprocess communication

MOM

• Messages can contain any data, but must be properly addressed. Usually, there is a systemwide unique name for the receiving queue. This allows a very simple interface. Queues are managed by queue managers, which may also act as relays to forward messages to other queues. Messages of different types can be interconnected by specialized applications called message brokers, which apply a set of rules to convert a message to a different type.

Page 33: Interprocess communication

IBM’s MQ Series

• IBM’s MQ Series is a popular mainframe message oriented middleware system that has also been integrated into IBM’s WebSphere Web Server.

• Details can be found at the IBM Web Site.

• The text gives a brief summary of the functionality and operation of MQ Series.

Page 34: Interprocess communication

Data Streams

• There are a variety of approaches to stream oriented communications, which consist of ways to pass timing dependent information over persistent connections that are established for the purpose. The sockets exercise gives a good practical understanding of TCP streams. Other mechanisms include pipes and compiler based stream libraries.

Page 35: Interprocess communication

References

• George Coularis, Jean Dollimore and Tim Kindberg, Distributed Systems, Concepts and Design, Addison Wesley, Fourth Edition, 2005

• Figures from the Coulouris text are from the instructor’s guide and are copyrighted by Pearson Education 2005

• Andrew Tanenbaum and Martin van Steen, Distributed Systems, Principles and Paradigms, Prentice Hall, 2002

• Phil Dykstra, Gigabit Ethernet Jumbo Frames http://sd.wareonearth.com/~phil/jumbo.html