10
Interprocess communications in the AN/BSY-2 distributed computer system: a case study David Andrews a, * , Paul Austin b , Peter Costello c , David LeVan c a Department of Electrical Engineering and Computer Science, University of Kansas, 415 Snow Hall, Lawrence, KS 66045, USA b Xerox Corporation, Rochester, NY, USA c Lockheed Martin Corporation, EP-7 RM 121, Electronics Park, Liverpool, NY 13221, USA Received 1 March 2001; received in revised form 1 July 2001; accepted 1 September 2001 Abstract This paper presents a case study of the design and implementation of the interprocess communications facility developed for the AN/BSY-2 distributed computer system, the computer system for the Seawolf submarine. The interprocess communications facility was identified as a critical design challenge for the AN/BSY-2 system, as the system incorporated new component and network technology along with new run time system services as well as application programs. The requirements specified for the interprocess communications included aggressive performance, as well as functional capabilities that had not been previously fielded. The AN/ BSY-2 computer system is comprised of over 100 processors interconnected in multiple fault tolerant fiber optic rings. First, a description of the AN/BSY-2 distributed architecture is presented. The message passing semantics are then presented. A key feature of the IPC facility is its support for both synchronous and asynchronous communications based on logical addressing. Logical addressing within the AN/BSY-2 system supports point-to-point as well as group communications, and also supports the fault tolerant requirements of the system. The hardware developed to support fast real time messaging, and support fault tolerance is discussed. Finally, the low level semantics of a message transfer through the system is outlined. Ó 2002 Elsevier Science Inc. All rights reserved. 1. Introduction This paper presents a case study of the design and implementation of the interprocess communications fa- cility developed for the AN/BSY-2 system, the data and signal processing computer system for the Seawolf 1; 2 submarine. The AN/BSY-2 system, deployed in 1997, represented a major advance in embedded systems de- sign, incorporating functional capabilities typically not included in a closed embedded system. A block diagram of the system is shown in Fig. 1. The AN/BSY-2 system provides nearly 100 processing nodes interconnected on a fiber optic redundant ring network, high speed parallel and serial communications channels from external in- terfaces, communication channels to display consoles, stand alone processors, and SCSI interfaces. In addition to meeting timeliness requirements, the system includes additional capabilities to support autonomous opera- tion, dynamic system resource management, and fault detection and reconfiguration. The AN/BSY-2 system represents one of the first large distributed real time multiprocessing systems using the message passing par- adigm, and is the largest real time system ever success- fully fielded. 3 Due to its size, complexity, and mission criticality, the AN/BSY-2 system has also been used to study the effect of Ada coding styles on execution per- formance. 4 While the AN/BSY-2 system was being developed in the late 1980s, standards for several open source mes- sage passing interfaces for non-real time systems were being defined (MPI, 1994; Saphir, 1993). These inter- faces sought to make use of the most attractive features The Journal of Systems and Software 61 (2002) 233–242 www.elsevier.com/locate/jss * Corresponding author. Tel.: +1-785-864-7743. E-mail address: [email protected] (D. Andrews). 1 http://www.dote.osd.mil/reports/FY97/navy/97ssn21.html 2 http://www.naval-technology.com/projects/seawolf/index.html 3 http://www.lockheedmartin.com/files3/lmtoday/9708/seawolf.html 4 http://www.sei.cmu.edu/publications/documents/92.reports/92.tr. 032.html 0164-1212/02/$ - see front matter Ó 2002 Elsevier Science Inc. All rights reserved. PII:S0164-1212(01)00151-0

Interprocess communications in the AN/BSY-2 distributed computer system: a case study

Embed Size (px)

Citation preview

Page 1: Interprocess communications in the AN/BSY-2 distributed computer system: a case study

Interprocess communications in the AN/BSY-2 distributedcomputer system: a case study

David Andrews a,*, Paul Austin b, Peter Costello c, David LeVan c

a Department of Electrical Engineering and Computer Science, University of Kansas, 415 Snow Hall, Lawrence, KS 66045, USAb Xerox Corporation, Rochester, NY, USA

c Lockheed Martin Corporation, EP-7 RM 121, Electronics Park, Liverpool, NY 13221, USA

Received 1 March 2001; received in revised form 1 July 2001; accepted 1 September 2001

Abstract

This paper presents a case study of the design and implementation of the interprocess communications facility developed for the

AN/BSY-2 distributed computer system, the computer system for the Seawolf submarine. The interprocess communications facility

was identified as a critical design challenge for the AN/BSY-2 system, as the system incorporated new component and network

technology along with new run time system services as well as application programs. The requirements specified for the interprocess

communications included aggressive performance, as well as functional capabilities that had not been previously fielded. The AN/

BSY-2 computer system is comprised of over 100 processors interconnected in multiple fault tolerant fiber optic rings. First, a

description of the AN/BSY-2 distributed architecture is presented. The message passing semantics are then presented. A key feature

of the IPC facility is its support for both synchronous and asynchronous communications based on logical addressing. Logical

addressing within the AN/BSY-2 system supports point-to-point as well as group communications, and also supports the fault

tolerant requirements of the system. The hardware developed to support fast real time messaging, and support fault tolerance is

discussed. Finally, the low level semantics of a message transfer through the system is outlined. � 2002 Elsevier Science Inc. All

rights reserved.

1. Introduction

This paper presents a case study of the design andimplementation of the interprocess communications fa-cility developed for the AN/BSY-2 system, the data andsignal processing computer system for the Seawolf 1; 2

submarine. The AN/BSY-2 system, deployed in 1997,represented a major advance in embedded systems de-sign, incorporating functional capabilities typically notincluded in a closed embedded system. A block diagramof the system is shown in Fig. 1. The AN/BSY-2 systemprovides nearly 100 processing nodes interconnected ona fiber optic redundant ring network, high speed paralleland serial communications channels from external in-terfaces, communication channels to display consoles,stand alone processors, and SCSI interfaces. In addition

to meeting timeliness requirements, the system includesadditional capabilities to support autonomous opera-tion, dynamic system resource management, and faultdetection and reconfiguration. The AN/BSY-2 systemrepresents one of the first large distributed real timemultiprocessing systems using the message passing par-adigm, and is the largest real time system ever success-fully fielded. 3 Due to its size, complexity, and missioncriticality, the AN/BSY-2 system has also been used tostudy the effect of Ada coding styles on execution per-formance. 4

While the AN/BSY-2 system was being developed inthe late 1980s, standards for several open source mes-sage passing interfaces for non-real time systems werebeing defined (MPI, 1994; Saphir, 1993). These inter-faces sought to make use of the most attractive features

The Journal of Systems and Software 61 (2002) 233–242

www.elsevier.com/locate/jss

*Corresponding author. Tel.: +1-785-864-7743.

E-mail address: [email protected] (D. Andrews).1 http://www.dote.osd.mil/reports/FY97/navy/97ssn21.html2 http://www.naval-technology.com/projects/seawolf/index.html

3 http://www.lockheedmartin.com/files3/lmtoday/9708/seawolf.html4 http://www.sei.cmu.edu/publications/documents/92.reports/92.tr.

032.html

0164-1212/02/$ - see front matter � 2002 Elsevier Science Inc. All rights reserved.

PII: S0164-1212 (01 )00151-0

Page 2: Interprocess communications in the AN/BSY-2 distributed computer system: a case study

of previously existing message passing systems. As anexample, the MPI standard was strongly influenced bythe work at IBM T.J. Watson Research Center (Balaet al., 1992; Purushotham et al., 1994), Intel’s NX/2(Pierce, 1988), Express (Parasoft Corporation, 1992),nCUBE’s Vertex (nCUBE Corporation, 1990), p4(Butler and Lusk, 1992), and PARAMACS (Bomansand Hemple, 1990). Standard libraries based on themessage passing paradigm have also been developed forspecific applications (Gupta and Banerjee, 1992). Thelargest portions of the early work in developing opensource message passing interfaces and standards werenot specifically targeted at real time systems. More re-cently, a working group has been developing MPI/RT(MPI/RT, 2000), a real time version of MPI. As imple-mentation of message passing facilities became moreefficient, its popularity as a scalable communicationsmodel continued to grow. The message passing modelhas now been adopted in a wide spectrum of applica-tion domains, including the newly emerging domain ofmicro-electrical–mechanical (MEM’s) based next gen-eration networked sensor systems (Hill et al., 2000).The message passing facility was identified by the

Navy at the beginning of the program as a criticalcomponent within the system due to aggressive perfor-mance and functional requirements that could not bemet by any commercially available messaging softwaresystem. Developing the message passing facility andverifying that the system met all requirements weremade more difficult as the AN/BSY-2 system was basedon a custom hardware platform necessary to meet theoverall system requirements. Further, the message pass-

ing facility would also be required by application pro-grammers to support development of the 4 millionsource lines of Ada that would eventually run in theAN/BSY-2 system. Due to this need for simultaneousdevelopment of the hardware along with the run timesystem software, a scaled prototype system composed ofcommercially available components that functionallyemulated the AN/BSY-2 system was specified for sup-porting functional development. The prototype system,along with the run time software, could then be used forprototyping and functional integration of applicationprograms. Although this approach mitigated risk byproviding a convenient development platform, finalintegration and verification of requirements would beperformed on the actual hardware once the system wasfielded.

2. Message passing interface overview

One of the first system challenges was to define therequirements for an augmentation to the commercialadopted Ada multitasking run time executive to sup-port operation in a distributed, real time environment.The augmentation was defined to support asynchronousoperation and co-ordination of distributed programtasks, and control of common resources distributedthroughout the system. The augmentation also includedthe fault detection and recovery requirements of thesystem. The augmentation, combined with additionalsystem resource management software, formed a systemlevel middleware layer.

Fig. 1. BSY-2 system network topology.

234 D. Andrews et al. / The Journal of Systems and Software 61 (2002) 233–242

Page 3: Interprocess communications in the AN/BSY-2 distributed computer system: a case study

The interprocess communication (IPC) facility wasdefined within the augmentation to support fault de-tection, reconfiguration, and routing of messages acrossmultiple communications channel media (serial links,parallel links, fiber optic links), within the context of realtime timeliness constraints. These requirements, typicalof an embedded real time system, provided challengesfor defining the operational semantics of the IPC facil-ity. Functionally, the IPC facility would support the dy-namic bandwidth requirements of the system, promoteplatform independence, component based software de-velopment, and support system reconfiguration in thepresence of faults. The functionality of the messagepassing facility also included support for developmentof the multiple applications running asynchronouslythroughout the system. With over 500 software engi-neers and 4 million Ada source lines, it was critical thatthe operating system provide a semantically consistentAPI for developers running on the prototype and ulti-mately the fielded system. The asynchronous nature ofthe system required additional builtin functionality fornon-invasive run time monitoring, and post-mortemanalysis. The semantics of the API also provided accessto additional system state information useful for systemintegration.The API provided a virtual channel abstraction,

which hid network specific protocols, physical messagerouting, physical locations of disks, workstations, andprocessor nodes from the user. Fig. 2 shows applicationprograms forming virtual connections using the messagepassing interface. A program running on a particularprocessor as shown in Fig. 1 may communicate witha program running on any other processor, requiringtransmission of the message across multiple communi-cations channels, each with their own specific device

driver protocols. The message passing interface containsno information on these locations, and IPC guaranteesdelivery of the message. This separation of networkspecific information was necessary in order to supportthe requirements of the system outlined above. Theadditional functionality required in order to support avirtual programming model is discussed below.

2.1. Logical addressing

Programs open logical channels to send, and logicalports to receive messages. The logical channels and portsare opened by calls from the user program to the IPCfacility. A program may open and close individualchannels and ports at any time during the program ex-ecution. Group opens are also provided to minimize theoverhead of opening multiple channels and ports se-quentially. The IPC facility registers each open requeston the program’s behalf with a global system resourcemanager. The global resource manager maintains arouting table that maps logical IDs to physical nodeswithin the system. Appropriate information is passedback from the global system resource manager to theIPC facility on all nodes requiring updates based on theaddition or deletion of a node for a logical ID. Thisinformation is not required by the application program,and is maintained by the IPC facility transparent to theuser.Basing the channel and port addresses on logical

identifiers provides several advantages to the system.First, group operations are easily enabled. Programsmay be dynamically included or excluded in a logicalgroup by opening or closing a port on the appropriatelogical channel. Second and critically important, thelogical channels support dynamic reconfiguration in the

Fig. 2. Abstract programming model.

D. Andrews et al. / The Journal of Systems and Software 61 (2002) 233–242 235

Page 4: Interprocess communications in the AN/BSY-2 distributed computer system: a case study

presence of faults. The system resource manager willrelocate to a healthy node, programs unreachable dueto faults or failures. Healthy programs may continuesending and receiving on the logical channel while therelocation is being performed. The IPC facility, networkand system resource managers provide this servicetransparently to the applications programs. Logical IDsalso support application debugging and integration, re-quiring no change to application code for operation oneither the test bed system or real system. The base sendAPI is shown below. Variants to the base call are alsoprovided.

The channel ID passed into the send is the logical ID. Inaddition to supporting the fault relocation requirementsof the system, transfers based on logical IDs also pro-vide support for platform independent point-to-pointand group transfers. This versatility is shown in Fig. 3where the mapping of logical ID L1 as a port to receivemessages in program P2 is transparent to program P1.The physical routing between any programs such as P1to P2 in the example below may occur over multiplecommunications media. If program P2 fails or termi-nates, the IPC facility will update all programs sendingand receiving on the affected channels.

2.2. Collective/group operations

Logical channels define the collective and groupoperations throughout the system. A logical ID willsupport multiple senders and a single receiver, multiplereceivers and a single sender, or multiple senders andmultiple receivers. This collective and group operationcapability supports user level fault tolerance as well as

reducing the processing time required by an applicationprogram to send the same message to multiple receivers.This capability also provides the system resource man-ager greater flexibility in routing without change toactual application programs.

2.3. Synchronous and asynchronous service

All message sends and receives are by default, asyn-chronous operations in the system. This default caseallows user programs to initiate message transfers andcontinue processing while the operating system andnetwork are transferring the message. The asynchro-nous transfer format minimizes the overhead penalty ofsending and receiving messages, allowing the user pro-gram to continue operations on user functions whileperiodically checking status. For applications that per-form large numbers of multiple asynchronous sends,the user defines, and then passes through the send call asemaphore. As the status of each send is returned andupdated, the IPC facility updates the semaphore. In thisfashion, the application program needs only check thesingle semaphore, and may elect to check the individualstatus for any number of completed events desired. Thissimple protocol reduces the status ‘‘polling’’ time forthese types of applications.Synchronous transmissions are synthesized by exe-

cuting a separate suspend function after executing thesend or receive. When suspend is executed, the operatingsystem suspends further execution of the Ada task untila transfer complete status has been returned. When thetask is suspended, the operating system initiates execu-tion of the next program in the ready to run queue, asshown in Fig. 4.

2.4. Virtual channel/datagram protocols

Both datagram and virtual channel protocols aresupported by the IPC facility. Message transfers usingthe datagram protocol in the BSY-2 system are initiatedby the IPC facility on the sending node. The IPC facilityinitiates the transfer on behalf of the sending application

SEND(Channel_ID: in CHANNEL_PTR_TYPE;MSG_SIZE: in MSG_SIZE_TYPE;Buffer_Ptr: in BUFFER_PTR_TYPE)

return TX_PTR_TYPE.

Fig. 3. Logical IDs and programs. Fig. 4. Asynchronous and synchronous message transmissions.

236 D. Andrews et al. / The Journal of Systems and Software 61 (2002) 233–242

Page 5: Interprocess communications in the AN/BSY-2 distributed computer system: a case study

by sending the message to the network processor node.At this point no allocation of buffers or intermediatehardware resources has occurred. While the applicationprogram continues execution, the message is transferredthroughout the network by dynamically allocating buf-fers and resources for the next intermediate destination.Multiple packets from the same message may exist si-multaneously throughout the network and are dynami-cally allocated resources throughout the network.Virtual channel transfers initiate identically to data-

gram channels. The IPC facility initiates the transfer onbehalf of the sending application by initiating transfer ofthe message to the network processor node. However,before the message is sent from the sending networkprocessor, all intermediate buffers and resources are pre-allocated. Virtual channels are used only in limited ap-plications, and have the potential to cause starvation ofintermediate network resources for large messages.

2.5. Message segmentation

IPC message sizes can range from a few bytes tothe size of the largest buffer in the system (approx.2+Mbytes determined prior to system implementation).The network resources that limit the size of a singletransfer are maintained transparently from the userapplications. The IPC facility accepts messages of allsizes, and segments the messages based on the resourcesavailable for the specific communication hardware re-sources and protocols. The IPC facility maintains statusand guarantees the transfer of all segments for eachmessage.

2.6. Fault tolerant message delivery

The IPC facility must guarantee the integrity ofthe message during transfer. A hierarchical fault detec-tion approach was implemented to provide overlap-ping detection of transfer errors. The hierarchy usesboth builtin hardware support, and additional softwaretechniques for providing the error detection coverage.First, a polynomial based encoding of a cyclical redun-dancy check (CRC) (MPI/RT, 2000) is performed oneach segment and segment executive header transferredbetween the application buffer and the network proces-sor. Before each executive header is transferred, a simplechecksum is included in the header. The final computedCRC is placed in a separate control message sent to thenetwork processor. The network processor transfers thecontrol message containing the CRC from the sendingnode to the receiving node within the message that ini-tiates the transfer. During the subsequent processing ofthe receiving message, the IPC facility in the receivingnode first checks the integrity of the header by verifyingthe checksum, and then initiates the transfer of themessage body into a user buffer. A CRC is again com-

puted on the segment and executive header during thetransfer from the processor node into the applicationmemory, and the IPC facility compares the CRCs todetermine if any errors occurred. The ring networkprovides further builtin coverage at the packet level.Any errors detected are reported back to the sendingIPC facility for subsequent error processing.Message failures can also occur due to other error

conditions throughout the system such as hardwarefaults or application program faults. The IPC facilityaccounts for these types of errors by setting a watchdogtimer for each transfer. If a message is sent and thewatchdog timer expires before a response is received, theIPC facility initiates error processing.In the event a failure is detected, the IPC facility

typically initiates a resend of the segment. The IPC fa-cility provides a default maximum number of 3 retriesper segment, per destination. In the event of a grouptransfer, individual segments may be retransmitted tospecific destinations. For a subset of faults the IPC fa-cility can resend and complete the transfer transparentlyto the application program. However, if the IPC facilityis unable to successfully complete the transfer, an errorstatus is returned to the application program, at whichtime the application program may elect to resend themessage. If elected, the message is only resent to thosedestinations that reported failures. Flow control is alsoimplemented in order to balance aperiodic send andreceive rates. If all resources (receive buffers) are cur-rently being used by a receiving program on a givenlogical channel, the message transfer is suspended untilresources are freed.

3. IPC hardware support

The AN/BSY-2 computer system included builtinhardware support for message passing, including par-allel DMA channels, mailbox interrupts between thenetwork processor and application processor, dualported memory, and fast exception handling. Low levelhardware support was also included for supporting faultlocation and detection, as well as fast efficient processingof the variable message sizes input from real time signalprocessing hardware.Fig. 5 shows the builtin low level hardware sup-

port for IPC. As shown in Fig. 5, each communica-tion channel (fiber optic, serial, and parallel) contains adedicated DMA device to provide fast block transfers ofdata between user buffers and communication devices.The node architecture also includes a private bus be-tween the CPU and dual port memory. This dual busorganization allows IPC to set up the DMA trans-fer across the global bus, and return control back tothe application program. The application program cancontinue instruction execution out of the CPU cache,

D. Andrews et al. / The Journal of Systems and Software 61 (2002) 233–242 237

Page 6: Interprocess communications in the AN/BSY-2 distributed computer system: a case study

and may continue to access data from the dual portmemory using the local bus. The dual bus organizationand multiport memory minimize busy waiting due tobus arbitration during the DMA message transfer.Mailbox interrupts are provided for communicationschannels to invoke IPC. Fig. 6 shows an expanded viewof the interface between the network processor’s multi-port memory, and the global bus. Fig. 6 shows thenetwork processor node containing send and receivework queues, and dual port memory. The dual portmemory is separated into two buffer pools, a send bufferpool and receive buffer pool.

3.1. Send/receive queues

IPC and the network communicate via communica-tions control messages (CCM) placed into the send andreceive work queues. A CCM message is shown in Fig.7. For the send queue, IPC maintains the next free entrypointer, and the network processor maintains the nextto service pointer. IPC places the next CCM entry intothe queue, and sets the valid bit. Setting the valid bitcauses an interrupt to the network processor. The net-work processor can service all sequential requests with avalid bit set. When the processor reaches an entry withthe valid bit cleared, the top of the work queue has beenreached, and the network processor terminates service.For the receive queue, the same protocol exists, but theroles of the network processor and IPC are reversed.The network processor places incoming requests into thequeue, and writes into the IPC mailbox causing an in-terrupt to IPC. IPC then checks the receive queue for thenext valid entry. Each work queue is maintained as a

Fig. 5. Processor node architecture.

Fig. 6. IPC-network interface. Fig. 7. CCM and executive header layout.

238 D. Andrews et al. / The Journal of Systems and Software 61 (2002) 233–242

Page 7: Interprocess communications in the AN/BSY-2 distributed computer system: a case study

circular buffer. This protocol allows both the networkprocessor and IPC to simultaneously operate out of thesame queue. At this level, both the IPC facility and thenetwork processor pull work requests from the queue ona first come first serve basis, implying no sorting or ar-ranging of work requests based on priorities. Both theIPC facility and the network processor may then chooseto insert requests into local scheduling queues based onpriorities. This approach simplified the implementationof the interface work queues.

4. IPC implementation

The critical design features of the IPC facility in-cluded the organization of run time data structures tosupport the required functionality, efficient run timeallocation/deallocation of these data structures, imple-mentation of fast device drivers, and utilization of lowlevel hardware resources supporting transfer of the data.Designing and implementing the data structures was acritical step in the overall design of the IPC facility. Thereal time requirements of the system placed hard timingconstraints on allowable overhead processing timefor allocation/deallocation of the data structures, andsearching and updating the status in the data structures.Where possible, data structures were pre-allocated inpools to minimize run time overhead, organized tominimize search and update times, and quick associa-tion fields were included for accessing particular fieldswithin more complicated and hierarchical structures.The organization of the data structures is discussedbelow.

4.1. Data structure design

Applications first execute an open_channel commandthat registers the program with the system resourcemanager. Information is passed back to IPC from thesystem resource manager during the execution of theopen_channel command, causing the creation of a com-munications control block (CCB) and request controlblock (RCB) as shown in Figs. 8 and 9, respectively.The AN/BSY-2 system requirements included multi-

destination sends, and multiple messages in flight on thesame logical ID. The segments of a particular messageare routed independently through the network based ona dynamic routing protocol, and arrive at the destina-tion in no particular order. This asynchrony requires theIPC facility to properly reconstruct the original messageon the receiving node.For each send operation performed by the applica-

tion program, a free RCB is popped from the stack andassociated with the single invocation. In the CCB shownabove, 31 simultaneous messages can be sent asyn-chronously on the same logical channel. After a partic-

ular send has been completed, the RCB associated withthe send is freed by placing the pointer back on the freestack. The organization of an RCB is shown in Fig. 9.The RCB data structure keeps the updated status of

the transfer. For each message sent over a particularcommunication medium, the RCB also keeps track ofthe number of segments sent in the message, and thestatus of each segment. The protocol defined for thesystem allows failure of an individual segment only acertain number of times. The RCB keeps a failed tallyfor each segment. When the number of maximum retriesfor failed segments is reached, the transmission to thatparticular destination is flagged as failed. In the case of asingle destination, then the complete transfer is flaggedas failed. However, for multiple destination messages,the message may have completed successfully to otherdestinations. The IPC facility maintains the status ofeach individual destination in the case of a multidesti-nation send, and makes this information available to theapplication programs through a return status. Applica-tion programs can invoke a resend of the message if thestatus indicates a failed destination, causing the IPCfacility to resend only to those destinations that reportedfailure. The message is not resent to destinations thatreported success. This information can also used by thenetwork resource manager in evaluating the systemstatus.

Fig. 9. RCB data structure.

Fig. 8. CCB data structure.

D. Andrews et al. / The Journal of Systems and Software 61 (2002) 233–242 239

Page 8: Interprocess communications in the AN/BSY-2 distributed computer system: a case study

5. IPC low level programming

IPC transfers can be broken into three separatestages. The first stage is initiated by the IPC call fromthe application program to the operating system. In thisfirst stage, the message is broken into segments, andthe individual segments are sent over the appropriatecommunications channel. Fig. 10 shows the transfer ofa message to multiple destinations over the fiber opticring. As shown in Fig. 10, multiple segments of the samemessage can exist in the network processor’s multiportmemory concurrently. Once an individual segment hasbeen transferred from the user buffer into the networkmultiport memory, it may be transferred through thenetwork independently from all other segments. In thecase of multiple destination messages, the individualsegments are transferred once from the applicationssend buffer to the multiport memory. The network per-forms multiple transfers of the same segment to eachdestination specified by IPC.The second stage is the actual transfer of the message

from the sending network processor’s multiport memoryinto the receiving node’s multiport memory. The thirdstage is the transfer of the message segments from thereceiver’s network processor into the receive buffer. Asthe segments arrive, the receiving IPC must reconstructthe message using the segment number information

contained in an executive header included at the top ofeach segment.Each of these three stages can operate asynchro-

nously and concurrently for the multiple segments goingto multiple destinations. IPC processes the transfer usinga program initiated interrupt driven exception routine.Execution of IPC in an interrupt routine allows eachtransfer stage shown in Fig. 10 to occur asynchronously,avoiding time-consuming busy waiting, or constantstatus polling of all segment transfers and status up-dates. A simplified version of the state machine orga-nization of the exception routine is shown in Fig. 11.If the IPC work queue is empty, or all requests have

been serviced and IPC is idle waiting on return status,placing the entry into the work queue initiates the ex-ception processing routine. If the work queue containsrequests in progress, the request is queued. Requestsare generally handed in FIFO order, however, certainhigher priority requests can be inserted into the queueand processed relative to their priority. In the absence ofany priority ordering, queued receive requests will takeprecedence over queued send requests. Control returnsback to the application program after the request isplaced in the queue. This implementation provides fastreturn back to the user program, minimizing the amountof time line taken by IPC for sending or receivingmessages.

Fig. 10. IPC transfer stages.

240 D. Andrews et al. / The Journal of Systems and Software 61 (2002) 233–242

Page 9: Interprocess communications in the AN/BSY-2 distributed computer system: a case study

5.1. Message transfer sequence

The protocol for transferring messages between IPCand the network processor is outlined in Fig. 12. Ini-tially, a segment from the send message buffer is trans-ferred into the network multiport memory (1 in Fig. 12).This transfer is accomplished using the network pro-cessor DMA unit shown in Fig. 9. IPC sets up thetransfer and returns control back to the user. The ap-plication program continues to execute while the DMAtransfer takes place. Once the DMA transfer has beencompleted, the DMA causes an interrupt that againkicks off the IPC exception routine. Next, an executiveheader is DMA’ed into the top seven long word entriesin the multiport memory buffer (2 in Fig. 12). The ex-ecutive header is shown in Fig. 11. The executive headercontains information required by the receiving IPC toreconstruct the message. The executive header doesnot contain any information required by the network,

and is passed through the network to the receiving IPCfacility as part of the message body. The executiveheader forms a virtual communications link at the ses-sion/transport layer shown in Fig. 2.The CRC check is performed on the message and

executive header. The IPC facility places the CRC intoa CCM for that transfer, and writes the CCM into thesend queue (3 in Fig. 12). The IPC facility then updatesthe RCB status indicating that a segment was sent to asingle destination and checks to see if more segments areavailable for transfer, or if more destinations are speci-fied for the segment before leaving the exception routine.For each destination specified in the logical ID, a CCMis transferred into the send work queue. If more seg-ments are available for transfer, the IPC facility sets upthe DMA and exits the routine.The exception routine is next entered when the net-

work processor writes into the IPC mailbox causing aninterrupt. The IPC facility then transfers a CCM fromthe receive queue into the processor (6 in Fig. 12) andchecks the op_code. If the op_code specified an incom-ing message, a DMA transfer of the executive headerfrom multiport memory (7 in Fig. 12) is initiated. Theexecutive header must be transferred first, in order todetermine the destination logical ID, and which segmentof the message is pending in the multiport memory. Thereceive data structures are updated, and a DMA transferis set up transferring the message from multiportmemory into the receive buffer (8 in Fig. 12). When thesegment has been transferred, IPC compares the com-puted CRC with the CRC sent in the CCM. If a dis-crepancy exists in the computed CRCs, the transfer ofthe segment is marked failed, and the status returned tothe sender for retry. If the CRCs match, then a successstatus is returned. In either event, the status of thetransfer is returned back to the sender in a CCM sentfrom the receiver (9 in Fig. 12). The network transpar-ently passes the status back to the sending IPC facility ina CCM (10 in Fig. 12). The sending network processorwrites in to the IPC mailbox causing the IPC facility toenter its exception routine again. The CCM is trans-ferred back into the processor (11 in Fig. 12) wherethe op_code specifies a returned status. If the statusreturned is a success, the IPC facility checks the workqueue for more segments, or messages to send. If no newmessages or segments are pending, control is returnedback to the application program currently executing.

6. Conclusion

This paper presented a case study of the design andimplementation of the real time interprocess communi-cations (IPC) facility developed for the AN/BSY-2 sys-tem. The IPC facility includes a platform independentmessage passing interface that supports the unique

Fig. 11. IPC exception routine operation.

Fig. 12. Message transfer protocol.

D. Andrews et al. / The Journal of Systems and Software 61 (2002) 233–242 241

Page 10: Interprocess communications in the AN/BSY-2 distributed computer system: a case study

requirements of a real time distributed system. In ad-dition to timeliness issues, the system requirements alsoincluded support for fault tolerance. The IPC facilityallows programs to form virtual channels, separating thenetwork specific hardware, and device driver protocolsfrom the application programmer. Group operations arealso conveniently defined on logical IDs allowing pro-grams to dynamically register and depart from anygroup. The low level hardware support designed forsupporting fast message transfers and fault detectionwas also presented. The hardware support includedDMA devices, demand driven interrupts and mailboxes,and dual ported memory. The protocols defined forprocessing incoming and outgoing messages minimizedthe overhead associated processing the requests. Thefirst system was successfully deployed in 1997.

References

Bala, V., Kipnis, S., Rudolph, L., Snir, M., 1992. Designing efficient

scalable, and protable collective communication libraries. Tech-

nical report, IBM T.J. Watson Research Center.

Bangalore, P.V., Doss, N.E., Skjellum, A. MPI++: issues and features.

In: OON_SKI’94.

Bomans, L., Hemple, R., 1990. The Argonne/GMD macros in

FORTRAN for portable parallel programming and their imple-

mentation in the Intel iPSC/2. Parallel Computing 15, 119–132.

Butler, R., Lusk, E., 1992. Users Guide to the p4 programming system.

Technical Report TM-ANL-92/17, Argonne National Labora-

tory.

Document for the Real-Time Message Passing Interface (MPI/RT-

1.0), March 6, 2000.

Gupta, M., Banerjee, P., 1992. A methodology for high-level synthesis

of communication on multicomputers. In: Proceedings of the ACM

International Conference on Supercomputing, Washington DC.

Hill, J., Szewcyzk, R., Woo, A., Hollar, S., Culler, D., Pister, K., 2000.

System architecture directions for networked sensors. In: Pro-

ceedings of the 9th International Conference on Architectural

Support for Programming Languages and Operating Systems,

ASPLOS IX, vol. 35, pp. 93–104.

MPI Users Guide, Draft Proposal 1994.

nCUBE Corporation, NCUBE 2 Programmers Guide r2.0, December

1990.

Parasoft Corporation, Pasadena, CA. Express User’s Guide, version

3.2.5 edition, 1992.

Pierce, P., 1988. The NX/2 operating system. In: Proceedings of the

third conference on Hypercube Concurrent Computers and Appli-

cations. ACM Press, New York, pp. 384–390.

Saphir, W., 1993. Comparison of Communication Libraries: NX,

CMMD, MPI, PVM. Computer Sciences Corporation, NAS User

Seminar, November 30.

Dr. Andrews joined the faculty at the University of Kansas in 2000and is currently an associate professor. Prior to joining the faculty atthe University of Kansas, Dr. Andrews worked for General ElectricCompany, and was on the faculty at the University of Arkansas. Sincebecoming a faculty member at the University of Kansas, Dr. Andrewshas been focusing his research on embedded systems and real timearchitectures. He received his BSEE and MSEE degrees from theUniversity of Missouri-Columbia, and the Ph.D. degree from SyracuseUniversity. His is a senior member of IEEE.

242 D. Andrews et al. / The Journal of Systems and Software 61 (2002) 233–242