Adaptable Software for Supercomputers

Design Adaptability

Adaptable Software for

Supercomputers

Thomas Schwederski and Howard Jay SiegelPurdue University

Software used tocontrol and programreconfigurablesupersystems mustefficiently exploit thehardware flexibilityavailable. If not, thesystem does not fulfillits potential.

S teady increases in computer hard-ware performance show no sign ofslowing. Today's supercomputers

such as the Cray-, I Cyber 205,2 andMPP (Massively Parallel Processor)3'4are capable of a few million hundred-floating-point operations per second.However, even these computers do notalways satisfy the speed requirements ofthe scientific and defense communities.Consequently, researchers are now explor-ing new supersystem architectures, such asdata flow computers and reconfigurablecomputers that restructure the organiza-tion of their hardware to adapt to com-putational needs. If a supersystem can bereconfigured, it can more likely efficientlyexecute tasks that previously required a setof dedicated systems. Reconfiguration isespecially important if the programs exe-cuted by the supersystem have widelydiffering computational requirements.Computations needed to control some so-phisticated weapon systems are of thisvariety. 5The utilization and performance of any

supersystem depends strongly on the soft-ware available for it. Necessary softwareincludes system software, such as com-pilers and operating systems, and applica-tion software. Designing the system soft-ware to make efficient use of complexsupercomputers is difficult. It is furthercomplicated ifthe supercomputer is recon-fi'gurable. The application programs forsuch systems are typically very large andcomplex, such as those for mission-criticalmilitary tasks. Thus, two problems facingsystem designers are

(1) the development of system routinesto efficiently control system recon-figuration, and

(2) the writing of complex applicationprograms.

Software used to control and programreconfigurable supersystems must effi-ciently exploit the hardware flexibilityavailable. If not, then the system does notfulfill its potential. Frequently in the past,customized application software packageswere developed for every new computer,often resulting in programs that could beexecuted on only one type of machine.This machine dependence leads to dupli-cation of effort since the same algorithmshave to be coded repeatedly, thus increas-ing software cost. It is therefore importantto look for solutions to these two problemsin order to make the efficient use of recon-figurable supersystems practical. We con-sider possible solutions to these problems.

Adaptable software

Adaptable software offers such a solu-tion. It encompasses all software thatmanages reconfigurable computer systemsand all software itself adaptable to variouscomputers. As noted in the Guest Editors'Introduction, compiler and operating sys-tem software used to manage a reconfigu-rable computer will be termed reconfigu-ration software. Machine-independentsoftware is termed retargetable software.

Reconfiguration software is essentialfor supersystems with reconfigurable ar-

0018-9162/86/0200-0040$o1.OO © 1986 IEEE40 COMPUTER

chitecture. The actual reconfiguration ofthe hardware must be handled by theoperating system of a reconfigurable sys-tem to shield the user from details of thehardware implementation and to avoid er-roneous hardware settings. The operatingsystem can be directed to perform recon-figuration either explicitly by the user orimplicitly by software tools that analyze agiven problem automatically, determinethe optimum hardware configuration forthe problem, and supervise program exe-cution. The explicit approach gives theprogrammer flexibility in choosing the sys-tem configuration needed for a particularalgorithm, while the implicit approachreduces programming effort and speedsup program development. Kartashev andKartashev describe a system that im-plements the implicit approach for For-tran programs. 6 However, currently noautomatic approach exists for all levels ofalgorithm design and execution. In fact,an exclusively automatic approach maynot be desirable for all computer architec-tures and problems. In some cases explicitreconfiguration enhances efficiency. Forexample, many highly efficient algorithmsfor specific computational tasks andspecific computer architectures have beendeveloped in the past. The efficiency andspeed of such hand-honed algorithms can-not be rivaled by automatically generatedprograms. For such cases, it is desirablefor an adaptable system to assume the ar-chitecture for which such an algorithmwas designed. To do this, it must providethe user with the tools to easily execute thisalgorithm on the reconfigurable system.The life cycle of software can be pro-

longed if it is made retargetable. Only pro-grams that do not use machine-specific in-structions will be retargetable, since anymachine-specific instruction needs to bechanged when the program is ported to adifferent computer. Thus, only programswritten in high-level languages will beretargetable.

Similar problems arise for the trans-portation of programs that use explicitoperating system calls. If a program usesexplicit operating system calls (which areinherently machine dependent), it is notretargetable. Furthermore, if the operat-ing system is changed or exchanged, theprogram will have to be rewritten or atleast modified to still execute on the samemachine. Machine-independent softwarecan survive hardware or operating system

changes more easily. The elimination ofchanges due to hardware or operating sys-tem modifications automatically decreasesthe overall cost of a software system. If aprogram is portable, it can be used on awide variety of computers, which elimin-ates duplicate programming efforts andfurther reduces cost. Since a machine-independent program is expected to re-main useful longer, it is feasible to putmore development effort into a program.This results in better, more efficient pro-grams.We have introduced two different types

of adaptable software: retargetable andreconfiguration software. Most recon-figuration software will not be retarget-able since it has to be tailored towardsmachine-specific capabilities. In somecases, components of a reconfigurationsoftware system could be retargetable,such as a reconfiguration control strategycoded in a high level language. Thus,reconfiguration and retargetable softwaremay overlap.

Reconfigurationsoftware

The reconfigurability of a supercom-puter can take many different forms. Sucha computer could consist of a number ofprocessors with a certain numeric preci-sion. It might be possible to combineseveral processors to achieve higher preci-sion, such as the reconfigurable vari-structure array processor, 7 DCA, 8'9 andTRAC. 10 A supersystem could be switchedbetween the single instruction stream-multiple data stream, or SIMD, and mul-tiple instruction stream-multiple datastream, or MIMD, modes of operation. I I

It could also be partitionable and able toform subgroups of processors. The sub-groups could operate in SIMD or MIMDmode, such as PASM. 12 A supercom-puter might have even more states, such asarray processor, pipeline computer, andmultiprocessor, such as DCA 13 and Red-di's and Feustel's machine. 14 The recon-figuration capability might be static, sothat the system has to be stopped in orderto perform reconfiguration. Alternatively,it could be dynamic, so that architecturalstates could be switched during programexecution. Clearly, static reconfigurationhas limited use in mission-critical super-

systems since these must perform theirtasks at high speed and without interrup-tions.

The processors of a supersystem mustcommunicate with each other to sharedata and results. This communication canbe handled through shared memory orthrough processor-to-processor intercon-nection hardware. 15 In shared-memorysystems the processors may be connectedto the shared memories through an inter-connection network. Examples includeC.mmp (which used a crossbar network), 16CHoPP (which proposed a hypercubestructure), 17 and the NYU Ultracomputer(which will use a multistage network). 18For processor-processor communication avariety of types of networks may also beemployed. Hardware interconnectionscan be fixed as the non-edge connectionsin the MPP, 3,4 where processors commu-nicate only with their four nearest neigh-bors. If a processor needs to talk to a pro-cessor not its nearest neighbor, multipledata transfers are necessary. Reconfigura-ble processor-to-processor interconnectionnetworks can allow communication amonga large subset or all of the processors, as inPASM, 12 where processors use a mul-tistage cube network. Software runningthese systems must make efficient use ofthe interconnection capabilities.

Thus, to take optimal advantage of thesystem, reconfiguration software needs toknow about the hardware's capabilities. Itmust therefore be tailored towards thehardware on which it will run. Ifthe recon-figuration software is retargetable, themachine-specific hardware descriptionwould have to be entered at compile or runtime, since it could not be part of theretargetable software.

Reconfiguration of a supersystem can behandled by the programmer through ex-plicit use of reconfiguring instructions, orimplicitly by system software which auto-matically generates reconfiguring instruc-tions without the need of programmer in-tervention. Hybrids of these methods arealso possible.

Explicit reconfiguration. If instructionsfor reconfiguring the system are availableto the user, the programmer can take ex-plicit advantage of various architecturalstates. With careful program analysis andgood coding, this results in programs gain-ing the most advantage from the recon-figuration capability. Programs for each

February 1986 41

Figure 1. Flowchart for a parallel program using Fork and Join statements.

architectural state like array, multi-processor, pipeline, etc., must be writtenin a language suited for this state, and thelanguage must include constructs for re-configuration. High level languages forMIMD and SIMD machines give good ex-amples of the specific requirements. Con-sider an MIMD computer. If the numberof processors for the given task is fixed, astandard assembly or high level languagesuch as Pascal, C, or Fortran can be usedto program the task. The programmer hasto decide which processor handles whatparticular task, and write the appropriateprograms. The programs are loaded intothe processors needed for the task. Theoperating system of the machine starts thetask and coordinates activities. Many ap-plications, especially in image processing,require a set of processors that all executethe same program. An example is an imageprocessing algorithm in which each pro-cessor finds features in a part ofthe image.In this case, each processor has its own setof data, but all processors execute thesame program. Only one program has tobe coded, independent of the number ofprocessors. Depending on the task, this

may involve just MIMD operations, justSIMD operations, or both. 19

If the number of processors varies dur-ing task execution, the programming lan-guage has to be expanded and must in-clude special statements to allocate anddeallocate processors. Conway20 pro-poses the use of Fork and Join instruc-tions. Whenever a Fork is encountered, anew process is created that proceeds in-dependently from the original process.The new process can run on another pro-cessor. If no processor is available, thenew process can be queued until a pro-cessor becomes available. If multipro-gramming capability is available, such asCHoPP, 17 two or more processes can runon the same processor in a timeslicedfashion. Conway proposes three kinds ofFork: Fork A, J, N; Fork A, J; and ForkA. In this implementation, a counter keepstrack of the number of concurrent pro-cesses. Fork A generates a new process;the old process continues to execute the in-struction following the Fork, the new pro-cess starts at instruction A. The Fork A, J,N instruction sets a counter in memorylocation J to the value N, then executes

Fork A. The programmer must match thenumber of Forks in his program with thecount specified in the Fork A, J, N, sincethe Fork A alone does not affect thecounter. Fork A, J increments counter Jby one and then executes a Fork A. Thisinstruction is needed if the decision to forkis made at run time. The Join statementmerges concurrent instruction streams.When a Join J is encountered, the counterat location J is decremented. The proces-sor that decrements the counter to zeroexecutes the statements following theJoin. All other processors are released andbecome available for other computations.

Figure I illustrates the use of the threedifferent Fork and the Join statements.Process I executes a Fork A, 100, 3, whichsets the counter in memory location 100 to3 and starts a concurrent process (Process2). Then Process I executes a Fork B, 100,which increments the counter at location100 by one and starts another process,Process 3. Meanwhile, Process 2 executesa Fork C, generating a fourth process,Process 4. Thus, four concurrent pro-cesses are now active. When a process en-counters a Join 100, it decrements thecounter. The process continues if thecounter reaches zero; otherwise, it ceases.Fork and Join are examples oftwo explicitreconfiguration commands.

Dijkstra21 proposed a more structuredway to express reconfiguration for mul-tiprocessor programs: parbegin Sl;S2; ....

Sn; defines SI through Sn as statementswhich can be executed in parallel. The par-allel execution ends when a parend isfound. Only after all processors executingthe parallel statements have found theirparend are statements following theparend executed. A program that needs torun two processes in parallel might havethe following structure:

parbeginprocessl : begin

statements for process Iend

process2: beginstatements for process 2end

parend

Both parbegin-parend and Fork-Joinblocks could be conditional. In otherwords, whether or not the block would beentered could depend on the result of an if

COMPUTER42

statement. This results in a dynamic recon-figuration capability.

Parallelism of an SIMD computer can-not be expressed explicitly in any conven-tional programming language. One typeof SIMD computer consists of an array ofprocessor/memorJ, pairs called processingelements, or PEs, which do the actualcomputations, and a control unit. Thecontrol unit executes all program flow in-structions and broadcasts data processinginstructions to the PEs. All PEs executethe same instruction at the same time, buton different data, e.g., on different sec-tions of an image. The control unit alsodetermines which of the PEs execute theinstruction and thereby varies the extent ofparallelism of the computer. An intercon-nection network provides communicationbetween the PEs. The network can befixed as in theMPP or the Illiac IV, 22 or itcan be reconfigurable as in PASM. 12 Ifthe network is reconfigurable, instructionsthat reconfigure the network in the desiredfashion must be provided in the program-ming language. Constructs to enable anddisable PEs need to be available as well.Extensions of existing high level languageshave been proposed. 23-30 Such extensionscan be machine dependent or machine in-dependent.An example of a machine-dependent

language is Glypnir,24 specifically devel-oped for the Illiac IV computer and basedon Algol 60. Illiac IV was an SIMDmachine with 64 PEs, designed in the1960's. Its fixed interconnection networkconnects PE i to PEs i + 1, i - 1, i + 8, i - 8,all modulo 64, providing a mesh-type inter-connection. The system cannot be parti-tioned into smaller machines; the only re-configuration capability involves enablingand disabling PEs. In Glypnir, variablesfor PEs and for the control unit can bedefined. A PE variable has an element inevery PE and is called a super word, orsword. Therefore, a sword always has 64elements. For example, the statement

PE REAL A,B

declares the variables A and B to beswords.

PE REAL C(20)

declares C to be an array of 20 elements ineach of the PEs. By using Boolean expres-sions, PEs can be selectively enabled or dis-

abled. Consider, for example, the state-ment

if < Boolean expression> then< statement >

else< statement2>

The Boolean expression will be evaluatedin all processors. The processors finding atrue value execute statement 1. The othersdo not participate in the instruction streambroadcast for statement 1, but execute onlystatement2.

Glypnir's architectural dependence onthe Iliac IV severely limits its use. An at-tempt for machine independence wasmade by the developers of Actus, 26 a lan-guage based on Pascal. Actus was designedfor SIMD machines of arbitrary size. Avariable can be declared parallel. The ex-tent of parallelism must be specified foreach parallel variable. In the range specifi-cation of a parallel variable declaration, acolon denotes that a dimension is to be ac-cessed in parallel. For example,

vararrayl: aray [l:m,l..n] of integer

declares an array of m by n elements; melements can be accessed at a time. Theunderlying operating system takes care ofreconfiguring the computer into a machinethat can handle the specified parallelvariable. if, case, while, and for state-ments are expanded for use with parallelvariables. Alignment operators are alsoprovided. For example, a shift operatormoves data within the declared range ofparallelism, a rotate shifts data circularlywith respect to the extent of parallelism.shift and rotate will be implemented byusing the processor interconnection net-work.

Parallel-C is a proposed programminglanguage based on C that handles both theMIMD and SIMD modes of operation. 30The SIMD features include declarations ofparallel variables and functions; an index-ing scheme to access and manipulate par-allel variables; expressions involving par-allel variables; extended control structuresusing parallel variables as control vari-ables; and functions for PE allocation,data alignment, and I/O. The languagesupports simple parallel data types likescalars and arrays as well as structureddata. For example,

parallel [NJ int a;parallel [N] char line[MAXLINE];struct node (

char *word; struct node *next;) parallel [N] nodespace[100], *head;

declares an integer "a," an array ofMAX-LINE characters, an array of 100 nodes,and a node pointer for each of the N pro-cessors. Indexing along the parallel dimen-sion can be done by using selectors; theyenable or disable subsets of the N pro-cessors. Functions to send data betweenprocessors are provided. They are not partof the language, but library routines, thusavoiding machine dependence. If the com-piler is retargeted, these communicationroutines must be adapted. No change ofthe compiler itself is required. To facilitateMIMD mode, a few new features and key-words are added. A preprocessor recog-nizes these constructs and translates theparallel algorithm into a standard serial Cprogram, which can then be compiled by astandard C compiler and loaded into theprogram memories of the processors ofthe MIMD machine.

Implicit reconfiguration. On a super-computer with a fixed architecture, it isdesirable to have the ability to automati-cally analyze a serial program and restruc-ture it so that it can be executed efficiently.As reconfigurable supercomputers becomemore and more complex, having users useexplicit reconfiguration commands to con-struct efficient algorithms for the systemmay no longer be feasible. On a machinewith reconfigurable architecture, it is de-sirable to automate the process of findingan optimal architectural state and thenstructuring the task for the architecture.This may involve more than one architec-

A task may be mostefficiently performedusing differentarchitectural statesfor different subtasks.

February 1986 43

Figure 2 Simplified block diagram of PASM.

tural state, because a task may be most ef-ficiently performed using different archi-tectural states for different subtasks.

Much effort has gone into developingalgorithms capable of detecting paral-lelism in serial programs. 31,32 Consider aprogram to add to vectors A and B:

FOR i = I TO N DO C[i]:= A[i] +B[i];

Obviously, the addition statements do notdepend on each other; therefore, all addi-tions can be executed in parallel on Nprocessors. This example typifies theoperations performed by a parallelizingcompiler. The compiler analyzes the pro-gram and schedules statements that do notdepend on each other to exeecute in paral-lel. It must also determine the number ofprocessors needed to execute the programand have the operating system allocatethem. Since the order in which two state-ments, I and J, execute in a sequential pro-gram is not preserved if they are executedin parallel, three conditions must be met ifI and J are to be parallelized. Statement Imust not write to locations that statementJ reads; statement J must not write to loca-tions that statement I reads; and statementI must not write to the same location asstatement J if that location is used by alater statement.One approach to handle a system with

reconfigurable architecture, developed byKartashev, 6 is called the reconfiguration

flowchart. The algorithm assumes adynamic architecture that can partition itsprocessors into groups of computers ofdifferent sizes and can combine processorsto form higher-precision computers. Itanalyzes a Fortran program and its pro-gram flow structure and constructs a pro-gram graph. Each node in the graph in-cludes one or more program statements.The arrows between nodes represent theprogram flow. For each node, thenecessary computational precision isdetermined and by that the minimumnumber of processors required for thecomputation of the node is found. Since aminimum number of processors is foundfor each node, a maximum number ofnodes can be processed simultaneously.

Several programs can be run concur-rently on the system. Whenever a programfinishes, a new program should be started,necessitating a switch to a new architec-tural state. If the task causing the switchruns for a longer time than other tasks, theprocessors executing the short tasks willwait for the long task to finish. Therefore,the execution time for each node is deter-mined to minimize processor idle time.Then a resource diagram is constructed,where nodes are assigned specific pro-cessors, and execution can begin.

Ongoing research at Purdue involvesthe design of expert-system-based in-telligent operating systems for controllingreconfigurable supersystems.33 To op-timize system performance, the intelligent

operating system uses information aboutwhich subtasks need to be executed to per-form an overall task, the performance/system-requirement characteristics of thealgorithms for the subtasks, and the cur-rent state of the system.

Reconfiguration methodology. An im-portant consideration in reconfigurablesystems is the reconfiguration methodolo-gy. The switch from one architectural stateto another takes time. To maximize per-formance, the sum of execution time andreconfiguration time must be minimized.Clearly, a very short reconfiguration timeis desirable, since otherwise advantagesgained by the reconfiguration are lost dueto the reconfiguration overhead. Ofcourse, the time to reconfigure has to beless than (hopefully, much less than) thereduction in execution time gainedthrough the reconfiguration.As an example of a reconfiguration

methodology, consider the PASM system.PASM is a partitionable SIMD/MIMDsystem. A prototype with 16 MotorolaMC68000-based PEs in the computationalengine and four control units is being con-structed at Purdue University. 34 Figure 2shows a simplified block diagram of thesystem. In SIMD mode, the PEs receivetheir instructions from a control unit andprocess data from their private memory.In MIMD mode, the PEs have data andprogram in their private memory. Onetype of reconfiguration is the switchingfrom SIMD to MIMD mode and vice ver-sa. Whenever a PE addresses a certain ad-dress range, AR, thisPE is in SIMD mode.As soon as an access to AR is detected, aninstruction request is issued to the controlunit, which then broadcasts an instructionto the requesting processors. The controlunit accomplishes a switch from SIMD toMIMD mode by broadcasting an uncondi-tional jump to the beginning ofthe MIMDprogram to the PEs, where the address ofthis MIMD program is outside the rangeAR. The PEs can return to SIMD mode byjumping into the address space AR. Thereconfiguration overhead for PASM istherefore just one instruction. For allpractical applications, this overhead isnegligible and program sections can beexecuted in SIMD or MIMD mode,whichever is best suited.

Apart from reconfiguration for op-timum performance, reconfiguration dueto faults or other undesired conditions

COMPUTER44

such as unbalanced load must be con-sidered in a supersystem. If a single pro-cessing element of a supersystem fails, thesystem as a whole should continue to func-tion, eventually with decreased perfor-mance. The necessary reconfiguration hasto be automatic and must be handled bythe operating system, since a programmercannot predict a fault. Ma35 proposesreconfiguration control algorithms forreconfigurable computers with and with-out centralized control. As an example forthe methodology, consider the straightfor-ward reconfiguration procedure with cen-tralized control. The control unit monitorsthe state of the processing nodes anddetects undesirable conditions. It thenbroadcasts reconfiguration commands tothe appropriate nodes, which perform thenecessary reconfiguration and signal com-pletion to the control unit. After all nodeshave completed their respective recon-figuration tasks, the control unit signalsthe nodes and computation can resume.

An additional consideration for recon-figuration software is the ability to con-centrate computing power. 33 For ex-ample, assume that numerous subtasks ofsome overall task are being executed con-currently. If, as a result ofsome derived in-termediate result, one subtask becomesmore time-critical than the others, the sys-tem should dynamically reconfigure toprovide additional resources to the sub-task. Reconfiguration software capable ofsuch operation would have to have someform of intelligence and knowledge of asystem model. This is an area for futureresearch.

Retargetable software

As mentioned previously, software thatcan be used on a wide variety ofcomputersis very desirable. An early attempt towardsthat goal is the standardization of the pro-gramming language Fortran by ANSI.Nevertheless, many Fortran dialects exist,and compilers are not verified. Therefore,it cannot be guaranteed that the same pro-gram will exhibit identical performance ondifferent machines. A standard Fortranprogram using only standard I/O routineslike read and write might very well be por-table. As soon as explicit operating systemcalls such as seek for a disk are included,however, the program is restricted to ma-

chines with compatible operating systemcalls.

Other important tasks that depend onthe operating system are task creation andinterprocess communication. They arenecessary for expressing concurrent oper-ations. Concurrent operations are oftenused in real-time systems and are essentialfor parallel processing. Low-level opera-tions like interrupt control are not possiblein Fortran. All this seriously limits theportability of Fortran programs. Thesame problems arise with other languages,such as Pascal and Algol. Current effortson the design of Ada36'37 are attemptingto circumvent these problems.

There are basically three ways to make aprogram portable. 38

Transportable language approach. Thisapproach, designed for serial computers,is based on widely used languages like For-tran and Pascal. The dialects of such a lan-guage are analyzed and a hypotheticalparent language with a grammar commonto all dialects is defined. The syntax andsemantics of the parent language arerigorously specified. For each computerwhich is to run portable code, parameterswhich completely specify the computer ar-chitecture are determined. These parame-ters are byte size, word size, byte or wordorientation of the CPU, main memorysize, and maximum program module size.Then a compiler with two modes of opera-tion is constructed: the transportabilitytesting mode and the compilation mode.In the transportability testing mode, a pro-gram coded in a dialect of the language isconverted to the hypothetical parent lan-guage, using the architectural parameters.In the compilation mode, hypotheticalparent-language code is translated into adialect ofthe language. Thus, existing pro-grams can be converted to the parent lan-guage, making them portable. Also, pro-grams can be written directly in the parentlanguage and thus be guaranteed easilyportable.

Certain instructions must be modifiedfor the parent language. For example, allvariable declarations need to include aprecision declaration so that the parentlanguage compiler can determine the ap-propriate data type in the language dialect.Desirable language features not availablein a given dialect (such as data structuresor while loops) can be made available inthe parent language, thus making pro-

February 1986

A major advantageof the transportablelanguage approach isthat existingprograms are notrendered obsolete.

gramming in the parent language moreflexible.A major advantage of the transportable

language approach is that existing pro-grams are not rendered obsolete but can beconverted and made portable. The neces-sity to write bifunctional compilers forevery machine, and the continued ex-istence of multiple dialects for the lan-guage are disadvantages.

Emulation approach. Another ap-proach to transportable software is totransform the machine rather than theprogram. Making machines capable ofemulating some class of computers allowsthose machines to execute code written forany of the computers in that class.39,40Programs can be coded in any high-levelor assembly language supported by thecomputers in that class. The programs arecompiled and the resulting machine code istransferred to one of the emulationmachines. This machine emulates the com-puter for which the code was written andthus is capable of executing the machinecode. One problem with the approach isdecreased performance. Another is that allmachines, of a set of machines amongwhich code is to be transportable, have tobe capable of emulating all the others.

New language approach. In this ap-proach, all programs to be ported are writ-ten in a standard new language known byall machines, thus making programs por-table. The definition of this new languageincorporates all desired features and thecompilers for this language are rigorouslyverified to ensure portability. The Depart-ment of Defense chose this method when

45

Figure 3. (a) User's view of task communication in Ada. (b) User's view of task communication in a standard program-ming language.

developing the programming ianguageAda. 37,41,42Ada is a structured programming lan-

guage. In addition to being designed forportability, it supports many featuresnecessary for executing tasks on recon-figurable systems. Functions necessary forprocess creation, deletion, and inter-process communication are part of thelanguage. The compiler actually generatesoperating system calls from those func-tions, but this is hidden from the user. Ifthe program is transferred to a differentmachine and recompiled, the appropriatenew operating system calls are generated.

Depending on the machine architecture,concurrent processes could either be run ina time-sliced mode, or they could beallocated on different processors. Thecompiler for a particular machine, incooperation with the operating system,has to handle the allocation procedure.

Concurrent processes are called tasks inAda. A task is declared by defining a taskheader and a task body. If one or moretasks are declared inside a procedure, theyexecute concurrently with the procedure.The task header specifies the task nameand entries that handle interprocess com-munication. The task body contains allstatements executed in the task. To send amessage to task T, other tasks call an entryofT like a procedure. The destination taskT can accept the message.As an example, consider a process that

accepts single-character messages fromother tasks. The header has the form

task ACCEPT-CHAR isentry MESSAGE ( C: inCHARACTER)

end;

The task name is defined as AC-CEPT_CHAR. Other processes can send

messages to ACCEPT-CHAR calling theentry:

MESSAGE (CHAR);

The task ACCEPT-CHAR reads the mes-sage when it encounters an accept state-ment, which must be part ofthe task body:

accept MESSAGE (C: inCHARACTER) doMESS:= Cend;

This statement accepts an incomingmessage and assigns its value to thevariable MESS. Only between the do-endpart ofthe accept is the passed message ac-cessible. Data is actually exchanged onlywhen the sending task calls an entry andthe receiving task executes an accept on thesame entry. If a process encounters an ac-cept without another process executing theappropriate entry call, the process willwait until an entry call is executed. An en-try call also waits for the appropriate ac-cept. Thus, sending and accepting ofmessages is synchronized between tasks.The execution of an entry call and its

associated accept statement is called taskrendezvous. The task rendezvous facili-tates interprocess communication by theability to exchange data. It facilitates pro-cess synchronization by executing the taskonly if both the sending and receivingtasks have reached their respective rendez-vous statements. It facilitates mutual ex-clusion since only one of all tasks request-ing communication with another processwill be served at any given time. The dif-ference between an Ada task rendezvousand task communication in other pro-gramming languages is illustrated inFigure 3. Since no explicit operating sys-

tem calls are needed, the task handling isnot machine specific and can easily beported.Another important portability problem

with programming languages arises frommachine-dependent data types. A Cray-lsupercomputer,2 for example, has an in-teger precision of 24 bits, whereas the Fu-jitsu VP-20043 uses 32-bit integers. Pro-grams utilizing 32-bit integer precisioncould not run on a machine of 16-bit in-teger precision. In Ada, a programmer hasaccess to such machine-dependent data-type attributes as

INTEGER-SIZE

Using such attributes, a program cancheck at runtime whether a specificmachine will be able to execute a programas desired.The task concept of Ada can be used to

explicitly handle parallel processing inMIMD multiprocessors. This is insuffi-cient to express parallelism of an SIMDmachine. Extensions to Ada facilitatingSIMD programming have been pro-posed29; these extensions provide func-tions similar to those for SIMD program-ming as described above. Due to the rigidAda policy allowing no subsets orsupersets of the language in a verifiedcompiler, this approach may not be widelyused. To code SIMD programs explicitlynevertheless, subprograms coded in a lan-guage suitable for SIMD programming,such as Actus or extended Ada, couldreplace a standard Ada subprogram if theprogram is executed on an SIMD machine.This procedure is analogous to replacing ahigh-level language subroutine by a fasterexecuting assembly program. The mainbody of the program would then still beportable. Only the special subroutine has tobe changed when the program is retargeted.

46 COMPUTER

A s more and more adaptable su-persystems are designed andbuilt, the need for adaptable soft-

ware to run these systems efficiently willcontinue to grow. The adaptable softwarerequired includes retargetable software,which makes application programs trans-portable, and reconfiguration software,which controls the system organization.Thus, adaptable software is an essentialelement in the development of super-systems or mission-critical objectives. C]

AcknowledgmentsThe authors would like to thank Steven

and Svetlana Kartashev and Andre vanTilborg for their careful reading of themanuscript and many helpful suggestions.

This work was supported by the RomeAir Development Center under contractnumber F30602-83-K-01 19.

References1. R. M. Russell, "The Cray-i Com-

puter System," Comm. ACM, Jan.1978, pp. 63-72.

2. E. W. Kozdrowicki and D. J. Theis,"Second Generation of Vector Su-percomputers," Computer, Nov.1980, pp. 71-83.

3. K. E. Batcher, "Design of a Mas-sively Parallel Processor," IEEETrans. Computers, Vol. C-29, Sept.1980, pp. 836-844.

4. K. E. Batcher, "Bit Serial ParallelProcessing Systems," IEEE Trans.Computers, Vol. C-31, May 1982,pp. 377-384.

5. E. W. Martin, "Strategy for a DoDSoftware Initiative," Computer,Vol. 16, Mar. 1983, pp. 52-59.

6. S. P. Kartashev and S.I. Kartashev,"Distribution of Programs for a Sys-tem with Dynamic Architecture,"IEEE Trans. Computers, Vol. C-31,June 1982, pp. 488-514.

7. G. J. Lipovski and A. R. Tripathi,"A Reconfigurable VaristructureArray Processor," 1977 Intl. Conf.Parallel Processing, Aug. 1977, pp.165-174.

8. S. I. Kartashev and S. P. Kartashev,"Dynamic Architectures: Problemsand Solutions," Computer, July1978, pp. 26-42.

9. S. I. Kartashev and S. P. Kartashev,"A Multicomputer System with Dy-namic Architecture," IEEE Trans.Computers, Vol. C-28, Oct. 1979,pp. 704-720.

10. G. J. Lipovski, "The Banyan Switchin TRAC the Texas ReconfigurableArray Computer," DistributedProcessing Technical CommitteeNewsletter (IEEE Computer Soci-ety), Jan. 1984, pp. 13- 26.

11. J. Keng and K-S. Fu, "A SpecialComputer Architecture for ImageProcessing," 1978 IEEE ComputerSociety Conf. Pattern Recognitionand Image Processing, June 1978,pp. 287-290.

12. H. J. Siegel, L. J. Siegel, F. C.Kemmerer, P. T. Mueller, Jr., H. E.Smalley, Jr., and S. D. Smith,"PASM: A Partitionable SIMD/MIMD System for Image Processingand Pattern Recognition," IEEETrans. Computers, Vol. C-30, Dec.1981, pp. 934-947.

13. S. I. Kartashev and S. P. Kartashev,"Problems of Designing Supersys-tems with Dynamic Architecture,"IEEE Trans. Computers, Vol. C-29,Dec. 1980, pp. 11 14-1132.

14. S. S. Reddi and E. A. Feustel, "ARestructurable Computer System,"IEEE Trans. Computers, Jan. 1978,pp. 1-20.

15. H. J. Siegel, Interconnection Net-works for Large-Scale Parallel Pro-cessing: Theory and Case Studies,Lexington Books, D. C. Heath andCo., Lexington, MA, 1985.

16. W. Wulf and C. Bell, "C.mmp-AMulti-Miniprocessor," AFIPS 1972Fall Joint Computer Conf., Dec.1972, pp. 765-777.

17. H. Sullivan, T. R. Bashkow, and K.Klappholz, "A Large-Scale Homo-geneous, Fully Distributed ParallelMachine," Fourth Ann. Symp.Computer Architecture, Mar. 1977,pp. 105-124.

18. A. Gottlieb, R. Grishman, C. P.Kruskal, K. P. McAuliffe, L.Rudolph, and M. Snir, "The NYUUltracomputer-Designing anMIMD Shared-Memory ParallelComputer," IEEE Trans. Comput-ers, Vol. C-32, Feb. 1983, pp.175-189.

19. D. L. Tuomenoksa, G. B. Adams III,H. J. Siegel, and 0. R. Mitchell, "A

Parallel Algorithm for ContourExtraction: Advantages and Archi-tectural Implications," 1983 IEEEComputer Society Symp. ComputerVision and Pattern Recognition,June 1983, pp. 336-344.

20. M. E. Conway, "A MultiprocessorSystem Design," AFIPS 1963 FallJoint Computer Conf., 1963, pp.139-146.

21. E. W. Dijkstra, "CooperatingSequential Processes," Program-ming Languages, ed. F. Genuys,Academic Press, New York, NY,1968, pp. 43-112.

22. W. J. Bouknight, S. A. Denenberg,D. E. Mclntryre, J. M. Randall,A. H. Sameh, and D. L. Slotnick,"The Illiac IV System," Proc. IEEE,Vol. 60, Apr. 1972, pp. 369-388.

23. K. G. Stevens, "CFD-A Fortran-like languagefor the Illiac IV,"ACMConf. Programming Languages andCompilers for Parallel and VectorMachines, Mar. 1975, pp. 72-76.

24. D. H. Lawrie, T. Layman, D. Baer,and J. M. Randall, "Glypnir-AProgramming Language for IlliacIV," Comm. ACM, Vol. 18, Mar.1975, pp. 157-164.

25. S. F. Reddaway, "DAP-A Dis-tributed Array Processor," 1st Ann.Symp. Computer Architecture, Dec.1973, pp. 61-65.

26. R. H. Perrott, "A Language forArray and Vector Processors,"ACMTrans. Programming Languages andSystems, Vol. 1, Oct. 1979, pp.177-195.

27. L. Uhr, "A Language for ParallelProcessing of Arrays, Embedded inPascal," Languages and Architec-turesfor Image Processing, eds. M.J. B. Duff and S. Levialdi, AcademicPress, London, England, 1981, pp.53-88.

28. A. P. Reeves and J. D. Bruner, "TheProgramming Language ParallelPascal," 1980 Int'l Conf. ParallelProcessing, Aug. 1980, pp. 5-7.

29. C. Cline and H. J. Siegel, "Augment-ing Ada for SIMD Parallel Process-ing," IEEE Trans. Software Engi-neering, Vol. SE-li, No. 9, Sept.1985, pp. 970-977.

30. J. T. Kuehn and H. J. Siegel,"Extensions to the C ProgrammingLanguage for SIMD/MIMD Paral-lelism," 1985 Int'l Conf. ParallelProcessing, Aug. 1985, pp. 232-235.

February 1986 47

31. D. J. Kuck and D. A. Padua, "High-Speed Multiprocessors and TheirCompilers," 1979 Int'l Conf.Parallel Processing, Aug. 1979, pp.5-16.

32. Arvind, "Decomposing a Programfor Multiple Processor Systems,"1979 Int'l Conf. Parallel Processing,Aug. 1979, pp. 7-14.

33. E. J. Delp, H. J. Siegel,A. Whinston, and L. H. Jamieson,"An Intelligent Operating System forExecuting Image UnderstandingTasks on a Reconfigurable ParallelArchitecture," IEEE ComputerSociety Workshop on ComputerArchitecture for Pattern Analysisand Image Database Mangement,Nov. 1985, pp. 217-224.

34. H. J. Siegel, T. Schwederski, N. J.Davis IV, and J. T. Kuehn, "PASM:A Reconfigurable Parallel System forImage Processing," Workshop onAlgorithm-guided Parallel Architec-turesfor Automatic Target Recogni-tion, July 1984, pp. 263-291. (Alsoappears in theACM SIGARCH news-letter, Computer'Architecture News,Vol. 12, No. 4, Sept. 1984, pp. 7-19.)

35. Y. W. Ma, "ReconfigurationControl Algorithms for Reconfig-urable Computer Systems," Int'lComputer Software and Applica-tions Conf., Nov. 1982, pp. 70-77.

36. J. G. P. Barnes, Programming inAda, Addison-Wesley, London,England, 1982.

37. U.S. Department of Defense,Reference Manual for the AdaProgramming Language, Washing-ton, D.C., 1982.

38. P. A. D. de Maine and S. Leong,"Transportation of Programs," 15thSoutheast Symp. System Theory,Mar. 1983, pp. 158-164.

39. T. A. Marsland and J. C. Demco, "ACase Study ofComputer Emulation,"Canadian J. Operational Researchand Information Processing, Vol. 16,June 1978, pp. 112-131.

40. J. Hayes, Computer Architectureand Organization, McGraw-Hill,New York, NY, 1978.

41. H. Ledgard, Ada, An Introduction,Springer-Verlag, New York, NY,1983.

42. R. J. A. Buhr, System Design withAda, Prentice Hall, Englewood City,NJ, 1984.

43. K. Hwang and F. A. Briggs, Comput-er Architecture and Parallel Process-ing, McGraw-Hill, New York, NY,1984.

Thomas Schwederski is currently workingtoward a PhD degree at Purdue Univer-sity. He received the Diplom-Ingenieurdegree in electrical engineering fromRuhr-Universitaet Bochum in 1983, andthe MS degree in electrical engineeringfrom Purdue University in 1985. His re-search interests include computer architec-ture, parallel processing, multimicropro-cessing systems, and VLSI design.

Howard Jay Siegel is a professor of elec-trical engineering and Director of thePASM Parallel Processing Laboratory atPurdue University, where he has been in-volved in research and teaching since 1976.His research interests include parallel/distributed processing, computer architec-ture, and image and speech understanding.

Siegel received BS degrees in electricalengineering and management from theMassachusetts Institute of Technology in1972. He received the MA and MSE de-grees in 1974, and the PhD degree in 1977,all three in electrical engineering and com-puter science, from Princeton University.

Siegel has served as an IEEE ComputerSociety distinguished visitor and is cur-rently an associate editor of the Journal ofParallel and Distributed Computing.

Questions regarding this article can bedirected to either author at the PASMParallel Processing Laboratory, School ofElectrical Engineering, Purdue Univer-sity, West Lafayette, IN 47907.

COMPUTER

SENIOR APPLICATIONENGINEERS

At Cray Research, the secret to our success is to hire the most talentedpeople and provide them with the funding to develop the world's mostadvanced supercomputers and related products.Right now, we are looking for exceptionally talented and motivatedindividuals who are interested in the design and development of majoralgorithms and codes for high speed parallel processors involving any of thefollowing fields:Computational plasma physicsComputational fluid dynamicsHigh speed graphicsYou need a strong numerical background with emphasis in parallelprocessing, You also need an advanced degree or 3 years' relatedexperience. High speed computer architecture experience preferred.

On Sabbatical?Some positions are available on a temporary basis for college facultymembers on sabbatical.If you enjoy working in a research-oriented setting with a small team oftalented innovators, send your resume today to:

T. Jetfery CrosbyCRAY RESEARCH, INC.Dept. R-0811050 Lowater RoadChippewa Falls, WI 54729An Equal Opportunity Employer M/F/H/V

o =1 W-IMMMM21116-111111111M12111111

Documents

Adaptable Software for Supercomputers