10

2 Bac - Kyoto U

  • Upload
    others

  • View
    9

  • Download
    0

Embed Size (px)

Citation preview

Page 1: 2 Bac - Kyoto U

Orgel: An Parallel Programming Language

with Declarative Communication Streams

Kazuhiko OHNO1, Shigehiro YAMAMOTO1,Takanori OKANO2, and Hiroshi NAKASHIMA1

1 Toyohashi University of Technology, Toyohashi, 441-8580 JAPAN2 Current a�liation: Hitachi Systems & Services, Ltd.

fohno, okanon, yamamoto, [email protected]

Abstract. Because of the irregular and dynamic data structures, paral-lel programming in non-numerical �eld often requires asynchronous andunspeci�c number of messages. Such programs are hard to write usingMPI/Pthreads, and many new parallel languages, designed to hide mes-sages under the runtime system, su�er from the execution overhead.Thus, we propose a parallel programming language Orgel that enablesbrief and e�cient programming. An Orgel program is a set of agentsconnected with abstract channels called streams. The stream connectionsand messages are declaratively speci�ed, which prevents bugs due to theparallelization, and also enables e�ective optimization. The computationin each agent is described in usual sequential language, thus e�cientexecution is possible.The result of evaluation shows the overhead of concurrent switchingand communication in Orgel is only 1.2 and 4.3 times larger than thatof Pthreads, respectively. In the parallel execution, we obtained 6.5{10times speedup with 11{13 processors.

1 Introduction

To obtain high performance using parallel machines, the means for brief ande�cient programming is necessary. Especially in the non-numerical processing,existing programming methods require low-level speci�cations or su�er fromlarge runtime overhead.

So, we propose a new parallel programming language called Orgel, which hasboth abstract description of parallelism and runtime e�ciency. The program-ming paradigm of Orgel is multi-agents connected with abstract communicationchannels called streams. The agents run in parallel, passing messages via streams.

The computation of each agent is described with a usual sequential language.Thus the execution of each agent is e�cient. The connection among agents andstreams are declaratively described. This feature statically determines the paral-lel model of the program, thus prevents bugs in the communications and enablesstrong optimization using static analysis.

This paper is organized as follows: Section 2 describes the background. Sec-tion 3, 4 presents the language design of Orgel and the current implementation.Section 5 shows the result of evaluation, and in Section 6 we give the conclusion.

Administrator
テキストボックス
This paper is author's private version of the paper published as follows. 3rd Intl. Symp. High Performance Computing, pp. 344-354, October 2000.
Page 2: 2 Bac - Kyoto U

2 Background

For the non-numerical processing, automatic parallelization is extremely di�cult.The dynamic and irregular structures like lists and trees cannot be staticallydivided, and the program structure is also irregular because of recursive calls.Therefore many parallel programming method have been proposed for this �eld.

One of the major method is to use message passing libraries (PVM [1],MPI [2]) or thread libraries (Pthreads [3]) on a sequential language like C, For-tran, etc. This way is easy to learn for the users accustomed to the sequentialprogramming language, and e�cient programming is possible by the low-leveltuning. However, the order and number of communications are nondeterministicin many non-numerical programs. In such cases, the mismatch of correspondingsends/receives easily occurs if low-level communications are explicitly speci�ed.

Another method is to design a new programming languages with parallelexecution semantics (KL1 [4]). Parallelism can be naturally described in suchlanguages. And owing to the abstraction of communications and synchroniza-tions, the user can avoid timing bugs. However, such abstraction causes largeoverhead at runtime, and leads to ine�ciency.

To reduce such overhead, the optimization schemes using static analysis havebeen proposed. For example, our optimization scheme for KL1 achieved remark-able speedup for typical cases [5, 6]. However, precise static analysis of dynamicbehavior is di�cult, thus the optimization is ine�ectual in some cases.

So, the desirable parallel programming language should have the followingfeatures: 1) e�cient and similar style with the usual sequential programming,and 2) abstract speci�cation of parallelism and communications to reduce theburden of users. The speci�cation also should help static optimization.

3 Language Design

3.1 Language Overview

Orgel is designed for non-numerical programming. Because automatic paral-lelization is di�cult in this �eld, Orgel leaves the speci�cation of parallelism andcommunications to the user, and supplies frameworks for such speci�cation.

The execution unit of Orgel is called an agent. We also introduce an abstractmessage channel called a stream, for the inter-agent communication. Thus, theOrgel program is represented as a set of agents connected by streams. The agentsrun in parallel, passing messages via streams.

The syntax of Orgel is based on C. We added 1)declarations of stream/agent/message/network connection; 2) statements for message creation/transmission/dereference and agent termination; and 3) agent member functions. We also elim-inated global variables and added agent member variables. Thus each functioncan be coded e�ciently in usual sequential programming.

As we describe in Section 3.2, 3.3, the structure and behavior of streamsand agents are de�ned as stream types and agent types. The instances of agents

Page 3: 2 Bac - Kyoto U

stream StreamType [inherits StreamType1 [, : : :]] fMessageType [(Type Arg :Mode [, . . . ])];. . .

g;

Fig. 1. Stream type declaration

and streams are automatically created at runtime, according to the variablesde�nition of stream types or agent types. They are automatically connectedaccording to the connection declaration (see Section 3.4).

When an Orgel program is executed, a main agent is created and starts its ex-ecution. If the main agent type contains some variable de�nitions of agent/streamtypes, their instances are also automatically created 1, streams are connected tothe agents, and the agents start execution in parallel. Thus, the network of agentsand streams are built without operational creation nor connection.

Compared with other many multi-agent/object-oriented languages [7, 8], thisdeclarative speci�cation of network clears the parallel execution model of theprogram. It prevents the creation of unexpected network structure, and alsoenables precise static analysis which leads to e�ective optimization.

3.2 Stream

A stream is an abstract message channel, based on KL1's stream communicationmodel [9]. Our stream has direction of message ow, and one or more agents canbe connected to each end.

A stream type is declared in the form of Fig. 1. This declaration enumeratesmessage types that a stream type StreamType accepts. An Orgel message typetakes the form of a function:MessageType which is used as the message identi�er,and a list of arguments with types and input/output mode (in/out).

3.3 Agent

An agent is an active execution unit, which sends messages each other whileperforming its computation.

To create an agent, an agent type declaration in the form of Fig. 2 is needed.The form of agent type declaration is similar to C function declaration. Thearguments of an agent type are its input/output streams with types and modes.

Member functions are de�ned in the same way as usual C functions, exceptthe function name is speci�ed in the form: AgentType::FunctionName.

For member variables, independent memory areas are allocated to each agentinstance. The scope of member variables is within the agent type declaration andall member functions of the agent type.

Agents and streams are logically created by the de�nition of member variablesof the declared types. In this paper, we call these variables agent variables and

1 For the e�cient execution, the creation is delayed until a message is sent to them.

Page 4: 2 Bac - Kyoto U

agent AgentType([ StreamType StreamName: Mode [,. . . ] ] )fmember function prototype declarationsmember variable declarationsconnection declarationsinitial Initializer;final Finalizer;task TaskHandler;dispatch (StreamName)fMessageType: MessageHandler;. . .

g;g;

Fig. 2. Agent type declaration

stream variables. As explained in Section 3.1, physical creation is automaticwithout operational creation.

Connection declarations specify how to connect agents and streams de�nedas member variables. We will show details of this declaration in Section 3.4.

The last four elements: initial, final, task, dispatch; de�nes event han-dlers. The handlers are sequential C code, with extensions for message handling.Initializer and Finalizer are executed on the creation and destruction of theagent, respectively. TaskHandler de�nes the agent's own computation, and isexecuted when the agent is not handling messages. And dispatch declarationspeci�es the message handler to each message type of input stream StreamName,by enumerating acceptable message types MessageType and handler code Mes-

sageHandler. This declaration works as an framework for asynchronous messagereceiving.

3.4 Connection Declaration

The connection among agents and streams can be speci�ed by a connectiondeclaration in the following form:

connect [ Agent0.S0 Dir0 ] Stream Dir1 Agent1.S1 ;

The declaration takes a stream variable Stream and input/output streams S0,S1 of agent variables Agent0, Agent1. A speci�er self can be used in place of anagent variable, to connect the agent that contains the connection declaration.Dir0, Dir1 are direction speci�ers (==> or <==) that indicates the message ow.

If the agent/stream variable is an array, an array speci�er in form of [Sub-scriptExpression] is needed. If the subscript expression is a constant expression,the declaration argument means an element of the array. If the expression isomitted, the argument means all elements of the array.

The subscript expression may contain one identi�er called pseudo-variable.It works as an variable whose scope is within the connection declaration, and

Page 5: 2 Bac - Kyoto U

agent main(){ worker w[16]; comm left[15], right[15]; broadcast b;

connect w[i].ro ==> right[i] ==> w[i+1].li; connect w[i].lo ==> left[i-1] ==> w[i-1].ri; connect self ==> b ==> w[].ctrl;}

(a) Connection declaration

b

ctrlw[1]

ro

rilo

li

w[0]

ro

ri

ctrlw[15]

lo

li

ctrl

right[0]

left[0]

(b) network of agents

main

b

Fig. 3. Example of connection declarations

represents every integer values with the restriction that each subscript expressiondoes not exceed the array size.

By using array speci�er, a set of one-to-one connections, or one-to-many/many-to-one connection, can be declared. A message to a stream with multiple receiversare multicasted to them, and messages to a stream from multiple senders arenondeterministically sequentialized.

An example is shown in Fig. 3(a). Here we regard that worker is an agenttype, and comm, broadcast are stream types. In the �rst connection declaration,each subscript expression is restricted to the size of array w and right. Thusthe value of pseudo-variable i is 0 : : : 14, and as shown in Fig. 3(b), the outputstream ro of each agent is connected to the input stream li of the right neighboragent, via each right stream. By the third declaration, the stream b's senderside is connected to the agent main, and the receiver side is connected to everyworker type agent. Thus this stream works as a broadcast network from main

to all worker agents.

Because the connect declaration statically de�nes network model, the com-piler can make static analysis precisely and can optimize scheduling and com-munications. Such optimization is much di�cult with the operational streamconnection in A'UM [10] or AYA [11]. Candidate Type Architecture [12] alsoo�ers declarative network con�guration, but our stream model is more exible.

3.5 Message

Message Variables Using message types declared in the stream type declara-tion, variables for messages can be de�ned. We call them message variables.

A message variable acts as a logical variable in logic programming languages.Its initial state is unbound, and changes to bound when assigned with a messageobject or other message variables. The bound state has two cases: if the variableis assigned to a message object, the state is instantiated; and if the variable isassigned to other variable that is not instantiated, the state is uninstantiated. Inthe latter case, later assignment with a message object can change the state ofevery related variables to instantiated.

Page 6: 2 Bac - Kyoto U

process ID

n

m

r

dat[32]

map ID

i

d[32]

process p;result r;int i;char d[32]; ...p = process(i, map(d), r);s <== p;

(2)copy data

(2)copy data

(1)create message objects

(3)send a message

stream command{ process(int n:in, map m:in, result r:out); map(char dat[32]: in); result(int answer: in);}

Fig. 4. Sending Messages

(2)copy data on dispath

char d[32];dispatch(s){ process(i, m, r):{ m ?== map(d); ... r = result(j); } }

process ID

n

m

r

map ID

i

d[32]

result

answer

j

r

m

(1)receive a message

(5)send back an out-moded argument

(4)instantiate an out-moded argument

(3)copy data on dereference

dat[32]

Fig. 5. Receiving Messages

Creating and Sending Messages To create a message object, the messagetype and actual arguments is described in a functional form. For example, if astream type is declared as shown in Fig. 4, a message variable can be de�nedand assigned to a message object as shown in Fig. 4(1).

The type of a message argument must be any C data type except pointertypes, or any message type.

In the former case, the actual argument value is stored in the message ob-ject. If the argument type is an array, the actual argument should be a pointerindicating the head of array data of declared size. In the example of Fig. 4(2),arguments of int and an array of char are stored in the message.

In the latter case, the mode can be either in or out. If the mode is in,this argument is instantiated by the message sender. The actual argument mustbe a message object of the declared argument type. It can be an uninstantiatedvariable on the message creation, but must be instantiated by the message senderin time. If the mode is out, this argument is instantiated by the message receiver.The actual argument must be an uninstantiated message variable.

The created message object is sent to a stream by a send statement of thefollowing form:

Stream <== Message;

Receiving Messages An agent, connected to a stream's receiver side, receivesand handles messages according to the dispatch declaration of the agent type.

dispatch declaration speci�es one of the agent's input streams, and enu-merates acceptable message types. When a message of the type is received, theargument value of the message object is stored in the corresponding variablesspeci�ed in dispatch. Similar to the message creation, the variable correspond-ing to the array argument is regarded as a pointer, and the area of array size iscopied.

Page 7: 2 Bac - Kyoto U

If the message has out moded arguments, the corresponding variables will beuninstantiated variables. These variables are bound to the sender's correspondingvariables, and by instantiating receiver's variables with message objects, theobjects are sent back to the sender.

The messages that appears as other message's argument can be obtained bya dereference expression in the following form:

Variable ?== MessageType[(Arg1 [, : : :])];

If a message variable Variable is uninstantiated, the agent executing this ex-pression is suspended, and resumed when other agent instantiates the variable. Ifthe message's type is MessageType, The expression returns non-zero and assignsArg1, . . . to the corresponding arguments. If the type di�ers, it returns zero.

Fig. 5 shows an example of receiving messages, sent in the example of Fig. 4.When a process message arrives (1), the receiver agent assigns the variable i tothe value of argument n (2). Next, by a dereference expression, the message-typeargument m is obtained and the value of argument dat is copied to d (3). And�nally, by assigning a message object to the out moded argument r (4), theobject is sent back to the sender of process message (5).

4 Implementation

Using Pthreads, the current implementation supports concurrent execution ona single-processor or parallel execution on shared-memory multi-processors.

The implementation consists of a Orgel compiler called Orc and Orgel run-time libraries which support agent management and stream communication. Orcis implemented as an Orgel-to-C translator. The automatically generated C pro-gram is compiled by a C compiler, linked with Orgel runtime libraries, and theexecutable �le is generated.

4.1 Implementation of Agents

For each agent type declaration shown in Fig. 2, Orc generates agent main

function. On the creation of an agent, a thread is created and starts executionof the corresponding agent main function.

In an agent main function, initial handler is called �rst. Then in the mainloop, each message handlers are called according to the received message types,and the task handler is called if no message is received. A terminate statementis compiled into the code that breaks the main loop. And when the main loopends, final hander is called before the thread terminates.

The agent member variables are translated into a C struct type that has eachvariables as members. The agent main function de�nes a variable of this structtype, and its pointer is added to the arguments of member functions. Each accessto member variables are replaced with the access to the corresponding memberof the struct. Thus, the instance of member variables are allocated for each agentinstance, and can be accessed in any member functions.

Page 8: 2 Bac - Kyoto U

To suppress the number of threads for e�ciency, and to enable lazy creationof agents, an agent instance is represented by an agent record. Corresponding tothe logical creation of agents, agent records are �rst created. And Orgel runtimeschedules agents using these records, creating threads in case of need.

4.2 Implementation of Streams and Messages

A stream instance is represented as a stream record, which keeps connectioninformation and a message queue.

Because the structure of an Orgel message is statically declared, it can becompiled into a C struct type. Every message struct type has a ID �eld to distin-guish message types and a logical pointer of a message struct to form a messagequeue for streams. The C-data-type arguments of the message are compiled asmembers of the struct, and the message-type arguments are represented as alogical pointer to the corresponding struct type.

The messages are not always freed in the creation order, because of themessage-type arguments. So they are allocated on a global heap and managedusing garbage collection (GC). This heap has a 2-level structure; A messagevariable contains the index for heap entry table, and each entry has the addressof corresponding message in the heap. Thus on GC, the runtime system can packmessages without changing the value of message variables.

4.3 Implementation of Sending/Receiving Messages

Message operations in agent type declarations and member functions are re-placed by the calls of corresponding functions in the runtime library. The once-assignment rule for the message variables is assured by compile-time and runtimecheck. For the latter, Orc inserts some inline code to check the restriction.

The suspension/resumption of agents are implemented as follows: The deref-erence function checks if the message variable is instantiated. If it is uninstan-tiated, the function creates a hook record with a condition variable [3] andsuspends the thread. When a message object is assigned to the variable, theinserted code �nds the hook and sends a signal to resume the suspended thread.

5 Evaluation

We evaluated the prototype implementation using 2 programs: nqueen and pia.The latter is a multiple protein sequence alignment program by parallel iterativeimprovement method.

5.1 Sequential Performance

We evaluated the e�ciency on single processor, by the comparison with sequen-tial C programs and concurrent programs using Pthreads library.

Page 9: 2 Bac - Kyoto U

C Orgel ratio

nqueen 66.54s 73.88s 1.11pia 12.18s 12.78s 1.05

(a)

C+Pthreads Orgel ratio

switching 18.50�s 22.58�s 1.22communication 2.96�s 12.76�s 4.31

(b)

Table 1. Sequential Performance (On SS10+Solaris 2.5)

sequential parallel speedup

nqueen 213.58s 21.24s 10.06pia 50.24s 7.72s 6.50

Table 2. Parallel Performance (On SPARCcenter + Solaris 2.6)

Table 1(a) shows the execution time of nqueen and pia. C version makes thecomputation, equivalent to Orgel agents, sequentially in loops. The result showsthat the overhead using Orgel is only 5{11% compared to C.

Table 1(b) shows overhead of Orgel runtime, compared with directly usingPthreads in C. We used a benchmark program that repeats transferring an in-teger value between 2 threads. The C+Pthreads version uses shared variable forinteger transfer, and uses semaphore for synchronization.

The thread switching overhead of Orgel is only 22% larger than C+Pthreads.To deal with many-to-many dependencies among threads, Orgel uses conditionvariables for synchronization. But its overhead is small enough.

Even for transferring just an integer, a message must be created and sentvia stream in Orgel. But the overhead is only 4 times larger than using a sharedvariable. We regard it is small enough because the ratio of transmission overheadis smaller in practical programs, and the overhead using Pthreads grows largerwhen bu�ering data or transferring dynamic data structures. Still more, thecurrent overhead of Orgel communication includes that of locking/unlockingstream records and the heap, which can be reduced using static analysis.

5.2 Parallel Performance

We executed the Orgel version of nqueen and pia on a shared-memory multi-processor machine SPARCcenter (Solaris 2.6). The result is shown in Table 2.

Nqueen obtained 10.06 speedup using 11 threads, which is almost linearspeedup. Pia obtained 6.50 speedup using 13 threads. The communications inpia are 1-to-many or many-to-1, and by optimizing the mutual exclusion onmessage transmission, the performance will be improved.

6 Conclusion

In this paper we proposed a new parallel programming language Orgel, andpresented its design, implementation and evaluation.

Page 10: 2 Bac - Kyoto U

Orgel is based on an execution model that multi-agents, connected with ab-stract message channels called streams, run in parallel. The distinctive featureof Orgel is declarative description of agent networks, which prevents communi-cation/synchronization bugs and also enables precise static analysis for e�ectiveoptimization. On the other hand, the computation of each agent is describedsequentially, which enables to write e�cient programs.

The evaluation on prototype implementation shows the overhead on singleprocessor is small enough compared to C or Pthreads library, and promisingspeedup is obtained on a multi-processor machine.

We are currently working on the optimizer using static analysis. Supportingdistributed-memory multi-processors is also our future work.

Acknowledgment

This research work is being pursued as a part of the research project entitled"Software for Parallel and Distributed Supercomputing" of Intelligence Informa-tion and Advanced Information Processing supported as a part of Research forthe Future Program by Japan Society for the Promotion of Science (JSPS).

References

[1] V. S. Sunderam. PVM: A framework for parallel distributed computing. Concur-rency: Practice and Experience 2, Vol. 2, No. 4, pp. 315{339, December 1990.

[2] Message Passing Interface Forum. MPI: A Message-Passing Interface Standard,June 1995.

[3] B. Nichols, D. Buttlar, and J. P. Farrell. Pthreads Programming. O'REILLY,1998.

[4] K. Ueda and T. Chikayama. Design of the kernel language for the parallel inferencemachine. The Computer Journal, Vol. 33, No. 6, pp. 494{500, 1990.

[5] K. Ohno, M. Ikawa, M. Goshima, S. Mori, H. Nakashima, and S. Tomita. Improve-ment of message communication in concurrent logic language. In Proceeding of theSecond International Symposium on Parallel Symbolic Computation PASCO'97,pp. 156{164, 1997.

[6] K. Ohno, M. Ikawa, M. Goshima, S. Mori, H. Nakashima, and S. Tomita. E�cientgoal scheduling in concurrent logic language using type-based dependency anal-ysis. In LNCS1345 Advances in Computing Science { ASIAN'97, pp. 268{282.Springer-Verlag, 1997.

[7] Y. Shoham. Agent-oriented programming. Arti�cial Intelligence, Vol. 60, pp.51{92, 1993.

[8] G. Agha. Concurrent object-oriented programming. Communications of the ACM,Vol. 33, No. 9, pp. 125{141, September 1990.

[9] T. Chikayama, T Fujise, and D. Sekita. KLIC User's Manual. ICOT, March 1995.[10] K. Yoshida and T. Chikayama. aum { a stream-based concurrent object-oriented

language {. New Generation Computing, Vol. 7, No. 2, pp. 127{157, 1990.[11] ICOT TM-1206. Introduction of AYA (version 1.0) (in Japanese).[12] G. A. Alverson, W. G. Griswold, C. Lin, D. Notkin, and L. Snyder. Abstractions

for portable, scalable parallel programming. IEEE Transactions on Parallel andDistributed Systems, Vol. 9, No. 1, pp. 71{86, January 1998.