12
The D-Box coordinational language and the run-time system Support for distribute development in Clean functional language Hernyák Zoltán http://aries.ektf.hu/~hz [email protected] Main results of the PhD dissertation 2009 Advisor: Dr. Horváth Zoltán, professor Eötvös Lóránd University, Faculty of Informatics H-1117 Budapes, Pázmány Péter sétány 1/C. PhD School of Eötvös Lóránd University, Faculty of Informatics PhD Programm: The foundations of and methodologies in informatics Leader of the PhD School and that of the PhD programme: Dr. Demetrovics János (Full Member of the Hungarian Academy of Sciences)

TheD-Boxcoordinationallanguageandthe run-timesystemtnkcs.inf.elte.hu/vedes/Hernyak_Zoltan_Tezisek_en.pdf · codes are read, parametered from external pattern files, on the basis

  • Upload
    others

  • View
    4

  • Download
    0

Embed Size (px)

Citation preview

Page 1: TheD-Boxcoordinationallanguageandthe run-timesystemtnkcs.inf.elte.hu/vedes/Hernyak_Zoltan_Tezisek_en.pdf · codes are read, parametered from external pattern files, on the basis

The D-Box coordinational language and therun-time system

Support for distribute development in Clean functional language

Hernyák Zoltánhttp://aries.ektf.hu/~hz

[email protected]

Main results of the PhD dissertation

2009

Advisor: Dr. Horváth Zoltán, professorEötvös Lóránd University, Faculty of InformaticsH-1117 Budapes, Pázmány Péter sétány 1/C.

PhD School of Eötvös Lóránd University, Faculty of InformaticsPhD Programm: The foundations of and methodologies in informaticsLeader of the PhD School and that of the PhD programme:Dr. Demetrovics János (Full Member of the Hungarian Academy of Sciences)

Page 2: TheD-Boxcoordinationallanguageandthe run-timesystemtnkcs.inf.elte.hu/vedes/Hernyak_Zoltan_Tezisek_en.pdf · codes are read, parametered from external pattern files, on the basis
Page 3: TheD-Boxcoordinationallanguageandthe run-timesystemtnkcs.inf.elte.hu/vedes/Hernyak_Zoltan_Tezisek_en.pdf · codes are read, parametered from external pattern files, on the basis

Introduction

The present thesis deals with a coordination language that supports the distributedevaluation of a Clean programme, and its run-time system. The D-Clean higher ab-straction coordination language primitives designed for the Clean functional languagewill be translated into these lower-level D-Box coordination language definitions, onwhich the concepts channel, box, protocol are defined. The primary coordination el-ement is the box, that contains a computation task - that can be a function or anexpression - formulated on Clean language. Parameters required by this expression arecarried by typed channels from other boxes; these are capable of minimal buffering aswell. On the other hand it supports the add and remove operations known from thequeue data structure.

The channels are read by a protocol expression that hides the details of channel-handling from the Clean language expression. The protocol expression is able to per-form basic transformations on the data coming through the channels - e.g. constructa list from them. The expression processes the data received from the protocol, thenit computes the result. The results are given to an output protocol, that will transmitthose to the outgoing channels. The input and output channels of the boxes are con-nected. The nod elements of the computation graph are the boxes, the edges are thechannels. The topic of the dissertation is the syntax and semantics of the coordinationlanguage describing this graph.

Aims

1. To create a coordination language that supports the distributed functional prog-ramming in its operation. This should support the D-Clean higher abstractioncoordination language and should be suitable for developing distributed program-ming solutions on a functional programming language. To work out the syntacticand static semantic rules of the coordination language, on the basis of which theD-Box language descriptions can be processed and checked.

2. To give the specifications connected to the operation of the coordination language,that can be the basis of creating a code-generating tool. To apply external pattern

1

Page 4: TheD-Boxcoordinationallanguageandthe run-timesystemtnkcs.inf.elte.hu/vedes/Hernyak_Zoltan_Tezisek_en.pdf · codes are read, parametered from external pattern files, on the basis

files during code-generation, that can be worked out in case of other platform,middleware as well. The code generating tool should parameter the patternswith types, and to create a Clean language source code. To work out the macrosapplied by the code-generating tool, and with the help of which the patterns canbe created. To create concrete pattern files, with the help of which a code canbe generated in case of some operation system and middleware.

3. To create the run-time system completing the services of the middleware, theinformal syntactic and semantic description of the functions of API. These func-tions can be referred to in the pattern files. This system should be suitablefor preparing the running of projects generated on the basis of D-Box languagedescription, and to support it during running.

4. To test the environment created previously, to run concrete distributed compu-tational tasks, to measure effectiveness, performance.

Antecedents of topic choice

In the first phase of my research I have dealt with the MPI/PVM programme librariesunder linux, using message passing communication method. However the complexity,weak typedness of C, C++ programming languages, is not a favorable environmentfor preparing distributed programmes, not to mention its cumbersome debugging andtracking methods.

Later, based on the possibilities offered by the .NET Framework I have prepared animplementation built on similar guidelines, for an O.O.P. platform, in C# language.The implementation was worked out on a pure C# language, and the communicationwas assured by a low-level socket handling, but in essence it was a class-collectionfurnished with communication-supporting methods. I have prepared a simple run-timesystem to it, and this was later the basis of the D-Box run-time system discussed inthe present thesis.

The Clean functional programming language, due to its many characteristics, isan attractive programming platform. Unfortunately the Clean does not support thedistributed running at the moment. It seemed necessary to develop a higher-levelprocess-description and coordination system. To serve this aim, Zsók Viktória, theresearcher of ELTE, has designed the D-Clean language with the help of which com-munication could be described by means similar to functional language expressions. Ihave joined the research at that point. The design of D-Box language was a cooperation.In this research I have prepared the D-Clean translator, the syntax and semantics ofD-Box language, the D-Box code-generating, run-time system I have also implemented

2

Page 5: TheD-Boxcoordinationallanguageandthe run-timesystemtnkcs.inf.elte.hu/vedes/Hernyak_Zoltan_Tezisek_en.pdf · codes are read, parametered from external pattern files, on the basis

the vast majority of D-Box pattern files. It was also me who have prepared an ear-lier implementation of D-Box protocol templates, that had to be reconsidered due tothe SplitF protocol introduced later. The present protocol implementations in Cleanlanguage were prepared by a researcher from ELTE, Diviánszky Péter.

According to our solution, the functions created by the user do not have to containany information related to communication or the run-time system. The generated codecompletely hides this from the Clean functions. This way the development and testingof the functions under traditional environments, and its transfer to the distributedenvironment without any altering becomes possible.

Antecedents of the research topic

From among the functional programming languages the Concurrent Haskell possessinga parallel evaluation system was studied. The JoCaml language is the supplement ofthe Objective Caml language with the help of the join calculus, and it has a specifi-cally parallel and distributed development support. It knows the concept of channeland abstract place. Communication among the processes are done through channels.The ERLANG makes the creation of concurrent, real-time, distributed and problem-tolerating systems. The language possesses inbuilt means on the domain of distribut-ing and message transmission, it fulfils the distribute without a common memory area,through sending messages. The Hume is a strongly typed, functionally founded pro-gramming language, its primary priority is to check the limitedness of running time anduse of resource. It applies an asynchronous communication model, the basic elementof which is the box. The boxes have unique identifiers, the in- and outgoing data aretyped. The boxes can be connected, the description of which happens separate fromthe definitions - during this it will be checked whether the connection type is correct.The boxes can be connected through devices (stream, port, etc.) as well. The boxescan might as well be connected to themselves, forming a one-element loop. Connectionamong the boxes can be called wires. Starting values can be placed on the wires, thatare useful in case of creating the loops. To the Eden functional programming languageJost Berthold has created an implementation language called EdI, in which he definedchannel-creating and data-sending primitives. Due to the lazy evaluation of the Eden,he has defined evaluation strategies, that can force the computing (and sending) of thevalue on the sending side.

Through completing the ObjectIO library developed for the Clean functional prog-ramming language interactive programmes can be developed, that contain a menuand dialog windows. The unique type makes it possible to handle resources from apure functional approach. A such kind of unique type value can not be duplicated.

3

Page 6: TheD-Boxcoordinationallanguageandthe run-timesystemtnkcs.inf.elte.hu/vedes/Hernyak_Zoltan_Tezisek_en.pdf · codes are read, parametered from external pattern files, on the basis

The porting of ObjectIO in a Linux environment is not complete, so it can not bebuilt upon in heterogeneous systems. The Concurrent Clean was a purely functional,strongly type language, that supported parallel and distributed evaluations as well.Unfortunately this extension of the language could not keep pace with the languageversions, its development has stopped. In the early phase of Clean there also was atransputer supported language version. With the help of annotations it was possible toset the parts of the expression that can be evaluated in a parallel way. The parallelingstrategies were based on these annotations, with their help it was possible to set anevaluation order.

Applied methods

The syntax of the D-Box coordination language is set by EBNF description and lex+ yacc definitions as well. The rules essential to the analysis of the static syntac-tic properness were formerly described. The operation of the D-Box protocols wasspecified. The operation of the run-time system was given in the form of a natural op-erational semantics. The created D-Box lexical and semantic analyzer is a programmegenerated not on the basis of BNF, written on C language. A C++, C, Clean languagecodes are read, parametered from external pattern files, on the basis of informationincluded in D-Box definitions. This is solved with the help of macros built into thepattern files. The Start expression of the boxes is created by the D-Box on the basisof code-generating non-patterns - the code strongly dependent on the applied protocoland channels is generated immediately by the translator.

The syntactic properness of the structure built up from lexical elements is checkedby a not purely LALR(1) syntactic analyzer. The attached static semantic analyzeroperates in a separate run, similar to the code-generating run. We do not performcode-optimizing steps, as it is not important in the code of the channels, the code ofthe boxes is generated in Clean language, and the code-optimization is done by theClean translator and lazy evaluation system. The inter-layer services of the runningsystem are extended by special services essential to the running of the project. ICEfunctions are called from the Clean code through interfaces. The source of the interfaceis in the pattern files as well.

Proposition 1.

The syntax and static semantics of D-Box language

I have created a coordination language through which a kind of computing graph can bedefined in which applications written on a Clean functional programming language are

4

Page 7: TheD-Boxcoordinationallanguageandthe run-timesystemtnkcs.inf.elte.hu/vedes/Hernyak_Zoltan_Tezisek_en.pdf · codes are read, parametered from external pattern files, on the basis

running on the computing nodes. The coordination language supports special languageelements as well, such as the *World type restorable value. These value types can notbe carried through channels, a replacing value can be generated on the host side.

The syntax was given in an EBNF form and in a form that can be processed bylexer and yacc. During the static semantic check the channels of the box and theapplied input protocol; the output protocol and the output channels were checking.It was analyzed, whether the protocol was able to create the input parameters of theapplied expression, and whether it is able to handle the output values. Furthermore itwas analyzed whether each input and output channel is used only once and whetherthe dynamic sub-graph startups can be fulfilled during running. Above all these it wasalso analyzed, whether the communication between the boxes is closed concerning thesub-graph.

Proposition 2.

Specification of D-Box language primitives

I have introduced the syntactical and static semantic rules of the coordination language,on the grounds of which I have created an operating syntactical checking programme.Based on the Windows operation system the ICE middleware a template collection wasalso created and attached to the thesis as a DVD supplement. The names of templatefiles and the library structure was put into a XML configuration file on the basis ofwhich the code generating phase is more easy to parameter.

The specification of the coordination language discusses the operation of protocols,the serialization of more difficult structures, lists, lists of lists, and the steps of writingon a channel. We have given the reading of channels, the de-serialization of a symbol-serial arriving on the channel, the specification related to the compilation of protocolresults. The operation of the input and the output protocols was precisely defined thisway.

The operation of the protocols is based on the lazy evaluation of Clean language,instead of the parallel way. The processing starts with the output protocol, that causesthe evaluation of the expression. The expression takes away the incoming data from theinput protocol in a lazy way, this way causes the implicit reading of input channels. Thegenerated data are put on the channels by the output protocol. The D-Box translatortakes the definitions describing the computation graph either from a file, or directlyfrom the D-Clean translator. The latter is the result of the integration of the twotranslators. After the evaluation of D-Box definitions and their static semantic checka resource code was generated on the basis of the pattern files. For the support ofcode-generation, macros were developed, the description of which are included in the

5

Page 8: TheD-Boxcoordinationallanguageandthe run-timesystemtnkcs.inf.elte.hu/vedes/Hernyak_Zoltan_Tezisek_en.pdf · codes are read, parametered from external pattern files, on the basis

attachment of the present thesis.The Clean Start function of the generated computation nodes (box) are created on

the basis of non-patterns by the translator, as basically its every line is up to the type,number of applied channels, and the protocol. We use the C++ linker - as we couldnot apply the Clean linker to the task. C++ language object files are also attached tothe codes of the box during the linking.

Proposition 3.

Formal semantics and implementation of the run-time

system

The thesis describes the state of the run-time system and the operational semantics.During the running, from a starting state boxes forming the spine of the project arelaunched, and the channel starting commands are processed. In the meantime newboxes are dynamically started and new channel starting commands can be part of thesystem. In the project finish state the start of every box is finished and every channelstart command was fulfilled. The run-time system places the generated binary codeinto a code library service, on the basis of which the project can be started. During this,the system first instantiate the boxes belonging to the beginning sub-graph. After theirstarting, the boxes can ask for the starting of channels and the instantiate of boxesbelonging to further sub-graphs. A name provider ensures the meeting of runningcomponents. The run-time system contains a scheduler, that defines which componentshould be put on which concrete computer.

Proposition 4.

Verification of the applicable of the D-Box system on

a problem class

Translation, code generation and running was checked through fulfilling a real-life,computation-intensive problem. Measures and charts related to the distributed runningof the application prove that the generated code is effective in this problem class. Therunning time of the distributed case using 4, 8, 16 nodes, cutting point alteration,operational complication approximated to the expected maximal speed-up value.

6

Page 9: TheD-Boxcoordinationallanguageandthe run-timesystemtnkcs.inf.elte.hu/vedes/Hernyak_Zoltan_Tezisek_en.pdf · codes are read, parametered from external pattern files, on the basis

Bibliography

[1] William Gropp, Ewing Lusk, Anthony Skjellum: Using MPI - Portable ParallelProgramming with Message-Passing Interface MIT Press, 1999

[2] Al Geist, Adam Beguelin, Jack Dongarra, Weicheng Jiang, Robert Manchek,VaidySunderam: PVM: Parallel Virtual Machine – A Users’ Guide and Tutorial forNetworked Parallel Computing , MIT Press, 1994, http://www.netlib.org/pvm3/book/pvm-book.html

[3] Kevin Hammond, Greg Michaelson, Robert Pointon: The Hume Report, version1.1, http://www-fp.cs.st-andrews.ac.uk/hume/report/

[4] Jost Bertold: Explicit and Implicit Parallel Functional Programming: Concepts andImplementation, PhD Disszertáció, 2008, Marburg.

[5] Jones, S. P., Gordon, A., Finne, S.: Concurrent Haskell, Conference Record ofPOPL ’96: The 23rd ACM SIGPLANSIGACT Symposium on Principles of Pro-gramming Languages, Glasgow, 1996, 11 pp.

[6] Finne, S. and Jones, S., P. J.: Concurrent Haskell, In Principles Of ProgrammingLanguages, St. Petersburg Beach, Florida, 1996, pp. 295-308

[7] Fournet, C., Le Fessant, F., Maranget, L., Schmitt, A.: The JoCaml language betarelease, Documentation and user’s manual, INRIA, 2001.

[8] Leroy X. et al. The Objective Caml Language (version 3.10). Software and docu-mentation, available at http://caml.inria.fr, 2007.

[9] J. Barklund and R. Virding. Erlang Reference Manual, 1999. Available from http:

//www.erlang.org/download/erl_spec47.ps.gz. 2007.06.01

[10] Kesseler, M.H.G.: The Implementation of Functional Languages on Parallel Ma-chines with Distributed Memory, PhD Thesis, Catholic University of Nijmegen,1996.

[11] Serrarens, P.R.: Communication Issues in Distributed Functional Computing,Ph.D. Thesis, University of Nijmegen, January 2001.

7

Page 10: TheD-Boxcoordinationallanguageandthe run-timesystemtnkcs.inf.elte.hu/vedes/Hernyak_Zoltan_Tezisek_en.pdf · codes are read, parametered from external pattern files, on the basis

[12] Horváth Z., Zsók V., Serrarens, P., Plasmeijer, R.: Parallel Elementwise Process-able Functions in Concurrent Clean, Mathematical and Computer Modelling 38,pp. 865-875, Pergamon, 2003.

[13] Horváth Zoltán, Hernyák Zoltán, Zsók Viktóra: Coordination Language for Dis-tributed Clean, Acta Cybernetica (ISSN: 0324-721 X), Vol. 17 (2), Institute ofInformatics, University of Szeged, Szeged, Hungary, 2005, pp. 247-271. Selectedpublication of CSCS PhD Conference in Computer Science.

[14] Achten, P., Wierich, M.:A Tutorial to the Clean Object I/O Library, Universityof Nijmegen, 2000. http://www.cs.kun.nl/~clean

[15] Plasmeijer,R.-van Eekelen,M.: Functional Programming and Parallel GraphRewriting, Addison-Wesley, 1993.

[16] [EeNoPlSm90] van Eekelen,M. et al.: Concurrent Clean, Technical Report no 90-20, November 1990, University of Nijmegen.

[17] Plasmeijer, R., van Eekelen, M.: Concurrent Clean Language Report, Universityof Nijmegen, 2001.

[18] Hernyák Zoltán: PEDPI as a Message Passing Interface with OO support,in: Striegnitz, Jörg; Davis, Kei (Eds.) (2003) Proceedings of the Workshop onParallel/High-Performance Object-Oriented Scientific Computing (POOSC’03), In-terner Bericht FZJ-ZAM-IB-2003-09, Juli 2003, pp. 93-100.

8

Page 11: TheD-Boxcoordinationallanguageandthe run-timesystemtnkcs.inf.elte.hu/vedes/Hernyak_Zoltan_Tezisek_en.pdf · codes are read, parametered from external pattern files, on the basis

List of publications

Referred publications

1. Horváth Zoltán, Hernyák Zoltán, Zsók Viktóra: Coordination Language for Dis-tributed Clean, Acta Cybernetica (ISSN: 0324-721 X), Vol. 17 (2), Institute of Infor-matics, University of Szeged, Szeged, Hungary, 2005, pp. 247-271. Selected publicationof CSCS PhD Conference in Computer Science.

2. Horváth Zoltán, Hernyák Zoltán,Kozsik Tamás, Tejfel Máté, Ulbert Attila: A DataIntensive Application on a Cluster - Parallel Elementwise Processing, in P. Kacsuk,D. Kranzlmüller, Zs. Nemeth, J. Volkert (Eds.): Distributed and Parallel System -Cluster and Grid Computing, Proc. 4th Austrian-Hungarian Workshop on Distributedand Parallel Systems, Kluwer Academic Publishers, The Kluwer International Seriesin Engineering and Computer Science, Vol. 706, pp. 46-53, Linz, Austria, September29-October 2, 2002.

3. Zsók Viktória, Hernyák Zoltán, Horváth Zoltán: Designing Distributed Com-putational Skeletons in D-Clean and D-Box, in.: Lecture Notes in Computer Science,Horváth Zoltán(ed.) in.: Central European Functional Programming School (The FirstCentral European Summer School, CEFP 2005, Budapest, Hungary, July 4-15, 2005),Revised Selected Lectures. ISSN 0302-9743, vol. 4164, 2006, pp. 229-265.

4. Zsók Viktória, Hernyák Zoltán, Horváth Zoltán: Distributed Pattern Design inD-Clean, Central European Functional Programming School, CEFP 2005, ELTE, Bu-dapest, Hungary, July 4-15, 2005, Lecture Notes, 33 pages

5. Zsók Viktória, Hernyák Zoltán, Horváth Zoltán: Improving the Distributed Ele-mentwise Processing Implementation in D-Clean, In: Horváth Z., Kozma L, Zsók V.(eds): Proceedings of the 10th Symposium on Programming Languages and SoftwareTools (ISBN: 978-963-463-925-1), SPLST 2007, Dobogókő, Hungary, June 14-16, 2007,Eötvös University Press, 2007, pp. 256-264.

6. Zsók Viktória, Hernyák Zoltán, Horváth Zoltán: Distributed Pattern Design inD-Clean, Vene V., Meriste M.(ed.) in.: Proceedings of the Ninth Symposium on Pro-gramming Languages and Software Tools, ISBN: 9949-11-113-7, SPLST 2005, Tartu,

9

Page 12: TheD-Boxcoordinationallanguageandthe run-timesystemtnkcs.inf.elte.hu/vedes/Hernyak_Zoltan_Tezisek_en.pdf · codes are read, parametered from external pattern files, on the basis

Estonia, 13-14 August, 2005, Tartu University Press, 2005, pp. 220-234.

Publications in referred proceedings

7. Zsók Viktória, Hernyák Zoltán, Horváth Zoltán: /Distributed Computation onCluster using D-Clean and D-Box. Extended abstract In: Davis, K., Quintino, T.,Striegnitz, J. (eds): 5th Workshop on Parallel/High Performance Object-OrientedScientific Computing, POOSC’06 at 20th European Conference on Object-OrientedProgramming, ECOOP 2006, Nantes, France, 3rd July, 2006, 3 pages. Summary:Object-Oriented Technology, ECOOP 2006 Workshop Reader, ECOOP 2006 Work-shops, Nantes, France, July 3-7, 2006, Final Reports, LNCS 4379, Springer Verlag,2007, pp. 141-145.

8. Horváth Zoltán, Hernyák Zoltán, Zsók Viktória: Implementing Distributed Skele-tons using D-Clean and D-Box, In: Butterfield, A. (ed): Proceedings of the 17thInternational Workshop on Implementation and Application of Functional Languages,IFL 2005, Dublin, Ireland, September 19-21, 2005, pp. 1-16.

9. Hernyák Zoltán, Horváth Zoltán, Zsók Viktória: Clean-CORBA Interface Sup-porting Pipeline Skeleton, Csőke Lajos(ed.) in.: Proceedings of 6th International Con-ference on Applied Informatics, Eger, Hungary, January 27-31, 2004. Eger, Hungary,B.V.B. Press, Vol. I. pp. 191-200.

Publications in international conference proceedings

10. Zsók Viktória, Horváth Zoltán, Hernyák Zoltán: /Distributed Elementwise Pro-cessing in D-Clean, In: Nilsson, H. (ed): Proceedings of the Seventh Symposium onTrends in Functional Programming, TFP 2006, Nottingham, UK, 19-21 April, 2006,The University of Nottingham, pp. 378-386.

11. Hernyák Zoltán, Horváth Zoltán, Zsók Viktória: Design of Language Elements forDynamic Distributed Computation of Clean Expressions on Clusters, in: Loidl, H-W.(ed): Proceedings of Fifth Symposium on Trends in Functional Programming, TFP2004, Munich, Germany, November 25-26, 2004, Ludwig-Maximilians University, pp.257-270.

12. Hernyák Zoltán: PEDPI as a Message Passing Interface with OO support, in:Striegnitz, Jörg; Davis, Kei (Eds.) (2003) Proceedings of the Workshop on Parallel/High-Performance Object-Oriented Scientific Computing (POOSC’03), InternerBericht FZJ-ZAM-IB-2003-09, Juli 2003, pp. 93-100.

10