Obj Parallel on Heterogeneous Workstation Clusters Using...

Distributed Obj ect-Oriented Parallel Computing on Heterogeneous Workstation

Clusters Using Java

Meijuan Shan

A thesis subrnitted to the Faculty of Graduate Studies

in partial fùlfillment of the requirements

for the degree of

Master of Science

Graduate Programme in Computer Science

York University

Toronto, Ontario

July 1999

National tibrary BiMibthèque nationale du Canada

Acquisitions and Acquisitions et Bibliographie SeMces senrices biûliographiques 395 Wdlingîm Street 395, nm WeKmgtm Ottawa ON KYA ONl -ON K I A W canada Canada

The author has granted a non- exclusive Licence allowing the National Library of Canada to reproduce, loan, distribute or sell copies of this thesis in microform, paper or electronic formats.

The author retains ownership of the copyright in this thesis. Neither the thesis nor substantiai extracts fiom it may be printed or otherwise reproduced without the author's permission.

L'auteur a accordé une licence non exclusive permettant à la Bibliothèque nationale du Canada de reproduire, prêter, distribuer ou vendre des copies de cette thèse sous la forme de rnicrofiche/nlm, de reproduction sur papier ou sur format électronique.

L'auteur conserve la propriété du droit d'auteur qui protège cette thèse. Ni la thèse ni des extraits substantiels de celle-ci ne doivent être imprimés ou autrement reproduits sans son autorisation.

Distributed, Object-Oriented, ParalleI Computing on Heterogeneous Workstation Clusters using Java

~ b~ Meijuan Shan

a thesis submitted to the Faculty of Graduate Studies of York Univefsity in partial fulfillment of the requirements for the degree of

Master of Science

Permission has been granted to the LI9RARY OF YORK UNIVERSITY to lend or seIl copies of this thesis. to the NATIONAL LIBRARY OF CANADA to microfilm this thesis and Co iend or seIl copies of the film, and to UNIVERSW WCFiûFiWS ta pu#ish an abstrad of mis thesis- The author reserves other publication nghts, and neither the thesis nor extensive extracts f r m it may be pnnted or otherwise reproduced w i h t the author's written permission.

Abstract

Unlike stand-alone workstations, computing on distributed n e ~ o r k s of workstation clus-

t ea requires dealing with different types of heterogeneity, such as architecture, computa-

tionai speed and network load, etc. Java, introduced by Sun, is designed specifiçally for

secure, disaibuted, network computing. Besides the platform independent bytecode, Java

also provides some basic mechanism for concurrency at the language level such as multi-

threading and synchronization. In ttiis thesis, we explore the use of the mechanisms avail-

able in the Java system to buiid a software hfkstructure for distributed shared memory

parallel computing among distributed, heterogeneous networks of workstations.

Our exploration begins with Java's multithreading feature. Java supports multiple threads

of control on a single workstation. In order to achieve parallelism, threads in a Java appli-

cation shodd be created either on different machines or created on one machine and then

dispatched to different machines. In our exploration, we have successfÙUy executed

ferent threads in the user's Java program on different machines. M e r experimentation

with Java's multithreading feature, we have explored several alternatives for data consis-

tency which include update detection and update propagation. We also explored some

alternatives for creating distributed shared objects.

The results of our exploration show that there are several alternatives for achieving paral-

lelism through extensions to the Java system. Some of the alternatives require extensions

to the existing Java compiler and interpreter; some of them require a new APL in the last

part of this thesis, we describe the design and partial implementation of a parallel comput-

ing system based on Java that we c d Paraileljava. Paralleljava system makes extensions

to both the Java API and the Java nintime system. It uses a data consistency mode1 similar

to entry consistency and implements an update-based coherence protocol.

To my parents,

who have dedicated to education more than 40 years and who have been my mentors since

1 was bom.

Acknowledgment

1 am most indebted to my parents who have been my mentors shce 1 was bom, who have

been nourishing me with their encouragement and wisdom, who require so Littie and give

so much. Words alone is not enough to express my feelings for them.

Thanks to Professor Hainder Sin& Sandhu for supervising this thesis and for his wis-

dom guidance. Thanks to other professors in my thesis committee, Eshrat Arjomandi,

Rich Pagie and Rene Fournier for taking the time reading the thesis and providing feed-

backs.

1 am very grateful to my husband Yang. Thanks him for his moral suppon and his huge

amount of work for helping me finishing this thesis. No word can express my gratitude to

1 am indebted to rest of my families: grandpa, uncles, aunts, Uiiyan, Zhichao, Meicha,

Lili, Baoli, Xiaohua, Xining, Xinggang, Feifei, Xiaoxiao, Taoshan. Thanks to al i of them

for givhg me constant support for so many years and thanks them for making my life

c o l o d .

Thanks to other people in our computer science department: Professor Jenkin, Patricia,

Professor Ammatides, Ulya, Lisa. Thanks thern for their helps. And I also want to thank

some of my fiiends, Ben, Arhie and Jason for their helps during the fust few years I came

Table of Contents

Chapter 1 Introduction

1.1 Motivation ................................................................................................................ 1

................................................................................... 1 -2 Parailel Computing in Java -3 ............................................................................................. 1 -3 Paralleljava Overview 4

.................................................................................................. 1 -4 Thesis Contribution 5

Chapter 2 Background .................................................................................... 2.1 Distributed Shared Memory -7

2.1.1 Distributeci Shared Memory Systems ................................................................................ 8 2.1 -2 Issues in Designing a DSM system .................................................................................... 10

................................................................. 2.2 Distributed Object-Onented Paradigm -22 2.2.1 Some Key Issues in Designing Distributed Object-Otiented Systems .............................. 23 2.2.2 Other Issues in Designing Distributed Object-Oriented Systerns ...................................... 26

2.3 Java ......................................................................................................................... 27 2.3.1 The Java System Architecture and the Java Rogramming Architecture ....................... -27 2.3.2 Multithreading and Synchronization in Java ..................................................................... 30 2.3.3 Memory Management in Java ..................................................................................... 3 1

2.4 Summary ................................................................................................................ 32

Chapter 3 Parallel Computing in Java 3.1 Issues in Developing DSM Systems within Java .................................................. 34

................................................................... 3 -2 Using Multithreading for Parallelism -36 3.2.1 The Creation of the Thread Objects at the Java Language Level ..................... .. ............... 36

.................................................... 3.2.2 Support for Threads at the Java Runtime System Level 38 3.2.3 Dispatching Threads to Different Machines in the Runtime System ................................. 39

....................................................................................... 3.3 Creating shared objects 42 3.3.1 The New Operator in Java .................................................................................................. 42 3.3.2 Extending the Functionaiity of the New Operator ............................................... ........... 43

3.4 Exploring Memory Consistency and Coherence ................................................... 46 3.4.1 Update Detection .............................. .... ..................................................................... 47 3 .4.2 Update Propagation ........................................................................................................ 53

3.5 Summary .............................................................................................................. 61

Chapter 4 The Paraiieljava System 4.1 The OveMew of Paralleijava System .............. .. ................................................ 63

............................................. 4.2 The S ystem Architecture of the Paralleijava System 65 4.2.1 The System Architecture of the Paralleljava System ......................................................... 66

............................................... 4.2.2 The Functionaiity of each iayer in the ParaIleljava System 68

............................................. 4.3 Dynamic Network Class File Loadhg and Security -70 ................................................................................. 4.3.1 Dynamic Class File Loading in Java. 70

....................................................... 4.3.2 Dynamic Netwotk Class File Loading in Paraileijava 72 4.33 Security .............................................................................................................................. 75

............................................ 4-4 lmplementation of data consistency in Paralleijava ...7 6 4.4.1 Updaîe Detection . Collection ............................................................................................. 77 4.4.2 Update Ropagation ............................................................................................................ 78 4.4.3 ï h e Shared ûbject in Paralleljava ...................................................................................... 79

Chapter 5 Conclusions and Future Work .......................................................................................................... 5.1 Conclusions 86

................................................................................................... 5.2 Future Researc h -88

.................................... Appendix A The Queue Class ... ..A

......................................................................... Appendix B The LockClass 91

Figure 4.8. The Data Consistency Mode1 inside the Shared Object ........................... 8 1 ................................................ Figure 4.9. Sample Client Code ...................... ......... 82

Figure 4.10. Sample Server Code ................................................................................... 83 Figure 4.1 1 : The Queue Class ..........................................,......,.......... ....................... -84 Figure 4.12. The Lock Class ............................................................... ........................... 85 Figure 4.13. SharedObject vs . RMI ................ ..,.... ......... .. ..................................... 85

Chapter 1

Introduction

1.1 Motivation

With increasing fiequency, networks of workstations are king used as parallel cornputers.

High speed general-purpose networks and very powerfid workstation processors have nar-

rowed the performance gap between workstation clusters and supercornputers. Further-

more, the workstation approach provides a relatively low-cost, low-risk entry into the

parallel computing arena. In ternis of performance, irnprovements in processor speed, net-

work bandwidth and latency allow networks of workstations to provide performance

approaching or exceeding supercornputer performance for an increasing class of applica-

tions. In terms of cost, many organizations already have installed workstation bases and

no special hardware is required to use this facility as a parallel cornputer. The resulting

system can easily be maintained, extended and upgracied.

On the other han& computing in a network is not like in an MPP (Massively Parailel Pro-

cessor), in which aii processors have exactly the same capability, resources, software, and

communication speed. The computers available on a network may be h m different ven-

dors or have different operating systems or compilers. Therefore, the software supporting

network computhg must cope with different types of heterogeneity, such as architecture,

data format, computational speed, machine load and network load, etc.

Java is designed specificdly for secure, dis tr ibw Web-based applications. A Java pro-

gram is compiled into an intermediate code, caiied bytecode, which is independent of the

hardware architecture and the operating system. Any platform with the Java Viaual

Machine ported to it can run Java bytecodes without modification. Thus, developing

applications using Java brings about software that is portable across multiple machine

architectures, operating systems, and graphical user interfaces.

Java also provides some basic mechanisms for concurrency such as multithreading and

synchronization at the language level. Using Java, users may program with multiple

threads of control in a single program, but aü of those threads get executed on a single

machine. Like any other interpreted language, Java bytecode must h t be changed to exe-

cutable machine code by the Java Virtual Machine and then executed. This makes the exe-

cution of Java programs slower compared with equivalent programs Wfitten in C or C H .

However, distributing those threads within a Java program across a network of computers

can potentially alIow the program to execute faster.

In this thesis, we explore the mechanisms available in the Java system to build a software

infr;istnicture for parallel computing among distributed, heterogeneous networks of work-

stations. Our exploration ranges fiom extending the fiinctionality of the Java API (Appli-

cation Programming Interface) to the Java Viriual Machine. We have also designed and

partially implemented a new software parallel computing systern called Paralleljava Par-

alleljava provides extensions to the Java API and the Java Virtual Machine. It d o w s Java

applications utilizing the Java muitithreading facilities to be executed in parallei on dis-

û-ibuted, heterogeneous networks of workstation environments.

1.2 Paraiiel Computing in Java

The primary objective of this thesis is to explore the mechanisms existing in the Java sys-

tem to impiement parallelism among distributed, heterogeneous networks of workstations.

The Java system, consisting of the Java programming Ianguage and the Java runtime sys-

tem, may be extended in a variety of ways to achieve parallelism. The obvious approach is

to extend both the Java API and the Java mutirne system to provide a complete parallel

programming environment. However, it is possible to achieve pardelism in Java by

extending the Java API (with a set of paraiiel classes) without modifjhg the Java runtirne

system. Alternatively, since Java already provides some basic mechanisms for concur-

rency, such as multithreadiog and synchronization, it may also be possible to modw the

Java runtime system to achieve parallelism without extending or m o m g the Java pro-

gramming environment.

The suitability of each of these approaches to parallelism within the Java fiamework

depends on the nature of the target computing environment. In this thesis, we consider

both the above approaches as well as one that is a mix of both alternatives. nie explora-

tion starts nom mdtithreading, the basic mechanism for concurrency in Java. Java sup-

ports multiple threads of execution, but threads are created and executed on a single

machine. A thread in Java is created by instantiating the Thread class and started by

invoking the start method of the Thread class. In order to implement parallelism, different

threads should either be created on different machines at the beginning or be created on

one machine and dispatched to different machines when invoking the s t a method. In the

frrst case, we need to extend the Java API, whereas, in the latter case, we have to mod*

the Java runtime system. These WU be discussed M e r in chapter 3.

Along with creating merent threads on different machines, the key issues in developing

parallel computing in Java are updoe detection (detennining when some shared data have

been modified) and update propagation (transmitting the updates to that s h e d &ta to

other machines). In chapter 3, we consider alternatives for each of these issues.

1.3 Paraueljava Ovemew

After looking into above alternatives, we settled on one set of the alternatives and impie-

mented it in a system we c d Paralleljava. Paralleijava is designed to utilize the mecha-

nisms for parallel computation existing within the Java system. It extends both the Java

API and the Java runtime system to achieve parallelism in the Java system. The Java API

is extended to support a shared objects model in Paralleljava. The shared object is based

on the object model in the Java system, The instantiation of the shared object class in a

user program provides a unified object model across various platform. The extension of

the Java r u t h e system presents users with the illusion of distributed shared memory.

1.4 Thesis Contribution

The introduction of Java, with its multithreading and platfiorm-independent features, has

sparked considerable interest among the distributed and parallel programming c0mmm.i-

ties. Many research projects have been proposed or are currentiy on going. A key part of

the work presented here was to look into the Java system and to exam how parallelism

could be made to fit into it. Some issues we considered for parallel computing in Java

included creating parallelism by using Java's multithreading features, sharing data among

different processors, and keeping shared data on different machines consistent. In the pro-

cess of addressing these issues, we designed and partially implemented an experimental

system we call Paralleljava

Up to ~ O W , various software systems have been proposed and built to support paralle1

computing on workstation networks, using either Distnbuted Shared Memory (DSM) or

Message Passing interface (MPI). Paralleljava works much like conventional distributed

shared memory systems, which present viaual shared memory to a group of workstations,

though the workstations physicaliy do not share memory. What distinguishes Paralleljava

fiom previous distrïbuted shared memory systems is that Paralleljava is a Java-based,

Web-optimized pardel computing system. By taking advantage of features such as plat-

form-independence and mdtithreading existing in the Java system, Paralleljava allows cli-

ents to dowdoad and thereafler execute in parallel a single Java application on networks

of workstations. The clients can also automatically upload and execute prograrns on

remote computing servers. The computing servers c m be any cornputers within the net-

work with Java system ported. The program is automaticaily uploaded and executed on a

computing semer, and results are returned to the client. In the case of a parallel applica-

tion, the client may upload code to many heterogeneous computing servers throughout the

Intemet.

This thesis explores mechanisms for parailel computing in Java and presents the design

and the partial development of the Pardeljava system. Chapter 2 gives an overview of

distributed, object-oriented computing and a bnef introduction of the Java system. Chap-

ter 3 presents alternatives for parallel computing that utiLize the mechanisms existing in

Java. Chapter 4 describes the design and the partial implementation of the Paralleljava

system. Chapter 5 presents the conclusions of this thesis and discusses fûture work in

developing Java pardlel computing.

Chapter 2

Background

The objective of this thesis is to explore the use of the mechanisms in the existing Java

system to build a software idiastmcture for executing seriai or mdtithreaded Java pro-

grams on faster computing servers or on a collection of possibly heterogeneous h o a . The

mechanisms we are using apply concepts fiom existing object-onented, distributed shared

memory systems and the Java system. in this chapter, we will review some of these con-

cepts. Section 2.1 gives a brief introduction to distributed shared memory systems. Section

2.2 describes the prirnary issues in the design of distributed object systems. Section 2.3

introduces the Java system together with its features. Section 2.4 surnmarizes this chapter.

2.1 Distributed Shared Memory

Parallel computing on networlcs of workstations genedy fds into two categories: Mes-

sage Passïng and Distributed Shared Memory ('SM). The Message Passing mode1 uses

primitives such as send and receive for interpmcess communication. The DSM model pro-

vides processes in a system with a shared address space in which data is accessed with

read and wite operations. One of the advantages of DSM over Message Passing is that it

presents users with a unified memory model. Using a DSM system, users need not wony

about the ciifferences between remote and local memory access. In the next section, we

briefly review some basic concepts of a DSM system as weii as the key design issues.

2.1.1 Distributed Sbared Memory Systems

A Distributed Shared Memory (Figure 2.1) system is a software system which is built to

support parallel computation on networks of workstations. Paraileljava, the syaem

described in this thesis, takes existing DSM concepts and builds them into the Java hme-

work. Consequently, we spend some time in this section dearibing some of the basic

DSM concepts, issues, and descnbing some of the DSM systems that have been built in

the past The idea b e b d DSM is to try to emulate the cache of a multiprocessor using

operating system software or runtime library routines.

1 Software hplementation Layer 1

Shared Memory

Figure 2.1: Distributcd Sbareà Mcmory

As shown in Figure 2.1, the workstattions physically do not share memory, but the Sofi-

ware Implementation Layer between the processors and the memones presents the illusion

of shared memory. All memory accesses are controlled by the software implementation

layer. In a DSM system, dl remote memory accesses behave Iike local memory accesses.

This relieves the programmer fiom worrying about remote memory access in developing

p d l e l applications 11 7 [14].

Besides the ease of programming, DSM systems provide the same programming environ-

ment as that on shared-memory multiprocessors. Applications developed for a DSM sys-

tem can be easily ported to a shared-memory multiprocessor dthough porting an

application developed for a shared-memory multiprocessor to a DSM system may need

some modifications to the program due to the higher latencies in a DSM system [lq-

2.1.2 Issues in Designiag a DSM system

The major issues in designhg a DSM system are: grmufarity, memory consistency and

coherence. In this section, we give a brief overview of these issues.

Granularity

Granularity refers to the size of the memory unit at which data is s h e d between the pro-

cessors. According to gmoularity, DSM systems can be categorized as page-based DSM

systems and region-based DSM systems. Page-based DSM systems (such as ZW

[ 1 71 [9] [2 11, TreadMarks [ 1 4, Brazos [3 O], CVM [30] and Quarks 1301) take a normai Lio-

ear address space and allow the pages to migrate dynamically over the network on

dernaad. Page-based DSM systems exploit the existing virtual memory hardware and

operating system available on most common architectures. In a page-based DSM system,

each node in the system keeps a copy of each shared memory page. When any of the nodes

modifies its own copy of the shared memory page, al1 the other nodes will set their copies

to be invalid Any accesses to the invalid pages will cause a vimial memory page fault.

Since page-based DSM systems use operating system pages as the sharing unit, compilers

do not need to be changed and DSM systems themselves are transparent to the user's pro-

gram. One of the disadvantages of page-based DSM systems is fake sharing. Since the

page size in a DSM system is k e d , different data items accessed by different processors

might end up king ailocated in the same page. In such circumstances, the system will

generate coherence trafic between these processors (perhaps by repeatedly transmitting

the page back and forth between them) even though the processors are actually accessing

dinerent portions of the page and are not sbaring any data. For example, suppose two dif-

ferent data items dl and d2 are on the same page? and that processor PI references oniy

dl, processor P2 references only d2. When PI updates dl, the memory page containhg dl

and d2 has to be sent to P2 even though d2 is not updated by PZ. Besides false shmhg,

irnplementation of page-based DSM systems are also architecture dependent due to use of

the vimial memory page protection mechanism.

In region-based DSM systems (such as Clouds[27], Midway[9 ] [2 11, Munio[2 11 and

ABCt+[lq [12]), only certain variables and data structures needed by more than one pro-

cessor are shared. These shared variables or data structures are put into critical sections

guarded by synchronization objects. Each shared data or variable is referenced as a region.

Unlike page-based DSM systems which use operating system pages as the sharing unit,

the sharing unit in region-based DSM systems is a region- Regions are chosen by the

application and can be of arbitrary size. In the implementation of a shared region-based

DSM system, a synchronization object is bound to a specific shared data, when a proces-

sor acquires the synchronization object, the data that is bound to that object becomes con-

sistent.

While page-based DSM systems use vimial memory pagefault handling mechanisms to

detect updates and accesses to a page, in region-based DSM systems, the application itself

has to supply information about when regions are accessed and modified. For example, in

the shared region implementation in Hurricane [12], a programmer e s t identifies the set

of shared regions in the program, and then encapsulates each series of accesses to those

regions with a set of annotations that indicate when those regions are referenced and

whether they are referenced for read-only or for write. The annotations include: readoc-

cess (acquires a region for read only); wrireaccess (acquires a region for read and write);

readdone (releases a readaccessed region); wriredone (releases a writeaccessed region).

Region-based DSM systems reduce most of the false sharing by using a variable size shar-

ing unit. However, in region-based DSM systems, application programmers have to

choose shared regions, choose the type of synchronization objects and biml them together.

This increases the complexity of application programs.

Memory Consistency

Memory consistency refers to the way in which updates to shared memory are reflected to

the processors in the system. In a DSM system, shared data is duplicated on ail the proces-

sors. In order to improve the performance, this data can be accessed concurrently. How-

ever, if the concurrent accesses are not carefùlly controlled, accesses to the shared data

may be executed in an order different fkom what the programmer expected. In other

words, if a read to the shared data returns the results fiom the most recent write to that

shared data, the memory is said to be coherent- In order to maintain the coherence of

shared data, a h e w o r k (Le., memory consistency model) that describes how to control

or synchronize the accesses is necessary, In this section, we review some of the common

memory consistency models.

Sequential Consktency (SC) [20](22] requires that modincations to shared memory are

made visible immediately to al1 processors sharing the same data In other words, i f a

memory is sequential consistent, any read to a shared memory location must reflect the

most recent write to that same memory location anywhere in the system- For example,

in a simple sequential consistency implementation, a processor may acquire exclusive

access to a memory page before modifjbg it, and then transmit its modification to

other processors before giving up exclusive control of that page. The earliest page-

based DSM system (e-g., iVY [17] [9][[2 l]) implemented sequential constancy because

it was the most obvious model. The disadvantage of using the sequential consistency

model is that it c m result in large amounts of communication [Figure 2.21.

Figure 2.2: An Example of Sequential Consistency among Three Processors

In Figure 2.2, we assume that there are three processors Pl, P2, P3 in a DSM system.

Processor Pl performs three write operatïons: w(x), ww), w(z)- This assumption also

applies to Figure 2.3, Figure 2.4, Figure 2.5 and Figure 2.6.

In Figure 2.2, after each write, Pl makes its write visible to P2 and P3. Sequential con-

sistency is not often used in page-based DSM systems now due to its high communica-

tion overhead.

Weak Consistency (WC) [20] does not make the modification to shared memory visible

to al1 the other processors sharing the same data until a synchronization point is reached

Figure 2-31. In other words, any write to shared memory performed by one processor

will only be made Iocally- When that processor reaches a synchronization point, the

writes are then transmitted giobdy.

synchronization point 1 synchronization point2 f F

Pl w(x) w(y)w(z)

E'2 X J J

Figure 23: An Example of Weak Consistency among Tbree Processors

In Figure 2.3, Processor P l does not make its writes visible to processor P2 and P3

until it reaches a synchronization point. The synchronization can be implemented by

means of explicit synchronization operations such as locks and barriers.

Comparing Figure 2.2 and Figure 2.3, we can see that weak consistency needs less

communication than sequential consistency when performing the same arnount of

write operatiom. However, there are some constraints on programmers when using

weak consistency. Programmers must make sure that accesses to synchronization vari-

ables are sequentially consistent, i.e., no data races are aiiowed and the programmers

c m oniy use synchronization operations recognized by the system. From the applica-

tion programmer's perspective, the prognunmer may have more work to do when

using weak consistency than using sequential consistency.

Overd, weak consistency is likely to have better performance than sequentid consis-

tency because it has lower communication overhead However, weak consistency still

has some limitations in that it recognizes only one type of synchronization variable.

When a processor accesses the synchronization variable, the memory system has no

way to know if the processor is about to leave or enter the critical section. Therefore,

the processor has to make its modif~cation to the shared data visible to aii the other

processors sharing the same data on both entering and leaving the critical section.

Release consistency (RC) [20] [8] improves weak consistency by making the modifica-

tion visible to al1 the other processors sharing the same data only when the updating

processor exits fiom the critical section. Release consistency implements this by defin-

h g two synchronization variables: acquire occess and release access. Acquire accesses

are used to tell the memory system that a critical section is about to be entered. Release

accesses is used to Say that a critical section has just been exited. A processor makes its

modification to the shared data visible to ali the other processors sharing the same data

only when the processor performs a release operation Figure 2.41.

Figure 2.4: An Example of Release Consistency among Thme Processors

In Figure 2.4, Frocessor Pl acquires synchronization variable S, performs three writes

w(x), w(y) and w(z). Pl did not make its modification visible to P2 and P3 until it

released the synchronization variable S.

In general, release consistency has lower commuuication overhead than weak consis-

tency by distinguishing between the type of synchronization variables used to enter

and exit the critical section,

Release consistency is the most popular memory consistency used in DSM systems up

to now. Some of the DSM systems using release consistency are TreadMarks[lTJ,

Munin [2 1 l, Brazos [30], CVM 130 ] and Quarks [3O].

Compared to sequential consistency and weak consistency, release consistency

improves a DSM system's performance. However, in release consistency, when it

cornes to a release, the updating processor makes the whole content of the critical sec-

tion visible to d l the other processors. Unfortunately, not ail the processors requires dl

the data inside the critical section.

Entry Consistency (EC) [9] Mproves release consistency by explicitly binding shared

data to synchronization variables, so that when a processor leaves a critical section,

only the shared data bound to that synchronization variable needs to be made consistent

among al1 processors [Figure 2.51.

Figure 2 5: An Example of Entry Consistency among Three Processors

In Figure 2.5, acq(Sï) stands for operation acquiring synchronization object Si; rel(Sï)

stands for operation releasing synchronization object Si. From the above figure, we

can see that each shared variable x, y, z is guarded by a synchronization object. Pro-

cessor Pl does not make its modification visible to processor P2 and P3 at the same

t h e . It makes its modification to a shared variable visible only to a processor which

acquires the synchronization object guarding that shared data. in this way, Pl transfers

less data than it does in release consistency.

From this example, we can see that entry consistency can reduce the amount of com-

munication required than that required in release consistency. It may therefore

improve the performance of a DSM system. However, in some cases, a DSM system

using entry consistency may perform worse than using release consistency. For exam-

ple, in Figure 2.5, when both processors P2 and P3 acquire synchronization objects S 1,

S2, S3, processor Pl has to communicate with P2 and P3 six times (See Figure 2.6).

While in refease consistency (see Figure 2.4), Pl needs to comrnunicate with P2 and

P3 only 2 times if x, y, z al l happen to be on the same page.

Figure 2.6: An Exampie of Wont Case in Using Entry Consistency

As in most region-based DSM systems, entry consistency also potentially increases

the complexity in w-riting a parallel program. Application programmers have to choose

shared variables, synchronization objects and bind each shared variable with a syn-

chronization object,

Entry consistency is often used in region-based DSM systems. Such systerns include

Midway [9] 12 11 and ABC* [16] 11 21.

Coherence

The coherence protocol indicates how memory consistency is enforced in a DSM system.

Coherence protocols are ofien categorized as invalidate-bmed or update-based and eager

or Z q according to the way updates to shared data are propagated. When using invali-

date-based protocols with sequential consistency, for example, a multicast message must

be sent before a write to shared data takes place in order to invalidate al1 copies of that

data. This prevents other processors fiom reading stale data. The update is propagated

ody when the shared data are read, Using update-based protocols, updates to the shared

data are made Iocaiiy. The updated s h e d data is then mdticast to other processors which

possess a copy of the shared &ta. Processors read the local copies of the shared data, thus

reducing the communication cost- Eager and lazy coherence protocol are ofien used with

the release consistency model. In the irnplementation of release consistency using the

eager coherence protocol, a processor postpones propagating its modification to shared

data until it cornes to a release (Le., the time when it exits fiom the criticai section). At

that t h e , it propagates the modifications to al1 other processors that cached the modified

pages. Using the lazy coherence protocol, the notitication of the modification is postponed

until the t h e of the acquire (i.e., the tirne when another processor acquires the synchroni-

zation object).

Different coherence protocols can be used in the irnplementation of different memory con-

sistency models, and different coherence protocols can also be used in the implementation

of a single mernory consistency model. In what follows, we give an example of imple-

menting release consistency by using different coherence protocols. Figure 2.7 is an illus-

tration of an invalidate-based coherence protocol used to implement the release

consistency model.

Figure 2.7: Implementing Release Consùtency using an invalidate-based Coherence Protocol

In Figure 2.7, we assume that there are two processors Pl and P2 in a DSM system; P l

performs two write operations: w(x) and w(y); P2 performs two read operations: r(z) and

r(y). This assumption also applies to Figure 2.8.

In Figure 2.7, after processor P l wrïtes memory Location y, Pl does not make y visible to

P2 until P2 acquires the synchronization variable S. Since the cache entry pointing mem-

ory location y is invalid, P2 c m not read y even if it has acquired S. A cache miss will

occur when P2 reads y. This will make Pl send its modification to P2. it is similar in the

implementation of release consistency using a l a q coherence protocol except the memory

page is not invalidated. Figure 2.8 illustrates the implementation of release consistency

using an update-based coherence protocol-

Figure 2.8: implemcnting Rclcase Coasistency using an update-bascd Coherencc Protocol

In Figure 2.8, processor P l makes its modification to memory location y visible to P2

when it releases the synchronizattion variable S. This is the same as implementing release

consistency using an eager coherence protocol,

Invalidate-based coherence protocols are often implemented as multiple-reader-single-

writer (many processors read while only one processor writes) sharing. It is potentially

expensive, but when the read/write ratio is sufficiently high, it can achieve good perfor-

mance. Update-based coherence protocols are ofien implemented as multiple-reader-mul-

tiple-writer sharing. Reads are cheap in this option, but multicasting the write is relatively

expensive to implement in software. Eager and lazy coherence protocols are often used

with release consistency.

2.2 Distributed Object-Oriented Paradigm

A distributed object is an object created on one machine that c m be accessed on other

machines in a distri'buted network computing environment. A distributed object c m be

used like a regular objecî, but fiom anywhere on the network. An important characteristic

that distinguishes objects fiom ordinary procedures or functions is that objects can still

exist even though the object which created them has already stopped. An object is consid-

ered to encapsulate its data and behavior (Le., encapsulation). Encapsulution means that

an object's intemal states are hidden from the public's view, it communicates with the out-

side world by its public interface. In a distributed network computing environment, dis-

tributed objects are packaged as independent pieces of code that can be accessed by

remote clients via method invocations. The language and compiler utilized to create dis-

tributed server objects are totally transparent to their clients. CLients do not need to know

where the distributed object resides or what system architecture it executes on. The dis-

tributed object can be on the same local machine as the client or on a machine that is

within the same network 1151. Section 2.2.1 introduces some key issues in designing dis-

tributed object-oriented systems as weil as efforts made in dealing with these issues. Sec-

tion 2.2.2 briefly describes some 0 t h issues in designing distributed object-oriented

systems.

2.2.1 Some Key Issues in Designing Distributed Object-Oriented Systems

As with a DSM system which presents programmers with a unified memory model, a dis-

tributed object-oriented system attempts to present programmers with a imified object

model across Metent machines in a network- Besides a unified object model, a distrib-

uted object-onented system also has to offer some communication mechanisms so that

objects on different machines in a network can commmicate with each other. Therefore, a

unified object model and communication mechanisms are two important issues in design-

h g a distributed object-oriented system. in what follows, we give a brief description of

these issues.

Unified Object Models

In a regular object-oriented programming environment, programmers only deal with

objects on the same machine (Le., local objects). In distributed object-oriented program-

ming environment., on the other hand, programmers have to deal with objects existing on

different machines (Le., distributed objects). Offering a unified object model is a very

important issue in designing a distributed object-oriented computing system. Efforts have

been made to present programmers with unified object models, such as CORBA (Com-

mon Object Request Broker Architecture) 1151, DCOM (Distributed Component Object

Model) 129) and Mentat [16].

CORBA and DCOM are standards in supporthg distributed object-oriented computing

systems, and are defined in very similar way- They both use D L (Interface Definition

Language) to de& distributed objects. The major merences between CORBA and

DCOM are their error handihg mechanisms: CORBA uses exceptions white DCOM uses

retumed values to report erroa. Using CORBA and DCOM, programmers do not need to

worry about the lower level details and complexities of software on various systems.

However, programmers have to deal with two difEerent object models when writing dis-

tributed programs: the local object model of the language and the distributed object model

mapped fiom IDL.

While both CORBA and DCOM are independent of any programming Ianguages, Mentat

is designed by extending the C* programming language. It extends C++ by using Mentat

classes to separate general C H classes fiom classes used for parallel computing. Mentat's

object model includes two types of objects: contained objects and independent objects.

Contained objects are objects contained in another object's address space. Instances of

C++ classes, integers, structures, and so on are contained objects. Independent objects

possess distinct address space, a system-wide unique name, and a thread of control. Com-

munication between independent objects is accomplished via menberfinction invocation

and r e t m values. Independent objects are analogous to Unix processes.

Some Communication Mechanisms

Unlike regular object-oriented programming where an object can be accessed ody by the

objects on the same machine, in distributed object-oriented programming, the object cre-

ated on one machine might be accessed fiom other machines. Thus, developing communi-

cation mechanisms for the objects which are uniform within a single application domain

or across multiple applications are key issues in designing a disiributeci object-onented

programming system. Communication mectianisms commoniy used in distributed com-

puting systems inchde: socket, RPC (Remote Procedure C d ) and RMI (Remote Method

Invocation). In what follows, we give a bief introduction to these communication mecha-

nisms as well as some efforts in using them.

A socket is a mechanism which creates an end point for communication. It provides

applications with point-to-point byte Stream services. In a system that uses sockets,

programmers have to create sockets on both the client and server sides. The sockets are

bound to local ports, and messages are packaged up and exchanged between both sides.

This mechanism requires the client and the server using sockets to engage in an appli-

cation4evel protocol to encode and decode messages for exchange. In general, this

rnechanism is optimized for performance, rather than ease of prograxnming. Communi-

cation among objects in this way c m be cumbersome and error-prone. Sockets are used

in distributed computing systems.

RPC (Remute Procedure CalI) is another communication mechanism used in distrib-

uted computing systems. The communication interface in this meçhanism acts as a pro-

cedure call. The arguments of the call are packaged up and shipped off to the remote

target of the call by the underlying system, which makes the application programmer

have an iilusion of callhg a local procedure. Compared with Sockets, W C is more pro-

gramming fiiendly. As with sockets, RPC is commonly used in distributed procedure-

oriented computing environments. However, some distributed object-onented systerns

also use RPC, such as Extended C* [16].

RMI (Remote Metkod Invocurkon) provides the same method invocation mechanism

in distributed object-oriented computing environment as it does in reguiar objectai-

ented cornputhg environment. The RMI mechanism can be regarded as an extension of

W C systems to the object-oriented paradigm. In systems using RMI, objects, whether

local or remote, are defined in terms of interfaces which are declared in a kind of hter-

face definition language (IDL), such as CORBA and DCOM. The implementation of

the objects are independent of the intefices and are aiso hidden fiom other objects In

RMI, the uoderlying mechanisms used to malce method c d s may be different depend-

h g on the location of the object. However, these mecbanisms are hidden from the pro-

grammer. RMI is the most popular communication mechanism used in distributed

object-oriented systerns. Most distributed object-oriented computing systems support

Arnong above communication mechanisms, Java supports both sockets and RMI.

2.2.2 Other Issues in Designing Distributed Object-Oriented Systems

Besides the above issues, some other concems in developing distributed objectaiented

systems include: object migralon, object storage, object integrity and data security.

Object migration techniques are often used to allow objects to be migrated across the net-

work while preserving data integrity, locaiity of reference, and sharing properties. A dis-

tributed object can be stored either in local memory, or on the network, hence, the design

of object storage mechanisms should allow the transparency to users and provide realistic

performance to object accesses. The last concem in developing distributed object-oriented

systems is that the object infegriiy and data security must be preserved irrespective of its

location and usage [']Cl41 [18].

2.3 Java

The work in this thesis is based on the Java system and some of Java's features, such as

platform independence, interpreted bytecode, multithreading as weli as transparent mem-

ory management, In this section, we give a brief description of the Java system and those

features which are relevant to the work we desctibe in this thesis. Section 2.3- 1 describes

the Java system architecture as well as the Java programming architecture. Section 2.3.2

introduces the multithreading and the synchronization mechanisms in Java. Section 2.3.3

gives a brief introduction to the memory management strategies in Java

2.3.1 The Java System Architecture and the Java Programming Architecture

Java is an object-oriented language. A Java program is compiled to intermediate code

(bytecode) which is independent of the machine architecture and the operating system.

This bytecode in tum runs on the top of the Java Vittual Machine. As s h o w in Figure 2.3,

the life cycle of a Java program includes both the compile-time and runtime phases. in the

compile-time phase, the developer writes Java source code (contained in a .java file) and

compiles it to bytecodes (contained in .closs files). In the runtime phase, the Java Byte-

Code Loader loads the correspondhg .clms mes fiom the local disk and it also resolves

those unresolved class names fiom the Java C h s Libraries. The Java VirtuaI Machine

consists of a Jma interpreter (which hterprets the Java's bytecode to correspondhg

machine code), Run-time system (which contains Java's nintime class libraries) and a

Code Generator (which generates platform-specific instructions afler bytecodes has been

loaded into the Java Virtuai Machine). Any platfonn with a Java Virtual Machine avail-

able c m nin Java applications without any special porting work for that application.

Compile Time Environment Run Time Environment

Source 0 +, Compiler

byte code move through

network or

* Java Class Loader Libraries

Bytecode Veriiicr

Java Byte

Y L Java Virtual Machine

Run time el Generator I

I hardware I

Figure 2.9: The Life Cycle of a Java Program

A Java application's portability is a result of the interpreted nature and architecture-neu-

trality of the bytecode. Furthemore, Java specifies sizes of ali its primitive data types and

defines the standard behavior of arithmetic that will apply to the data types across ail plat-

foms. In this way, Java eliminates some other languages' effort of defining mauy funda-

mental data types as implementation dependent.

The Java environment itself is portable. The Java run-time system is written in ANS1 C

with a clean portability boundary which is essentially POSIX-compiiant. Figure 2.4 shows

Java system on a host operating system.

Java Applications (ByteCode)

Figure 2.10: Java System on a Host Operating System

Figure 2.1 O shows the Java system fiom the programming perspective. The Java API is

implemented by several classes written in Java, which includes the language and utility

classes, the Abstract Window TooIkit, and the Network and V 0 classes. The Java API is

independent of the underlying hardware and stands on top of the Java runtirne system. The

Java runtime system coosists of two parts: a platform independent part and a platform

dependent porting interface. These parts are written in a combination of C and Assembler

language. The Java Runtime implements the interpreter and garbage coiiector.

Java Runtïme (Platform Independent)

Java Runtime (platform dependent porthg interface)

I B 1 5 3

E I W ' <

2.3.2 Multithreadiag and Synchronization in Java

One of the key Java mechanisms that is explored in this thesis is multithreading. In this

section, we give a brief description of the multithreading feature in Java

A rhread is a lightweight process. The difference between a thread and a process is that

each process has its own resources (such as memory address spaces, etc.), whereas, sev-

erai threads can share the wime resources. A single process cm have severai threads. Mul-

tithreading is the way to obtain fast, lightweight concurrency within a single address

space.

In Java, each thread has its own working memory and al1 the threads in the same Java pro-

gram share one main memory. The main memory contains the master copy of each vari-

able. Each thread's own working memory contains the working copy of the variables the

thread must use. Each thread can operate on its own working copy of the variables. How-

ever, there are rules (synchronkation mechanisms) to follow when a thread wants to oper-

ate on the main memory. Java supports its rnultithreading mechanism by a set of

synchronization primitives based on the widely used rnonitor and condirion variable para-

digm which was întroduced by Hoare in the early 1970s [IO]. Monitors provide a struc-

tured way to control access to shared resources. Ln Java, the keyword synchronized placed

in fiont of a method definition implies that any thread executing that method must gain

exclusive access nghts prior to executing the method. A synchronized method automati-

cally performs a lock operation when it is invoked. The method can not be executed before

the lock operation has successfully completed. When execution of the method is com-

pleted, an unlock operation is automatically performed on that same lock. Within a syn-

chronized method, a thread may call wair. to temporarily halt execution of the thread and

ailowing another thread to execute a synchronized method in that class. The original

thread resumes execution only when another thread c d s nonfi or nofiBAll.

2.3.3 Memory Management in Java

Udike other hi&-lever programming languages, such as C and CH, the Java system par-

tially relieves the programmer fiom concems about memory management. in this section,

we give a brief introduction of the memory management strategies in Java

In Java, each variable is a typed storage location. A variable in Java can contain a value of

primitive type or a reference to an object. No matter what the variable contains, it is bound

to two main attributes: its type and its storage clairs. The storage class is used to detennine

the lifetime of a variable,

Local variables are declared and allocated within a block and are discarded on exit from

the block. The Method parameters are also considered local variables. The static variables

are local variables within a class; they are aliocated when the class is loaded and discarded

as the class is udoaded. The dynamic objects are instances of classes and arrays. They are

allocated by the new operator and their storage can be reclaimed by some automatic stor-

age management techniques, for instance, garbage collection. In some circurnstances, the

resources (e.g. operating system graphics context etc.) cannot be fieed automatically by an

automatic storage manager, thefinalite method in class Object should be invoked. M e r

an object has been finalized, the storage occupied by the object may be reclaimed immedi-

ately and recycled for other uses.

In Java, ai1 the references to allocated storage and ail the references to an object are

through symbolic handes. The Java mernory manager keeps track of references to

objects. The Java compileci code references memory via symbolic hundles that are

resolved to real memory addresses at nin time by the Java interpreter.

2.4 Summary

In this chapter, we have given a brief review on the basic concepts of distributed parallel

computing methodology and some features of Java. Distributed computing systems gener-

ally fa11 into two categories: DSM systems and distributed object-oriented systems-

According to granularity, DSM systems can M e r be categorized as page-based DSM

systems and region-based DSM systems. We aiso give a bnef introduction to some com-

munication mechanisms used in existing distributed computing systems. These communi-

cation mechanisms include: sockets, RPC and RMI. Among hem, RMI is the most

popular one used in distributed object-oriented systems. Section 2.3 gives a brief descrïp

tion of some of Java's features which are relevant to this thesis- These features are: piat-

form independence, multithreading, interpreted bytecode and memory management.

Chapter 3

Parallel Computing in Java

In the last chapter, we gave a brief overview of some basic concepts in distributed, object-

oriented parailel computing among networks of workstations as well as the Java system.

We can see that the mechanisms existing in the Java system provide some of the idka-

structure required for an object-oriented DSM system. For instance, fiom the application

programming perspective, Java contains some features that are suitable for building

shared address space parallel programs through its existing facilities for concurrency (par-

ticularly multi-threading and locks). However, in order to extend the Java mode1 to

achieve parallelism across networks of distributed machines, extensions have to be made

to the Java runtirne fiamework, and possibly to the Java API, to provide the illusion of a

shared address space across machines that do not share memory.

In this chapter we limit ourselves to oniy discussing the alternatives for extending the Java

system for supporting parallelism. The next chapter will present an actual implementation

of a parallel computing system that wiii put some of the alternatives described here into

practice. The goal of the alternatives presented in this chapter is that both existing Java

programs and new programs written to exploit parailelism can nui on the same extended

Java system. The difference is that existing Java programs will run as stand-alone work-

station applications, whereas, new programs using the extended API will run as parallel

applications.

Section 3.1 discusses the key obstacles and issues in developing a DSM system within the

Java fiamework. Section 3.2 describes alternatives for ushg Java's multithreading feature

to build object-oriented paraiiel programs, and ways in which the runtime system c m

exploit this multithreading feature to create paraiieîism across nehvorks of distributed

cornputers. Section 3.3 tallcs about alternatives for creating shared objects. Section 3.4

explores methods for maintainhg memory consistency when developing a DSM system

within the Java framework. Section 3.5 summarizes this chapter.

3.1 Issues in Developing DSM Systems within Java

As discussed in the previous section, the existing Java system provides some of the mech-

anisms for developing a paralle1 computing system. However, the Java system currently

does not support parailelism within its fnunework. Since Java is a purely object-oriented

language, a DSM h e w o r k within Java system should also be object-oriented. However,

there are a number of issues in developing object-oriented DSM systems. In chapter 2, we

gave an oveMew of some of these issues. With respect to the object-oriented paradigm,

the following issues exist:

Finding an object-based mode1 for parallel computing that fi& into the existing object

oriented h e w o r k of Java

Developing an appropriate object communication scheme that accommodates the issue

of accessing different address spaces in a distributed object-oriented progamm.ing

environment.

From the DSM system's perspective, the main concems are:

Choosing the granularity at which data is shared (Le., the size of the shared memory

unit).

Finding mechanisms for maintiiining memory consistency and coherence. Recall that

memory consistency refers to how updates to shared data are reflected to the processors

sharing the same data in a DSM system. A coherence protocol indicates the way in

which memory consistency is enforced to a DSM system. The latter includes the strate-

gies for update detection and change propagation.

The goal of this thesis is to explore mechanisms for achieving parallelism by utilizing

existing mechanisms withlli the Iava fiamework. Hence, our study, with respect to the

above issues, begins with Java's multithreading feature. As discussed earlier, Java already

supports multiple threads for concurrency. However, all threads within one Java program

are executed on the same machine. Dispatching the threads of the same Java program to

dflerent available machines is the main concem in our study of parallelism within the

Iava framework. As a result of dispatching these threads across different machines, we

must aiso consider the issues of fhding an appropriate sharing unit and maintainhg mem-

ory consistency and coherence. In what follows, we present our studies of developing an

object-onented DSM system within the Java h e w o r k .

3.2 Using Multithreading for Parallelism

The multithreading feature in Java allows users to write programs with multiple execu-

tions of control. However, threads within a Java program are actually executed in a single

machine serially. In this section, we present our studies on running these Java threads on

different machines to create p d e l i s m across networks of distributed computers.

3.2.1 The Creatioa of the Threod Objects at the Java Language Level

In Java, threads are created and managed by classes caiied Thread and ThreadGroupIhe

only way to create a new thread in Java is by creating a Thread object. There are two

approaches in creating a Java thread object. One of them is to irnplement the method run

of the class Threud when the new thread class extends the class Threamigwe 3.11.

/ / I m p l e m e n t h g a class DemoThread by extending class Thread

Class DemoThread extends Thread {

Public void run O {

/ / C r e a t i n g and startiag an instance of the class Dernomead

DemoThread T = n e w DemoTkureadO;

Figure 3.1: Cmtion of Tbread Object by Extending CIass Tllread

In Figure 3.1, the user thread class DernoThread implements the run method in class

Thread. The instance of the DemoThread class, T, is started by Ulvoking the start method

in class Thread (i-e., T-starto).

The other approach for creating a Thread object is to implement the Runnable interface.

Using the Runnable interface, a new thread class is denved fiom the user class that has

impIemented the Runnable interface [Figure 3.23.

//Implementing a class Demo by implementing interface Runnable

C l a s s Demo Implements ~unnable{

Public V o i d run0 {

//Creatïng and starting an instance of class Demo

Demo T = new Demo ( 1 ;

new T h r e a d (T) . start ( ) ;

Figure 3.2: Creation of Tôread Objcct by Impkmenting the Runnuble Intedace

In Figure 3.2, the class Demo implements the Runnable interface. The thread object is

denved fiom an instance of the Demo class, and it is started by invoking the sturt method

in the Thread class (Le., new Thread(T).staaO).

In both of the above cases, the thread objects are started by invoking the starf method in

class Thread, which thus initiates the execution of the run method in the thread class.

3.2.2 Support for Threads at the Java Runtime System Level

At the program level, a thread is created by the new operator and started by invoking the

star@ method in the Thread class. In the Java runtime system, the support for Java threads

is as follows. The invocation of the starto method in the Java program causes the follow-

ing method (written in C) in the Java nintime system to be executed.

Hobject thread (Tm, unsigned int, s i z e t, void * (+) 0 )

In this case, TID is the handle to the instance of Thread class. The value of the second

parameter indicates the type of the thread to be created A value of "O" indicates that a sys-

tem thread is to be created; a value of "2" stands for a user thread. The third parameter

shows the actual size of the thread C stack. The last parameter is a function with a return

type void. In the case of a thread start, the last parameter should be the foliowing function

(written in C):

s t a t i c void ThreadRTO(register Hjava-language-Thread 9)

This h c t i o n does the real work of thread creation.

The b c t i o n ThreadRTO sets the execution environment and initializes the thread by call-

ing rnethod theadnit. m e r the initiaiization of the thread execution environment, the

thread is started by calling method (wrïtten in C):

execute-Java-dynamic-me thod ( &ee , (void * 1 P , " run" , C 1 V m 1

The first parameter is the execution environment variable, the second is the thread object

within which the run method is implemented; the third is the dynamic Java method run

which is going to be executed and the Iast parameter is the signature of the running

method. A signature is a retum type in JAVA, which can be a primitive type or a reference

type. In this case, it shows the r e m type of the run method. In Java, the r e t m type of the

run method is void, which is indicated with symbol O Vin the above method.

3.2.3 Dispatching Threads to Different Machines in the Runtime System

in order to begin experimenting with p d e l computing within Java, we start with a sim-

ple execution mode1 in which the only communication between parallel threads is the

parameters provided to each thread on start-up, and the collection of results fiom each

thread at the end of execution. The two issues, the creation of paralielism, and the collec-

tion of results, are deait with separately below-

An intuitive approach utilizing Java's multithreading feature for creating parallelisrn in a

networked environment is to create ditferent threads in a single machine, Le., a host

machine, and then dispatch these threads with different parameters to different target

machines. M e r the computation is completed, the host machine collects the results fiom

different target machines. An alternative approach is to create different threads on differ-

ent target machines. In the latter case, the host machine packs the necessary information

for creating the threads and ships it to different target machines. The target machines then

use the information fiom the host machine to create the threads locally. Compared with

this approach, the first approach which ships the created threads is more complicated

because the base operating system on a host machine running a Java mdtithreaded appli-

cation knows about only one thtead, i.e., the Java Virtual Machine thread. Ail of the other

threads in a Java application have to be mapped onto host operating system threads for

parallelism- Therefore, in order to dispatch a thread created on one machine to other

machine, we have to have information about the thread on a particular host operating sys-

tem, such as, the name of the thread, the status of the threads, the handler of the thread, the

current execution context, etc-

To test the feasibility of this latter approach, we conducted an implementation as follows.

In this implementation, the host machine dispatches the name of the class wbîch is to be

instantiated and the corresponding parameters to different target machines, On each target

machine, there is a daemon process niMUig. A daemon process is a background process

which provides service to other threads in the system. Once the daemon receives the

reIated information, it will create and execute the thread on the target machine.

Gening different thread objects to begin execution on different machines is the f h t step in

developing a parallel computiug fiamework for Java The next step is to coilect the results.

In Java, the r e t m type of the run method is void, which means that it does not retum any

information afier its execution. Therefore, we cannot use the original nrn method for this

purpose. Here, we conducted another Mplementation. In this implemenation, we extended

the fimction of the original run method by introducing another method called runl, which

not only W l s the fünction of the original ri«l method but also returns some other result

types besides void. From the Java program's perspective, there is not much dBerence

between methods run and ml. The only ciifference between these two methods is their

retum types. The r e m type for the nrn method can only be void, however, the return type

for the nrnl method can be any type.

The introduction of the runl method solves the problems of collecting execution results

from different threads ninning on different machines. Therefore, it increases the possibil-

ity of using Java's multithreading feature to achieve parallelism. However, in a parailel

computing system, we need wt only c o k t the h a 1 results but also propagate the inter-

mediate updates to a shared data. Unfortunately, the runl method can not propagate the

intemediate updates to a shared data, In the next section, we introduce a class we cal1

SharedObject to collect the final r ed t s and to propagate the intermediate updates to a

shared data,

3.3 Creating shared objects

Although DSM systems typicdy use pages as the unit of sharing, many object-oriented

DSM systems built specificaily for C* or other object-orïented languages often use an

object as the sharing unit Since Java is an object-oriented system, choosing the object as

sharing unit seems appropriate. Furthemore, as discussed in chapter 2, object-based DSM

systems reduce most fdse sharing caused by fïxed size granularity in page-based DSM

systems. In this section, we describe alternatives for creating distributed shared objects.

Ail the objects in Java, either regular objects or thread objects, are created using the new

operator. Consequently, we explore extensions to this operator for creating distributed

shared O bjects.

3.3.1 The New Operator in Java

At the Java language level, the new operator creates either a new instance of a class or a

new array object. From the user perspective, there is no difference between the creation of

these two objects. However, at the Java Virtuai Machine level, the class object and the

array object are created and manipuiated by different sets of instructions.

For instance, at the Java Vimial Machine level, the instruction for creating a new class

instance is new, wheras the instructions for creating new arrays are newarray, anewmroy

and mulrianewarray.

newmay is used to create a one-dimensional array of primitive types. anewurray is used

to create array of object references as weii as the first dimension of a multi-dimensional

-Y - For example, the statement

new Thread[7]

creates an array of references to thread objects, however,

new int[6 J

creates a one-dimensional array of integers.

At the Java language Ievel, both of the above arrays are created by using the same new

operator. However, at the Java Virtuai Machine level, the fkst one is created by the

anewarray instruction and the second one is created by the newarray instruction.

The rnultianewomay instruction is use to create an amy of references to a new array

object.

3.3.2 Extending the Functionality of the new Operator

Our objective in extending the new operator is to create shared objects at the objects' cre-

ation t h e . Here, we start with studying two alternatives for extending the fiinctionality of

the new operator. The first alternative is to extend the existing new operator in the Java

programming language. The second alternative is through introducing a shred-New oper-

ator to create shared objects.

Using the nrst alternative, the extended new operator appears to be the same to users.

However, the users have to add a keyword Shmed before the operator new when they want

a created object to be shared Figure 3.31. Once the Java compiler comes across the key

word Shared, it wiii mark the object to be created as shared- During the execution on a

host machine, the interpreter in the Java Vimial Machine will create dinerent copies of the

shared object on different target machines

//Creating a shared object by using the class DemoThread in Figure 3-1

DemoThread T = Shared new DemoThread ( 1 ;

Figure 33: A Demo of Creating a Sbared Object with Keyword Shared

The second alternative introduces another operator we c d shared-New [Figure 3-41 to

create only shared objects, whereas, the new operator can be used to create regular objects.

As with the first alternative, when the Java interpreter cornes across the shmed-New oper-

ator, it will create different copies of the shared object on different target machines.

//Creating a shared object by using the class DemoThread in Figure 3.1

DemoThread T = shared-New DemoThread ( 1 ;

Figure 3.4: A Demo of Creating a Sbared Object with Sharedtnew Operator

As we can see, both of the above alternatives can be used to create shared objects. Both of

them require changes to the Java compiler due to the introduction of some new keywords

(i .e., shared in the fïrst alternative and sharedaew in the second alternative), and the Java

interpreter due to introduction of the new operator (i.e., shed-New in the second altema-

tive) and extension of the existing operator (Le., shmed in the first alternative). However,

they have some ciifferences. For instauce, using the first alternative, we have to re-write

the existhg implementation of the new operator to create both regular objects and s h e d

objects. The new operator would then do two tasks, making it harder to understand and

maintain (this is related to the principle of uniqueness in programming language design)-

However, using the second aiternative, we can add a single module to the Java interpreter

to implement the shed-New operator while keeping the existing impiementation of the

new operator intact. In this way, if the Java system is updated, we will only need to update

the corresponding module instead of re-writing the implementation of the new operator.

Overall, both of the above alternatives require changes to the Java compiler, the Java

interpreter and the Java programming interface. This reduces the portability of the paralle1

applications. Furthemore, if any component of the Java system is updated, our system

wiii have to be re-written accordingly. It would be better to have an approach which c m

create shared objects without afTecting the integrity of the Java system. Therefore, we

carne up with a third alternative-

In the third alternative, we introduce a class we call SharedObject Figure 3.51- It is imple-

mented as an add-on to the existing Java runtime class Iibrary. Java programs can create

shared objects by extending the SharedObjecr class

//~reating a shared object by using the class SharedObject

SharedObject sharedData = new SharedObject (...)

Figure 3.5: A Demo of Creating a Sbared Object with Clrus SbaredObject

This approach extends both the Java nuitirne class iibrary and the Java programmiog inter-

face. However, since this approach does not change the Java compiler and the Java inter-

preter, it will not affect the execution of a reguiar Java program on stand-alone

workstations. In chapter 4, we will give a detail description of this approach as part of our

discussion on the implementation of Paraiieijava.

O v e d , each of the above alternatives have their own streagths and Limitations. Figure 3 -6

is a cornparison among them.

Limitation Creating Shared Ob j ect

Keyword Shared

Strength

Operator shared -New

Uses the same new operator to create both the regular and the shared object

Class SharedOb j ec t

mges to the ~ a v a compiler & interpreter Reduce readability .

Use another operator to create onïy shared objects, keeps the modularity of the interpreter

Figure 3.6: Cornparison of the Above Tbree Alternatives

Compiler & interpreter

m g e s to the Java

Added complexity.

Does not change the Java compiler & interpreter

3.4 Exploring Memory Consistency and Coherence

Programmers have to learn a new API.

As discussed in chapter 2, besides granuiarity, the other two important issues in designing

a DSM system are memory consistency and coherence. Memory consistency refers to how

updates to shared memory are reflected to the processors in the system. A coherence pro-

toc01 indicates how memory consistency is enforced in a DSM system. The most com-

monly used memory consistency models are: sequentiai consistency, weak consistency,

release consistency and entry consistency- The most commonly used alternatives for

building coherence protocols include write-invalidate, write-update, eager and lazy. Di&

ferent coherence protocols can be used in the implementation of different memory consis-

tency models and different coherence protocois can aiso be used in the implementation of

a single memory consistency d e l .

Maintainhg memory consistency and coherence requires mechanisms for updute detec-

tion (determinhg when some s h e d data has been modined) and updare propagation

(transrnitting the modification to a shared data to other processors sharing the same data).

Update detection and propagation can be achieved through program-level annotations by

the programmer (such as those used in region-based DSM systems), through compiler

analysis, or through virtual memory page protection (such as those used in page-based

DSM systems) [12]. In what follows, we tak about some alternatives for update detection

and propagation in designing a DSM system within the Java h e w o r k .

3.4.1 Update Detection

We consider three different alternatives for update detection in this section. The î b t

approach is to extend the fûnctionality of the Store instruction in the Java Virtual

Machine. The second approach is to use virtual memory page protection mechanisms. The

third approach is to use informaton provided by the application.

Extending the Store Instnicâions

In this section, we taik about detecting updates to shared data through extensions to the

store instructions at the Java virtual machine tevel.

The store instructions in the Java virhial machine store values fkom the operand stack to

local variables. As in some other programming languages, such as C and C u , the stack in

Java holds local variables and intemediate results- In the Java vii.tual machine, most of

the arithmetic operations are performed on the operand stack. The results are then trans-

ferred to local variables by store instructions. There is one exception, the incremental

operation (such as, i*) is performed on the local variables directly through the JVM

instruction iinc.

At the Java virtual machine, there are dBerent store operators for different data types. For

instance, the store instnictions for the integer variable and the double variable are istare

Figure 3 -71 and &ore respectively. //Java Source code using Integer data type void whileInt ( 1 {

i n t i = 0; while (i c 100) {

//The correspondhg JVM assembly code Method void whileInt (1

O iconst-0 1 istore-1 //Store constant O to local variable (il 2 goto 8 5 iinc 1 1 //Local variable1 plus 1 8 iload-1 //Load local variable onto operand stack 9 bipush 100 11 if-icmplt 5 //1f local variable1 is less than 100, 14 return

goto 5

Figure 3.7: An Example of the &ore Instruction

As shown in Figure 3.7, the instruction ISTORE is used to store Uiteger value "O" to local

variable "i"-

From Figure 3.7, we can see that the JVM names the variables within a Java program as

local variable 1 to n according to the sequence in which the variables appear in the pro-

gram. During the execution of the Java program, whenever a store happens, the interpreter

in the JVM wili store the value on the operand stack to the local variable according to the

store instruction Figure 3-81. Therefore, in order to detect an update to variables, the

interpreter has to be extended. In this way, when a store instruction is executed, the inter-

preter can tell the parailel computing systern that an update to variables has occurred-

//Implementation of the Store operation at the Interpreter

switch (opcode) { case opc_istore-##num: //-code is store-i

#i'f:def Parallel //Extending the interpreter for Parallel //Signal that the local variable##nurn is being changed

#endif

vars [numl = S-INFO(-1) ; //Put the number on stack to vars Cil

SIZE-AND-STACK(1,-1); //Change the value of the stack pointer - - -

case opc-dstore-##num: //Opcode is store-d

vars [numl = S-=O(-2) ; //Put the first byte to vars Cd] vars [num +11 = S-IWO (-1) ; //Put the second byte to vars [d+l] SIZE-AND-STACR(1,-2); //Change the value of the stack pointer - - -

1 . . .

Figure 3.8: Store an Intcgcr and a Double type to Local Variables at the Interpreter

As we can see in Figure 3.8, one can use extensions (e-g., the codes between #ifdef Parai-

le1 and #endif in above figure) to the interpreter to Uidicate that updates to a variable are

happening. Through this extensions, one can also determine the sequence number of the

variable which is being updated. However, at the interpreter level, one can not determine

the name of the variable that is king updated-

In a pardel program, variables may be shared or local- Ideaiiy the interpreter signals only

when updates to shared variables are being perfomed. However, the interpreter itseif c m

not tell the difference between a shared variable and a variable that is not stiared.

Therefore, it is likely that the Java compiler would need to be extended. During compile

t h e , the Java compiler would mark the variables which are shared. At the time of the exe-

cution, the interpreter would signal the parailel system if a marked variable (Le., a shared

variable) is being Wfitten.

Virtual Memoty Page Protection

Pagefault detection is ofien used in a page-based DSM system to detect a write to a rnem-

ory page, in which a certain memory page shared among al1 the processors is protected.

When one processor modifies a s h e d memory page, al1 its copies on the other processors

are invalidated. Any write to the invaiidated copies will cause a Pagefadi to occur. In the

implementation of the fault handler, the system propagates the modification to that shared

page to ail the other processors sharing the same page.

Virtual memory page protection mechanism c m be used to detect updates to shared data

in a page-based DSM system based on Java This is because at the Java Vutual Machine

Ievel, the support of muitithreading is by allowing multiple threads to independentiy exe-

cute Java codes which operate on Java data and objects residing in a shared memory.

Besides the shared memory, each of the threads has its own working memory in which the

thread keeps its own working copy of the shared variables it must use. In the execution of

a Java program, a thread operates on its working copies of the shared variables. In order to

maintain the integrïty of the shared variables inside the mahi memory, Java makes use of

monitors to aliow only one thread at a time to execute a region of code protected by the

monifor.

To access a shared variable, a thread should obtain a lock f'irst and flush its own working

memory, which guarantees that the shared values will be loaded corn the shared main

memory to the thread's worlüng memory. The unlock of a lock by a thread guarantees the

value held by the thread in its working mernory to be d e n back to the main memory.

From the above descriptions, we can see that it is possible to use virtuai memory page pro-

tection mechanism to detect updates to shared data within a parallel computing system

based on Java. However, using virtuai memory page protection may cause the following

issues. First, it may reduce the portability of the parallel computing system because the

virtual memory page size is decided by the operating system and different host machines

in a network cm have different operathg systerns. Second, as we discussed before, page-

based DSM systems may cause false sharhg, therefore, such a system would be less effi-

cient. Furthemore, Java is an object-oriented programming language, so it seems more

appropriate to use an object as the sharing unit- As discussed in chapter 2, object-based

DSM systems usuaiiy use entry consistency which uses information provided by the appli-

cation to decide when write operations are perfiormed to a shared data. This Leads to the

third approach.

Update Detaoon througit Rogram Annotations

The third approach is very similar to entry consistency in that it first lets the programmer

identw the shared objects in the program, then choose the synchronization object and

bind the shared object with the synchronization object- The updates to a shared data are

propagated only when the synchronization objects bound to that shared data are acquired.

The differences between this approach and entry consistency are as follows. First, while

entry consistency binds a lock to some shared data, our approach implements this by

instantiating the class ShedObjecf (discussed in previous sections). Second, in our

approach, the acquire of the lock is by calling the updare method in the class SharedOb-

ject. Last, the reIease of the lock in our approach is implemented by the successful execu-

tion of the updute methoci- In chapter 4, we wii i give a detail description to this approach.

From the above discussions, we can see that the three approaches introduced in this sec-

tion have different strength and limitations. We compare these in Figure 3.9.

Update detection Limitation

1 Extending 1 No new -1 1 Java Compiler and the 1 the store instructions I interpreter extended Virtual memory page protection

No new API and no compiler extension

False sharing. Operating system dependent.

Application level annotation

No compiler and interpreter extension

New API and more complicated for the application programmer

Figure 3.9: A Cornparison of the tbree Approachcs for Update Dctection

3.4.2 Update Propagation

As we described earlier, the task of update propagation is to transmit the modifications to

a shared data to all the other machines sharing the same data. in this section, we taik about

some alternatives we considered for propagating updates to shared data. The fxst alterna-

tive is to extend the functionaiity of the return operator in Java The second alternative is

by extending the existing Java synchronization mechanisms. The third alternative is

through the information provided by the application programs.

Extending the retwn Operator

Propagating the updates to shared data when a method returns is a relatively simple

approach. Some existing DSM systems, Mentat, for instance, add a function such as R T i

to implement its data consistency. RTF stands for rem-to-future. It is an analog of the

Return h c t i o n in C*. Unlike the Reîurn function, the returned value fiom RTF is for-

warded to ail member fûnction invocations that are data dependent on it, and to the c d e r

only if necessary.

In the Java system, the r e m operator causes an immediate exit h m a method in a Java

program. The expression foiiowing the retum operator, if any, is the redt of the method.

In a normal completion of a method invocation, a value may be retumed to the invoking

method

At the Java V M Machine Ievel, when a method is invoked, a new Frame is created cor-

respondingly for it. Afiame at the Java Virtual Machine is used to store data and interne-

diate resuits, to perfomi dynamic linking, to r e m values for methods and to dispatch

exceptions. The new fhme will become current when its method takes control of the exe-

cution, and wiiI be discarded when the method retunis to its caller [Figure 3-10]. There are

also different retum instructions in the JVM corresponding to different data types. Figure

3.10 is the part of the code fkom the Interpreter which deals with method r e m with the

integer type and the double type retum value.

//The parts of code dealing w i t h return operator at JVM switch (opcode) { case opc-ireturn:

frame->prev->optop[O] = S-INFO(-1) ; frame->prev->optop++;

S . - case opc-dreturn:

frame->prev->optop[O] = S =O(-2) ; frame-sprev->optopCl] = sIINFo(-1) ; frame-sprev->optop += 2;

1 - * -

Figure 3.10: The Interpreter Dcaling with Method Return with Intcger & Double Retutn Values

From Figure 3.10, we can see that when a method retums, the interpreter puts the retum

values directiy to the operand stack of the previous h e (i.e., the cailer h e ) . If we

want to propagate updates to a shared data during the method return, we first must know if

the return value is shared. However, the interpreter cannot know this just fiom the return

value itself. Therefore, the Java compiler has to be extended to provide it with such infor-

mation. The limitation of this approach is that it can not be used to propagate the interme-

diate results of the s h e d data

Exfending the Existing Synchronization Mechanisms in Java

In this section, we talk about how to propagate updates to shared data in a parallel comput-

ing environment through extension of the existing synchronization mechanisms in Java.

The discussion in this section is based on the assumption that updates to shared data are

detected by the extended store instructions in the previous section. In the Tst part of this

section, we give a brief description of the existing synchronizattion mechanisms in Java. In

the second part of this section, we discuss how to propagate updates to shared data by

extending the existing Java synchronization mechanisms in the Java mutirne system.

Multithreading in Java

Multithreading is one of the key feahues in Java As discussed in chapter 2, each thread in

a multithreaded Java application owos its working memory and al1 the threads share one

main memory. The working memory of a thread contains the workïng copy of a variable,

while the main memory contains a master copy of the variable. The thread can perform

any operation on the worlchg copy of the variable. However, there are some synchroniza-

tion mechanism to foilow to operate on the main memory.

Java uses monitors to synchronize the operation of the multiple threads on main memory.

Monitors are a high level synchronization mechanism. A monitor is iike a critical section.

In Java, the keyword synchronized placed before the definition of a method indicates that

a thread must gain exclusive access to execute that method Figure 3.1 11. When a syn-

chronized method is invoked, it automatically perfonns a Iock operation. When the syn-

chronized method nnishes execution, an udock operation is performed automaticaliy on

that same lock.

//A sample Java program u s h g synchronization

class synchSample { i n t a = 1, b=2, c; synchronized void synchW ( 1 {

a = b;

1 synchronized void synchR ( {

Figure 3.1 1: A Sampte Java Program using Synchronimtion

In Figure 3.1 1, the class SynchSample contains two synchronized method: the synchW

and the synchR. The method synchW writes the variable a and the method synchR reads

the variable a.

Suppose there are two threads: the thread tSarnpleW and the thread tSampleR. The tSam-

pleW calls the method synchW and the tSampleR caiis the method synchR. We ais sup

pose that the c d fiom the thread tSampleW is a bit earlier than the c d fiom the thread

tSampeR The execution flow of the two threads can be as follows (Figure 3.12). tSampleW Main Memory tSampleR

Lock class SynchSample

I read b

use, b

unLock class SynchSarnple

Lock class Synchçample I

read a 1

1 use a I I

assign c I

mite c 1 I unLock class syn&h~arnple

Figure 3.12: The Possible Execution Flow usiag Metbods in Figure 3.1 1

From Figure 3.12, we can see that before the threads can execute the synchronized meth-

ods, they must gain exclusive access to the class SynchSample which implements the syn-

chronized methods. After finishing execution, they perform an unlock automatically so

that other threads can access the same methods. When the unlock operation is performed,

the copies of the variables in the thread's working memory are flushed into main memory.

The main memory therefore contains the final version of the shared variables.

The implementation of the lock and the d o c k operations in the Java rutirne are through

~ W O fùnctions: monitorEnter and monitorExit.

Extending the Eaistiag Java Synchronization Mechanism for Updsite Propagation

As discussed in the previous section, when a mdtithreaded program runs on a stand-aione

workstations, any updates to shared data are made consistent through the use of a monitor

(using the keyword synchronized). By entering a monitor, a thread gets exclusive access

to shared data within that monitor. By exiting the monitor, the thread writes the updated

version of shared data back to the main memory. In this way, shared data is guaranteed to

be consistent among the threads.

The existing synchronization mechanism in Java can be extended to a p d e l computing

system under the assumption that the system can distinguish shared data fiom the data that

are not shared. This is because a thread on a stand-alone workstation simply flushes the

contents of its working memory to main memory when exiting the monitor. In a parallel

computing system, only shared data need to be propagated to other machines. The existing

synchronization mechanism in Java is very similar to release consistency in a parailei

computing system by making the data consistent only when exiting the monitor. Recall

that release consistency makes updates to shared data visible to other machines sharing the

same data when a critical section is exited. Release consistency is often implemented

using either the eager coherence protocol or the lazy coherence protocol. Using the eager

coherence protocol, a processor makes its updates to shared data visible to other machines

sharing the same data when it exits fiom the critical section (Le., perfOrms the release

operation). Using the lazy coherence protocol, updates to shared data are made visible to

other machines ody when another machine acquires the synchronization objects (Le.,

other machine perfoms the acquire operation). Therefore, the existing Java synchroniza-

tion mechanism (which makes shared data consistent among multiple threads only when a

thread exits the monitor) can be extended to release consistency in a parailel computing

system Figure 3-13].

//Extending the rnonitorExit for update propagation void monitorExit(unsigned int key) {

moni tor-t +mid; int ret;

- - - #ifdef Parallel //Propagate the updates to other processors which //has the copy of the data associated with //monitor Key.

#endif }

Figure 3.13: Extcnding Functioa monitorExit for Update Propagation

In Figure 3.1 3, the part between #ifdef and #endif is the extension of the existing monitor-

Exit h c t i o n for parallelism. The function monitorExit in Java is used to exit a monitor

and copies shared data in the working memory of a thread to main memory. As we can see

fiom Figure 3.13, the extended monitorExit fhction broadcasts updates to shared data to

other machines sharing the same data when a thread exits a monitor. This is an implemen-

tation of retease consistency using an eager coherence protocol.

This approach does not need to change the Java compiler. However, it may need to extend

any monitor related parts in the Java runtime system as weii as the Interpreter. It is uuder

an assumption that the store instructions in the JVM detect writes to shared data and the

system can distinguish the shared data fiom data that are not shared-

Update Propagation through Idormation Provideà (rom the Appiication

As with update detection, we came up with another alternative for update propagation by

using the class ShoredObject. Using this alternative, each processor first updates its own

copy of the shared data, then calls the upkte method of the class ShwedOdject to propa-

gate its update to other processors shariag the same data.

In the latter alternative, the application itself has to provide the information about when

the update is going to be propagated (i.e., by cding the updute method). It changes the

Java programming interface and extends the Java runtime class library. However, it does

not change the Java interpreter and the Java compiler. Furthemore, the extension wiiI not

affect the execution of a regular Java program. In chapter 4, we will taJk about this

approach. Figure 3.14 is a cornparison of the above approaches.

Update Detection S trength Limitation

Extending the No new API: return operator

Extending the Java compilez & the interpreter. Cannot propagate intermediate results,

Extending the No extending to the Extendhg the Java nintime existing Java compiler & no new A P f - & interpreter; extending synchronizatio Propagate any update to store instructions for mechanisms shared data- uudate detection,

No extending to the Through compiler & interpreter . New A P X . application Propagate any update More complicated of the information ta shared data, application prograns,

Figure 3.14: A Cornparison of the Alternatives for Update Propagation

3.5 Summary

In this chapter, we analyzed and explored the possibilities of developing a parallel com-

puting mode1 within the Java framework. Our exploration tanged fiom the Java applica-

tion level, the Java Lnterpreter level to the Java Runtime level.

From the above explorations, we realized that it is impossible for us to run unmodified

Java program without any new class libraries king added. It is also hard for a small goup

of people to deveIop a new programrning environment by purely extending the Java Vir-

tual Machine and to keep its integrity, capabilities and performance cornpetitive witt that

of a standard systems. On the other hand, it can often impose unreasonable Limitations on

the user program and heavy burden in dealing with the memory consistency by purely

using the class library approach.

Chapter 4

The Paralleljava System

In the last chapter, we gave an overview of our exploration of paralle1 cornputhg within

the Java frcunework. We showed that there were various possibilities in developing a DSM

system w i t b Java. Based on the exploration described in the previous chapter, in this

chapter, we give an oveMew of the design and the partial implementation of the paralle1

computing system within Java that we cal1 Paralleljava.

Paraileljava is an object-oriented DSM system that is designed for parallei computing

within the Java system. It supports a coherence fkamework that is similar to entry consis-

tency and an update-based coherence protocol. Unlike other existing DSM system, the

hplementation of memory consistency in Paralleljava depends on neither the compiler

(as in DSM systems which use compiler and runtirne to detect and coilect writes to shared

data) nor operating system page faults (as in typical Virtual Memory based DSM systems

which use operating system virtuai memory page protection to detect and coliect writes to

shared data). Memory coosistency in Paralleljava is implemented by simply broadcastiag

ail data associated with a synchronhtion object during interprocessor synchronization.

In section 4.1, we give an oveMew of the Paralleljava system. Section 4.2 shows the sys-

tem architecture of the Paralleljava system. Section 4.3 describes our design and imple-

mentation of the dynamic network class file loading within the Paralleljava system.

Section 4.4 describes the design and partial implementation of data coosistency in the Par-

aileljava system. In section 4.5, we summarize this chapter.

4.1 The Overview of Paralleljava System

There are two kinds of Java programs. The fïrst, known as applets, run within a Java-com-

patible browser. The second are stand-alone Java programs known as Java applications. In

Paralleljava, we modim the Java Virtual Machine to dIow user Java applications to run in

paralle1 on networks of workstations. When an execution request is sent to the Java Virtuai

Machine, the Paralleljava system will search for locally available idle machines (we caü

them servers) and dispatch the tasks to different servers. M e r finishing their work, the

servers will send the results back to the local machine. This is a kind of client/server

model, with the local machine acting as a client, and the remote machines acting as com-

puting servers. The remote semer is implemented as a daernon (i-e., a background process

which provides service for other threads in the system), waiting for new tasks to execute.

Once the semer obtains a Java method fiom the client, it will execute this method on a

local ParaHeljava Virtual Machine and then send the result back to the client. During the

execution, the client can communicate with any servers to keep the data consistent. Figure

4.1 is an o v e ~ e w of the Paralieijava system.

(idlemachine daemon) 1

, --- Server #1 daemon Server #2 daemon Server #N daemon

Figure 4.1: An Ovewkw of Paralleljava System

Within the client machine, there is a daemon cailed the idemachine daemon, which is

used to search for idle machines in the local network and write the information about the

idle machines to a text füe (e-g., server-inf in our implementation). When the user starts a

multi-threaded Java application on the local machine, the client looks up the available

machines within the file server.inf, dispatches the threads to different remote machines,

and then waits for the information fiom each of the servers. The client uses a round-robin

mode1 to allocate the idle machines to the tasks.

Like lava, Paralleljava is independent of both the operating system and the system archi-

tecture. It will work on any machine which nins the Paralleljava server daemon (written in

the C language). The Paralleljava server daemons are designed with portable interfaces so

that various subsystems can comect to them and receive services that are independent of

the hardware and soAware architecture.

in Paralleljava, the client sen& the same version of the shared data to each remote server

machine. The remote secvers then upload necessary class files and instantiate new Java

objects. The server daemon cails the ml method of the uploaded class and starts a new

process. Each process in a different machine executes independently und it meets a syn-

chronization point, At the synchronization point, the processes will try to communicate

with the client machine and idem the client of its update to the shared data. Server

machines communkate with the client in order to update the shared data. The update

among the processors are taken care of by the shared objects on each machine. The s h e d

object is implemented by a group of Java class libraries. After the synchronization point,

each machine will have an up-to-date version of the shared data.

4.2 The System Architecture of the Paralleljava System

In Paralleljava, there are two kinds of communications between machines. One is at the

t h e of the dynamic class file uploading/downloading, and the other is during data consis-

tency and synchronization actions (synchronization and data consistency happen sirnuita-

neously). In order to W y utilize Java's features and reduce co~nmunication overhead,

Paralleljava deals with these two kinds of communications in different ways. The cornmu-

nication during dynatnic class file loading is implemented in the Java interpreter using the

C language. The co~~~munication during data consistency process is implemented by

shared objects in the Java runtime system. In what follows, we talk about the system

architecture of the Paralleljava system.

4.2.1 The System Architecture of the Paralleljava System

In the Paralleljava system, botO the client and the server consist of three layers: the Appli-

cation Layer, the Paralleljava Virtual Machine Layer and the Transportation Layer. Each

layer on the same machine is independent of the others and can be replaced by an alterna-

tive impIementation without affecthg the other layers. For example, the transportation

layer in the current version is implemented based on the TCP protocol (using Unix sock-

ets), but a transportation layer based on UDP or ATM can be used interchangeably. The

system architecture of the Paralleljava system is illustrated in Figure 4.2.

As shown in Figure 4.2, the processes in a Java program are dispatched fiom the Parallel-

java Virtual Machine down through the Transportation Layer on the client side, then up

through the server side Transportation Layer to the server Pamlleljava Virtual Machine

Layer. The update propagation is done by the s h e d objects between the Application

Layers on both sides.

Once the Pardeljava Virtual Machine on the client side receives a request from a user

application, the Virtual Machine wiil anaiyze the request, wrap up the request and then

foward the wrapped request to the client side Transportation Layer. The Transportation

Layer then dispatches the requests to the server side Traasportation Layer. The cornmuni-

cation protocol between the two transportation layers is based on TCP/IP in the current

implementation. The Transportation Layer on the server side forwards the received

request to its Paralleljava Virtuai Machine Layer. The V W Machine Layer unwraps the

requests, executes the application, uploads class files from the client when necessary. The

4.2.2 The Functionality of eacb layer in the ParaIleljava System

Application Layer

The application layer consists of a group of class libraries which implement the synchroni-

zation mechanism in Paraileijava It includes mechanisms for update collection and propa-

gation.

The synchronization mechanism in Paraiieljava is implemented by shared objects. The

Application Layer on the client side maintains the shared object by allowing only one

server access to it at any tirne. It aiso coilects updates fiom and propagates updates to serv-

ers. The Application Layer on the server side updates its own copy of the shared object

first and then propagates the update to the client.

The update propagation is bplemented by a group of class libraries. The propagation

messages go only between the application layers on both sides.

The AppIication Layer on the client side is responsible for:

irnplementing and managing s h e d objects,

setting-up and maintaining the connections with the Application Layer on the server

collecting, rnarshaling (i.e., the technique used to arrange the updated data) and propa-

gating the updates to servers.

The Application Layer on the server side is responsible for:

setting-up and maintainhg the comection with the Application Layer on the client

initiating the update propagation,

rnarshaling the update.

Paralleljava Virtual Machine Layer

The Paralleljava Virtual Machine Layer extends the Java Virtual Machine by implement-

ing the dynamic network class file loading as weil as extending the Run method in class

irhread In Java, the V W Machine takes the byîecodes as input, translates them to

machine code and executes the machine code on the local machine. During the execution,

the Java Virtual Machine will dynamically load the necessary class files h m the local

machine.

In Paralleljava, the Java system class files exist on any machine with the Java system

ported. However, the user class files may not be on some server machines when they are

invoked. In the implementation of the Paralleljava system, the Paralleljava Virtual

Machine on the server side sends those class file names to the client V i d Machine. The

Virtual Machine on the client side then uploads the class files to the server.

The Paralleljava Virtual Machine Layer on the client side is responsible for:

parsing the user requests fiom the commandline,

setting-up and managing an available machine list,

allocating different threads with different parameters to dBerent servers,

dynamic network class füe uploading.

The Paralleljava Virtual Machine Layer on the server side is responsible for:

receiving the initial class files and parameters fkom the client,

creatïng an object for the class file received,

setting the object to nin,

sending the requests to client and waiting for incoming information.

Transportation Layer

In gened, the Transportation Layer in Paralleljava is responsible for:

setting up the connections between the client and diffèrent remote servers,

listening for incoming messages,

setting up a comection for an incoming message,

tramferring the necessary information.

Besides these fiinctions, the Transportation layer on the server side also creates the dae-

mon and maintains it,

4.3 Dynamic Network Class File Loading and Security

4.3.1 Dynamic Class File Loading in Java

The existing Java system loads and nuis class files fiom a local file system. For Parallel-

java, this is insufficient because, in Paralleljava, the class files used might not exist on the

other machines. Therefore, we had to extend the existing Java fiamework for dynamic

class file loading in order to accommodate distributed computation in Paralleljava. in this

section, we begin by explaining the dynamic class file loading mechanism within the

existing Java system and then describe how the Paralleljava extensions have been imple-

mented.

Uniike the edit-compile-link-nui development pattern of other programming languages,

Java programming needs only edit, compile and nui. The Java user program does not need

to be linked to a static executable fiie before running. Instead, during the execution of a

Java p i o ~ , the Java Virhial Machine loads the compiled Java byte code, Le., the class

files, from the local machine's disk dynamicaily. An abstract class ClassLoader in Java is

offered to define the policy for loading Java classes into the runtime environment By

default, the runtime system Ioads cIasses onginating as files by reading them fiom the

directory defined by the CLASSPATH environment variable.

A classloader is itself an object which is responsible for loading classes. Given the name

of a class, the class[ouder will try to locate or generate data that constitutes a definition for

the class. Java uses the strategy of transfonning the name into a file narne and then reading

a class file with that file name (Le., name.class) fkom a file system. When an executable

Java code needs to use a class that has not yet been loaded, the loadClass (as shown in the

following figure) method of the class ClassLoader is invoked to load the class containing

the desired data.

Protected abstract Class loadclass (String name, boolean resolve)

throws ClassNotFoundException

Figure 43: The Class LwdCim in Java

In the above figure, name is the name of the class to be loaded; resolve is a link which

shows whether the symbolic reference of the name in this class is resolved or not.

This class loading mechanism in Java loads class files only h m a local file system. In

Paralleljava, the class files can be anywhere in the network. Therefore, a mechanism for

dynamic network c l w file loading is necessary in Pdle l java

43.2 Dyaamic Network Class File Loading in ParaIleljava

There exist two kinds of class files in Java, Le., the Java system class files and the user

defined class fiIes. The Java system class files exist on any platforms with the Java system

ported. However, the user class files may exist only on some of the machines within a dis-

tributed network computing environment. Thus, in Pdleljava, different class file loading

mechanisms should be developed to cater to the above two situations. In our irnplementa-

tion, we use the default class file loading mechanism provided k the Java system for sys-

tem class file loading. For the user defined class files in a Java program, we have

developed a network class file loading mechanism.

Although Paralleljava offers a homogeneous distributed network computing environment

with the Paralleljava system ported on a i l the platforms, the user class files still need to be

loaded fkom the network. The reason is that when the client dispatches different tasks to

different servers, it dispatches only the related information about the tasks to the servers

instead of the class files themselves. When a Paralleljava user program begins to execute,

there are no user defined class files on the server sides. When the server runtime system

invokes a user class file which is not available and the nuitirne system can not h d a cor-

responding Java source file fiom the local disk fiom which the class file originates, the

runtirne system will send requests to the client for the class file. The client then uploads

the correspondhg class file to the semer- The class füe can be used by the server imrnedi-

ately without any changes.

The implementation of the dynamic network class file loading in Paraileljava [ s e Figure

4.41 is an extension of Java's machine dependent class importing code. It is written in C. It

uses sockets for communication between the client and the server.

//Objective: to implement the dynamic network class file //loading (server side) //import-md is used by the Java interpreter to load a Java //source file from which a required Java class originates. //Before calling import-md,the source file is made sure existing //on the local disk, in paralleljava, only system *.java f i l e //are on the semer side. Therefore, in this funcation, we //use the file extension to tell the ciifference between a //Java system class file and a user defined class file- int import-md (char *name, char *hint)

{ char **qa ; char f ilename [MAXLïNE] ; if (name [O] == DIR-SEPARATOR) {

return (int) LoadFile (name, " . " , hint ) ; 1 #ifdef PARALLEhlAVA //Changes for Paralleljava begins i f ((strncmp(name, "java", 4 ) ) != 0) (

strcpy ( filename, name) ; strcat (filename, . classn) ; NetLoadFiLe(newsockfd,filename);//Network fileloaing

1 #endif for (cpa = CLASSPATH ( ) ; *cpa; cpa++) {

char pathC2001 ; sprintf (path, "tslc%s." JAVAOBJEXT, *cpa, DIR-SEPARATOR,

name) ; if (LoadFile (path, *cpa, hint) ) {

return 1;

1 1 return ( 0 ) ;

Figure 4.4: Tbe Implemtntation of Dynamic Network C b File Lorrding in ParaIleljava

The program code in Figure 4.4 shows the implementation of the dynamic network class

file loading on the Pardeljava semer side. On the client side, Paralleljava uses Java's

class file loading mechanism because al1 class files exist on the client side.

In Figure 4.4, the section of code between #zTdef PARALLEL JAVA and #endifis the exten-

sion made to the Java class loader by Paralleijava In Java, the module import-md Figure

4-41 is used to load any Java source fiie which creates required class files. In Paralleijava,

there are only Java system source files existing on server machines. Therefore, by check-

ing the file extension, Paralleljava can decide which kind of source file it needs to load and

whether it needs to load it fiom the network or firom the local disk. In Figure 4.4, if it is a

user defined Java source file (the Ne extension can not be java in this case), Paralleljava

calls the subroutine NefLuadFile(newsoc~ filename) [Figure 4-51 to load the corre-

sponding class file fiom the network. In the subroutine, the parameter newsoc@l is the

socket file identifier, theflename is the class file name.

//Module name: NetLoadFile //Objective: used on server side import-md to load the //user class file from the client. #ifdef PARALLELJAVA NetLoadFile(int sockfde, char *fne) //The sockfde is a existiag socket number. //The fne is the class file name to be loaded

E: intn, nl, if fd, sizes; charline [MAXLXNE] , f nl [MAXLINEI ; //Send the class file name to the client syscall(SYS-mite, sockfde, fne, strlen(fne)) ; //Open a new file to be written f d = open ( f ne, O-WRONLY 1 0-CREAT 1 0-TFtUNC , 0 744 ) ; //Read the file size from the client syscall (SYS-read, sockf de, &sizes, sizeof (sizes) 1 ; e et the file contents from the client nl = 0; while (ni c sizes) {

n = syscall (SYS-read, sockfde, line, MAXLINE) ; if ( (syscall (SYS-write, fd, line, n) 1 ! = n)

nl = nl + n; 1

close ( fd) ; return;

1 #en&£

Figure 4.5: Tbe NetLoadFik Su broutine

4.3.3 Securiîy

Security is one of the most important issues in developing a distributed network comput-

h g system. In Paraileijava, the security issues c m arise at the time when the client ini-

tiates uploads of files to the server or when a server initiates a download of files nom the

client. Some of the security issues in Paralleljava are addressed within the existing Java

fiamework. For exarnple, when a server downloads files fiom the chent, the security man-

ager in Java regdates access to sensitive functions (e.g., fimctions for updating memory)

and the class loader makes sure that loaded classes are subject to the security manager's

checking and adhere to the standard Java safety guarantees. Furthemore, if w d properly,

there should be no resource abuses existing in the Paralleljava system s h e the network

class file loader in Paralleljava is designed to load only necessary files Erom the client.

However, there are still some security issues in Paralleljava. For exarnple, when the client

uploads files to the server, some of the foliowing might happen:

an unauthorized user application makes use of the server,

a user application hogs resources on the server.

We have not implemented any solutions for these issues- To address these issues, it wouid

involve building additional security mechanisms, however that may affect other users try-

ing to execute code on servers.

4.4 Implementation of data consistency in Paraileljava

Most existing software-based DSM systems use either the operating system's virtual mem-

ory page protection or the compiler extensions to detect and collect writes to shared data-

The first method cm lead to two problems. First, writes might have high overhead since

page faults occur on every writes to a protected page. The page probably needs to be writ-

ten many times to amortize the cost of the page fault. Second, the fked vîrtual memory

page size as the unit of coherency causes false sharing. Mechaaisrns for handling false

sharing rnight increase nin-time overhead and might cause unnecessary data communica-

tion among workstations. The data consistency strategies used in DSM systems extending

the compiler might have some advantages over that used in page-based DSM systems, but

they require modifications to the compiler and they also induce ruritirne consistency mod-

ule overhead. In the impiementation of Paraileljava, because we did not want to do page

granularity data sharing, and because m o w i n g the compiler was beyond the scope of this

work, we adopted a solution Figure 4.61 simiiar to entry consistency, which relies on pro-

gram level annotation to convey coherence related information.

In the entry consistency mode1, the shared data and code are put into critical sections

which are protected by specific synchronization objects. A processor's accesses to the

code and data in the criticai sections are controiied by the synchronization objects. Shared

data become consistent at a processor ody when the processor acquires a synchronization

object that protects the data- in Paralieljava, the shared data is implemented as the shated

object. Any update to the shared object is by caiiing the update method within the class

SharedObject. The shared data is updated &er a successful execution of the update

method.

f Acquire-Lock * 1 Calling-of -the- U p d a t e - M e t h o d

Figure 4.6: The Relationship between EC and Daîa Consistency in Paralleljava

4.4.1 Update Detection, Collection

Udike the mtegies typically used in existing page-based DSM systems and DSM sys-

tems extending compilers, Paraileljava requires neither compiler extensions nor virtual

memory pagefault to ensure memory consistency. In Paralleljava, memory consistency is

irnpiemented by simply broadcasting ail data associated with a synchronization object

during interprocessor communication, Write detec tion is not necessary since Paralleljava

implements its memory consistency mode1 through an update protocoI. This approach is

simple and has no immediate write overhead. However, it will transfer unnecessary data

when synchronization objects guard large data objects that are sparsely written.

In order to reduce the amount of data king m e r r e d , we used a twinoing & difliog

algorithm which is similar to the existing implementation of page-based DSM systems 181.

Using this approach, for each s h e d object (instance of class SharedObject in our imple-

mentatîon) on a procesor, the pmcessor keeps a second copy of each Shared0bjectt At

each synchronization point, the SharedObject bound to the synchronïzation object is com-

pared with its copy to determine which part has been modifïed. This approach avoids the

cost of write detection, but increases the storage requirements (every SharedObject must

be hKinned on any processor which writes it), and the synchronization overhead of the

consistency mechanism (to diff unmodified data and maintain the twin). Moreover, this

approach stiil requires management of the update incarnations to ensure that a chah of

processor updates are correctiy propagated. In the next section, we t ak about the update

propagation among the processors.

4.42 Update Propagation

An advantage of the update-based protocol is that interprocessor cornmunication is only

necessary during the acquisition of synchronization objects. By updating only at synchro-

nization points and only between the synch ron i~g processors, updates to the shared data

guarded by a synchronization object may be coalesced and transmitted to a processor al1 at

once. Further more, by ensuring that updates are performed only when a processor enters a

critical section, unexpected delays in a critical section caused by cache misses c m not

occuf-

In Paraileljava, any operation to the shared object on the server happas only on its own

memory. The client maintains its copy of a shared object as a shared memory to al1 the

memory on the server. At any synchronization point, updates to a shared object on any

server are reflected to the client This is implemented by acquiring the Iock guarding the

shared object on the client (see Figures in section 4.4.3). The client keeps a queue of serv-

ers acquiring the lock. At any t h e , there is only one server holding the lock. The server

with the lock submits its updated information to the client and the client then updates its

own shared object and broadcasts the update to aii the servers.

4.4.3 The Shured Object in Paralleljava

In Java, there exists two kinds of object models. One is the instance of a class, the other is

the array object. The shared object mode1 in Paralleljava is based on the concept of Java

objects. It is implemented as a class with each of its elements implemented as a one-

dimensional string array. Figure 4.7 highiights some of the implementation of the shared

ohjecf class (class SharedObjecf). The s h e d object class in our implementation is actu-

ally a multi-line string b&er which holds the shared data Any instance of such a class

creates a shared object. In the Paralleljava distributed network computing environment,

both the client and the server have copies of the shared object. In the implementation of

the shared object, each server reports any changes it made to the client by calhg the

update method in the SharedObject class. The client then broadcasts the changes to ali the

O ther serves.

//Module n a m e : SharedObject //Objective: to create a shared object in Paralleljava import java, io , * irnport net ,* class sharedobject{

private String valueR; //Value for the string return private int count; //Value for the string storage private int column; //Column of the original array private boolean shared; i/Sharing flag //If it is a client process, then create a Lock Void SharedOb ject ( ) {

if IsClient (me) { Lock instLock = new Lock ( ) ;

//Copy the buffer when the buffer is shared protected void talkWhenShared(int i) {

if (shared) { . . .

1 //Update method for //and broadcast its public synchronized

the server to update shared data updat e void upDate (int an- C l II {

getLock0; //Get the Lock from the client * - *

1 strParse (str) ; talkWhenShared ( row) ;

1 //Needed by semer-client when talkiag with each void setShared0 {shared = true;)

..- ) //End of the SharedObject class

Figure 4.7: The SharedObject Class

Any user program that instantiates the SharedObject class and calls the setshared method

creates a shared object In Paralleljava, the client dispatches the user class which imple-

ments the shared object with different arguments to different servers. The shared object,

therefore, is created on both the client and the servers. In our current implementation of

Paraileljava, the data consistency between client and servers is maintahed by calling the

update method in class SharedObject Any update to the shared object must be reflected to

the client. The client then broadcasts the changes to ai i the server [see Figure 4-81.

Client- C

Spvers /

Legend

- - - (1) P I makes an update t o the shared object and -m

Annotation then c a l l s the update method to inform the

c l ien t P O of i ts update t o the shared object. - Communication (2) Client PO broadcasts the update to other

servers (P2 t o Pn) .

Shared Object

Figure 4.8: Tùe Data Consistency Model inside the Sbareü Object

The advantage of this implementation is that any updates to the shared object occur only

on the server's local address at the beginning, which reduces the overhead caused by mui-

tiple accesses to the shared object residing on the client. Figure 4-9 and Figure 4.10 are

examples of the client and server which use SharedObject-

//Module Name: testlclient //Objective: Sample client code using shared object to //inpletnent array multiplication //First, create a connection with the server, //Second, create a instance of the class Sharedobject //to implement the distributed array multiplication

import net.*; import java.io.*; class test2cli-t extends ~etwork~lient{

public static void main (String args [] ) { //Create a client, here is "tiger", port 8000 NetworkClient client1 = new NetworkClient ("tigerW, 8000); DataoutputStream out = new DataOutput Stream(clientl.server- Output) ; DataInputStream in = new DataInputStream(clientl,serverIn- put) ; intCl II A = new intC21 C21; for (int i=O; i < 2; i++)

fox (int j=O; j c 2; j++) A[i] [j] = i +j;

Sharedobject sharedÀrray= new SharedOb j ect (A, 2, 2 1 ; sharedArray.print ( 1 ; ALOI [O] = 4; A[O] Cl] = 5 ; sharedArray , upDate (A, 0, 2 ) ; sharedArray-print O ;

Figure 4.9: Sample Client Code

//Module name: test2server //Objective: Sample server code using Sharedobject to //implement array multiplication

import net.'; import java-io.*; class test2server extends ~etworkServer{

public static void main (String args Cl ) { NetworkServer serverl = new NetworkServerO; serverl. startserver (8000 1 ; int[] [] A = new intl21 121 ; for (int i=O; i < 2; i++)

for (int j=O; j c 2; j++) ACi] [j] = i +j;

Sharedobject sharedArray = new SharedOb j ect (A, 2, 2 ;

sharedArray -print () ; Ai01 [O] = 4; AC01 [Il = 5 ; sharedArray-upDate (A, 0 , 2 1 ; sharedArray - print ( ) ;

1 final public void run() {

if (isservex) { / /Try to connect with the client. If it is not //successful, then create a new socket and rebind //with the client.

Figure 4.10: Sample Sewer Code

In the above examples, we assume only one server and one client, so we did not use my

synchronkation mechanism besides the synchronization mechanism fiom Java. However,

in a multi-server environment, each semer c d s the update method explicitly to inform the

client of its changes to the shared data, so it needs more synchronization mechanisms than

the synchronization mechanism fiom Java. We have designed and partly implemented a

synchronization mechanism for the SharedObject class. We have implemented a Queue

cIass [See Figure 4-11] on the client to Iine up the servers that cailed the updare method,

and implemented a Lock class [See Figure 4.121 to guard the access to the shared data on

the client More derailed irnplementation of the Queue class and the Lock class are shown

in the Appendix A and the Appendix B respectively.

//Module name: Queue //Objective: Class Queue is used to create and maintain a //çueue on the client, which lines up al1 the requests of servers //who want to update the shared object on the client. //The queue is basically a string array with each //element as a string. Each string in the queue consists //of 3 categories: //serverPid, portNumber, string (real infor.)

class Queue {

private int length; //~he length of the queue private int index; //Index to the Queue element private String[] queue; //Eletnent of the Queue //Construct a queue with a given length public Queue(int l a ) {

.-. 1 //Append to the queue public void Append(String str) {

. . . 1 //Remove an element from the queue public String Remove ( ) {

- - - 1 //Parse the element in the queue, each catagory is separated

a //space public String Cl P a r s e (String str) {

Figure 4.11: The Queue Class

//Module name: Lock //Objective: Class Lock is used to guard the shared object //on the client. //In the design of the lock, a queue is linked //ta the lock, Servers who want to update the shared //data on the client form a queue-

class Lock { private int value; //The value for this lock private Queue queue;//A waiting queue for this lock boolean held; //~ool value to check if the lock is held

//construct a lock from a given value public Lock (int val) {

//Destructor for the lock, we donrt need to worry about the //release of the string, Java will take caxe of that itself public void unLock(int val) {

//Accquire a lock void Acquire (int n) {

//Release a lock void Release ( 1 (

Figure 4-12: The Loek C b

4.5 Summary

In this chapter, we have given an o v e ~ e w of the design and partial implementation of the

Paralleljava system. The current implernentation of the Paralleljava system is by both

extending the Java rutirne system and adding new class libraries. The dynamic network

class file loading is ùnplemented in the runtime system, however, the shared object is

implemented by a group of class libraries. The data consistency in Paralleljava is imple-

mented by the class SharedObject. The Sharedobject is different fiom the RMI (Rernote

Method Invocation) in the following aspects:

Figure 4.13: SharedObject vs. RMI

No Stub

Shared and can even be used within RMI

A Stub is needed

No sharing

Chapter 5

Conclusions and Future Work

5.1 Conclusions

The goal of this thesis was to explore the potential for irnplementing paralleLimi by using

exiting features in Java.

Our exploration began by looking at Java's multithreading feature. Java supports multiple

threads of control, but threads are created and executed on a single machine. In order to

implement parallelism, different threads in a Java program should be created either on dif-

ferent machines or be created on one machine and then dispatched to different machines.

In our exploration, we have successfùlly executed different threads in the user's Java pro-

gram on different machines. The results of our exploration show that it is easy to create

threads on different machines and then execute them. However, it is more dficult to cre-

ate threads on one machine and then dispatch them to different m a c b s . The reason is

that dispatching threads to different machines requires thread-reIated information, such as,

thread name, thread execution context, etc.

After experimentation with Java's multithreading feature, we have explored several alter-

natives for data consistency which include update detection and update propagation. The

alternatives for update detection include: extending the fiinctionaiity of the store instruc-

tion in the Java Virtual Machine; using virhial memory page protection mechanisms and

using information provided by the application programs. The alternatives for update prop-

agation include: extending the fhctionality of the return operator in Java; extending the

existing synchronization mechanisms in Java and utilking the information provided by

the application programs.

We have also studied some alternatives for creating distributed shared objects. Since all

the objects in Java are created using the new operator, our stuclies are based on extending

this operator. Our studies include introducing a keyword sliared, an operator shared-New

or a class SharedObject. The results of our studies show that the third alternative through

introducing a class ShoredObject is more suitable for developing a parallel computing sys-

tem. The reason is that by extending the class Iibrary, the r e d t parallel computing system

will not have to be updated when the Java system itself is changed. Both of the first two

alternatives require changes to the Java compiler and the Java interpreter.

We have also successfûlly implemented dynamïc network class file loading within Java.

Java supports dynamic class file loading, but all the class files are on the Local disk. In a

distributed, paralle1 computing environment, a user class file can be anywhere in the net-

work- Therefore, developing a dynamic network class file loading mechanism is very

important.

Based on our exploration, we conclude that it is possible to implement parallelism within

the Java fiamework. We have also described in this thesis the design and partial imple-

mentation of an experimental pualle1 computing system that we c d Paralleljava. Parallel-

java system made extensions to both the Java API and the Java Runtime system. It uses a

data consistency modd sirnilar to entry consistency and implements an update-based pro-

tocol.

Using Paralleljava, users can create distributed Java programs. Shared data in Paralleljava

are managed by the class SharedObject. Any Ilistantiation of the class SharedObject cre-

ates a shared object. Data consistency between shared objects is maintained by calliig the

update method of the class SharedObject,

5.2 Future Research

In this thesis, we have explored various possibilities in developing distributed, pardel

computing systems using features existing in the Java system. We have implemented

dynamic network class file loading and we have designed and partially implemented a

DSM system which we cal1 Paralleljava.

However, a number of issues still remain to be addressed- They include:

Security: we have highlighted some of the potential secinity issues with P d e l j a v a in

this thesis, such as an unauthorized user application making use of a server and a user

application hogging resomes on the server. For the first issue, one possible solution is

to nin a daemon on the server to iden* the client machines so that unauthorized

machines can not connect to that server. A possible solution for the second issue is to

set a limit on the resources (such as, memory and CPU time etc.) each machine can use

so that each dient application can use ody certain amount of CPU time and memory.

Building another version of Paraileljava with data consistency implemented through

extending the existîng Java synchronization mechanisms and the Java compiler. In the

current version of Paralleijava, data consistency is enforced through information pro-

vided nom application progratns (e.g., the update method in class SharedObject). How-

ever, in our shidies in chapter 3, we have shown that it is possible to implement data

consistency without application program annotation. For instance, the store instructions

in the Java vimial machine c m be extended for update detection and the existing Java

synchronization mechanisms can be extended for update propagation.

Appendix A The Queue Class

//Module name: Queue //Objective: Class Queue is used to create and maintain a //queue on the client, which lines up al1 the requests of servers //who want to update the shared object on the client. //The queue is basically a string array with each //element as a string. Each string in the queue consists //of 3 categories: //serverPid, portNumber, string (real infor - class Queue {

private int length; //The length of the queue private i n t index; //Index to the Queue element private String[] queue; //Element of the Queue //construct a queue with a given length public Queue (int len) {

queue = new String Clen] ; for (int i = O; i c len; i++) queueii] = new String();

length = len;

1 //~ppend to the queue public void Append(String str) {

int len = length; thi~~queueflen] = str; length += 1;

1 //Remove an element from the queue public String Remove0 {

length -= 1; String str = new String (queue [lengthl 1 ; return str;

1 //Parse the element in the queue, each catagory is separated by

//space public String ll Parse (String

Systern. out .println ( -This

Appendix B The Lock Class

//Module name: Lock //Objective: Ciass Lock is used to guard the shared object //on the client. //In the design of the lock, a queue is linked //to the lock, Servers who want to update the shared //data on the client form a queue. class Lock {

pxivate int value; //The value for this lock private Queue queue;//A waiting queue for this lock boolean held; //Bo01 value to check if the lock is held

//construct a lock from a given value public Lock(int val) {

value = val; held = false; queue = new Queue(100) ;

//Destxuctor for the lock, we dontt need to worry about the //release of the string, Java will take care of that itself public void unLock(int val) {

held = false; }

//Accquire a lock void Acquire (int n) {

while (held ! = true) { //insert this request to the queue this-queue = Integer.toString(n) + this-queue; this.queue,Append(str) ; 1 held = true;

1 //Release a lock void Release ( ) {

//remove a request from the queue held = false;

Bibliography

[1] W-Richard Stevens. Unir Neîwork Progrumrning- PTR Prentice Hall, Englewood

Cliffs, New Jersey 07632.

[2] Tim Lindholm, Frank YeIlin. The Java Virtual Machine Spectjicution. Addison- Wes-

ley, 1996.

[3] James Gosling, Biil Joy, and Guy Steele. The Java Language Spectjkation. Addison-

WesIey, 1996.

[4] Ann Wolhth, Roger Riggs, and Jim Waldo. A Distributed Object Mode1 for the Java

System. Javasoft- Computing Systems 9(1), pages 265-290, 1996.

[SI Roger Riggs, Jim Waldo, Ann Wollrath. Pickling State in the Java System. The 2nd

USENM Conference un Object-Oriented Technologies, 1996.

[6] TreadMarks Documentation. http://www.cs.rice.edu/-willy/rreadMarks/over-

view-htrnl.

[7] Ken Arnold, James Gosling. m e Java Programming Lanugage. Addison-Wesley,

[8] Peter Keleher, et al. TreadMarks: Distributed Shared Memory on Standard Worksta-

tions and Operating Systems. In Proceedings of the Winter 91 Useni% Con&rence,

pages 115-131, January 1994.

[9] Brian N. Bershad, et al. The Midway Distributed Shared Memory System. In Pro-

ceedings of the '93 CompCon Conference, pages 528-537, February 1993.

[l O] A-Silberschatz, J-Peterson, et al- Operating System Concepts. Addison Wesley Pub-

iishing Company, 199 1.

[ I l ] Gary Comeli, Cay S-Horstmam. Core Java. Prentice Hall PTR @CS Professional),

December 1998.

1121 Harjinder Sandhu. Shared Regions: A Strategy for Efficient Cache Management in

Shared Memory Multiprocessors. Ph.D thesis, University of Toronto, July 1995.

[ 1 31 PVM: Parallel Virtad Machine. hffp:/ /~.netl ib.org/p~/booWnode 1 .html.

[13] Paul S. Wang. C++ with Object-Oriented Programming. PWS hblishing Company,

[14] J h Waldo, Geoff Wyant, Ann WolIrath, and Sam Kendail. A Note on Distributed

Computing. Sun Microsystems Laboratorks Technical Report, SMLI TR-94-29,

November 1994.

[15] Robert Orfail, Dan Harkey, Jeri Edwards. The Essential Disrribuied Objects SurvivaZ

Guide. John Wiley & Sons, Inc, September 1995.

1161 Gregory V- Wilson and Paul Lu. ParafZeZ Programming Using C++. MIT Press,

[17] Cristana Amza, et al. TreadMarks: Shared Memory Computing on Networks of

Workstations. E E E Compter, Vol. 29, No. 2, pages 18-28, February 1996.

[18] Esharat Arjomandi, William 07Farreii, et al. ABC*: Concurrency by inheritance in

C++. lBM System Journal, Vol. 34, No. 1, pages 120-1 37, January 1995.

[19] John K, Bennett, et al. Munin: Distributed Shared Memory Based on Type-Specific

Memory Coherence. in Proceedings of the 1990 Conference on the Princlples and

Practice of ParaIIel Programming, March 1 990.

[20] Peter keleher, Alan L. et ai. Lazy ReIease Consistency for Software Distributed

Shared Memory. In Proceedings of the 19th Annual International Symposium on Com-

puter Architecture. Pages 1 3-2 1, May 1992.

[2 1 ] ] John B. Carter, John K. Bennet, et al. lmplementation and Pefiormance of Munin. In

Proceedings of the 13th ACM Symposium on Operuting Systems Principles, pages

152-1 64, October 199 1.

[22] Alan L. Cox, Sandhya Dwarkadas, et al. An integrated Approach to Distributed

Shared Memory. First international Workshop on Parallei Processing, December

[23 3 Sandhya Dwarkadas, Peter keleher, et al. Evaluation of Release Consistent S o b a r e

Distributed Shared Memory on Emerging Network Technology. iSCA93, pages 144-

155, May 1993.

[24] M. J- Zekauskas, W. A. Sawdon, B. N. Bershad, Software Write Detection for a Dis-

tributed Shared Memory. OSDII, pages 87-100, Nov. 1994.

[25] R. K. Karne. Object-oriented Cornputer Architectures for New Generation of Appli-

cations. Compurer Architecture News, Vol. 23, No. 5-Dec- 1995.

[27] Introduction to Distributed Shared Memory. http://cne.gmuedulmoduies/dsml

index-html

[28] G. Hilderink, J. Broenink, et al. Communicating Java Threads. In Proceedings of the

20th WorM Occam and Tramputer User Group Technical Meeting, pages 48-76,

[29] D. Thompson, D. Watkhs. Cornparisons between CORBA and DCOM: Architec-

tures for Distributed Computing. http ~/~.sd.monash.edu.au/research/pubLcatiod

1 997/ABSTRACTS.html#P97- 1

[30] Distributed Shared Memory Home Pages. http:/f-.cs.umd.edu/-keleher/dsmsmhtml

Obj Parallel on Heterogeneous Workstation Clusters Using...

Documents

ebooks.azlibnet.azebooks.azlibnet.az/book/MY4oKEpB.pdfTranslate this page%PDF-1.4 %âãÏÓ 1 0 obj > endobj 2 0 obj > endobj 3 0 obj > endobj 4 0 obj >/Type /Page>> endobj 5 0 obj

(GPU) - cpe.ku.ac.thparuj/204521/GPU.pdf · Warp thread Nvidia SIMT (Single Instruction Multiple Thread) thread warp kernel Thread Warp 3 Thread Warp 8 Thread Warp Common PC Scalar

NexTech2010 keynotes overview25102010 - IARIA€¦ · PBSM Thread Processors PBSM Thread Processors Thread Processors Thread Processors Thread Processors Thread Processors Thread

Threads and Fasteners Thread Symbols. Screw Thread Terms: External Thread: A thread on the outside of a member, as on a shaft. Internal Thread: A thread

Instantiating a Global Network Measurement Framework

Instantiating the PERFoRM System Architecture for Industrial …€¦ · Instantiating the PERFoRM System Architecture for Industrial Case Studies Paulo Leitão 1,2, José Barbosa

JAVA PROGRAMMING - KNREDDY...1970/01/01 · CREATING A THREAD We create a thread by instantiating an object of type Thread.Java defines two ways in which this can be accomplished:

UFCEKU-20-3Web Games Programming Instantiating World Objects

perfSONAR: Instantiating a Global Network Measurement Framework

this pagePDF-1.6 %âãÏÓ 1 0 obj stream endstream endobj 2 0 obj endobj 4 0 obj endobj 3 0 obj endobj 6 0 obj endobj 5 0 obj

Unit 3In the most general sense, you create a thread by instantiating an object of type Thread. Java defines two ways in which this can be accomplished: { You can implement the Runnable

Instantiating Abstract Argumentation with Classical Logic ... · Instantiating Abstract Argumentation with Classical Logic Arguments: Postulates and Properties Nikos Gorogiannis a,

GEWINDEDREHEN THREAD TURNING GEWINDEFRÄSEN THREAD …

Panel: Gobierno Abierto · Sociedad de la Información y el Conocimiento. Obj 1 Obj 2 Obj 3 Obj 4 Obj 5 Transparencia e-Inclusión e-Participación e-Servicios Tecnología e Innovación

C++ (7. Vorlesung) Exercise Discussion, Advanced Templates · What if we have multiple source files instantiating the same template? – file1.cc instantiating vector file2.cc

Towards Instantiating Design Principles for Physical Networks€¦ · Towards Instantiating Design Principles for Physical Networks CHI 19, Glasgow, Scotland, part of MIA emphasising

MAJOR/MINOR ASSESSMENT INSTRUCTIONS...obj. 1 obj. 2 obj. 3 obj. 4 obj. 5 obj. 6 obj. 7 obj. 8 acc 240 x acc 241 x bus 206 x x x x x x x bus 231 x x x x bus 321 x x x x x x x bus 332

The HotCiv GUI Instantiating the MiniDraw Framework

Cloud-RAN Factory: Instantiating virtualized mobile

wyliesystems.netwyliesystems.net/i3000 hyd RCI cal.pdf%PDF-1.3 %âãÏÓ 1 0 obj endobj 2 0 obj endobj 3 0 obj endobj 4 0 obj endobj 5 0 obj