View
9
Download
0
Category
Preview:
Citation preview
Distributed Obj ect-Oriented Parallel Computing on Heterogeneous Workstation
Clusters Using Java
Meijuan Shan
A thesis subrnitted to the Faculty of Graduate Studies
in partial fùlfillment of the requirements
for the degree of
Master of Science
Graduate Programme in Computer Science
York University
Toronto, Ontario
July 1999
National tibrary BiMibthèque nationale du Canada
Acquisitions and Acquisitions et Bibliographie SeMces senrices biûliographiques 395 Wdlingîm Street 395, nm WeKmgtm Ottawa ON KYA ONl -ON K I A W canada Canada
The author has granted a non- exclusive Licence allowing the National Library of Canada to reproduce, loan, distribute or sell copies of this thesis in microform, paper or electronic formats.
The author retains ownership of the copyright in this thesis. Neither the thesis nor substantiai extracts fiom it may be printed or otherwise reproduced without the author's permission.
L'auteur a accordé une licence non exclusive permettant à la Bibliothèque nationale du Canada de reproduire, prêter, distribuer ou vendre des copies de cette thèse sous la forme de rnicrofiche/nlm, de reproduction sur papier ou sur format électronique.
L'auteur conserve la propriété du droit d'auteur qui protège cette thèse. Ni la thèse ni des extraits substantiels de celle-ci ne doivent être imprimés ou autrement reproduits sans son autorisation.
Distributed, Object-Oriented, ParalleI Computing on Heterogeneous Workstation Clusters using Java
~ b~ Meijuan Shan
a thesis submitted to the Faculty of Graduate Studies of York Univefsity in partial fulfillment of the requirements for the degree of
Master of Science
Permission has been granted to the LI9RARY OF YORK UNIVERSITY to lend or seIl copies of this thesis. to the NATIONAL LIBRARY OF CANADA to microfilm this thesis and Co iend or seIl copies of the film, and to UNIVERSW WCFiûFiWS ta pu#ish an abstrad of mis thesis- The author reserves other publication nghts, and neither the thesis nor extensive extracts f r m it may be pnnted or otherwise reproduced w i h t the author's written permission.
Abstract
Unlike stand-alone workstations, computing on distributed n e ~ o r k s of workstation clus-
t ea requires dealing with different types of heterogeneity, such as architecture, computa-
tionai speed and network load, etc. Java, introduced by Sun, is designed specifiçally for
secure, disaibuted, network computing. Besides the platform independent bytecode, Java
also provides some basic mechanism for concurrency at the language level such as multi-
threading and synchronization. In ttiis thesis, we explore the use of the mechanisms avail-
able in the Java system to buiid a software hfkstructure for distributed shared memory
parallel computing among distributed, heterogeneous networks of workstations.
Our exploration begins with Java's multithreading feature. Java supports multiple threads
of control on a single workstation. In order to achieve parallelism, threads in a Java appli-
cation shodd be created either on different machines or created on one machine and then
dispatched to different machines. In our exploration, we have successfÙUy executed
ferent threads in the user's Java program on different machines. M e r experimentation
with Java's multithreading feature, we have explored several alternatives for data consis-
tency which include update detection and update propagation. We also explored some
alternatives for creating distributed shared objects.
The results of our exploration show that there are several alternatives for achieving paral-
lelism through extensions to the Java system. Some of the alternatives require extensions
to the existing Java compiler and interpreter; some of them require a new APL in the last
part of this thesis, we describe the design and partial implementation of a parallel comput-
ing system based on Java that we c d Paraileljava. Paralleljava system makes extensions
to both the Java API and the Java nintime system. It uses a data consistency mode1 similar
to entry consistency and implements an update-based coherence protocol.
To my parents,
who have dedicated to education more than 40 years and who have been my mentors since
1 was bom.
Acknowledgment
1 am most indebted to my parents who have been my mentors shce 1 was bom, who have
been nourishing me with their encouragement and wisdom, who require so Littie and give
so much. Words alone is not enough to express my feelings for them.
Thanks to Professor Hainder Sin& Sandhu for supervising this thesis and for his wis-
dom guidance. Thanks to other professors in my thesis committee, Eshrat Arjomandi,
Rich Pagie and Rene Fournier for taking the time reading the thesis and providing feed-
backs.
1 am very grateful to my husband Yang. Thanks him for his moral suppon and his huge
amount of work for helping me finishing this thesis. No word can express my gratitude to
hirn.
1 am indebted to rest of my families: grandpa, uncles, aunts, Uiiyan, Zhichao, Meicha,
Lili, Baoli, Xiaohua, Xining, Xinggang, Feifei, Xiaoxiao, Taoshan. Thanks to al i of them
for givhg me constant support for so many years and thanks them for making my life
c o l o d .
Thanks to other people in our computer science department: Professor Jenkin, Patricia,
Professor Ammatides, Ulya, Lisa. Thanks thern for their helps. And I also want to thank
some of my fiiends, Ben, Arhie and Jason for their helps during the fust few years I came
here.
Table of Contents
Chapter 1 Introduction
1.1 Motivation ................................................................................................................ 1
................................................................................... 1 -2 Parailel Computing in Java -3 ............................................................................................. 1 -3 Paralleljava Overview 4
.................................................................................................. 1 -4 Thesis Contribution 5
Chapter 2 Background .................................................................................... 2.1 Distributed Shared Memory -7
2.1.1 Distributeci Shared Memory Systems ................................................................................ 8 2.1 -2 Issues in Designing a DSM system .................................................................................... 10
................................................................. 2.2 Distributed Object-Onented Paradigm -22 2.2.1 Some Key Issues in Designing Distributed Object-Otiented Systems .............................. 23 2.2.2 Other Issues in Designing Distributed Object-Oriented Systerns ...................................... 26
2.3 Java ......................................................................................................................... 27 2.3.1 The Java System Architecture and the Java Rogramming Architecture ....................... -27 2.3.2 Multithreading and Synchronization in Java ..................................................................... 30 2.3.3 Memory Management in Java ..................................................................................... 3 1
2.4 Summary ................................................................................................................ 32
Chapter 3 Parallel Computing in Java 3.1 Issues in Developing DSM Systems within Java .................................................. 34
................................................................... 3 -2 Using Multithreading for Parallelism -36 3.2.1 The Creation of the Thread Objects at the Java Language Level ..................... .. ............... 36
.................................................... 3.2.2 Support for Threads at the Java Runtime System Level 38 3.2.3 Dispatching Threads to Different Machines in the Runtime System ................................. 39
....................................................................................... 3.3 Creating shared objects 42 3.3.1 The New Operator in Java .................................................................................................. 42 3.3.2 Extending the Functionaiity of the New Operator ............................................... ........... 43
3.4 Exploring Memory Consistency and Coherence ................................................... 46 3.4.1 Update Detection .............................. .... ..................................................................... 47 3 .4.2 Update Propagation ........................................................................................................ 53
3.5 Summary .............................................................................................................. 61
Chapter 4 The Paraiieljava System 4.1 The OveMew of Paralleijava System .............. .. ................................................ 63
............................................. 4.2 The S ystem Architecture of the Paralleijava System 65 4.2.1 The System Architecture of the Paralleljava System ......................................................... 66
............................................... 4.2.2 The Functionaiity of each iayer in the ParaIleljava System 68
............................................. 4.3 Dynamic Network Class File Loadhg and Security -70 ................................................................................. 4.3.1 Dynamic Class File Loading in Java. 70
....................................................... 4.3.2 Dynamic Netwotk Class File Loading in Paraileijava 72 4.33 Security .............................................................................................................................. 75
............................................ 4-4 lmplementation of data consistency in Paralleijava ...7 6 4.4.1 Updaîe Detection . Collection ............................................................................................. 77 4.4.2 Update Ropagation ............................................................................................................ 78 4.4.3 ï h e Shared ûbject in Paralleljava ...................................................................................... 79
Chapter 5 Conclusions and Future Work .......................................................................................................... 5.1 Conclusions 86
................................................................................................... 5.2 Future Researc h -88
.................................... Appendix A The Queue Class ... ..A
......................................................................... Appendix B The LockClass 91
viii
Figure 4.8. The Data Consistency Mode1 inside the Shared Object ........................... 8 1 ................................................ Figure 4.9. Sample Client Code ...................... ......... 82
Figure 4.10. Sample Server Code ................................................................................... 83 Figure 4.1 1 : The Queue Class ..........................................,......,.......... ....................... -84 Figure 4.12. The Lock Class ............................................................... ........................... 85 Figure 4.13. SharedObject vs . RMI ................ ..,.... ......... .. ..................................... 85
Chapter 1
Introduction
1.1 Motivation
With increasing fiequency, networks of workstations are king used as parallel cornputers.
High speed general-purpose networks and very powerfid workstation processors have nar-
rowed the performance gap between workstation clusters and supercornputers. Further-
more, the workstation approach provides a relatively low-cost, low-risk entry into the
parallel computing arena. In ternis of performance, irnprovements in processor speed, net-
work bandwidth and latency allow networks of workstations to provide performance
approaching or exceeding supercornputer performance for an increasing class of applica-
tions. In terms of cost, many organizations already have installed workstation bases and
no special hardware is required to use this facility as a parallel cornputer. The resulting
system can easily be maintained, extended and upgracied.
On the other han& computing in a network is not like in an MPP (Massively Parailel Pro-
cessor), in which aii processors have exactly the same capability, resources, software, and
communication speed. The computers available on a network may be h m different ven-
dors or have different operating systems or compilers. Therefore, the software supporting
network computhg must cope with different types of heterogeneity, such as architecture,
data format, computational speed, machine load and network load, etc.
Java is designed specificdly for secure, dis tr ibw Web-based applications. A Java pro-
gram is compiled into an intermediate code, caiied bytecode, which is independent of the
hardware architecture and the operating system. Any platform with the Java Viaual
Machine ported to it can run Java bytecodes without modification. Thus, developing
applications using Java brings about software that is portable across multiple machine
architectures, operating systems, and graphical user interfaces.
Java also provides some basic mechanisms for concurrency such as multithreading and
synchronization at the language level. Using Java, users may program with multiple
threads of control in a single program, but aü of those threads get executed on a single
machine. Like any other interpreted language, Java bytecode must h t be changed to exe-
cutable machine code by the Java Virtual Machine and then executed. This makes the exe-
cution of Java programs slower compared with equivalent programs Wfitten in C or C H .
However, distributing those threads within a Java program across a network of computers
can potentially alIow the program to execute faster.
In this thesis, we explore the mechanisms available in the Java system to build a software
infr;istnicture for parallel computing among distributed, heterogeneous networks of work-
stations. Our exploration ranges fiom extending the fiinctionality of the Java API (Appli-
cation Programming Interface) to the Java Viriual Machine. We have also designed and
partially implemented a new software parallel computing systern called Paralleljava Par-
alleljava provides extensions to the Java API and the Java Virtual Machine. It d o w s Java
applications utilizing the Java muitithreading facilities to be executed in parallei on dis-
û-ibuted, heterogeneous networks of workstation environments.
1.2 Paraiiel Computing in Java
The primary objective of this thesis is to explore the mechanisms existing in the Java sys-
tem to impiement parallelism among distributed, heterogeneous networks of workstations.
The Java system, consisting of the Java programming Ianguage and the Java runtime sys-
tem, may be extended in a variety of ways to achieve parallelism. The obvious approach is
to extend both the Java API and the Java mutirne system to provide a complete parallel
programming environment. However, it is possible to achieve pardelism in Java by
extending the Java API (with a set of paraiiel classes) without modifjhg the Java runtirne
system. Alternatively, since Java already provides some basic mechanisms for concur-
rency, such as multithreadiog and synchronization, it may also be possible to modw the
Java runtime system to achieve parallelism without extending or m o m g the Java pro-
gramming environment.
The suitability of each of these approaches to parallelism within the Java fiamework
depends on the nature of the target computing environment. In this thesis, we consider
both the above approaches as well as one that is a mix of both alternatives. nie explora-
tion starts nom mdtithreading, the basic mechanism for concurrency in Java. Java sup-
ports multiple threads of execution, but threads are created and executed on a single
machine. A thread in Java is created by instantiating the Thread class and started by
invoking the start method of the Thread class. In order to implement parallelism, different
threads should either be created on different machines at the beginning or be created on
one machine and dispatched to different machines when invoking the s t a method. In the
frrst case, we need to extend the Java API, whereas, in the latter case, we have to mod*
the Java runtime system. These WU be discussed M e r in chapter 3.
Along with creating merent threads on different machines, the key issues in developing
parallel computing in Java are updoe detection (detennining when some shared data have
been modified) and update propagation (transmitting the updates to that s h e d &ta to
other machines). In chapter 3, we consider alternatives for each of these issues.
1.3 Paraueljava Ovemew
After looking into above alternatives, we settled on one set of the alternatives and impie-
mented it in a system we c d Paralleljava. Paralleijava is designed to utilize the mecha-
nisms for parallel computation existing within the Java system. It extends both the Java
API and the Java runtime system to achieve parallelism in the Java system. The Java API
is extended to support a shared objects model in Paralleljava. The shared object is based
on the object model in the Java system, The instantiation of the shared object class in a
user program provides a unified object model across various platform. The extension of
the Java r u t h e system presents users with the illusion of distributed shared memory.
1.4 Thesis Contribution
The introduction of Java, with its multithreading and platfiorm-independent features, has
sparked considerable interest among the distributed and parallel programming c0mmm.i-
ties. Many research projects have been proposed or are currentiy on going. A key part of
the work presented here was to look into the Java system and to exam how parallelism
could be made to fit into it. Some issues we considered for parallel computing in Java
included creating parallelism by using Java's multithreading features, sharing data among
different processors, and keeping shared data on different machines consistent. In the pro-
cess of addressing these issues, we designed and partially implemented an experimental
system we call Paralleljava
Up to ~ O W , various software systems have been proposed and built to support paralle1
computing on workstation networks, using either Distnbuted Shared Memory (DSM) or
Message Passing interface (MPI). Paralleljava works much like conventional distributed
shared memory systems, which present viaual shared memory to a group of workstations,
though the workstations physicaliy do not share memory. What distinguishes Paralleljava
fiom previous distrïbuted shared memory systems is that Paralleljava is a Java-based,
Web-optimized pardel computing system. By taking advantage of features such as plat-
form-independence and mdtithreading existing in the Java system, Paralleljava allows cli-
ents to dowdoad and thereafler execute in parallel a single Java application on networks
of workstations. The clients can also automatically upload and execute prograrns on
remote computing servers. The computing servers c m be any cornputers within the net-
work with Java system ported. The program is automaticaily uploaded and executed on a
computing semer, and results are returned to the client. In the case of a parallel applica-
tion, the client may upload code to many heterogeneous computing servers throughout the
Intemet.
This thesis explores mechanisms for parailel computing in Java and presents the design
and the partial development of the Pardeljava system. Chapter 2 gives an overview of
distributed, object-oriented computing and a bnef introduction of the Java system. Chap-
ter 3 presents alternatives for parallel computing that utiLize the mechanisms existing in
Java. Chapter 4 describes the design and the partial implementation of the Paralleljava
system. Chapter 5 presents the conclusions of this thesis and discusses fûture work in
developing Java pardlel computing.
Chapter 2
Background
The objective of this thesis is to explore the use of the mechanisms in the existing Java
system to build a software idiastmcture for executing seriai or mdtithreaded Java pro-
grams on faster computing servers or on a collection of possibly heterogeneous h o a . The
mechanisms we are using apply concepts fiom existing object-onented, distributed shared
memory systems and the Java system. in this chapter, we will review some of these con-
cepts. Section 2.1 gives a brief introduction to distributed shared memory systems. Section
2.2 describes the prirnary issues in the design of distributed object systems. Section 2.3
introduces the Java system together with its features. Section 2.4 surnmarizes this chapter.
2.1 Distributed Shared Memory
Parallel computing on networlcs of workstations genedy fds into two categories: Mes-
sage Passïng and Distributed Shared Memory ('SM). The Message Passing mode1 uses
primitives such as send and receive for interpmcess communication. The DSM model pro-
vides processes in a system with a shared address space in which data is accessed with
read and wite operations. One of the advantages of DSM over Message Passing is that it
presents users with a unified memory model. Using a DSM system, users need not wony
about the ciifferences between remote and local memory access. In the next section, we
briefly review some basic concepts of a DSM system as weii as the key design issues.
2.1.1 Distributed Sbared Memory Systems
A Distributed Shared Memory (Figure 2.1) system is a software system which is built to
support parallel computation on networks of workstations. Paraileljava, the syaem
described in this thesis, takes existing DSM concepts and builds them into the Java hme-
work. Consequently, we spend some time in this section dearibing some of the basic
DSM concepts, issues, and descnbing some of the DSM systems that have been built in
the past The idea b e b d DSM is to try to emulate the cache of a multiprocessor using
operating system software or runtime library routines.
1 Software hplementation Layer 1
Shared Memory
Figure 2.1: Distributcd Sbareà Mcmory
As shown in Figure 2.1, the workstattions physically do not share memory, but the Sofi-
ware Implementation Layer between the processors and the memones presents the illusion
of shared memory. All memory accesses are controlled by the software implementation
layer. In a DSM system, dl remote memory accesses behave Iike local memory accesses.
This relieves the programmer fiom worrying about remote memory access in developing
p d l e l applications 11 7 [14].
Besides the ease of programming, DSM systems provide the same programming environ-
ment as that on shared-memory multiprocessors. Applications developed for a DSM sys-
tem can be easily ported to a shared-memory multiprocessor dthough porting an
application developed for a shared-memory multiprocessor to a DSM system may need
some modifications to the program due to the higher latencies in a DSM system [lq-
2.1.2 Issues in Designiag a DSM system
The major issues in designhg a DSM system are: grmufarity, memory consistency and
coherence. In this section, we give a brief overview of these issues.
Granularity
Granularity refers to the size of the memory unit at which data is s h e d between the pro-
cessors. According to gmoularity, DSM systems can be categorized as page-based DSM
systems and region-based DSM systems. Page-based DSM systems (such as ZW
[ 1 71 [9] [2 11, TreadMarks [ 1 4, Brazos [3 O], CVM [30] and Quarks 1301) take a normai Lio-
ear address space and allow the pages to migrate dynamically over the network on
dernaad. Page-based DSM systems exploit the existing virtual memory hardware and
operating system available on most common architectures. In a page-based DSM system,
each node in the system keeps a copy of each shared memory page. When any of the nodes
modifies its own copy of the shared memory page, al1 the other nodes will set their copies
to be invalid Any accesses to the invalid pages will cause a vimial memory page fault.
Since page-based DSM systems use operating system pages as the sharing unit, compilers
do not need to be changed and DSM systems themselves are transparent to the user's pro-
gram. One of the disadvantages of page-based DSM systems is fake sharing. Since the
page size in a DSM system is k e d , different data items accessed by different processors
might end up king ailocated in the same page. In such circumstances, the system will
generate coherence trafic between these processors (perhaps by repeatedly transmitting
the page back and forth between them) even though the processors are actually accessing
dinerent portions of the page and are not sbaring any data. For example, suppose two dif-
ferent data items dl and d2 are on the same page? and that processor PI references oniy
dl, processor P2 references only d2. When PI updates dl, the memory page containhg dl
and d2 has to be sent to P2 even though d2 is not updated by PZ. Besides false shmhg,
irnplementation of page-based DSM systems are also architecture dependent due to use of
the vimial memory page protection mechanism.
In region-based DSM systems (such as Clouds[27], Midway[9 ] [2 11, Munio[2 11 and
ABCt+[lq [12]), only certain variables and data structures needed by more than one pro-
cessor are shared. These shared variables or data structures are put into critical sections
guarded by synchronization objects. Each shared data or variable is referenced as a region.
Unlike page-based DSM systems which use operating system pages as the sharing unit,
the sharing unit in region-based DSM systems is a region- Regions are chosen by the
application and can be of arbitrary size. In the implementation of a shared region-based
DSM system, a synchronization object is bound to a specific shared data, when a proces-
sor acquires the synchronization object, the data that is bound to that object becomes con-
sistent.
While page-based DSM systems use vimial memory pagefault handling mechanisms to
detect updates and accesses to a page, in region-based DSM systems, the application itself
has to supply information about when regions are accessed and modified. For example, in
the shared region implementation in Hurricane [12], a programmer e s t identifies the set
of shared regions in the program, and then encapsulates each series of accesses to those
regions with a set of annotations that indicate when those regions are referenced and
whether they are referenced for read-only or for write. The annotations include: readoc-
cess (acquires a region for read only); wrireaccess (acquires a region for read and write);
readdone (releases a readaccessed region); wriredone (releases a writeaccessed region).
Region-based DSM systems reduce most of the false sharing by using a variable size shar-
ing unit. However, in region-based DSM systems, application programmers have to
choose shared regions, choose the type of synchronization objects and biml them together.
This increases the complexity of application programs.
Memory Consistency
Memory consistency refers to the way in which updates to shared memory are reflected to
the processors in the system. In a DSM system, shared data is duplicated on ail the proces-
sors. In order to improve the performance, this data can be accessed concurrently. How-
ever, if the concurrent accesses are not carefùlly controlled, accesses to the shared data
may be executed in an order different fkom what the programmer expected. In other
words, if a read to the shared data returns the results fiom the most recent write to that
shared data, the memory is said to be coherent- In order to maintain the coherence of
shared data, a h e w o r k (Le., memory consistency model) that describes how to control
or synchronize the accesses is necessary, In this section, we review some of the common
memory consistency models.
Sequential Consktency (SC) [20](22] requires that modincations to shared memory are
made visible immediately to al1 processors sharing the same data In other words, i f a
memory is sequential consistent, any read to a shared memory location must reflect the
most recent write to that same memory location anywhere in the system- For example,
in a simple sequential consistency implementation, a processor may acquire exclusive
access to a memory page before modifjbg it, and then transmit its modification to
other processors before giving up exclusive control of that page. The earliest page-
based DSM system (e-g., iVY [17] [9][[2 l]) implemented sequential constancy because
it was the most obvious model. The disadvantage of using the sequential consistency
model is that it c m result in large amounts of communication [Figure 2.21.
Figure 2.2: An Example of Sequential Consistency among Three Processors
In Figure 2.2, we assume that there are three processors Pl, P2, P3 in a DSM system.
Processor Pl performs three write operatïons: w(x), ww), w(z)- This assumption also
applies to Figure 2.3, Figure 2.4, Figure 2.5 and Figure 2.6.
In Figure 2.2, after each write, Pl makes its write visible to P2 and P3. Sequential con-
sistency is not often used in page-based DSM systems now due to its high communica-
tion overhead.
Weak Consistency (WC) [20] does not make the modification to shared memory visible
to al1 the other processors sharing the same data until a synchronization point is reached
Figure 2-31. In other words, any write to shared memory performed by one processor
will only be made Iocally- When that processor reaches a synchronization point, the
writes are then transmitted giobdy.
synchronization point 1 synchronization point2 f F
I 0
Pl w(x) w(y)w(z)
E'2 X J J
P3
Figure 23: An Example of Weak Consistency among Tbree Processors
In Figure 2.3, Processor P l does not make its writes visible to processor P2 and P3
until it reaches a synchronization point. The synchronization can be implemented by
means of explicit synchronization operations such as locks and barriers.
Comparing Figure 2.2 and Figure 2.3, we can see that weak consistency needs less
communication than sequential consistency when performing the same arnount of
write operatiom. However, there are some constraints on programmers when using
weak consistency. Programmers must make sure that accesses to synchronization vari-
ables are sequentially consistent, i.e., no data races are aiiowed and the programmers
c m oniy use synchronization operations recognized by the system. From the applica-
tion programmer's perspective, the prognunmer may have more work to do when
using weak consistency than using sequential consistency.
Overd, weak consistency is likely to have better performance than sequentid consis-
tency because it has lower communication overhead However, weak consistency still
has some limitations in that it recognizes only one type of synchronization variable.
When a processor accesses the synchronization variable, the memory system has no
way to know if the processor is about to leave or enter the critical section. Therefore,
the processor has to make its modif~cation to the shared data visible to aii the other
processors sharing the same data on both entering and leaving the critical section.
Release consistency (RC) [20] [8] improves weak consistency by making the modifica-
tion visible to al1 the other processors sharing the same data only when the updating
processor exits fiom the critical section. Release consistency implements this by defin-
h g two synchronization variables: acquire occess and release access. Acquire accesses
are used to tell the memory system that a critical section is about to be entered. Release
accesses is used to Say that a critical section has just been exited. A processor makes its
modification to the shared data visible to ali the other processors sharing the same data
only when the processor performs a release operation Figure 2.41.
Figure 2.4: An Example of Release Consistency among Thme Processors
In Figure 2.4, Frocessor Pl acquires synchronization variable S, performs three writes
w(x), w(y) and w(z). Pl did not make its modification visible to P2 and P3 until it
released the synchronization variable S.
In general, release consistency has lower commuuication overhead than weak consis-
tency by distinguishing between the type of synchronization variables used to enter
and exit the critical section,
Release consistency is the most popular memory consistency used in DSM systems up
to now. Some of the DSM systems using release consistency are TreadMarks[lTJ,
Munin [2 1 l, Brazos [30], CVM 130 ] and Quarks [3O].
Compared to sequential consistency and weak consistency, release consistency
improves a DSM system's performance. However, in release consistency, when it
cornes to a release, the updating processor makes the whole content of the critical sec-
tion visible to d l the other processors. Unfortunately, not ail the processors requires dl
the data inside the critical section.
Entry Consistency (EC) [9] Mproves release consistency by explicitly binding shared
data to synchronization variables, so that when a processor leaves a critical section,
only the shared data bound to that synchronization variable needs to be made consistent
among al1 processors [Figure 2.51.
Figure 2 5: An Example of Entry Consistency among Three Processors
In Figure 2.5, acq(Sï) stands for operation acquiring synchronization object Si; rel(Sï)
stands for operation releasing synchronization object Si. From the above figure, we
can see that each shared variable x, y, z is guarded by a synchronization object. Pro-
cessor Pl does not make its modification visible to processor P2 and P3 at the same
t h e . It makes its modification to a shared variable visible only to a processor which
acquires the synchronization object guarding that shared data. in this way, Pl transfers
less data than it does in release consistency.
From this example, we can see that entry consistency can reduce the amount of com-
munication required than that required in release consistency. It may therefore
improve the performance of a DSM system. However, in some cases, a DSM system
using entry consistency may perform worse than using release consistency. For exam-
ple, in Figure 2.5, when both processors P2 and P3 acquire synchronization objects S 1,
S2, S3, processor Pl has to communicate with P2 and P3 six times (See Figure 2.6).
While in refease consistency (see Figure 2.4), Pl needs to comrnunicate with P2 and
P3 only 2 times if x, y, z al l happen to be on the same page.
Figure 2.6: An Exampie of Wont Case in Using Entry Consistency
As in most region-based DSM systems, entry consistency also potentially increases
the complexity in w-riting a parallel program. Application programmers have to choose
shared variables, synchronization objects and bind each shared variable with a syn-
chronization object,
Entry consistency is often used in region-based DSM systems. Such systerns include
Midway [9] 12 11 and ABC* [16] 11 21.
Coherence
The coherence protocol indicates how memory consistency is enforced in a DSM system.
Coherence protocols are ofien categorized as invalidate-bmed or update-based and eager
or Z q according to the way updates to shared data are propagated. When using invali-
date-based protocols with sequential consistency, for example, a multicast message must
be sent before a write to shared data takes place in order to invalidate al1 copies of that
data. This prevents other processors fiom reading stale data. The update is propagated
ody when the shared data are read, Using update-based protocols, updates to the shared
data are made Iocaiiy. The updated s h e d data is then mdticast to other processors which
possess a copy of the shared &ta. Processors read the local copies of the shared data, thus
reducing the communication cost- Eager and lazy coherence protocol are ofien used with
the release consistency model. In the irnplementation of release consistency using the
eager coherence protocol, a processor postpones propagating its modification to shared
data until it cornes to a release (Le., the time when it exits fiom the criticai section). At
that t h e , it propagates the modifications to al1 other processors that cached the modified
pages. Using the lazy coherence protocol, the notitication of the modification is postponed
until the t h e of the acquire (i.e., the tirne when another processor acquires the synchroni-
zation object).
Different coherence protocols can be used in the irnplementation of different memory con-
sistency models, and different coherence protocols can also be used in the implementation
of a single mernory consistency model. In what follows, we give an example of imple-
menting release consistency by using different coherence protocols. Figure 2.7 is an illus-
tration of an invalidate-based coherence protocol used to implement the release
consistency model.
Figure 2.7: Implementing Release Consùtency using an invalidate-based Coherence Protocol
In Figure 2.7, we assume that there are two processors Pl and P2 in a DSM system; P l
performs two write operations: w(x) and w(y); P2 performs two read operations: r(z) and
r(y). This assumption also applies to Figure 2.8.
In Figure 2.7, after processor P l wrïtes memory Location y, Pl does not make y visible to
P2 until P2 acquires the synchronization variable S. Since the cache entry pointing mem-
ory location y is invalid, P2 c m not read y even if it has acquired S. A cache miss will
occur when P2 reads y. This will make Pl send its modification to P2. it is similar in the
implementation of release consistency using a l a q coherence protocol except the memory
page is not invalidated. Figure 2.8 illustrates the implementation of release consistency
using an update-based coherence protocol-
Figure 2.8: implemcnting Rclcase Coasistency using an update-bascd Coherencc Protocol
In Figure 2.8, processor P l makes its modification to memory location y visible to P2
when it releases the synchronizattion variable S. This is the same as implementing release
consistency using an eager coherence protocol,
Invalidate-based coherence protocols are often implemented as multiple-reader-single-
writer (many processors read while only one processor writes) sharing. It is potentially
expensive, but when the read/write ratio is sufficiently high, it can achieve good perfor-
mance. Update-based coherence protocols are ofien implemented as multiple-reader-mul-
tiple-writer sharing. Reads are cheap in this option, but multicasting the write is relatively
expensive to implement in software. Eager and lazy coherence protocols are often used
with release consistency.
2.2 Distributed Object-Oriented Paradigm
A distributed object is an object created on one machine that c m be accessed on other
machines in a distri'buted network computing environment. A distributed object c m be
used like a regular objecî, but fiom anywhere on the network. An important characteristic
that distinguishes objects fiom ordinary procedures or functions is that objects can still
exist even though the object which created them has already stopped. An object is consid-
ered to encapsulate its data and behavior (Le., encapsulation). Encapsulution means that
an object's intemal states are hidden from the public's view, it communicates with the out-
side world by its public interface. In a distributed network computing environment, dis-
tributed objects are packaged as independent pieces of code that can be accessed by
remote clients via method invocations. The language and compiler utilized to create dis-
tributed server objects are totally transparent to their clients. CLients do not need to know
where the distributed object resides or what system architecture it executes on. The dis-
tributed object can be on the same local machine as the client or on a machine that is
within the same network 1151. Section 2.2.1 introduces some key issues in designing dis-
tributed object-oriented systems as weil as efforts made in dealing with these issues. Sec-
tion 2.2.2 briefly describes some 0 t h issues in designing distributed object-oriented
systems.
2.2.1 Some Key Issues in Designing Distributed Object-Oriented Systems
As with a DSM system which presents programmers with a unified memory model, a dis-
tributed object-oriented system attempts to present programmers with a imified object
model across Metent machines in a network- Besides a unified object model, a distrib-
uted object-onented system also has to offer some communication mechanisms so that
objects on different machines in a network can commmicate with each other. Therefore, a
unified object model and communication mechanisms are two important issues in design-
h g a distributed object-oriented system. in what follows, we give a brief description of
these issues.
Unified Object Models
In a regular object-oriented programming environment, programmers only deal with
objects on the same machine (Le., local objects). In distributed object-oriented program-
ming environment., on the other hand, programmers have to deal with objects existing on
different machines (Le., distributed objects). Offering a unified object model is a very
important issue in designing a distributed object-oriented computing system. Efforts have
been made to present programmers with unified object models, such as CORBA (Com-
mon Object Request Broker Architecture) 1151, DCOM (Distributed Component Object
Model) 129) and Mentat [16].
CORBA and DCOM are standards in supporthg distributed object-oriented computing
systems, and are defined in very similar way- They both use D L (Interface Definition
Language) to de& distributed objects. The major merences between CORBA and
DCOM are their error handihg mechanisms: CORBA uses exceptions white DCOM uses
retumed values to report erroa. Using CORBA and DCOM, programmers do not need to
worry about the lower level details and complexities of software on various systems.
However, programmers have to deal with two difEerent object models when writing dis-
tributed programs: the local object model of the language and the distributed object model
mapped fiom IDL.
While both CORBA and DCOM are independent of any programming Ianguages, Mentat
is designed by extending the C* programming language. It extends C++ by using Mentat
classes to separate general C H classes fiom classes used for parallel computing. Mentat's
object model includes two types of objects: contained objects and independent objects.
Contained objects are objects contained in another object's address space. Instances of
C++ classes, integers, structures, and so on are contained objects. Independent objects
possess distinct address space, a system-wide unique name, and a thread of control. Com-
munication between independent objects is accomplished via menberfinction invocation
and r e t m values. Independent objects are analogous to Unix processes.
Some Communication Mechanisms
Unlike regular object-oriented programming where an object can be accessed ody by the
objects on the same machine, in distributed object-oriented programming, the object cre-
ated on one machine might be accessed fiom other machines. Thus, developing communi-
cation mechanisms for the objects which are uniform within a single application domain
or across multiple applications are key issues in designing a disiributeci object-onented
programming system. Communication mectianisms commoniy used in distributed com-
puting systems inchde: socket, RPC (Remote Procedure C d ) and RMI (Remote Method
Invocation). In what follows, we give a bief introduction to these communication mecha-
nisms as well as some efforts in using them.
A socket is a mechanism which creates an end point for communication. It provides
applications with point-to-point byte Stream services. In a system that uses sockets,
programmers have to create sockets on both the client and server sides. The sockets are
bound to local ports, and messages are packaged up and exchanged between both sides.
This mechanism requires the client and the server using sockets to engage in an appli-
cation4evel protocol to encode and decode messages for exchange. In general, this
rnechanism is optimized for performance, rather than ease of prograxnming. Communi-
cation among objects in this way c m be cumbersome and error-prone. Sockets are used
in distributed computing systems.
RPC (Remute Procedure CalI) is another communication mechanism used in distrib-
uted computing systems. The communication interface in this meçhanism acts as a pro-
cedure call. The arguments of the call are packaged up and shipped off to the remote
target of the call by the underlying system, which makes the application programmer
have an iilusion of callhg a local procedure. Compared with Sockets, W C is more pro-
gramming fiiendly. As with sockets, RPC is commonly used in distributed procedure-
oriented computing environments. However, some distributed object-onented systerns
also use RPC, such as Extended C* [16].
RMI (Remote Metkod Invocurkon) provides the same method invocation mechanism
in distributed object-oriented computing environment as it does in reguiar objectai-
ented cornputhg environment. The RMI mechanism can be regarded as an extension of
W C systems to the object-oriented paradigm. In systems using RMI, objects, whether
local or remote, are defined in terms of interfaces which are declared in a kind of hter-
face definition language (IDL), such as CORBA and DCOM. The implementation of
the objects are independent of the intefices and are aiso hidden fiom other objects In
RMI, the uoderlying mechanisms used to malce method c d s may be different depend-
h g on the location of the object. However, these mecbanisms are hidden from the pro-
grammer. RMI is the most popular communication mechanism used in distributed
object-oriented systerns. Most distributed object-oriented computing systems support
RMI .
Arnong above communication mechanisms, Java supports both sockets and RMI.
2.2.2 Other Issues in Designing Distributed Object-Oriented Systems
Besides the above issues, some other concems in developing distributed objectaiented
systems include: object migralon, object storage, object integrity and data security.
Object migration techniques are often used to allow objects to be migrated across the net-
work while preserving data integrity, locaiity of reference, and sharing properties. A dis-
tributed object can be stored either in local memory, or on the network, hence, the design
of object storage mechanisms should allow the transparency to users and provide realistic
performance to object accesses. The last concem in developing distributed object-oriented
systems is that the object infegriiy and data security must be preserved irrespective of its
location and usage [']Cl41 [18].
2.3 Java
The work in this thesis is based on the Java system and some of Java's features, such as
platform independence, interpreted bytecode, multithreading as weli as transparent mem-
ory management, In this section, we give a brief description of the Java system and those
features which are relevant to the work we desctibe in this thesis. Section 2.3- 1 describes
the Java system architecture as well as the Java programming architecture. Section 2.3.2
introduces the multithreading and the synchronization mechanisms in Java. Section 2.3.3
gives a brief introduction to the memory management strategies in Java
2.3.1 The Java System Architecture and the Java Programming Architecture
Java is an object-oriented language. A Java program is compiled to intermediate code
(bytecode) which is independent of the machine architecture and the operating system.
This bytecode in tum runs on the top of the Java Vittual Machine. As s h o w in Figure 2.3,
the life cycle of a Java program includes both the compile-time and runtime phases. in the
compile-time phase, the developer writes Java source code (contained in a .java file) and
compiles it to bytecodes (contained in .closs files). In the runtime phase, the Java Byte-
Code Loader loads the correspondhg .clms mes fiom the local disk and it also resolves
those unresolved class names fiom the Java C h s Libraries. The Java VirtuaI Machine
consists of a Jma interpreter (which hterprets the Java's bytecode to correspondhg
machine code), Run-time system (which contains Java's nintime class libraries) and a
Code Generator (which generates platform-specific instructions afler bytecodes has been
loaded into the Java Virtuai Machine). Any platfonn with a Java Virtual Machine avail-
able c m nin Java applications without any special porting work for that application.
Compile Time Environment Run Time Environment
Source 0 +, Compiler
byte code move through
network or
* Java Class Loader Libraries
Bytecode Veriiicr
Java Byte
I
Y L Java Virtual Machine
Run time el Generator I
I hardware I
Figure 2.9: The Life Cycle of a Java Program
A Java application's portability is a result of the interpreted nature and architecture-neu-
trality of the bytecode. Furthemore, Java specifies sizes of ali its primitive data types and
defines the standard behavior of arithmetic that will apply to the data types across ail plat-
foms. In this way, Java eliminates some other languages' effort of defining mauy funda-
mental data types as implementation dependent.
The Java environment itself is portable. The Java run-time system is written in ANS1 C
with a clean portability boundary which is essentially POSIX-compiiant. Figure 2.4 shows
Java system on a host operating system.
Java Applications (ByteCode)
Figure 2.10: Java System on a Host Operating System
Figure 2.1 O shows the Java system fiom the programming perspective. The Java API is
implemented by several classes written in Java, which includes the language and utility
classes, the Abstract Window TooIkit, and the Network and V 0 classes. The Java API is
independent of the underlying hardware and stands on top of the Java runtirne system. The
Java runtime system coosists of two parts: a platform independent part and a platform
dependent porting interface. These parts are written in a combination of C and Assembler
language. The Java Runtime implements the interpreter and garbage coiiector.
Java Runtïme (Platform Independent)
Java Runtime (platform dependent porthg interface)
s 8
I B 1 5 3
E I W ' <
2.3.2 Multithreadiag and Synchronization in Java
One of the key Java mechanisms that is explored in this thesis is multithreading. In this
section, we give a brief description of the multithreading feature in Java
A rhread is a lightweight process. The difference between a thread and a process is that
each process has its own resources (such as memory address spaces, etc.), whereas, sev-
erai threads can share the wime resources. A single process cm have severai threads. Mul-
tithreading is the way to obtain fast, lightweight concurrency within a single address
space.
In Java, each thread has its own working memory and al1 the threads in the same Java pro-
gram share one main memory. The main memory contains the master copy of each vari-
able. Each thread's own working memory contains the working copy of the variables the
thread must use. Each thread can operate on its own working copy of the variables. How-
ever, there are rules (synchronkation mechanisms) to follow when a thread wants to oper-
ate on the main memory. Java supports its rnultithreading mechanism by a set of
synchronization primitives based on the widely used rnonitor and condirion variable para-
digm which was întroduced by Hoare in the early 1970s [IO]. Monitors provide a struc-
tured way to control access to shared resources. Ln Java, the keyword synchronized placed
in fiont of a method definition implies that any thread executing that method must gain
exclusive access nghts prior to executing the method. A synchronized method automati-
cally performs a lock operation when it is invoked. The method can not be executed before
the lock operation has successfully completed. When execution of the method is com-
pleted, an unlock operation is automatically performed on that same lock. Within a syn-
chronized method, a thread may call wair. to temporarily halt execution of the thread and
ailowing another thread to execute a synchronized method in that class. The original
thread resumes execution only when another thread c d s nonfi or nofiBAll.
2.3.3 Memory Management in Java
Udike other hi&-lever programming languages, such as C and CH, the Java system par-
tially relieves the programmer fiom concems about memory management. in this section,
we give a brief introduction of the memory management strategies in Java
In Java, each variable is a typed storage location. A variable in Java can contain a value of
primitive type or a reference to an object. No matter what the variable contains, it is bound
to two main attributes: its type and its storage clairs. The storage class is used to detennine
the lifetime of a variable,
Local variables are declared and allocated within a block and are discarded on exit from
the block. The Method parameters are also considered local variables. The static variables
are local variables within a class; they are aliocated when the class is loaded and discarded
as the class is udoaded. The dynamic objects are instances of classes and arrays. They are
allocated by the new operator and their storage can be reclaimed by some automatic stor-
age management techniques, for instance, garbage collection. In some circurnstances, the
resources (e.g. operating system graphics context etc.) cannot be fieed automatically by an
automatic storage manager, thefinalite method in class Object should be invoked. M e r
an object has been finalized, the storage occupied by the object may be reclaimed immedi-
ately and recycled for other uses.
In Java, ai1 the references to allocated storage and ail the references to an object are
through symbolic handes. The Java mernory manager keeps track of references to
objects. The Java compileci code references memory via symbolic hundles that are
resolved to real memory addresses at nin time by the Java interpreter.
2.4 Summary
In this chapter, we have given a brief review on the basic concepts of distributed parallel
computing methodology and some features of Java. Distributed computing systems gener-
ally fa11 into two categories: DSM systems and distributed object-oriented systems-
According to granularity, DSM systems can M e r be categorized as page-based DSM
systems and region-based DSM systems. We aiso give a bnef introduction to some com-
munication mechanisms used in existing distributed computing systems. These communi-
cation mechanisms include: sockets, RPC and RMI. Among hem, RMI is the most
popular one used in distributed object-oriented systems. Section 2.3 gives a brief descrïp
tion of some of Java's features which are relevant to this thesis- These features are: piat-
form independence, multithreading, interpreted bytecode and memory management.
Chapter 3
Parallel Computing in Java
In the last chapter, we gave a brief overview of some basic concepts in distributed, object-
oriented parailel computing among networks of workstations as well as the Java system.
We can see that the mechanisms existing in the Java system provide some of the idka-
structure required for an object-oriented DSM system. For instance, fiom the application
programming perspective, Java contains some features that are suitable for building
shared address space parallel programs through its existing facilities for concurrency (par-
ticularly multi-threading and locks). However, in order to extend the Java mode1 to
achieve parallelism across networks of distributed machines, extensions have to be made
to the Java runtirne fiamework, and possibly to the Java API, to provide the illusion of a
shared address space across machines that do not share memory.
In this chapter we limit ourselves to oniy discussing the alternatives for extending the Java
system for supporting parallelism. The next chapter will present an actual implementation
of a parallel computing system that wiii put some of the alternatives described here into
practice. The goal of the alternatives presented in this chapter is that both existing Java
programs and new programs written to exploit parailelism can nui on the same extended
Java system. The difference is that existing Java programs will run as stand-alone work-
station applications, whereas, new programs using the extended API will run as parallel
applications.
Section 3.1 discusses the key obstacles and issues in developing a DSM system within the
Java fiamework. Section 3.2 describes alternatives for ushg Java's multithreading feature
to build object-oriented paraiiel programs, and ways in which the runtime system c m
exploit this multithreading feature to create paraiieîism across nehvorks of distributed
cornputers. Section 3.3 tallcs about alternatives for creating shared objects. Section 3.4
explores methods for maintainhg memory consistency when developing a DSM system
within the Java framework. Section 3.5 summarizes this chapter.
3.1 Issues in Developing DSM Systems within Java
As discussed in the previous section, the existing Java system provides some of the mech-
anisms for developing a paralle1 computing system. However, the Java system currently
does not support parailelism within its fnunework. Since Java is a purely object-oriented
language, a DSM h e w o r k within Java system should also be object-oriented. However,
there are a number of issues in developing object-oriented DSM systems. In chapter 2, we
gave an oveMew of some of these issues. With respect to the object-oriented paradigm,
the following issues exist:
Finding an object-based mode1 for parallel computing that fi& into the existing object
oriented h e w o r k of Java
Developing an appropriate object communication scheme that accommodates the issue
of accessing different address spaces in a distributed object-oriented progamm.ing
environment.
From the DSM system's perspective, the main concems are:
Choosing the granularity at which data is shared (Le., the size of the shared memory
unit).
Finding mechanisms for maintiiining memory consistency and coherence. Recall that
memory consistency refers to how updates to shared data are reflected to the processors
sharing the same data in a DSM system. A coherence protocol indicates the way in
which memory consistency is enforced to a DSM system. The latter includes the strate-
gies for update detection and change propagation.
The goal of this thesis is to explore mechanisms for achieving parallelism by utilizing
existing mechanisms withlli the Iava fiamework. Hence, our study, with respect to the
above issues, begins with Java's multithreading feature. As discussed earlier, Java already
supports multiple threads for concurrency. However, all threads within one Java program
are executed on the same machine. Dispatching the threads of the same Java program to
dflerent available machines is the main concem in our study of parallelism within the
Iava framework. As a result of dispatching these threads across different machines, we
must aiso consider the issues of fhding an appropriate sharing unit and maintainhg mem-
ory consistency and coherence. In what follows, we present our studies of developing an
object-onented DSM system within the Java h e w o r k .
3.2 Using Multithreading for Parallelism
The multithreading feature in Java allows users to write programs with multiple execu-
tions of control. However, threads within a Java program are actually executed in a single
machine serially. In this section, we present our studies on running these Java threads on
different machines to create p d e l i s m across networks of distributed computers.
3.2.1 The Creatioa of the Threod Objects at the Java Language Level
In Java, threads are created and managed by classes caiied Thread and ThreadGroupIhe
only way to create a new thread in Java is by creating a Thread object. There are two
approaches in creating a Java thread object. One of them is to irnplement the method run
of the class Threud when the new thread class extends the class Threamigwe 3.11.
/ / I m p l e m e n t h g a class DemoThread by extending class Thread
Class DemoThread extends Thread {
Public void run O {
/ / C r e a t i n g and startiag an instance of the class Dernomead
DemoThread T = n e w DemoTkureadO;
Figure 3.1: Cmtion of Tbread Object by Extending CIass Tllread
In Figure 3.1, the user thread class DernoThread implements the run method in class
Thread. The instance of the DemoThread class, T, is started by Ulvoking the start method
in class Thread (i-e., T-starto).
The other approach for creating a Thread object is to implement the Runnable interface.
Using the Runnable interface, a new thread class is denved fiom the user class that has
impIemented the Runnable interface [Figure 3.23.
//Implementing a class Demo by implementing interface Runnable
C l a s s Demo Implements ~unnable{
Public V o i d run0 {
//Creatïng and starting an instance of class Demo
Demo T = new Demo ( 1 ;
new T h r e a d (T) . start ( ) ;
Figure 3.2: Creation of Tôread Objcct by Impkmenting the Runnuble Intedace
In Figure 3.2, the class Demo implements the Runnable interface. The thread object is
denved fiom an instance of the Demo class, and it is started by invoking the sturt method
in the Thread class (Le., new Thread(T).staaO).
In both of the above cases, the thread objects are started by invoking the starf method in
class Thread, which thus initiates the execution of the run method in the thread class.
3.2.2 Support for Threads at the Java Runtime System Level
At the program level, a thread is created by the new operator and started by invoking the
star@ method in the Thread class. In the Java runtime system, the support for Java threads
is as follows. The invocation of the starto method in the Java program causes the follow-
ing method (written in C) in the Java nintime system to be executed.
Hobject thread (Tm, unsigned int, s i z e t, void * (+) 0 )
In this case, TID is the handle to the instance of Thread class. The value of the second
parameter indicates the type of the thread to be created A value of "O" indicates that a sys-
tem thread is to be created; a value of "2" stands for a user thread. The third parameter
shows the actual size of the thread C stack. The last parameter is a function with a return
type void. In the case of a thread start, the last parameter should be the foliowing function
(written in C):
s t a t i c void ThreadRTO(register Hjava-language-Thread 9)
This h c t i o n does the real work of thread creation.
The b c t i o n ThreadRTO sets the execution environment and initializes the thread by call-
ing rnethod theadnit. m e r the initiaiization of the thread execution environment, the
thread is started by calling method (wrïtten in C):
execute-Java-dynamic-me thod ( &ee , (void * 1 P , " run" , C 1 V m 1
The first parameter is the execution environment variable, the second is the thread object
within which the run method is implemented; the third is the dynamic Java method run
which is going to be executed and the Iast parameter is the signature of the running
method. A signature is a retum type in JAVA, which can be a primitive type or a reference
type. In this case, it shows the r e m type of the run method. In Java, the r e t m type of the
run method is void, which is indicated with symbol O Vin the above method.
3.2.3 Dispatching Threads to Different Machines in the Runtime System
in order to begin experimenting with p d e l computing within Java, we start with a sim-
ple execution mode1 in which the only communication between parallel threads is the
parameters provided to each thread on start-up, and the collection of results fiom each
thread at the end of execution. The two issues, the creation of paralielism, and the collec-
tion of results, are deait with separately below-
An intuitive approach utilizing Java's multithreading feature for creating parallelisrn in a
networked environment is to create ditferent threads in a single machine, Le., a host
machine, and then dispatch these threads with different parameters to different target
machines. M e r the computation is completed, the host machine collects the results fiom
different target machines. An alternative approach is to create different threads on differ-
ent target machines. In the latter case, the host machine packs the necessary information
for creating the threads and ships it to different target machines. The target machines then
use the information fiom the host machine to create the threads locally. Compared with
this approach, the first approach which ships the created threads is more complicated
because the base operating system on a host machine running a Java mdtithreaded appli-
cation knows about only one thtead, i.e., the Java Virtual Machine thread. Ail of the other
threads in a Java application have to be mapped onto host operating system threads for
parallelism- Therefore, in order to dispatch a thread created on one machine to other
machine, we have to have information about the thread on a particular host operating sys-
tem, such as, the name of the thread, the status of the threads, the handler of the thread, the
current execution context, etc-
To test the feasibility of this latter approach, we conducted an implementation as follows.
In this implementation, the host machine dispatches the name of the class wbîch is to be
instantiated and the corresponding parameters to different target machines, On each target
machine, there is a daemon process niMUig. A daemon process is a background process
which provides service to other threads in the system. Once the daemon receives the
reIated information, it will create and execute the thread on the target machine.
Gening different thread objects to begin execution on different machines is the f h t step in
developing a parallel computiug fiamework for Java The next step is to coilect the results.
In Java, the r e t m type of the run method is void, which means that it does not retum any
information afier its execution. Therefore, we cannot use the original nrn method for this
purpose. Here, we conducted another Mplementation. In this implemenation, we extended
the fimction of the original run method by introducing another method called runl, which
not only W l s the fünction of the original ri«l method but also returns some other result
types besides void. From the Java program's perspective, there is not much dBerence
between methods run and ml. The only ciifference between these two methods is their
retum types. The r e m type for the nrn method can only be void, however, the return type
for the nrnl method can be any type.
The introduction of the runl method solves the problems of collecting execution results
from different threads ninning on different machines. Therefore, it increases the possibil-
ity of using Java's multithreading feature to achieve parallelism. However, in a parailel
computing system, we need wt only c o k t the h a 1 results but also propagate the inter-
mediate updates to a shared data. Unfortunately, the runl method can not propagate the
intemediate updates to a shared data, In the next section, we introduce a class we cal1
SharedObject to collect the final r ed t s and to propagate the intermediate updates to a
shared data,
3.3 Creating shared objects
Although DSM systems typicdy use pages as the unit of sharing, many object-oriented
DSM systems built specificaily for C* or other object-orïented languages often use an
object as the sharing unit Since Java is an object-oriented system, choosing the object as
sharing unit seems appropriate. Furthemore, as discussed in chapter 2, object-based DSM
systems reduce most fdse sharing caused by fïxed size granularity in page-based DSM
systems. In this section, we describe alternatives for creating distributed shared objects.
Ail the objects in Java, either regular objects or thread objects, are created using the new
operator. Consequently, we explore extensions to this operator for creating distributed
shared O bjects.
3.3.1 The New Operator in Java
At the Java language level, the new operator creates either a new instance of a class or a
new array object. From the user perspective, there is no difference between the creation of
these two objects. However, at the Java Virtuai Machine level, the class object and the
array object are created and manipuiated by different sets of instructions.
For instance, at the Java Vimial Machine level, the instruction for creating a new class
instance is new, wheras the instructions for creating new arrays are newarray, anewmroy
and mulrianewarray.
newmay is used to create a one-dimensional array of primitive types. anewurray is used
to create array of object references as weii as the first dimension of a multi-dimensional
-Y - For example, the statement
new Thread[7]
creates an array of references to thread objects, however,
new int[6 J
creates a one-dimensional array of integers.
At the Java language Ievel, both of the above arrays are created by using the same new
operator. However, at the Java Virtuai Machine level, the fkst one is created by the
anewarray instruction and the second one is created by the newarray instruction.
The rnultianewomay instruction is use to create an amy of references to a new array
object.
3.3.2 Extending the Functionality of the new Operator
Our objective in extending the new operator is to create shared objects at the objects' cre-
ation t h e . Here, we start with studying two alternatives for extending the fiinctionality of
the new operator. The first alternative is to extend the existing new operator in the Java
programming language. The second alternative is through introducing a shred-New oper-
ator to create shared objects.
Using the nrst alternative, the extended new operator appears to be the same to users.
However, the users have to add a keyword Shmed before the operator new when they want
a created object to be shared Figure 3.31. Once the Java compiler comes across the key
word Shared, it wiii mark the object to be created as shared- During the execution on a
host machine, the interpreter in the Java Vimial Machine will create dinerent copies of the
shared object on different target machines
//Creating a shared object by using the class DemoThread in Figure 3-1
DemoThread T = Shared new DemoThread ( 1 ;
Figure 33: A Demo of Creating a Sbared Object with Keyword Shared
The second alternative introduces another operator we c d shared-New [Figure 3-41 to
create only shared objects, whereas, the new operator can be used to create regular objects.
As with the first alternative, when the Java interpreter cornes across the shmed-New oper-
ator, it will create different copies of the shared object on different target machines.
//Creating a shared object by using the class DemoThread in Figure 3.1
DemoThread T = shared-New DemoThread ( 1 ;
Figure 3.4: A Demo of Creating a Sbared Object with Sharedtnew Operator
As we can see, both of the above alternatives can be used to create shared objects. Both of
them require changes to the Java compiler due to the introduction of some new keywords
(i .e., shared in the fïrst alternative and sharedaew in the second alternative), and the Java
interpreter due to introduction of the new operator (i.e., shed-New in the second altema-
tive) and extension of the existing operator (Le., shmed in the first alternative). However,
they have some ciifferences. For instauce, using the first alternative, we have to re-write
the existhg implementation of the new operator to create both regular objects and s h e d
objects. The new operator would then do two tasks, making it harder to understand and
maintain (this is related to the principle of uniqueness in programming language design)-
However, using the second aiternative, we can add a single module to the Java interpreter
to implement the shed-New operator while keeping the existing impiementation of the
new operator intact. In this way, if the Java system is updated, we will only need to update
the corresponding module instead of re-writing the implementation of the new operator.
Overall, both of the above alternatives require changes to the Java compiler, the Java
interpreter and the Java programming interface. This reduces the portability of the paralle1
applications. Furthemore, if any component of the Java system is updated, our system
wiii have to be re-written accordingly. It would be better to have an approach which c m
create shared objects without afTecting the integrity of the Java system. Therefore, we
carne up with a third alternative-
In the third alternative, we introduce a class we call SharedObject Figure 3.51- It is imple-
mented as an add-on to the existing Java runtime class Iibrary. Java programs can create
shared objects by extending the SharedObjecr class
//~reating a shared object by using the class SharedObject
SharedObject sharedData = new SharedObject (...)
Figure 3.5: A Demo of Creating a Sbared Object with Clrus SbaredObject
This approach extends both the Java nuitirne class iibrary and the Java programmiog inter-
face. However, since this approach does not change the Java compiler and the Java inter-
preter, it will not affect the execution of a reguiar Java program on stand-alone
workstations. In chapter 4, we will give a detail description of this approach as part of our
discussion on the implementation of Paraiieijava.
O v e d , each of the above alternatives have their own streagths and Limitations. Figure 3 -6
is a cornparison among them.
Limitation Creating Shared Ob j ect
Keyword Shared
Strength
Operator shared -New
Uses the same new operator to create both the regular and the shared object
Class SharedOb j ec t
mges to the ~ a v a compiler & interpreter Reduce readability .
Use another operator to create onïy shared objects, keeps the modularity of the interpreter
Figure 3.6: Cornparison of the Above Tbree Alternatives
Compiler & interpreter
m g e s to the Java
Added complexity.
Does not change the Java compiler & interpreter
3.4 Exploring Memory Consistency and Coherence
Programmers have to learn a new API.
As discussed in chapter 2, besides granuiarity, the other two important issues in designing
a DSM system are memory consistency and coherence. Memory consistency refers to how
updates to shared memory are reflected to the processors in the system. A coherence pro-
toc01 indicates how memory consistency is enforced in a DSM system. The most com-
monly used memory consistency models are: sequentiai consistency, weak consistency,
release consistency and entry consistency- The most commonly used alternatives for
building coherence protocols include write-invalidate, write-update, eager and lazy. Di&
ferent coherence protocols can be used in the implementation of different memory consis-
tency models and different coherence protocois can aiso be used in the implementation of
a single memory consistency d e l .
Maintainhg memory consistency and coherence requires mechanisms for updute detec-
tion (determinhg when some s h e d data has been modined) and updare propagation
(transrnitting the modification to a shared data to other processors sharing the same data).
Update detection and propagation can be achieved through program-level annotations by
the programmer (such as those used in region-based DSM systems), through compiler
analysis, or through virtual memory page protection (such as those used in page-based
DSM systems) [12]. In what follows, we tak about some alternatives for update detection
and propagation in designing a DSM system within the Java h e w o r k .
3.4.1 Update Detection
We consider three different alternatives for update detection in this section. The î b t
approach is to extend the fûnctionality of the Store instruction in the Java Virtual
Machine. The second approach is to use virtual memory page protection mechanisms. The
third approach is to use informaton provided by the application.
Extending the Store Instnicâions
In this section, we taik about detecting updates to shared data through extensions to the
store instructions at the Java virtual machine tevel.
The store instructions in the Java virhial machine store values fkom the operand stack to
local variables. As in some other programming languages, such as C and C u , the stack in
Java holds local variables and intemediate results- In the Java vii.tual machine, most of
the arithmetic operations are performed on the operand stack. The results are then trans-
ferred to local variables by store instructions. There is one exception, the incremental
operation (such as, i*) is performed on the local variables directly through the JVM
instruction iinc.
At the Java virtual machine, there are dBerent store operators for different data types. For
instance, the store instnictions for the integer variable and the double variable are istare
Figure 3 -71 and &ore respectively. //Java Source code using Integer data type void whileInt ( 1 {
i n t i = 0; while (i c 100) {
i++;
//The correspondhg JVM assembly code Method void whileInt (1
O iconst-0 1 istore-1 //Store constant O to local variable (il 2 goto 8 5 iinc 1 1 //Local variable1 plus 1 8 iload-1 //Load local variable onto operand stack 9 bipush 100 11 if-icmplt 5 //1f local variable1 is less than 100, 14 return
goto 5
Figure 3.7: An Example of the &ore Instruction
As shown in Figure 3.7, the instruction ISTORE is used to store Uiteger value "O" to local
variable "i"-
From Figure 3.7, we can see that the JVM names the variables within a Java program as
local variable 1 to n according to the sequence in which the variables appear in the pro-
gram. During the execution of the Java program, whenever a store happens, the interpreter
in the JVM wili store the value on the operand stack to the local variable according to the
store instruction Figure 3-81. Therefore, in order to detect an update to variables, the
interpreter has to be extended. In this way, when a store instruction is executed, the inter-
preter can tell the parailel computing systern that an update to variables has occurred-
//Implementation of the Store operation at the Interpreter
switch (opcode) { case opc_istore-##num: //-code is store-i
#i'f:def Parallel //Extending the interpreter for Parallel //Signal that the local variable##nurn is being changed
#endif
vars [numl = S-INFO(-1) ; //Put the number on stack to vars Cil
SIZE-AND-STACK(1,-1); //Change the value of the stack pointer - - -
case opc-dstore-##num: //Opcode is store-d
vars [numl = S-=O(-2) ; //Put the first byte to vars Cd] vars [num +11 = S-IWO (-1) ; //Put the second byte to vars [d+l] SIZE-AND-STACR(1,-2); //Change the value of the stack pointer - - -
1 . . .
Figure 3.8: Store an Intcgcr and a Double type to Local Variables at the Interpreter
As we can see in Figure 3.8, one can use extensions (e-g., the codes between #ifdef Parai-
le1 and #endif in above figure) to the interpreter to Uidicate that updates to a variable are
happening. Through this extensions, one can also determine the sequence number of the
variable which is being updated. However, at the interpreter level, one can not determine
the name of the variable that is king updated-
In a pardel program, variables may be shared or local- Ideaiiy the interpreter signals only
when updates to shared variables are being perfomed. However, the interpreter itseif c m
not tell the difference between a shared variable and a variable that is not stiared.
Therefore, it is likely that the Java compiler would need to be extended. During compile
t h e , the Java compiler would mark the variables which are shared. At the time of the exe-
cution, the interpreter would signal the parailel system if a marked variable (Le., a shared
variable) is being Wfitten.
Virtual Memoty Page Protection
Pagefault detection is ofien used in a page-based DSM system to detect a write to a rnem-
ory page, in which a certain memory page shared among al1 the processors is protected.
When one processor modifies a s h e d memory page, al1 its copies on the other processors
are invalidated. Any write to the invaiidated copies will cause a Pagefadi to occur. In the
implementation of the fault handler, the system propagates the modification to that shared
page to ail the other processors sharing the same page.
Virtual memory page protection mechanism c m be used to detect updates to shared data
in a page-based DSM system based on Java This is because at the Java Vutual Machine
Ievel, the support of muitithreading is by allowing multiple threads to independentiy exe-
cute Java codes which operate on Java data and objects residing in a shared memory.
Besides the shared memory, each of the threads has its own working memory in which the
thread keeps its own working copy of the shared variables it must use. In the execution of
a Java program, a thread operates on its working copies of the shared variables. In order to
maintain the integrïty of the shared variables inside the mahi memory, Java makes use of
monitors to aliow only one thread at a time to execute a region of code protected by the
monifor.
To access a shared variable, a thread should obtain a lock f'irst and flush its own working
memory, which guarantees that the shared values will be loaded corn the shared main
memory to the thread's worlüng memory. The unlock of a lock by a thread guarantees the
value held by the thread in its working mernory to be d e n back to the main memory.
From the above descriptions, we can see that it is possible to use virtuai memory page pro-
tection mechanism to detect updates to shared data within a parallel computing system
based on Java. However, using virtuai memory page protection may cause the following
issues. First, it may reduce the portability of the parallel computing system because the
virtual memory page size is decided by the operating system and different host machines
in a network cm have different operathg systerns. Second, as we discussed before, page-
based DSM systems may cause false sharhg, therefore, such a system would be less effi-
cient. Furthemore, Java is an object-oriented programming language, so it seems more
appropriate to use an object as the sharing unit- As discussed in chapter 2, object-based
DSM systems usuaiiy use entry consistency which uses information provided by the appli-
cation to decide when write operations are perfiormed to a shared data. This Leads to the
third approach.
Update Detaoon througit Rogram Annotations
The third approach is very similar to entry consistency in that it first lets the programmer
identw the shared objects in the program, then choose the synchronization object and
bind the shared object with the synchronization object- The updates to a shared data are
propagated only when the synchronization objects bound to that shared data are acquired.
The differences between this approach and entry consistency are as follows. First, while
entry consistency binds a lock to some shared data, our approach implements this by
instantiating the class ShedObjecf (discussed in previous sections). Second, in our
approach, the acquire of the lock is by calling the updare method in the class SharedOb-
ject. Last, the reIease of the lock in our approach is implemented by the successful execu-
tion of the updute methoci- In chapter 4, we wii i give a detail description to this approach.
From the above discussions, we can see that the three approaches introduced in this sec-
tion have different strength and limitations. We compare these in Figure 3.9.
Update detection Limitation
1 Extending 1 No new -1 1 Java Compiler and the 1 the store instructions I interpreter extended Virtual memory page protection
No new API and no compiler extension
False sharing. Operating system dependent.
Application level annotation
No compiler and interpreter extension
New API and more complicated for the application programmer
Figure 3.9: A Cornparison of the tbree Approachcs for Update Dctection
3.4.2 Update Propagation
As we described earlier, the task of update propagation is to transmit the modifications to
a shared data to all the other machines sharing the same data. in this section, we taik about
some alternatives we considered for propagating updates to shared data. The fxst alterna-
tive is to extend the functionaiity of the return operator in Java The second alternative is
by extending the existing Java synchronization mechanisms. The third alternative is
through the information provided by the application programs.
Extending the retwn Operator
Propagating the updates to shared data when a method returns is a relatively simple
approach. Some existing DSM systems, Mentat, for instance, add a function such as R T i
to implement its data consistency. RTF stands for rem-to-future. It is an analog of the
Return h c t i o n in C*. Unlike the Reîurn function, the returned value fiom RTF is for-
warded to ail member fûnction invocations that are data dependent on it, and to the c d e r
only if necessary.
In the Java system, the r e m operator causes an immediate exit h m a method in a Java
program. The expression foiiowing the retum operator, if any, is the redt of the method.
In a normal completion of a method invocation, a value may be retumed to the invoking
method
At the Java V M Machine Ievel, when a method is invoked, a new Frame is created cor-
respondingly for it. Afiame at the Java Virtual Machine is used to store data and interne-
diate resuits, to perfomi dynamic linking, to r e m values for methods and to dispatch
exceptions. The new fhme will become current when its method takes control of the exe-
cution, and wiiI be discarded when the method retunis to its caller [Figure 3-10]. There are
also different retum instructions in the JVM corresponding to different data types. Figure
3.10 is the part of the code fkom the Interpreter which deals with method r e m with the
integer type and the double type retum value.
//The parts of code dealing w i t h return operator at JVM switch (opcode) { case opc-ireturn:
frame->prev->optop[O] = S-INFO(-1) ; frame->prev->optop++;
S . - case opc-dreturn:
frame->prev->optop[O] = S =O(-2) ; frame-sprev->optopCl] = sIINFo(-1) ; frame-sprev->optop += 2;
1 - * -
Figure 3.10: The Interpreter Dcaling with Method Return with Intcger & Double Retutn Values
54
From Figure 3.10, we can see that when a method retums, the interpreter puts the retum
values directiy to the operand stack of the previous h e (i.e., the cailer h e ) . If we
want to propagate updates to a shared data during the method return, we first must know if
the return value is shared. However, the interpreter cannot know this just fiom the return
value itself. Therefore, the Java compiler has to be extended to provide it with such infor-
mation. The limitation of this approach is that it can not be used to propagate the interme-
diate results of the s h e d data
Exfending the Existing Synchronization Mechanisms in Java
In this section, we talk about how to propagate updates to shared data in a parallel comput-
ing environment through extension of the existing synchronization mechanisms in Java.
The discussion in this section is based on the assumption that updates to shared data are
detected by the extended store instructions in the previous section. In the Tst part of this
section, we give a brief description of the existing synchronizattion mechanisms in Java. In
the second part of this section, we discuss how to propagate updates to shared data by
extending the existing Java synchronization mechanisms in the Java mutirne system.
Multithreading in Java
Multithreading is one of the key feahues in Java As discussed in chapter 2, each thread in
a multithreaded Java application owos its working memory and al1 the threads share one
main memory. The working memory of a thread contains the workïng copy of a variable,
while the main memory contains a master copy of the variable. The thread can perform
any operation on the worlchg copy of the variable. However, there are some synchroniza-
tion mechanism to foilow to operate on the main memory.
Java uses monitors to synchronize the operation of the multiple threads on main memory.
Monitors are a high level synchronization mechanism. A monitor is iike a critical section.
In Java, the keyword synchronized placed before the definition of a method indicates that
a thread must gain exclusive access to execute that method Figure 3.1 11. When a syn-
chronized method is invoked, it automatically perfonns a Iock operation. When the syn-
chronized method nnishes execution, an udock operation is performed automaticaliy on
that same lock.
//A sample Java program u s h g synchronization
class synchSample { i n t a = 1, b=2, c; synchronized void synchW ( 1 {
a = b;
1 synchronized void synchR ( {
Figure 3.1 1: A Sampte Java Program using Synchronimtion
In Figure 3.1 1, the class SynchSample contains two synchronized method: the synchW
and the synchR. The method synchW writes the variable a and the method synchR reads
the variable a.
Suppose there are two threads: the thread tSarnpleW and the thread tSampleR. The tSam-
pleW calls the method synchW and the tSampleR caiis the method synchR. We ais sup
pose that the c d fiom the thread tSampleW is a bit earlier than the c d fiom the thread
tSampeR The execution flow of the two threads can be as follows (Figure 3.12). tSampleW Main Memory tSampleR
Lock class SynchSample
I read b
1 I
use, b
unLock class SynchSarnple
Lock class Synchçample I
I 1
read a 1
1 use a I I
assign c I
mite c 1 I unLock class syn&h~arnple
Figure 3.12: The Possible Execution Flow usiag Metbods in Figure 3.1 1
From Figure 3.12, we can see that before the threads can execute the synchronized meth-
ods, they must gain exclusive access to the class SynchSample which implements the syn-
chronized methods. After finishing execution, they perform an unlock automatically so
that other threads can access the same methods. When the unlock operation is performed,
the copies of the variables in the thread's working memory are flushed into main memory.
The main memory therefore contains the final version of the shared variables.
The implementation of the lock and the d o c k operations in the Java rutirne are through
~ W O fùnctions: monitorEnter and monitorExit.
Extending the Eaistiag Java Synchronization Mechanism for Updsite Propagation
As discussed in the previous section, when a mdtithreaded program runs on a stand-aione
workstations, any updates to shared data are made consistent through the use of a monitor
(using the keyword synchronized). By entering a monitor, a thread gets exclusive access
to shared data within that monitor. By exiting the monitor, the thread writes the updated
version of shared data back to the main memory. In this way, shared data is guaranteed to
be consistent among the threads.
The existing synchronization mechanism in Java can be extended to a p d e l computing
system under the assumption that the system can distinguish shared data fiom the data that
are not shared. This is because a thread on a stand-alone workstation simply flushes the
contents of its working memory to main memory when exiting the monitor. In a parallel
computing system, only shared data need to be propagated to other machines. The existing
synchronization mechanism in Java is very similar to release consistency in a parailei
computing system by making the data consistent only when exiting the monitor. Recall
that release consistency makes updates to shared data visible to other machines sharing the
same data when a critical section is exited. Release consistency is often implemented
using either the eager coherence protocol or the lazy coherence protocol. Using the eager
coherence protocol, a processor makes its updates to shared data visible to other machines
sharing the same data when it exits fiom the critical section (Le., perfOrms the release
operation). Using the lazy coherence protocol, updates to shared data are made visible to
other machines ody when another machine acquires the synchronization objects (Le.,
other machine perfoms the acquire operation). Therefore, the existing Java synchroniza-
tion mechanism (which makes shared data consistent among multiple threads only when a
thread exits the monitor) can be extended to release consistency in a parailel computing
system Figure 3-13].
//Extending the rnonitorExit for update propagation void monitorExit(unsigned int key) {
moni tor-t +mid; int ret;
- - - #ifdef Parallel //Propagate the updates to other processors which //has the copy of the data associated with //monitor Key.
#endif }
Figure 3.13: Extcnding Functioa monitorExit for Update Propagation
In Figure 3.1 3, the part between #ifdef and #endif is the extension of the existing monitor-
Exit h c t i o n for parallelism. The function monitorExit in Java is used to exit a monitor
and copies shared data in the working memory of a thread to main memory. As we can see
fiom Figure 3.13, the extended monitorExit fhction broadcasts updates to shared data to
other machines sharing the same data when a thread exits a monitor. This is an implemen-
tation of retease consistency using an eager coherence protocol.
This approach does not need to change the Java compiler. However, it may need to extend
any monitor related parts in the Java runtime system as weii as the Interpreter. It is uuder
an assumption that the store instructions in the JVM detect writes to shared data and the
system can distinguish the shared data fiom data that are not shared-
Update Propagation through Idormation Provideà (rom the Appiication
As with update detection, we came up with another alternative for update propagation by
using the class ShoredObject. Using this alternative, each processor first updates its own
copy of the shared data, then calls the upkte method of the class ShwedOdject to propa-
gate its update to other processors shariag the same data.
In the latter alternative, the application itself has to provide the information about when
the update is going to be propagated (i.e., by cding the updute method). It changes the
Java programming interface and extends the Java runtime class library. However, it does
not change the Java interpreter and the Java compiler. Furthemore, the extension wiiI not
affect the execution of a regular Java program. In chapter 4, we will taJk about this
approach. Figure 3.14 is a cornparison of the above approaches.
Update Detection S trength Limitation
Extending the No new API: return operator
Extending the Java compilez & the interpreter. Cannot propagate intermediate results,
Extending the No extending to the Extendhg the Java nintime existing Java compiler & no new A P f - & interpreter; extending synchronizatio Propagate any update to store instructions for mechanisms shared data- uudate detection,
No extending to the Through compiler & interpreter . New A P X . application Propagate any update More complicated of the information ta shared data, application prograns,
Figure 3.14: A Cornparison of the Alternatives for Update Propagation
3.5 Summary
In this chapter, we analyzed and explored the possibilities of developing a parallel com-
puting mode1 within the Java framework. Our exploration tanged fiom the Java applica-
tion level, the Java Lnterpreter level to the Java Runtime level.
From the above explorations, we realized that it is impossible for us to run unmodified
Java program without any new class libraries king added. It is also hard for a small goup
of people to deveIop a new programrning environment by purely extending the Java Vir-
tual Machine and to keep its integrity, capabilities and performance cornpetitive witt that
of a standard systems. On the other hand, it can often impose unreasonable Limitations on
the user program and heavy burden in dealing with the memory consistency by purely
using the class library approach.
Chapter 4
The Paralleljava System
In the last chapter, we gave an overview of our exploration of paralle1 cornputhg within
the Java frcunework. We showed that there were various possibilities in developing a DSM
system w i t b Java. Based on the exploration described in the previous chapter, in this
chapter, we give an oveMew of the design and the partial implementation of the paralle1
computing system within Java that we cal1 Paralleljava.
Paraileljava is an object-oriented DSM system that is designed for parallei computing
within the Java system. It supports a coherence fkamework that is similar to entry consis-
tency and an update-based coherence protocol. Unlike other existing DSM system, the
hplementation of memory consistency in Paralleljava depends on neither the compiler
(as in DSM systems which use compiler and runtirne to detect and coilect writes to shared
data) nor operating system page faults (as in typical Virtual Memory based DSM systems
which use operating system virtuai memory page protection to detect and coliect writes to
shared data). Memory coosistency in Paralleljava is implemented by simply broadcastiag
ail data associated with a synchronhtion object during interprocessor synchronization.
In section 4.1, we give an oveMew of the Paralleljava system. Section 4.2 shows the sys-
tem architecture of the Paralleljava system. Section 4.3 describes our design and imple-
mentation of the dynamic network class file loading within the Paralleljava system.
Section 4.4 describes the design and partial implementation of data coosistency in the Par-
aileljava system. In section 4.5, we summarize this chapter.
4.1 The Overview of Paralleljava System
There are two kinds of Java programs. The fïrst, known as applets, run within a Java-com-
patible browser. The second are stand-alone Java programs known as Java applications. In
Paralleljava, we modim the Java Virtual Machine to dIow user Java applications to run in
paralle1 on networks of workstations. When an execution request is sent to the Java Virtuai
Machine, the Paralleljava system will search for locally available idle machines (we caü
them servers) and dispatch the tasks to different servers. M e r finishing their work, the
servers will send the results back to the local machine. This is a kind of client/server
model, with the local machine acting as a client, and the remote machines acting as com-
puting servers. The remote semer is implemented as a daernon (i-e., a background process
which provides service for other threads in the system), waiting for new tasks to execute.
Once the semer obtains a Java method fiom the client, it will execute this method on a
local ParaHeljava Virtual Machine and then send the result back to the client. During the
execution, the client can communicate with any servers to keep the data consistent. Figure
4.1 is an o v e ~ e w of the Paralieijava system.
(idlemachine daemon) 1
, --- Server #1 daemon Server #2 daemon Server #N daemon
.-.
Figure 4.1: An Ovewkw of Paralleljava System
Within the client machine, there is a daemon cailed the idemachine daemon, which is
used to search for idle machines in the local network and write the information about the
idle machines to a text füe (e-g., server-inf in our implementation). When the user starts a
multi-threaded Java application on the local machine, the client looks up the available
machines within the file server.inf, dispatches the threads to different remote machines,
and then waits for the information fiom each of the servers. The client uses a round-robin
mode1 to allocate the idle machines to the tasks.
Like lava, Paralleljava is independent of both the operating system and the system archi-
tecture. It will work on any machine which nins the Paralleljava server daemon (written in
the C language). The Paralleljava server daemons are designed with portable interfaces so
that various subsystems can comect to them and receive services that are independent of
the hardware and soAware architecture.
in Paralleljava, the client sen& the same version of the shared data to each remote server
machine. The remote secvers then upload necessary class files and instantiate new Java
objects. The server daemon cails the ml method of the uploaded class and starts a new
process. Each process in a different machine executes independently und it meets a syn-
chronization point, At the synchronization point, the processes will try to communicate
with the client machine and idem the client of its update to the shared data. Server
machines communkate with the client in order to update the shared data. The update
among the processors are taken care of by the shared objects on each machine. The s h e d
object is implemented by a group of Java class libraries. After the synchronization point,
each machine will have an up-to-date version of the shared data.
4.2 The System Architecture of the Paralleljava System
In Paralleljava, there are two kinds of communications between machines. One is at the
t h e of the dynamic class file uploading/downloading, and the other is during data consis-
tency and synchronization actions (synchronization and data consistency happen sirnuita-
neously). In order to W y utilize Java's features and reduce co~nmunication overhead,
Paralleljava deals with these two kinds of communications in different ways. The cornmu-
nication during dynatnic class file loading is implemented in the Java interpreter using the
C language. The co~~~munication during data consistency process is implemented by
shared objects in the Java runtime system. In what follows, we talk about the system
architecture of the Paralleljava system.
4.2.1 The System Architecture of the Paralleljava System
In the Paralleljava system, botO the client and the server consist of three layers: the Appli-
cation Layer, the Paralleljava Virtual Machine Layer and the Transportation Layer. Each
layer on the same machine is independent of the others and can be replaced by an alterna-
tive impIementation without affecthg the other layers. For example, the transportation
layer in the current version is implemented based on the TCP protocol (using Unix sock-
ets), but a transportation layer based on UDP or ATM can be used interchangeably. The
system architecture of the Paralleljava system is illustrated in Figure 4.2.
As shown in Figure 4.2, the processes in a Java program are dispatched fiom the Parallel-
java Virtual Machine down through the Transportation Layer on the client side, then up
through the server side Transportation Layer to the server Pamlleljava Virtual Machine
Layer. The update propagation is done by the s h e d objects between the Application
Layers on both sides.
Once the Pardeljava Virtual Machine on the client side receives a request from a user
application, the Virtual Machine wiil anaiyze the request, wrap up the request and then
foward the wrapped request to the client side Transportation Layer. The Transportation
Layer then dispatches the requests to the server side Traasportation Layer. The cornmuni-
cation protocol between the two transportation layers is based on TCP/IP in the current
implementation. The Transportation Layer on the server side forwards the received
request to its Paralleljava Virtuai Machine Layer. The V W Machine Layer unwraps the
requests, executes the application, uploads class files from the client when necessary. The
4.2.2 The Functionality of eacb layer in the ParaIleljava System
Application Layer
The application layer consists of a group of class libraries which implement the synchroni-
zation mechanism in Paraileijava It includes mechanisms for update collection and propa-
gation.
The synchronization mechanism in Paraiieljava is implemented by shared objects. The
Application Layer on the client side maintains the shared object by allowing only one
server access to it at any tirne. It aiso coilects updates fiom and propagates updates to serv-
ers. The Application Layer on the server side updates its own copy of the shared object
first and then propagates the update to the client.
The update propagation is bplemented by a group of class libraries. The propagation
messages go only between the application layers on both sides.
The AppIication Layer on the client side is responsible for:
irnplementing and managing s h e d objects,
setting-up and maintaining the connections with the Application Layer on the server
side,
collecting, rnarshaling (i.e., the technique used to arrange the updated data) and propa-
gating the updates to servers.
The Application Layer on the server side is responsible for:
setting-up and maintainhg the comection with the Application Layer on the client
side,
initiating the update propagation,
rnarshaling the update.
Paralleljava Virtual Machine Layer
The Paralleljava Virtual Machine Layer extends the Java Virtual Machine by implement-
ing the dynamic network class file loading as weil as extending the Run method in class
irhread In Java, the V W Machine takes the byîecodes as input, translates them to
machine code and executes the machine code on the local machine. During the execution,
the Java Virtual Machine will dynamically load the necessary class files h m the local
machine.
In Paralleljava, the Java system class files exist on any machine with the Java system
ported. However, the user class files may not be on some server machines when they are
invoked. In the implementation of the Paralleljava system, the Paralleljava Virtual
Machine on the server side sends those class file names to the client V i d Machine. The
Virtual Machine on the client side then uploads the class files to the server.
The Paralleljava Virtual Machine Layer on the client side is responsible for:
parsing the user requests fiom the commandline,
setting-up and managing an available machine list,
allocating different threads with different parameters to dBerent servers,
dynamic network class füe uploading.
The Paralleljava Virtual Machine Layer on the server side is responsible for:
receiving the initial class files and parameters fkom the client,
creatïng an object for the class file received,
setting the object to nin,
sending the requests to client and waiting for incoming information.
Transportation Layer
In gened, the Transportation Layer in Paralleljava is responsible for:
setting up the connections between the client and diffèrent remote servers,
listening for incoming messages,
setting up a comection for an incoming message,
tramferring the necessary information.
Besides these fiinctions, the Transportation layer on the server side also creates the dae-
mon and maintains it,
4.3 Dynamic Network Class File Loading and Security
4.3.1 Dynamic Class File Loading in Java
The existing Java system loads and nuis class files fiom a local file system. For Parallel-
java, this is insufficient because, in Paralleljava, the class files used might not exist on the
other machines. Therefore, we had to extend the existing Java fiamework for dynamic
class file loading in order to accommodate distributed computation in Paralleljava. in this
section, we begin by explaining the dynamic class file loading mechanism within the
existing Java system and then describe how the Paralleljava extensions have been imple-
mented.
Uniike the edit-compile-link-nui development pattern of other programming languages,
Java programming needs only edit, compile and nui. The Java user program does not need
to be linked to a static executable fiie before running. Instead, during the execution of a
Java p i o ~ , the Java Virhial Machine loads the compiled Java byte code, Le., the class
files, from the local machine's disk dynamicaily. An abstract class ClassLoader in Java is
offered to define the policy for loading Java classes into the runtime environment By
default, the runtime system Ioads cIasses onginating as files by reading them fiom the
directory defined by the CLASSPATH environment variable.
A classloader is itself an object which is responsible for loading classes. Given the name
of a class, the class[ouder will try to locate or generate data that constitutes a definition for
the class. Java uses the strategy of transfonning the name into a file narne and then reading
a class file with that file name (Le., name.class) fkom a file system. When an executable
Java code needs to use a class that has not yet been loaded, the loadClass (as shown in the
following figure) method of the class ClassLoader is invoked to load the class containing
the desired data.
Protected abstract Class loadclass (String name, boolean resolve)
throws ClassNotFoundException
Figure 43: The Class LwdCim in Java
In the above figure, name is the name of the class to be loaded; resolve is a link which
shows whether the symbolic reference of the name in this class is resolved or not.
This class loading mechanism in Java loads class files only h m a local file system. In
Paralleljava, the class files can be anywhere in the network. Therefore, a mechanism for
dynamic network c l w file loading is necessary in Pdle l java
43.2 Dyaamic Network Class File Loading in ParaIleljava
There exist two kinds of class files in Java, Le., the Java system class files and the user
defined class fiIes. The Java system class files exist on any platforms with the Java system
ported. However, the user class files may exist only on some of the machines within a dis-
tributed network computing environment. Thus, in Pdleljava, different class file loading
mechanisms should be developed to cater to the above two situations. In our irnplementa-
tion, we use the default class file loading mechanism provided k the Java system for sys-
tem class file loading. For the user defined class files in a Java program, we have
developed a network class file loading mechanism.
Although Paralleljava offers a homogeneous distributed network computing environment
with the Paralleljava system ported on a i l the platforms, the user class files still need to be
loaded fkom the network. The reason is that when the client dispatches different tasks to
different servers, it dispatches only the related information about the tasks to the servers
instead of the class files themselves. When a Paralleljava user program begins to execute,
there are no user defined class files on the server sides. When the server runtime system
invokes a user class file which is not available and the nuitirne system can not h d a cor-
responding Java source file fiom the local disk fiom which the class file originates, the
runtirne system will send requests to the client for the class file. The client then uploads
the correspondhg class file to the semer- The class füe can be used by the server imrnedi-
ately without any changes.
The implementation of the dynamic network class file loading in Paraileljava [ s e Figure
4.41 is an extension of Java's machine dependent class importing code. It is written in C. It
uses sockets for communication between the client and the server.
//Objective: to implement the dynamic network class file //loading (server side) //import-md is used by the Java interpreter to load a Java //source file from which a required Java class originates. //Before calling import-md,the source file is made sure existing //on the local disk, in paralleljava, only system *.java f i l e //are on the semer side. Therefore, in this funcation, we //use the file extension to tell the ciifference between a //Java system class file and a user defined class file- int import-md (char *name, char *hint)
{ char **qa ; char f ilename [MAXLïNE] ; if (name [O] == DIR-SEPARATOR) {
return (int) LoadFile (name, " . " , hint ) ; 1 #ifdef PARALLEhlAVA //Changes for Paralleljava begins i f ((strncmp(name, "java", 4 ) ) != 0) (
strcpy ( filename, name) ; strcat (filename, . classn) ; NetLoadFiLe(newsockfd,filename);//Network fileloaing
1 #endif for (cpa = CLASSPATH ( ) ; *cpa; cpa++) {
char pathC2001 ; sprintf (path, "tslc%s." JAVAOBJEXT, *cpa, DIR-SEPARATOR,
name) ; if (LoadFile (path, *cpa, hint) ) {
return 1;
1 1 return ( 0 ) ;
1
Figure 4.4: Tbe Implemtntation of Dynamic Network C b File Lorrding in ParaIleljava
The program code in Figure 4.4 shows the implementation of the dynamic network class
file loading on the Pardeljava semer side. On the client side, Paralleljava uses Java's
class file loading mechanism because al1 class files exist on the client side.
In Figure 4.4, the section of code between #zTdef PARALLEL JAVA and #endifis the exten-
sion made to the Java class loader by Paralleijava In Java, the module import-md Figure
4-41 is used to load any Java source fiie which creates required class files. In Paralleijava,
there are only Java system source files existing on server machines. Therefore, by check-
ing the file extension, Paralleljava can decide which kind of source file it needs to load and
whether it needs to load it fiom the network or firom the local disk. In Figure 4.4, if it is a
user defined Java source file (the Ne extension can not be java in this case), Paralleljava
calls the subroutine NefLuadFile(newsoc~ filename) [Figure 4-51 to load the corre-
sponding class file fiom the network. In the subroutine, the parameter newsoc@l is the
socket file identifier, theflename is the class file name.
//Module name: NetLoadFile //Objective: used on server side import-md to load the //user class file from the client. #ifdef PARALLELJAVA NetLoadFile(int sockfde, char *fne) //The sockfde is a existiag socket number. //The fne is the class file name to be loaded
E: intn, nl, if fd, sizes; charline [MAXLXNE] , f nl [MAXLINEI ; //Send the class file name to the client syscall(SYS-mite, sockfde, fne, strlen(fne)) ; //Open a new file to be written f d = open ( f ne, O-WRONLY 1 0-CREAT 1 0-TFtUNC , 0 744 ) ; //Read the file size from the client syscall (SYS-read, sockf de, &sizes, sizeof (sizes) 1 ; e et the file contents from the client nl = 0; while (ni c sizes) {
n = syscall (SYS-read, sockfde, line, MAXLINE) ; if ( (syscall (SYS-write, fd, line, n) 1 ! = n)
nl = nl + n; 1
close ( fd) ; return;
1 #en&£
Figure 4.5: Tbe NetLoadFik Su broutine
4.3.3 Securiîy
Security is one of the most important issues in developing a distributed network comput-
h g system. In Paraileijava, the security issues c m arise at the time when the client ini-
tiates uploads of files to the server or when a server initiates a download of files nom the
client. Some of the security issues in Paralleljava are addressed within the existing Java
fiamework. For exarnple, when a server downloads files fiom the chent, the security man-
ager in Java regdates access to sensitive functions (e.g., fimctions for updating memory)
and the class loader makes sure that loaded classes are subject to the security manager's
checking and adhere to the standard Java safety guarantees. Furthemore, if w d properly,
there should be no resource abuses existing in the Paralleljava system s h e the network
class file loader in Paralleljava is designed to load only necessary files Erom the client.
However, there are still some security issues in Paralleljava. For exarnple, when the client
uploads files to the server, some of the foliowing might happen:
an unauthorized user application makes use of the server,
a user application hogs resources on the server.
We have not implemented any solutions for these issues- To address these issues, it wouid
involve building additional security mechanisms, however that may affect other users try-
ing to execute code on servers.
4.4 Implementation of data consistency in Paraileljava
Most existing software-based DSM systems use either the operating system's virtual mem-
ory page protection or the compiler extensions to detect and collect writes to shared data-
The first method cm lead to two problems. First, writes might have high overhead since
page faults occur on every writes to a protected page. The page probably needs to be writ-
ten many times to amortize the cost of the page fault. Second, the fked vîrtual memory
page size as the unit of coherency causes false sharing. Mechaaisrns for handling false
sharing rnight increase nin-time overhead and might cause unnecessary data communica-
tion among workstations. The data consistency strategies used in DSM systems extending
the compiler might have some advantages over that used in page-based DSM systems, but
they require modifications to the compiler and they also induce ruritirne consistency mod-
ule overhead. In the impiementation of Paraileljava, because we did not want to do page
granularity data sharing, and because m o w i n g the compiler was beyond the scope of this
work, we adopted a solution Figure 4.61 simiiar to entry consistency, which relies on pro-
gram level annotation to convey coherence related information.
In the entry consistency mode1, the shared data and code are put into critical sections
which are protected by specific synchronization objects. A processor's accesses to the
code and data in the criticai sections are controiied by the synchronization objects. Shared
data become consistent at a processor ody when the processor acquires a synchronization
object that protects the data- in Paralieljava, the shared data is implemented as the shated
object. Any update to the shared object is by caiiing the update method within the class
SharedObject. The shared data is updated &er a successful execution of the update
method.
f Acquire-Lock * 1 Calling-of -the- U p d a t e - M e t h o d
I I
Figure 4.6: The Relationship between EC and Daîa Consistency in Paralleljava
4.4.1 Update Detection, Collection
Udike the mtegies typically used in existing page-based DSM systems and DSM sys-
tems extending compilers, Paraileljava requires neither compiler extensions nor virtual
memory pagefault to ensure memory consistency. In Paralleljava, memory consistency is
irnpiemented by simply broadcasting ail data associated with a synchronization object
during interprocessor communication, Write detec tion is not necessary since Paralleljava
implements its memory consistency mode1 through an update protocoI. This approach is
simple and has no immediate write overhead. However, it will transfer unnecessary data
when synchronization objects guard large data objects that are sparsely written.
In order to reduce the amount of data king m e r r e d , we used a twinoing & difliog
algorithm which is similar to the existing implementation of page-based DSM systems 181.
Using this approach, for each s h e d object (instance of class SharedObject in our imple-
mentatîon) on a procesor, the pmcessor keeps a second copy of each Shared0bjectt At
each synchronization point, the SharedObject bound to the synchronïzation object is com-
pared with its copy to determine which part has been modifïed. This approach avoids the
cost of write detection, but increases the storage requirements (every SharedObject must
be hKinned on any processor which writes it), and the synchronization overhead of the
consistency mechanism (to diff unmodified data and maintain the twin). Moreover, this
approach stiil requires management of the update incarnations to ensure that a chah of
processor updates are correctiy propagated. In the next section, we t ak about the update
propagation among the processors.
4.42 Update Propagation
An advantage of the update-based protocol is that interprocessor cornmunication is only
necessary during the acquisition of synchronization objects. By updating only at synchro-
nization points and only between the synch ron i~g processors, updates to the shared data
guarded by a synchronization object may be coalesced and transmitted to a processor al1 at
once. Further more, by ensuring that updates are performed only when a processor enters a
critical section, unexpected delays in a critical section caused by cache misses c m not
occuf-
In Paraileljava, any operation to the shared object on the server happas only on its own
memory. The client maintains its copy of a shared object as a shared memory to al1 the
memory on the server. At any synchronization point, updates to a shared object on any
server are reflected to the client This is implemented by acquiring the Iock guarding the
shared object on the client (see Figures in section 4.4.3). The client keeps a queue of serv-
ers acquiring the lock. At any t h e , there is only one server holding the lock. The server
with the lock submits its updated information to the client and the client then updates its
own shared object and broadcasts the update to aii the servers.
4.4.3 The Shured Object in Paralleljava
In Java, there exists two kinds of object models. One is the instance of a class, the other is
the array object. The shared object mode1 in Paralleljava is based on the concept of Java
objects. It is implemented as a class with each of its elements implemented as a one-
dimensional string array. Figure 4.7 highiights some of the implementation of the shared
ohjecf class (class SharedObjecf). The s h e d object class in our implementation is actu-
ally a multi-line string b&er which holds the shared data Any instance of such a class
creates a shared object. In the Paralleljava distributed network computing environment,
both the client and the server have copies of the shared object. In the implementation of
the shared object, each server reports any changes it made to the client by calhg the
update method in the SharedObject class. The client then broadcasts the changes to ali the
O ther serves.
//Module n a m e : SharedObject //Objective: to create a shared object in Paralleljava import java, io , * irnport net ,* class sharedobject{
private String valueR; //Value for the string return private int count; //Value for the string storage private int column; //Column of the original array private boolean shared; i/Sharing flag //If it is a client process, then create a Lock Void SharedOb ject ( ) {
if IsClient (me) { Lock instLock = new Lock ( ) ;
//Copy the buffer when the buffer is shared protected void talkWhenShared(int i) {
if (shared) { . . .
1 //Update method for //and broadcast its public synchronized
the server to update shared data updat e void upDate (int an- C l II {
getLock0; //Get the Lock from the client * - *
1 strParse (str) ; talkWhenShared ( row) ;
1 //Needed by semer-client when talkiag with each void setShared0 {shared = true;)
othex
..- ) //End of the SharedObject class
Figure 4.7: The SharedObject Class
Any user program that instantiates the SharedObject class and calls the setshared method
creates a shared object In Paralleljava, the client dispatches the user class which imple-
ments the shared object with different arguments to different servers. The shared object,
therefore, is created on both the client and the servers. In our current implementation of
Paraileljava, the data consistency between client and servers is maintahed by calling the
update method in class SharedObject Any update to the shared object must be reflected to
the client. The client then broadcasts the changes to ai i the server [see Figure 4-81.
\
Client- C
Spvers /
/
Legend
- - - (1) P I makes an update t o the shared object and -m
Annotation then c a l l s the update method to inform the
c l ien t P O of i ts update t o the shared object. - Communication (2) Client PO broadcasts the update to other
servers (P2 t o Pn) .
Shared Object
Figure 4.8: Tùe Data Consistency Model inside the Sbareü Object
The advantage of this implementation is that any updates to the shared object occur only
on the server's local address at the beginning, which reduces the overhead caused by mui-
tiple accesses to the shared object residing on the client. Figure 4-9 and Figure 4.10 are
examples of the client and server which use SharedObject-
//Module Name: testlclient //Objective: Sample client code using shared object to //inpletnent array multiplication //First, create a connection with the server, //Second, create a instance of the class Sharedobject //to implement the distributed array multiplication
import net.*; import java.io.*; class test2cli-t extends ~etwork~lient{
public static void main (String args [] ) { //Create a client, here is "tiger", port 8000 NetworkClient client1 = new NetworkClient ("tigerW, 8000); DataoutputStream out = new DataOutput Stream(clientl.server- Output) ; DataInputStream in = new DataInputStream(clientl,serverIn- put) ; intCl II A = new intC21 C21; for (int i=O; i < 2; i++)
fox (int j=O; j c 2; j++) A[i] [j] = i +j;
Sharedobject sharedÀrray= new SharedOb j ect (A, 2, 2 1 ; sharedArray.print ( 1 ; ALOI [O] = 4; A[O] Cl] = 5 ; sharedArray , upDate (A, 0, 2 ) ; sharedArray-print O ;
Figure 4.9: Sample Client Code
//Module name: test2server //Objective: Sample server code using Sharedobject to //implement array multiplication
import net.'; import java-io.*; class test2server extends ~etworkServer{
public static void main (String args Cl ) { NetworkServer serverl = new NetworkServerO; serverl. startserver (8000 1 ; int[] [] A = new intl21 121 ; for (int i=O; i < 2; i++)
for (int j=O; j c 2; j++) ACi] [j] = i +j;
Sharedobject sharedArray = new SharedOb j ect (A, 2, 2 ;
sharedArray -print () ; Ai01 [O] = 4; AC01 [Il = 5 ; sharedArray-upDate (A, 0 , 2 1 ; sharedArray - print ( ) ;
1 final public void run() {
if (isservex) { / /Try to connect with the client. If it is not //successful, then create a new socket and rebind //with the client.
Figure 4.10: Sample Sewer Code
In the above examples, we assume only one server and one client, so we did not use my
synchronkation mechanism besides the synchronization mechanism fiom Java. However,
in a multi-server environment, each semer c d s the update method explicitly to inform the
client of its changes to the shared data, so it needs more synchronization mechanisms than
the synchronization mechanism fiom Java. We have designed and partly implemented a
synchronization mechanism for the SharedObject class. We have implemented a Queue
cIass [See Figure 4-11] on the client to Iine up the servers that cailed the updare method,
and implemented a Lock class [See Figure 4.121 to guard the access to the shared data on
the client More derailed irnplementation of the Queue class and the Lock class are shown
in the Appendix A and the Appendix B respectively.
//Module name: Queue //Objective: Class Queue is used to create and maintain a //çueue on the client, which lines up al1 the requests of servers //who want to update the shared object on the client. //The queue is basically a string array with each //element as a string. Each string in the queue consists //of 3 categories: //serverPid, portNumber, string (real infor.)
class Queue {
private int length; //~he length of the queue private int index; //Index to the Queue element private String[] queue; //Eletnent of the Queue //Construct a queue with a given length public Queue(int l a ) {
.-. 1 //Append to the queue public void Append(String str) {
. . . 1 //Remove an element from the queue public String Remove ( ) {
- - - 1 //Parse the element in the queue, each catagory is separated
a //space public String Cl P a r s e (String str) {
Figure 4.11: The Queue Class
//Module name: Lock //Objective: Class Lock is used to guard the shared object //on the client. //In the design of the lock, a queue is linked //ta the lock, Servers who want to update the shared //data on the client form a queue-
class Lock { private int value; //The value for this lock private Queue queue;//A waiting queue for this lock boolean held; //~ool value to check if the lock is held
//construct a lock from a given value public Lock (int val) {
//Destructor for the lock, we donrt need to worry about the //release of the string, Java will take caxe of that itself public void unLock(int val) {
//Accquire a lock void Acquire (int n) {
//Release a lock void Release ( 1 (
Figure 4-12: The Loek C b
4.5 Summary
In this chapter, we have given an o v e ~ e w of the design and partial implementation of the
Paralleljava system. The current implernentation of the Paralleljava system is by both
extending the Java rutirne system and adding new class libraries. The dynamic network
class file loading is ùnplemented in the runtime system, however, the shared object is
implemented by a group of class libraries. The data consistency in Paralleljava is imple-
mented by the class SharedObject. The Sharedobject is different fiom the RMI (Rernote
Method Invocation) in the following aspects:
Figure 4.13: SharedObject vs. RMI
No Stub
Shared and can even be used within RMI
A Stub is needed
No sharing
Chapter 5
Conclusions and Future Work
5.1 Conclusions
The goal of this thesis was to explore the potential for irnplementing paralleLimi by using
exiting features in Java.
Our exploration began by looking at Java's multithreading feature. Java supports multiple
threads of control, but threads are created and executed on a single machine. In order to
implement parallelism, different threads in a Java program should be created either on dif-
ferent machines or be created on one machine and then dispatched to different machines.
In our exploration, we have successfùlly executed different threads in the user's Java pro-
gram on different machines. The results of our exploration show that it is easy to create
threads on different machines and then execute them. However, it is more dficult to cre-
ate threads on one machine and then dispatch them to different m a c b s . The reason is
that dispatching threads to different machines requires thread-reIated information, such as,
thread name, thread execution context, etc.
After experimentation with Java's multithreading feature, we have explored several alter-
natives for data consistency which include update detection and update propagation. The
alternatives for update detection include: extending the fiinctionaiity of the store instruc-
tion in the Java Virtual Machine; using virhial memory page protection mechanisms and
using information provided by the application programs. The alternatives for update prop-
agation include: extending the fhctionality of the return operator in Java; extending the
existing synchronization mechanisms in Java and utilking the information provided by
the application programs.
We have also studied some alternatives for creating distributed shared objects. Since all
the objects in Java are created using the new operator, our stuclies are based on extending
this operator. Our studies include introducing a keyword sliared, an operator shared-New
or a class SharedObject. The results of our studies show that the third alternative through
introducing a class ShoredObject is more suitable for developing a parallel computing sys-
tem. The reason is that by extending the class Iibrary, the r e d t parallel computing system
will not have to be updated when the Java system itself is changed. Both of the first two
alternatives require changes to the Java compiler and the Java interpreter.
We have also successfûlly implemented dynamïc network class file loading within Java.
Java supports dynamic class file loading, but all the class files are on the Local disk. In a
distributed, paralle1 computing environment, a user class file can be anywhere in the net-
work- Therefore, developing a dynamic network class file loading mechanism is very
important.
Based on our exploration, we conclude that it is possible to implement parallelism within
the Java fiamework. We have also described in this thesis the design and partial imple-
mentation of an experimental pualle1 computing system that we c d Paralleljava. Parallel-
java system made extensions to both the Java API and the Java Runtime system. It uses a
data consistency modd sirnilar to entry consistency and implements an update-based pro-
tocol.
Using Paralleljava, users can create distributed Java programs. Shared data in Paralleljava
are managed by the class SharedObject. Any Ilistantiation of the class SharedObject cre-
ates a shared object. Data consistency between shared objects is maintained by calliig the
update method of the class SharedObject,
5.2 Future Research
In this thesis, we have explored various possibilities in developing distributed, pardel
computing systems using features existing in the Java system. We have implemented
dynamic network class file loading and we have designed and partially implemented a
DSM system which we cal1 Paralleljava.
However, a number of issues still remain to be addressed- They include:
Security: we have highlighted some of the potential secinity issues with P d e l j a v a in
this thesis, such as an unauthorized user application making use of a server and a user
application hogging resomes on the server. For the first issue, one possible solution is
to nin a daemon on the server to iden* the client machines so that unauthorized
machines can not connect to that server. A possible solution for the second issue is to
set a limit on the resources (such as, memory and CPU time etc.) each machine can use
so that each dient application can use ody certain amount of CPU time and memory.
Building another version of Paraileljava with data consistency implemented through
extending the existîng Java synchronization mechanisms and the Java compiler. In the
current version of Paralleijava, data consistency is enforced through information pro-
vided nom application progratns (e.g., the update method in class SharedObject). How-
ever, in our shidies in chapter 3, we have shown that it is possible to implement data
consistency without application program annotation. For instance, the store instructions
in the Java vimial machine c m be extended for update detection and the existing Java
synchronization mechanisms can be extended for update propagation.
Appendix A The Queue Class
//Module name: Queue //Objective: Class Queue is used to create and maintain a //queue on the client, which lines up al1 the requests of servers //who want to update the shared object on the client. //The queue is basically a string array with each //element as a string. Each string in the queue consists //of 3 categories: //serverPid, portNumber, string (real infor - class Queue {
private int length; //The length of the queue private i n t index; //Index to the Queue element private String[] queue; //Element of the Queue //construct a queue with a given length public Queue (int len) {
queue = new String Clen] ; for (int i = O; i c len; i++) queueii] = new String();
length = len;
1 //~ppend to the queue public void Append(String str) {
int len = length; thi~~queueflen] = str; length += 1;
1 //Remove an element from the queue public String Remove0 {
length -= 1; String str = new String (queue [lengthl 1 ; return str;
1 //Parse the element in the queue, each catagory is separated by
//space public String ll Parse (String
Systern. out .println ( -This
Appendix B The Lock Class
//Module name: Lock //Objective: Ciass Lock is used to guard the shared object //on the client. //In the design of the lock, a queue is linked //to the lock, Servers who want to update the shared //data on the client form a queue. class Lock {
pxivate int value; //The value for this lock private Queue queue;//A waiting queue for this lock boolean held; //Bo01 value to check if the lock is held
//construct a lock from a given value public Lock(int val) {
value = val; held = false; queue = new Queue(100) ;
}
//Destxuctor for the lock, we dontt need to worry about the //release of the string, Java will take care of that itself public void unLock(int val) {
held = false; }
//Accquire a lock void Acquire (int n) {
while (held ! = true) { //insert this request to the queue this-queue = Integer.toString(n) + this-queue; this.queue,Append(str) ; 1 held = true;
1 //Release a lock void Release ( ) {
//remove a request from the queue held = false;
1 }
Bibliography
[1] W-Richard Stevens. Unir Neîwork Progrumrning- PTR Prentice Hall, Englewood
Cliffs, New Jersey 07632.
[2] Tim Lindholm, Frank YeIlin. The Java Virtual Machine Spectjicution. Addison- Wes-
ley, 1996.
[3] James Gosling, Biil Joy, and Guy Steele. The Java Language Spectjkation. Addison-
WesIey, 1996.
[4] Ann Wolhth, Roger Riggs, and Jim Waldo. A Distributed Object Mode1 for the Java
System. Javasoft- Computing Systems 9(1), pages 265-290, 1996.
[SI Roger Riggs, Jim Waldo, Ann Wollrath. Pickling State in the Java System. The 2nd
USENM Conference un Object-Oriented Technologies, 1996.
[6] TreadMarks Documentation. http://www.cs.rice.edu/-willy/rreadMarks/over-
view-htrnl.
[7] Ken Arnold, James Gosling. m e Java Programming Lanugage. Addison-Wesley,
1996
[8] Peter Keleher, et al. TreadMarks: Distributed Shared Memory on Standard Worksta-
tions and Operating Systems. In Proceedings of the Winter 91 Useni% Con&rence,
pages 115-131, January 1994.
[9] Brian N. Bershad, et al. The Midway Distributed Shared Memory System. In Pro-
ceedings of the '93 CompCon Conference, pages 528-537, February 1993.
[l O] A-Silberschatz, J-Peterson, et al- Operating System Concepts. Addison Wesley Pub-
iishing Company, 199 1.
[ I l ] Gary Comeli, Cay S-Horstmam. Core Java. Prentice Hall PTR @CS Professional),
December 1998.
1121 Harjinder Sandhu. Shared Regions: A Strategy for Efficient Cache Management in
Shared Memory Multiprocessors. Ph.D thesis, University of Toronto, July 1995.
[ 1 31 PVM: Parallel Virtad Machine. hffp:/ /~.netl ib.org/p~/booWnode 1 .html.
[13] Paul S. Wang. C++ with Object-Oriented Programming. PWS hblishing Company,
1994.
[14] J h Waldo, Geoff Wyant, Ann WolIrath, and Sam Kendail. A Note on Distributed
Computing. Sun Microsystems Laboratorks Technical Report, SMLI TR-94-29,
November 1994.
[15] Robert Orfail, Dan Harkey, Jeri Edwards. The Essential Disrribuied Objects SurvivaZ
Guide. John Wiley & Sons, Inc, September 1995.
1161 Gregory V- Wilson and Paul Lu. ParafZeZ Programming Using C++. MIT Press,
1996.
[17] Cristana Amza, et al. TreadMarks: Shared Memory Computing on Networks of
Workstations. E E E Compter, Vol. 29, No. 2, pages 18-28, February 1996.
[18] Esharat Arjomandi, William 07Farreii, et al. ABC*: Concurrency by inheritance in
C++. lBM System Journal, Vol. 34, No. 1, pages 120-1 37, January 1995.
[19] John K, Bennett, et al. Munin: Distributed Shared Memory Based on Type-Specific
Memory Coherence. in Proceedings of the 1990 Conference on the Princlples and
Practice of ParaIIel Programming, March 1 990.
[20] Peter keleher, Alan L. et ai. Lazy ReIease Consistency for Software Distributed
Shared Memory. In Proceedings of the 19th Annual International Symposium on Com-
puter Architecture. Pages 1 3-2 1, May 1992.
[2 1 ] ] John B. Carter, John K. Bennet, et al. lmplementation and Pefiormance of Munin. In
Proceedings of the 13th ACM Symposium on Operuting Systems Principles, pages
152-1 64, October 199 1.
[22] Alan L. Cox, Sandhya Dwarkadas, et al. An integrated Approach to Distributed
Shared Memory. First international Workshop on Parallei Processing, December
1994.
[23 3 Sandhya Dwarkadas, Peter keleher, et al. Evaluation of Release Consistent S o b a r e
Distributed Shared Memory on Emerging Network Technology. iSCA93, pages 144-
155, May 1993.
[24] M. J- Zekauskas, W. A. Sawdon, B. N. Bershad, Software Write Detection for a Dis-
tributed Shared Memory. OSDII, pages 87-100, Nov. 1994.
[25] R. K. Karne. Object-oriented Cornputer Architectures for New Generation of Appli-
cations. Compurer Architecture News, Vol. 23, No. 5-Dec- 1995.
[27] Introduction to Distributed Shared Memory. http://cne.gmuedulmoduies/dsml
index-html
[28] G. Hilderink, J. Broenink, et al. Communicating Java Threads. In Proceedings of the
20th WorM Occam and Tramputer User Group Technical Meeting, pages 48-76,
1997.
[29] D. Thompson, D. Watkhs. Cornparisons between CORBA and DCOM: Architec-
tures for Distributed Computing. http ~/~.sd.monash.edu.au/research/pubLcatiod
1 997/ABSTRACTS.html#P97- 1
[30] Distributed Shared Memory Home Pages. http:/f-.cs.umd.edu/-keleher/dsmsmhtml
Recommended