Elimination of Java Array Bounds Checks in the Presence of

CONCURRENCY AND COMPUTATION: PRACTICE AND EXPERIENCEConcurrency Computat.: Pract. Exper. 2004; 16:1 Prepared using cpeauth.cls [Version: 2002/09/19 v2.02]

Elimination of Java ArrayBounds Checks in thePresence of Indirection

Mikel Lujan1,∗ John R. Gurd1, T. L. Freeman1 and Jose Miguel2

1 Centre for Novel Computing, University of ManchesterOxford Road, Manchester M13 9PL, United Kingdom.{mlujan, jgurd, lfreeman}@cs.man.ac.uk2 Dep. de Arquitectura y Tecnologia de Computadores,UPV-EHUP.O. Box 649, 20080 Donostia-San Sebastian, [email protected]

SUMMARY

The Java language specification states that every access to an array needs to be within thebounds of that array; i.e. between 0 and array length −1. Different techniques for differentprogramming languages have been proposed to eliminate explicit bounds checks. Someof these techniques are implemented in off-the-shelf Java Virtual Machines (JVMs). Theunderlying principle of these techniques is that bounds checks can be removed when aJVM/compiler has enough information to guarantee that a sequence of accesses (e.g.inside a for-loop) is safe (within the bounds).

Most of the techniques for the elimination of array bounds checks have been developedfor programming languages that do not support multi-threading and/or enable dynamicclass loading. These two characteristics make most of these techniques unsuitable forJava. Techniques developed specifically for Java have not addressed the elimination ofarray bounds checks in the presence of indirection, that is, when the index is stored inanother array (indirection array).

With the objective of optimising applications with array indirection, this paperproposes and evaluates three implementation strategies, each implemented as a Javaclass. The classes provide the functionality of Java arrays of type int so that objects ofthe classes can be used instead of indirection arrays. Each strategy enables JVMs, whenexamining only one of these classes at a time, to obtain enough information to removearray bounds checks.

Note: This is a preprint of an article accepted for publication in Concurrency andComputation: Practice & Experience Copyright c©2003 John Wiley & Sons, Ltd. Thevolume, issue and pages are not known at the moment.

∗Correspondence to: M. Lujan, Centre for Novel Computing, University of Manchester, Oxford Road,Manchester M13 9PL, United Kingdom.Contract/grant sponsor: ML acknowledges the support of a postdoctoral fellowship from the Department ofEducation, Universities and Research of the Basque Government.JM acknowledges the support of the Spanish MCYT, contract TIC2001-0591-C02-02.

Copyright c© 2004 John Wiley & Sons, Ltd.

2 LUJAN ET AL.

key words: Array bounds check, Java, array indirection

1. Introduction

The Java language was designed so as to avoid the most common errors made bysoftware developers. Arguably, these are memory leaks, array indices out-of-bounds and typeinconsistencies. These language features are the basic pillars that make Java an attractivelanguage for software development. On these basic pillars Java builds a rich API for distributed(network) and graphical programming, and has built-in threads. Underneath these, Javaprovides a virtual machine which, together with the language specification, realises theportability of programs – “write once, run anywhere”.

The cost of these enhanced language features is performance. The first generation of JavaVirtual Machines (JVMs) were bytecode interpreters that focused on functionality, not onperformance. Much of the reputation of Java as a slow language comes from early performancebenchmarks with these immature JVMs.

Nowadays, JVMs are in their third generation and are characterised by (1) mixed executionof interpreted bytecodes and machine code generated just-in-time; (2) profiling of applicationexecution; (3) selective application of compiler transformations to time-consuming parts ofthe application – “hot spots”; and (4) deoptimisation of parts of the application when theanalysis that allowed their optimisation is not longer valid. The alternative approach of staticcompilers (i.e. compilers of Java or bytecodes which generate machine code ahead of execution)until recently was incompatible with the language definition, specifically the dynamic loadingof classes at run-time. TowerJ† was the first Java static compiler which passed the Javacompatibility test. It generates machine code for the parts of a given program which areaccessible at compilation. It also includes in the generated machine code a JVM which candynamically load other classes available only at run-time for the given program.

The performance of modern JVMs is increasing steadily and getting closer to that of C andC++ compilers especially on commodity processors and operating systems. See, for example,the Scimark benchmark‡ and the Java Grande Forum (JGF) Benchmark Suite§ [9]. Despitethe remarkable improvement of JVMs, some standard compiler transformations (e.g. loopreordering transformations) are still not included, and some overheads intrinsic to the languagespecification remain.

The run time overhead of array bounds checks is intrinsic to the language since Java’sspecification requires that:

(1) all array accesses are checked at run time; and

†TowerJ web site http://www.towerj.com‡Web page http://math.nist.gov/scimark2.§Web page http://www.epcc.ed.ac.uk/javagrande.

Copyright c© 2004 John Wiley & Sons, Ltd. Concurrency Computat.: Pract. Exper. 2004; 16:1–1Prepared using cpeauth.cls

ELIMINATION OF ARRAY BOUNDS CHECKS IN THE PRESENCE OF INDIRECTION 3

(2) any access outside the bounds (index less than zero or greater than or equal to thelength) of an array throws an ArrayIndexOutOfBoundsException.

The specific case under consideration is the elimination of array bounds checks in thepresence of indirection. For example consider the array accesses foo[ indices[6] ], whereindices and foo are both one-dimensional arrays of type int. The array foo is accessedthrough the indirection array indices. Two checks are performed at run time. The firstone tests the access to indices and is out of the scope of this paper. Readers interested intechniques implemented for Java in JVMs/compilers for eliminating checks of this kind ofarray accesses should consult [33, 42, 13, 12, 2, 5, 39]. The second check tests the access tofoo and this is the kind of array bounds check that is the subject of this paper.

When an array bound check is eliminated from a Java program, two things are accomplishedfrom a performance point of view. The first, direct, reward is the elimination of the check itself(at least two integer comparisons). The second, indirect, reward is the possibility for othercompiler transformations. Due to the strict exception model specified by Java, instructionscapable of causing exceptions are inhibitors of compiler transformations. When an exceptionarises in a Java program the user-visible state of the program has to look as if everypreceding instruction has been executed and no succeeding instructions have been executed.This exception model prevents, in general, instruction reordering.

The following section motivates the work and defines the problem to be solved. Section3 describes related work. Section 4 presents the three implementation strategies that enableJVMs to examine only one class at a time to decide whether the array bounds checks canbe removed. The solutions are described as classes to be included in a multi-dimensionalarray package – the Multiarray package proposed in Java Specification Request (JSR)-083¶.The strategies were generated in the process of understanding the performance delivered byour Java library OoLaLa [29, 30, 28], independently of the JSR. Section 5 evaluates theperformance gained by eliminating array bounds checks in a scientific computing kernel witharray indirection; it also determines the overhead generated by introducing classes instead ofindirection arrays. Section 6 summarises the advantages and disadvantages of the strategies.Conclusions and future work are presented in Section 7.

The JGF, a group of researchers in High-Performance Computing, was constituted to explorethe potential benefits that Java can bring to their research areas. From this perspective theforum has observed Java’s limitations and noted solutions [22, 23, 7, 44]. This paper extendsthe work of the JGF, specifically, the work of IBM’s Ninja group [33, 34, 35, 36, 37] in multi-dimensional arrays and array bounds checks. To the best of the authors’ knowledge, this isthe first work in the elimination of array bounds checks in the presence of indirection forprogramming languages with dynamic code loading and built-in threads.

¶JSR-083 Multiarray package web site http://jcp.org/jsr/detail/083.jsp


4 LUJAN ET AL.

public class ExampleSparseBLAS {// y = A*x + ypublic static void mvmCOO ( int indx[], int jndx[],

double value[], double y[], double x[] ) {for (int k = 0; k < value.length; k++)

y[ indx[k] ] += value[k] * x[ jndx[k] ];}

}

Figure 1. Sparse matrix-vector multiplication using coordinate storage format.

2. Definition of the Problem

Array indirection is ubiquitous in implementations of sparse matrix operations. A matrix isconsidered sparse when it contains an order of magnitude fewer nonzero elements than zeroelements. This kind of matrix arises frequently in Computational Science and Engineering(CS&E) applications where physical phenomena are modelled by differential equations. Thecombination of differential equations and state-of-the-art solution techniques produces sparsematrices.

Efficient storage formats (data structures) for sparse matrices rely on storing only thenonzero elements in arrays. For example, the coordinate (COO) storage format is a datastructure for a sparse matrix that consists of three arrays of the same size; two of type int(indx and jndx) and the other (value) of any floating point data type. Given an index k,the elements in the k-th position in indx and jndx represent the row and column indices,respectively. The element at the k-th position in the array value is the corresponding matrixelement. Those elements that are not stored have value zero. Figure 1 presents the kernel of asparse matrix-vector multiplication where the matrix is stored in COO format (mvmCOO). Thisfigure illustrates an example of array indirection and the kernel described is commonly usedin the iterative solution of systems of linear equations. “It has been estimated that about 75%of all CS&E applications require the solution of a system of linear equations at one stage oranother” [24].

Array indirection can also occur in the implementation of irregular sections of multi-dimensional arrays [34] and in the solution of non-sparse systems of linear equations withpivoting [16, 21].

Consider the set of statements in Figure 2. The statement with comment Access A can beexecuted without difficulty. On the other hand the statement with comment Access B wouldthrow an ArrayIndexOutOfBoundsException exception because it tries to access position -4in the array foo.

The difficulty of Java, compared with other main stream programming languages, is thatseveral threads can be running in parallel and more than one can access the array indx. Thus,



int indx[] = {1, -4};double foo[] = {3.14159, 2.71828, 9.8};... foo[ indx[0] ] ... // Access A... foo[ indx[1] ] ... // Access B

Figure 2. Example of array indirection.

it is possible for the elements of indx to be modified before the statements with the commentsare executed. Even if a JVM could check all the classes loaded to make sure that no otherthread could access indx, new classes could be loaded and invalidate such analysis.

3. Related Work

Related work can be divided into (a) techniques to eliminate array bounds checks, (b) escapeanalysis, (c) field analysis and related field analysis; and (d) IBM’s Ninja compiler and multi-dimensional array package (the precursor of the present work). A survey of other techniquesto improve the performance of Java applications can be found in [25].

Elimination of Array Bounds Checks – To the best of the authors’ knowledge, none ofthe published techniques, per se, and existing compilers/JVMs can:

• optimise array bounds checks in the presence of indirection;• are suitable for adaptive just-in-time compilation;• support multi-threading; and• support dynamic class loading.

Techniques based on theorem provers [43, 38, 47] are too heavy-weight. Algorithms based ona data-flow-style have been published extensively [31, 19, 20, 26, 3, 10], but only for languageswithout multi-threading.

Bodik, Gupta and Sarkar [5] developed the ABCD (Array Bounds Check on Demand)algorithm for Java. It is designed to fit the time constraints of analysing the program andapplying the transformation at run-time. ABCD targets business-style applications and usesa sparse representation (known as inequality graph) of the program which does not includeinformation about multi-threading. Thus, the algorithm, although the most interesting for thepurposes of this paper, cannot handle indirection since the origin of the problem is multi-threading. The strategies proposed in Section 4 can provide cheaply that extra information toenable ABCD to eliminate checks in the presence of indirection.

Escape Analysis – In 1999, four papers were published in the same conference describingescape analysis algorithms and the possibilities of optimising Java programs by the optimalallocation of objects (heap vs. stack) and the removal of synchronization [11, 6, 4, 45]. Escapeanalysis tries to determine whether an object that has been created by a method, a thread,


6 LUJAN ET AL.

or another object escapes the scope of its creator. Escape means that another object can geta reference to the object and, thus, make it live (not garbage collected) beyond the executionof the method, the thread, or the object that created it. The three strategies presented inSection 4 require a simple escape analysis. The strategies can only provide information (classinvariants) if every instance variable does not escape the object. Otherwise, a different threadcan get access to instance variables and update them, possibly breaking the desired classinvariant.

Field Analysis and Related Field Analysis – Both field analysis [15] and relatedfield analysis [1] are techniques that look at one class at a time and try to extract as muchinformation as possible about the fields (or instance variables) relying on the visibility rulesand statements of their class. This information can be used, for example, for the resolution ofmethod calls or object inlining. Related field analysis looks for relations among the instancevariables of objects. Aggarwal and Randall [1] demonstrate how related field analysis can beused to eliminate array bounds checks for a class following the iterator pattern [14]. This workuses field analysis in order to generate class invariants which, then, can be used to eliminatearray bounds checks in the presence of indirection. The actual algorithms to test the relationsand generate the class invariants have not been used in previous work on field analysis. Thedemonstration of eliminating array bounds checks given in [1] cannot be applied in the presenceof indirection.

IBM’s Ninja project and multi-dimensional array package – IBM’s Ninja group[33, 34, 35, 36, 37] has focused on optimising Java numerical intensive applications basedon arrays. Midkiff, Moreira and Snir [33] developed a loop versioning algorithm so thatiteration spaces of nested loops can be partitioned into safe regions for which it can be provedthat no exceptions (null checks and array bound checks) and no synchronisations can occur.Having found these exception-free and syncronisation-free regions, traditional loop reorderingtransformations can be applied without violating the strict exception model.

The Ninja group designed and implemented a multi-dimensional array package [34] (toreplace Java arrays) for which the discovery of safe regions becomes easier. To eliminatethe overhead introduced by using classes, they have developed semantic expansion‖ [46].Semantic expansion is a compilation strategy by which selected classes are treated as languageprimitives by the compiler. In the prototype Ninja static compiler, they successfully implementthe elimination of array bounds checks, together with semantic expansion for their multi-dimensional array package and other loop reordering transformations.

The Ninja compiler is not compatible with the specification of Java since it does not supportdynamic class loading. The semantic expansion technique ensures only that computationsthat directly use the multi-dimensional array package do not suffer overheard. Although thecompiler is not compatible with Java, this does not mean that the techniques they havedeveloped could not be incorporated into a JVM. These techniques would be especiallyattractive for quasi-static compilers [41].

‖Since first publishing this work [46], the Ninja group has decided to change their original term, “semanticinlining”, to “semantic expansion” [37].



This paper extends the Ninja group’s work by tackling the problem of eliminating arraybounds checks in the presence of indirection. The strategies, described in Section 4, generateclasses that are incorporated into a multi-dimensional array package; for example the proposedpackage in the JSR-083. If accepted, this JSR will define the standard Java multi-dimensionalarray package and it is a direct consequence of the Ninja group’s work.

4. Strategies

The strategies that are described in this section have a common goal: to produce a class forwhich JVMs can discover the pertinent invariants simply by examining the class and, thereby,derive information to eliminate array bounds checks. Each class provides at least the samefunctionality as Java arrays of type int and, thus, can substitute for them. Objects of theseclasses would be used instead of indirection arrays. The three strategies generate three differentclasses that naturally fit into the multi-dimensional array package. Figure 3 presents a subsetof the public methods that the three classes implement; the three classes extend the abstractclass IntIndirectionMultiarray1D.

Part of the class invariant that JVMs should discover is common to all three strategies; thevalues returned by the methods getMin and getMax are always lower bounds and upper bounds,respectively, of the elements stored. This common part of the invariant can be computed, forexample, using the algorithm for constraint propagation proposed in the ABCD algorithm [5],suitable for just-in-time dynamic compilation.

The reflection mechanism provided by Java can interfere with the three strategies. Forexample, a program using reflection can access instance variables and methods without knowingthe names. Even private instance variables and methods can be accessed! In this way aprogram can read from or write to instance variables and, thereby, violate the visibility rules.The circumstances under which this can happen depend on the security policy in-place foreach JVM. In order to avoid this interference, hereafter the security policy is assumed to:

1. have a security manager in-place;2. not allow programs to create a new security manager or replace the existing one (i.e.

permissions setSecurityManager and createSecurityManager are not granted; seejava.lang.RuntimePermission);

3. not allow programs to change the current security policy (i.e. permissions getPolicy,setPolicy, insertProvider, removeProvider, write to security policy files are notgranted; see java.security.SecurityPermission and java.io.FilePermission); and

4. not allow programs to gain access to private instance variables and methods (i.e.permission suppressAccessChecks are not granted; see java.lang.reflect.Reflect-Permission).

JVMs can test these assumptions in linear time by invoking specific methods in thejava.lang.SecurityManager and java.security.AccessController objects at start-up.These assumptions do not imply any loss of generality since CS&E applications do not requirereflection for accessing private instance variables or methods. In addition, the described


8 LUJAN ET AL.

package multiarray;

MultiarrayUncheckedException MultiarrayIndexOutOfBoundsException

MultiarrayRunTimeException

RunTimeException

Multiarray

Multiarray1D Multiarray2D Multiarray<rank>D

BooleanMultiarray<rank>D

IntMultiarray<rank>D

<type>Multiarray<rank>D

public abstract class IntMultiarray1D extends Multiarray1D{public abstract int get ( int i );public abstract void set ( int i, int value );public abstract int length ();

}

public abstract class IntIndirectionMultiarray1Dextends IntMultiarray1D {

public abstract int getMin ();public abstract int getMax ();

}

Figure 3. Public interface for classes that substitute Java arrays of int and constitute a multi-dimensional array package, multiarray.



security policy assumptions represent good security management practice for general purposeJava programs. For a more authoritative and detailed description of Java security, see [17].

The first strategy is the simplest. Given that the problem arises from parallel executionof multiple threads, a trivial situation occurs when no thread can write in the indirectionarray. In other words, part of the problem disappears when the indirection array is immutable:ImmutableIntMultiarray1D class.

The second strategy uses the synchronization mechanisms defined in Java. The objectsof this class, namely MutableImmutableStateIntMultiarray1D, are in either of two states.The default state is the mutable state and it allows writes and reads. The other state is theimmutable state which allows only reads. This second strategy can be thought of as a way tosimplify the general case to the trivial case proposed in the first strategy.

The third and final strategy takes a different approach and does not seek immutability.Only an index that is outside the bounds of an array can generate an ArrayIndexOutOf-BoundsException: i.e. JVMs need to include explicit bounds checks. The number of threadssimultaneously accessing (writing/reading) an indirection array is irrelevant as long as everyelement in the indirection array is within the bounds of the arrays accessed through indirection.The third class, ValueBoundedIntMultiarray1D, enforces that every element stored in anobject of this class is within the range of zero to a given parameter. The parameter must begreater than or equal to zero, cannot be modified and is passed in to the constructors.

4.1. ImmutableIntMultiarray1D

The methods of a given class can be divided into constructors, queries and commands [32].Constructors are those methods of a class that, once executed (without anomalies), create anew object of that class. Moreover, in Java, a constructor must have the same name as itsclass. Queries are those methods that return information about the state (instance variables)of an object. These methods do not modify the state of any object, and can depend on anexpression of several instance variables. Commands are those methods that change the state(modify the instance variables) of an object.

The class ImmutableIntMultiarray1D follows the simple idea of making its objectsimmutable. Consider the abstract class IntIndirectionMultiarray1D (see Figure 3), themethods get, length, getMin and getMax are query methods. The method set is a commandmethod. Since the class is abstract, it does not declare constructors.

Figure 4 presents a simplified implementation of the class ImmutableIntMultiarray1D.In order to make ImmutableIntMultiarray1D objects immutable, the command methods areimplemented simply by throwing a MultiarrayUncheckedException. By definition, the querymethods do not modify any instance variable. The instance variables (array, length, minand max) are declared as final (which means they cannot be modified once they have beeninitialised) and every instance variable is initialised by each constructor.

Note that the only statements (bytecodes in the JVMs) that write to the instancevariables occur in constructors and that the instance variables are private and final.These two conditions are almost enough to derive that every object is immutable. Howeverthey do not guarantee that a method of another class cannot modify an object of classImmutableIntMultiarray1D. For this purpose, it is also necessary that the instance variables


10 LUJAN ET AL.

public final class ImmutableIntMultiarray1D extendsIntIndirectionMultiarray1D {

private final int array[];private final int length;private final int min;private final int max;

public ImmutableIntMultiarray1D ( int values[] ) {int temp, auxMin, auxMax;auxMin = 0;auxMax = 0;length = values.length;array = new int [length];for (int i = 0; i < length; i++){

temp = values[i]; array[i] = temp; // element-by-element copyif ( auxMin > temp ) auxMin = temp;if ( auxMax < temp ) auxMax = temp;

}max = auxMax; min = auxMin;

}public int get ( int i ) {

if ( i >= 0 && i < length ) return array[i];else throw new MultiarrayIndexOutOfBoundsException();

}public void set ( int i , int value ) {

throw new MultiarrayUncheckedException();}public int length () { return length; }public int getMin () { return min; }public int getMax () { return max; }

}

Figure 4. Simplified implementation of class ImmutableIntMultiarray1D.

do not escape the scope of the class (i.e. escape analysis — see Section 3 — is required). Thisis ensured when

• these instance variables are created by a method of the class ImmutableIntMultiarray-1D;

• they are all declared as private; and• none of the methods of class ImmutableIntMultiarray1D returns a reference to any

instance variable.

The life span of these instance variables therefore cannot exceed that of its creator object.In contrast the instance variable array would escape the scope of the class if a constructor

were to be implemented as shown in Figure 5. It would also escape the scope of the class if the



public final class ImmutableIntMultiarray1D extendsIntIndirectionMultiarray1D {

...public ImmutableIntMultiarray1D ( int values[] ) {

int temp, auxMin, auxMax;length = values.length;array = values; // both reference the same arrayfor (int i = 0; i < length; i++){

temp = values[i];if ( auxMin > temp ) auxMin = temp;if ( auxMax < temp ) auxMax = temp;

}max = auxMax;min = auxMin;

}...

public Object getArray () { return array; }...}

Figure 5. Methods that enable the instance variable array to escape the scope of the classImmutableIntMultiarray1D.

non-private method getArray (see Figure 5) were included in the class. In both cases, anynumber of threads can get a reference to the instance variable array and modify its contents(as shown in Figure 6).

Consider an algorithm which checks that:

• only bytecodes in constructors write to any instance variable;• every instance variable is private and final; and• any instance variable whose type is not primitive does not escape the class.

Such an algorithm can:

• determine whether any class produces immutable objects; and• it is of complexity O(#b × #iv), where #b is the number of bytecodes and #iv is the

number of instance variables.

Hence, the algorithm is suitable for just-in-time compilers. Further, once the JSR-083becomes the Java standard multi-dimensional array package, JVMs can treat the classImmutableIntMultiarray1D as a special case and produce the invariant without doing anycheck.

The algorithm presented above is restrictive in the sense that ensures immutability for thewhole life of those objects of a given class. A less restrictive approach to immutability and itsapplications to program optimisation are described in [40].


12 LUJAN ET AL.

public class ExampleBreakImmutability {public static void main ( String args[] ) {

int fib[] = {1, 1, 2, 3, 5, 8};ImmutableIntMultiarray1D indx = new

ImmutableIntMultiarray1D( fib );fib[0] = -1;System.out.println( indx.get( 0 ) ); // Output: -1int aux[] = (int[]) indx.getArray();aux[0] = -8;System.out.println( indx.get( 0 ) ); // Output: -8

}}

Figure 6. An example program modifies the contents of the instance variable array using the methodand the constructor implemented in Figure 5.

The constructor provided in the class ImmutableIntMultiarray1D is inefficient in termsof memory requirements. This constructor implies that, at some point during execution, aprogram would consume double the necessary memory space in order to hold the elementsof an indirection array. This constructor is included mainly for the sake of clarity. A morecomplete implementation of this class will provide other constructors that read the elementsfrom files or input streams.

The implementation of the method get includes a specific test for the parameter i beforeaccessing the instance variable array. This test ensures that accesses to array are not out ofbounds. Tests of this kind are implemented in every method set and get in the subsequentclasses. For clarity, these tests are omitted in the implementations of classes shown hereafter.

4.2. MutableImmutableStateIntMultiarray1D

Figure 7 presents a simplified and non-optimised implementation of the second strategy. Theidea behind this strategy is to ensure that objects of class MutableImmutableStateInt-Multiarray1D can be in only one of two states:

Mutable state – Default state. The elements stored in an object of the class can be modifiedand read (at the user’s own risk).

Immutable state – The elements stored in an object of the class can be read but notmodified.

The strategy relies on the synchronization mechanisms provided by Java to implement theclass. Every object in Java has associated with it a lock. The execution of a synchronized methodor synchronized block is a critical section. Given an object with a synchronized method, beforeany thread can start execution of the method it must first acquire the lock of that object.



Upon return from the method the thread releases the lock. The same applies to synchronizedblocks. At any point in time at most one thread can be executing a synchronized method ora synchronized block for the given object.

The Java syntax and the standard Java API do not provide the concept of acquiring andreleasing an object’s lock. Thus, a Java application does not contain special keywords nordoes it invoke a method of the standard Java API to access the lock of an object. Theseconcepts are part of the specification for execution of Java applications. Further details aboutmulti-threading in Java can be found in [18, 27].

Consider an object indx of class MutableImmutableStateIntMultiarray1D. Theimplementation of this class (see Figure 7) enforces that indx starts in the mutable state.The state is stored in the boolean instance variable isMutable whose value is kept equivalentto the boolean expression (reader == null). For the mutable state, the implementations ofthe methods get and set are as expected.

From the default state (mutable) the object indx can only change its state when a threadinvokes its synchronized method passToImmutable. When the state is mutable indx changesits instance variable isMutable to false and stores the thread that executed the method inthe instance variable reader. However if the state is already immutable, the thread executingthe method stops until the state becomes mutable and then proceeds as explained for themutable state.

Once indx is in the immutable state, the get method is implemented as expected while theset method cannot modify the elements of array until indx returns to the mutable state.

The object indx returns to the mutable state when the same thread that successfullyprovoked the transition mutable-to-immutable invokes in indx the synchronized methodreturnToMutable. When the transition is completed, this thread notifies one of the threads,if any, waiting in indx of the state transition.

Given the complexity of matching the lock-wait-notify logic with the statements (bytecodes)that access the instance variables, it is unlikely that JVMs will incorporate tests for this kindof class invariant. Thus, the remaining alternative is that JVMs recognise the class as beingpart of the standard Java API and automatically produce the desired class invariant.

Note again that to guarantee the class invariant, the instance variables of the classMutableImmutableStateIntMultiarray1D cannot escape the scope of the class.

4.3. ValueBoundedIntMultiarray1D

Figure 8 presents a simplified implementation of the third strategy. The implementation of thisclass, ValueBoundedIntMultiarray1D, ensures that its objects can store only elements greater-than-or-equal-to zero and less-than-or-equal-to a parameter. This parameter, upperBound, ispassed in to the constructor and cannot be modified thereafter.

The implementation of the method get is the same as in the previous strategies. Theimplementation of the method set includes a test which ensures that only elements in therange [0..upperBound] are stored. The methods getMin and getMax, in contrast with previousclasses, do not return the actual minimum and maximum stored elements, but lower (zero)and upper bounds.


14 LUJAN ET AL.

public final class MutableImmutableStateIntMultiarray1Dextends IntIndirectionMultiarray1D {

private Thread reader;private final int array[];private final int length;private int min;private int max;private boolean isMutable;

public MutableImmutableStateIntMultiarray1D ( int length ) {this.length = length;array = new int [length];reader = null;isMutable = true;min = 0; max = 0;

}public int get ( int i ) { return array[i]; }public synchronized void set ( int i , int value ) {

while ( !isMutable ){try{ wait(); }catch (InterruptedException e) {

throw new MultiarrayUncheckedException();}

}array[i] = value;if ( min > value ) min = value;if ( max < value ) max = value;

}public int length () { return length; }public synchronized int getMin () { return min; }public synchronized int getMax () { return max; }public synchronized void passToImmutable () {

while ( !isMutable ) {try { wait(); }catch (InterruptedException e) {

throw new MultiarrayUncheckedException();}

}isMutable = false; reader = Thread.currentThread();

}public synchronized void returnToMutable () {

if ( reader == Thread.currentThread() ) {reader = null; isMutable = true; notify();

}else throw new MultiarrayUncheckedException();

}}

Figure 7. Simplified implementation of class MutableImmutableStateIntMultiarray1D.



public final class ValueBoundedIntMultiarray1D extendsIntIndirectionMultiarray1D {

private final int array[];private final int length;private final int upperBound;private final int lowerBound = 0;

public ValueBoundedIntMultiarray1D ( int length,int upperBound ) {

this.length = length;this.upperBound = upperBound;array = new int [length];

}public int get ( int i ) { return array[i]; }public void set ( int i , int value ) {

if ( value >= lowerBound && value <= upperBound )array[i] = value;

else throw new MultiarrayUncheckedException();}public int length () { return length; }public int getMin () { return lowerBound; }public int getMax () { return upperBound; }

}

Figure 8. Simplified implementation of class ValueBoundedIntMultiarray1D.

The tests that JVMs need to perform to extract the class invariant include the escape analysisfor the instance variables of the class (described in Section 4.1). In addition, as mentioned inthe introduction to Section 4 with respect to the common part of the invariant, JVMs computethe lower and upper bound using data flow analysis restricted to the instance variables. Aswith previous classes, JVMs could also recognise the class and produce the invariant withoutperforming any test.

4.4. Usage of the Classes

This section revisits the matrix-vector multiplication kernel where the matrix is sparse andstored in COO format (see Figure 1) in order to illustrate how the three different classescan be used. Figure 9 presents the same kernel but includes conditions that follow therecommendations of the Ninja group [35, 36, 37] to facilitate loop versioning (the statementsinside generate neither NullPointerExceptions nor ArrayIndexOutOfBoundsExceptions anddo not use synchronization mechanisms). This new implementation checks that the parametersare not null and that the accesses to indx and jndx are within the bounds. These checks aremade prior to execution of the for-loop. If the checks fail, then none of the statements in the


16 LUJAN ET AL.

public class ExampleSparseBLASWithNinja {// y = A*x + ypublic static void mvmCOO ( int indx[], int jndx[], double value[],

double y[], double x[] ) throws SparseBLASException {if ( indx != null && jndx != null && value != null &&

y != null && x != null && indx.length >= value.length &&jndx.length >= value.length ) {

for (int k = 0; k < value.length; k++)y[ indx[k] ] += value[k] * x[ jndx[k] ];

}else throw new SparseBLASException();

}}

Figure 9. Sparse matrix-vector multiplication using coordinate storage format and Ninja group’srecommendations.

int fib[] = {1, 1, 2, 3, 5, 8};int aux[] = fib;aux[0] = -8;System.out.println("Are they aliases?: " + fib[0] == -8 );// Output: Are they aliases?: true

Figure 10. An example of array aliases.

for-loop is executed and the method terminates by throwing an exception. For the sake ofclarity, the implementation omits the checks to generate loops free of aliases (for example, thevariables fib and aux are aliases of the same array according to their declaration in Figure 10.)

Note that checks to ensure that accesses to y and x are within bounds will require traversalof complete local copies of the arrays indx and jndx. Local copies of the arrays are necessary,since both indx and jndx escape the scope of this method. This makes it possible, for example,for another thread to modify their contents after the checks but before execution of the for-loop.The creation of local copies is a memory inefficient approach and the overhead of copying andchecking element-by-element for the maximum and minimum is similar to (or greater than)the overhead of the explicit checks.

Figure 11 presents the implementations of mvmCOO using the three classes described inprevious sections. In the interest of brevity, the checks recommended by IBM’s Ninja group arereplaced with a comment. The implementations of mvmCOO for ImmutableIntMultiarray1Dand ValueBoundedIntMultiarray1D are identical; they contain the same statements but



have different method signatures. The implementation for MutableImmutableStateInt-Multiarray1D builds on the previous implementations, but it first needs to ensure that theobjects indx and jndx are in immutable state. The implementation also requires that theseobjects are returned to the mutable state upon completion or abnormal interruption.

5. Experiments

This section reports two sets of experiments. The first set determines the overhead of arraybounds checks for a typical CS&E kernel with array indirection. The aim is to find anexperimental lower bound for the performance improvement that can be achieved when arraybounds checks are eliminated. This is a lower bound because array bounds checks, together withJava’s strict exception model, are inhibitors of other optimising transformations [35, 36, 37]that can further improve the performance. The second set of experiments, also for the sameCS&E kernel, determines the overhead of using the classes proposed in Section 4 rather thanaccessing Java arrays directly. In this second set of experiments we use off-the-shelf Sun JVMsthat are unaware of, and thus unable to exploit, the work presented in this paper.

The CS&E kernel considered is the solution of systems of linear equations using iterativesolvers [16]. The experiments focus on the matrix-vector multiplication. Specifically, matrix-vector multiplication is used where the matrix is sparse and in COO format (mvmCOO). Notethat this is the only proposed benchmark for CS&E which addresses array indirection (seethe JGF Benchmark and Scimark). The iterative nature of the solver implies that the matrix-vector multiplication operation is executed repeatedly until sufficient accuracy is achieved or anupper bound on the number of iterations is reached. The experiments implement an iterativemethod that executes mvmCOO 100 times.

Results are reported for four different implementations for mvmCOO (see Figures 9 and 11).The four implementations of mvmCOO are derived using:

1. only Java arrays (JA implementation);2. objects of class ImmutableIntMultiarray1D (I-MA implementation);3. objects of class MutableImmutableStateIntMultiarray1D (MI-MA implementation);

and4. objects of class ValueBoundedIntMultiarray1D (VB-MA implementation).

The experiments consider three different square sparse matrices from the Matrix Marketcollection [8] (see Figure 12). The three matrices are: utm5940 (size 5940 with 83842 nonzeroelements), s3rmt3m3 (symmetric and size 5357 with 106526 nonzero entries), and s3dkt3m2(symmetric and size 90449 with 1921955 nonzero entries). The implementations do not takeadvantage of symmetry and, thus, s2rmt3m3 and s3dkt3m2 are stored and operated on usingall their nonzero elements. The vector x (see Figures 9 and 11) is initialised with randomnumbers from a uniform distribution with values ranging between 0 and 5, and the vector yis initialised with zeros.

The experiments are performed on a Pentium III at 1 GHz with 256 MB running Windows2000 service pack 2, Sun Java 2 SDK 1.3.1 04 Standard Edition, Sun Java 2 SDK 1.4.0 Standard


18 LUJAN ET AL.

public class KernelSparseBLAS {// y = A*x + ypublic static void mvmCOO ( ImmutableIntMultiarray1D indx,

ImmutableIntMultiarray1D jndx, double value[], double y[],double x[] ) throws SparseBLASException {

if ( /* checks */ ) {for (int k = 0; k < value.length; k++)

y[ indx.get( k ) ] += value[k] * x[ jndx.get( k ) ];}else throw new SparseBLASException();

}// y = A*x + ypublic static void mvmCOO ( MutableImmutableStateIntMultiarray1D indx,

MutableImmutableStateIntMultiarray1D jndx, double value[],double y[], double x[] ) throws SparseBLASException {

if ( indx != null && jndx != null ) {indx.passToImmutable();if ( indx != jndx ) jndx.passToImmutable();if ( /* checks */ ) {

for (int k = 0; k < value.length; k++)y[ indx.get( k ) ] += value[k] * x[ jndx.get( k ) ];

}else {

indx.passToMutable();if ( indx != jndx ) jndx.returnToMutable();throw new SparseBLASException();

}indx.returnToMutable();if ( indx != jndx ) jndx.returnToMutable();

}else throw new SparseBLASException();

}// y = A*x + ypublic static void mvmCOO ( ValueBoundedIntMultiarray1D indx,

ValueBoundedIntMultiarray1D jndx, double value[],double y[], double x[] ) throws SparseBLASException {

if ( /* checks */ ) {for (int k = 0; k < value.length; k++)

y[ indx.get( k ) ] += value[k] * x[ jndx.get( k ) ];}else throw new SparseBLASException();

}}

Figure 11. Sparse matrix-vector multiplication using coordinate storage format, and the classesdescribed in Figures 4, 7 and 8.



utm5940

s3rmt3m3

s3dkt3m2

Figure 12. Graphical representations for the sparse matrices used in the experiments.

Edition and Jove∗∗ version 2.0 associated with the Sun Java 2 Runtime Environment 1.3.0Standard Edition. For both set of the experiments the programs are compiled with the flag -Oand the Sun JVMs are executed with the flags -server -Xms128Mb -Xmx256Mb.

For the first set of experiments Jove is used with the following two configurations. Thefirst configuration, hereafter referred to as Jove, creates an executable Windows program

∗∗Jove is a static optimising native compiler for Java. Web page http://www.instantiations.com/jove/


20 LUJAN ET AL.

Table I. Times in seconds for the JA implementation of mvmCOO.

using default optimisation parameters and optimization level 1 (the lowest level). The secondconfiguration, hereafter referred to as Jove no ABC, is the same configuration plus a flag thateliminates every array bounds check.

The second set of the experiments are performed on a Sun Ultra-5 at 333Mhz with 256MBof memory and Solaris 5.8, and on a Pentium III 550Mhz with 128MB of memory and LinuxRed Hat release 7.2 kernel 2.4.9-34. Both machines run off-the-shelf Sun Java 2 SDK 1.3.1 04Standard Edition and Sun Java 2 SDK 1.4.0 Standard Edition, but not Jove which is availableonly on Windows. Hereafter the machines are referred to by their operating system names.

Each experiment is run 20 times and the results shown are the minimum execution time inseconds. The timers are accurate to the millisecond on Solaris and Linux, and to 10 millisecondson Windows. For each run, the total execution time of the 100 executions of mvmCOO is recorded.

Table I presents the execution times on Windows for the JA implementation compiled withthe two configurations described for the Jove compiler. The row Overhead ABC presents theoverhead induced by array bounds checks. This overhead is between 8.43 and 9.99 percent ofthe execution time. The row Overhead ABC-Ind refines the previous overhead to the specificoverhead induced by the array bounds checks in the presence of indirection (3 out of 7).

For the second set of experiments (i.e. four implementations of mvmCOO using off-the-shelfJVMs), Tables II and III present the execution times, and the associated performance resultsexpressed as percentages with respect to JA, on the machines with the JVMs versions 1.4.0and 1.3.1 04, respectively. Among the I-MA, IM-MA and VB-MA implementations, the VB-MA implementation is the fastest in 9 experiments, while the I-MA implementation and theIM-MA implementation are the fastest in 8 and 4 experiments, respectively. (Note that forthese counts more than one timing result can be the fastest and that two timings that differonly to the accuracy of the timer are considered to be the same.) Consider only the results inTable II (i.e. JVM 1.4.0), then the I-MA implementation is the fastest on 6 ocasions while theVB-MA implementation and the IM-MA implementation are the fastest on 2 and 3 occasions,



respectively. On the other hand, consider only the results in Table III (i.e. JVM 1.3.1 04), thenthe VB-MA implementation is the fastest on 7 occasions while the I-MA implementation andthe IM-MA implementation are the fastest on 2 and 1 occasions, respectively. When comparingthe results of one table with the other, Table III (i.e. JVM 1.3.1 04) presents results that aregenerally faster than the equivalent results in Table II (i.e. JVM 1.4.0). On Windows theresults present a set of odd cases where the three implementations with the multi-dimensionalarray package are faster than the implementation using only Java arrays (JA). This also occursin Table III for the IM-MA implementation with s3dkt3m2 on Solaris, and for the VB-MAimplementation with s3dkt3m2 on Linux.

No definitive conclusions can be drawn from these results other than the followingobservations:

• in most cases the I-MA and VB-MA implementations offer better performance than theIM-MA implementation;

• the performance difference between the I-MA and VB-MA implementations favours thesecond implementation on JVM 1.3.1 04 and favours the first on JVM 1.4.0;

• in most cases JVM 1.3.1 04 produces faster times than JVM 1.4.0; and• the overhead of using classes rather than Java arrays varies for each implementation

(I-MA, VB-MA, IM-MA), JVM, machine architecture and matrix, but it is an averageof 8 percent of the execution time.

Table IV takes a closer look at the execution times of the JA and VB-MA implementations onWindows with JVM version 1.4.0 and combines them with semantic expansion [46, 34, 35, 36].Experiments with semantic expansion reported in [34, 36] indicate that, for scientific codeswith real number arithmetic, benchmarks using a multi-dimensional array package run between2.5 and 5 times faster than the same benchmarks using Java arrays. In other words, semanticexpansion removes the overhead of using a multi-dimensional array package instead of Javaarrays and enables other optimisations that further improve performance. The row VB-MA-SE presents an estimate (based on the above arguments) of the times for the VB-MAimplementation after applying semantic expansion. The row VB-MA-SE no-ABC-Ind presentsan estimate (based on the results of Table I) of the times for the VB-MA-SE after also removingarray bounds checks in the presence of indirection. Finally, the row Perf. on the right hand sidepresents an estimate of the percentage of performance improvement due to the elimination ofarray bounds checks in the presence of indirection.

6. Discussion

The following paragraphs try to determine whether one of the classes from Section 4 is the“best”, or whether the multi-dimensional array package needs more than one class.

Consider first an object indx of class MutableImmutableStateIntMultiarray1D. In orderto obtain the benefit of array bounds checks elimination when using indx as an indirectionarray, programs need to follow these steps:

1. change the state of indx to immutable;


22 LUJAN ET AL.

Table II. Average times in seconds for the four different implementations of mvmCOO.



Table III. Average times in seconds for the four different implementations of mvmCOO.


24 LUJAN ET AL.

Table IV. Comparison between JA and VB-MA considering semantic expansion.

2. execute any other actions (normally inside loops) that access the elements stored in indx;and

3. change the state of indx back to mutable.

An example of these steps is the implementation of matrix-vector multiplication using classMutableImmutableStateIntMultiarray1D (see Figure 11).

If the third step is omitted, possibly accidentally, the indirection array indx becomes uselessfor the purpose of eliminating array bounds checks. Other threads (or a thread that abandonedexecution without adequate clean up) would be left waiting indefinitely for notification thatindx has returned to the mutable state.

Another problem can arise when several threads are executing in parallel and at least twothreads need indx and jndx (another object of the same class) at the same time. Dependingon the order of execution, or if both are aliases of the same object, a deadlock can occur.

One might think that these problems could be overcome by modifying the implementationof the class so that it maintains a complete list of thread readers instead of just one threadreader. However, omission of the third step then leads to starvation of writers.

Given that these problems are not inherent in the other two classes, and that the experimentsof Section 5 do not show a performance benefit in favour of class MutableImmutableState-IntMultiarray1D, the decision is to disregard this class.

The discussion now considers scenarios in CS&E applications where the two remainingclasses can be used. The functionalities of classes ImmutableIntMultiarray1D and Value-BoundedIntMultiarray1D are contrasted with regard to the requirements of such applications.

Remember that the two classes have the same public interface, but provide differentfunctionality. As the name suggests, class ImmutableIntMultiarray1D implements the methodset without modifying any instance variable; it simply throws an unchecked exception(see Figure 4). In contrast, class ValueBoundedIntMultiarray1D provides the expected



functionality for this method as long as the parameter value of the method is positive andless than or equal to the instance variable upperBound (see Figure 8).

In CS&E scenarios, such as LU-factorisation of dense and banded matrices with pivoting,and Gauss Elimination for sparse matrices with fill-in, applications need to update the contentof indirection arrays [16, 21]. For example, fill-in means that new nonzero elements are createdwhere zero elements existed previously. Thus, the Gauss Elimination algorithm progressivelycreates new nonzero matrix elements. Assuming that the matrix is stored in COO format,the indirection arrays indx and jndx need to be updated with the row and column indices,respectively, for each new nonzero element.

Given that, in other CS&E applications, indirection arrays remain unaltered afterinitialisation, the only reason for including class ImmutableIntMultiarray1D would beperformance. However the performance evaluation reveals that there is no significanceperformance advantage. Therefore, the conclusion is to incorporate only the class Value-BoundedIntMultiarray1D into the multi-dimensional array package.

Note that for existing Java CS&E codes there is a migration cost associated with modifyingthem to use a multi-dimensional array package. This cost needs to be weighted up against theperformance improvement of eliminating array bounds checks in the presence of indirectionand further improvements that are allowed by guarantying exception free sections of the code.Obviously, the migration cost does not exist when considering new Java CS&E codes and,until now, few Java CS&E codes have been developed.

7. Conclusions

Array indirection is ubiquitous in CS&E applications. With the specification of array accessesin Java and with current techniques for eliminating array bounds checks, applications witharray indirection suffer the overhead of explicitly checking each access made through anindirection array.

Building on previous work by IBM’s Ninja and the ABCD algorithm [5], three new strategiesto eliminate array bounds checks in the presence of indirection have been presented. Eachstrategy is implemented as a Java class that can replace indirection arrays of type int. Theaim of the three strategies is to provide extra information to JVMs so that array bounds checkscan be removed even in the presence of indirection. For normal Java arrays of type int thisextra information would require access to the whole application (i.e. no dynamic loading) orbe heavyweight. The algorithm to remove the array bounds checks is a combination of loopversioning (as used by the Ninja group [33, 35, 36, 37]), escape analysis [11] and constructionof constraints based on data flow analysis (ABCD algorithm).

The experiments have estimated the performance benefit of eliminating array bound checksin the presence of indirection. The benefit has been estimated at about 4 percent of theexecution time. The experiments have also evaluated the overhead of using a Java class toreplace Java arrays on off-the-shelf JVMs. This overhead varies for each class, JVM, machinearchitecture and matrix, but it is an average of 8 percent of the execution time. This kindof overhead essentially can be eliminated using semantic expansion [46, 36] or a sequence ofstandard transformations as described and illustrated in [28].


26 LUJAN ET AL.

The evaluation of the three proposed strategies also includes a discussion of their advantagesand disadvantages. Overall, the third strategy, class ValueBoundedIntMultiarray1D, is thebest. It takes a different approach by not seeking immutability of objects. The number ofthreads accessing an indirection array at a given time is irrelevant as long as every element inthe indirection array is within the bounds of the arrays accessed through indirection. The classenforces that every element stored in an object of the class contains a value between zero anda given parameter. The parameter must be greater than or equal to zero, cannot be modifiedand is passed in to the constructor.

REFERENCES

1. A. Aggarwal and K. H. Randall. Related field analysis. In Proceedings of the ACM Conference onProgramming Language Design and Implementation – PLDI’01, pages 214–20, 2001.

2. B. Alpern, C. R. Attanasio, J. J. Barton, M. G. Burke, P. Cheng, J.-D. Choi, A. Cocchi, S. J. Fink,D. Grove, M. Hind, S. F. Hummel, D. Lieber, V. Litvinov, M. F. Mergen, T. Ngo, J. R. Russell, V. Sarkar,M. J. Serrano, J. C. Shepherd, S. E. Smith, V. C. Sreedhar, H. Srinivassan, and J. Whaley. The Jalapenovirtual machine. IBM Systems Journal, 39(1):211–238, 2000.

3. J. M. Asuru. Optimization of array subscript range. ACM Letters on Programming Languages andSystems, 1(2):109–118, 1992.

4. B. Blanchet. Escape analysis for object oriented languages. Application to Java. In Proceedings of theACM Conference on Object-Oriented Programming, Systems, Languages, and Application – OOPSLA’99,pages 20–34, 1999.

5. R. Bodik, R. Gupta, and V. Sarkar. ABCD: Eliminating array bounds checks on demand. In Proceedingsof the ACM Conference on Programming Language Design and Implementation – PLDI 2000, pages321–333, 2000.

6. J. Bogda and U. Holzle. Removing unnecessary synchronization. In Proceedings of the ACM Conferenceon Object-Oriented Programming, Systems, Languages, and Application – OOPSLA’99, pages 35–46,1999.

7. R. F. Boisvert, J. E. Moreira, M. Philippsen, and R. Pozo. Java and numerical computing. IEEEComputing in Science and Engineering, 3(2):18–24, 2001.

8. R. F. Boisvert, R. Pozo, K. Remington, R. Barrett, and J. J. Dongarra. The Matrix Market: A webresource for test matrix collections. In Quality of Numerical Software, Assessment and Enhancement,pages 125–137. Chapman & Hall, 1997. Matrix Market http://math.nist.gov/MatrixMarket/.

9. J. M. Bull, L. A. Smith, L. Pottage, and R. Freeman. Benchmarking Java against C and Fortran forscientific applications. In Proceedings of the ACM 2001 Java Grande/ISCOPE Conference, pages 97–105, 2001.

10. W.-N. Chin and E.-K. Goh. A reexamination of “optimization of array subscript range checks”. ACMTransactions on Programming Languages and Systems, 17(2):217–227, 1996.

11. J.-D. Choi, M. Gupta, M. Serrano, B. C. Sreedhar, and S. Midkiff. Escape analysis for Java. In Proceedingsof the ACM Conference on Object-Oriented Programming, Systems, Languages, and Applications –OOPSLA’99, pages 1–19, 1999.

12. M. Cierniak, G.-Y. Lueh, and J. M. Stichnoth. Practicing JUDO: Java under dynamic optimizations.In Proceedings of the ACM Conference on Programming Language Design and Implementation – PLDI2000, pages 13–26, 2000.

13. R. Fitzgerald, T. B. Knoblock, E. Ruf, B. Steensgaard, and D. Tarditi. Marmot: an optimizing compilerfor Java. Sofware – Practice and Experience, 30(3):199–232, 2000.

14. E. Gamma, R. Helm, R. Johnson, and J. Vlissides. Design Patterns: Elements of Reusable Object OrientedSoftware. Addison-Wesley, 1995. ISBN: 0201633612.

15. S. Ghemawat, K. H. Randall, and D. J. Scales. Field analysis: Getting useful and low-costinterprocedural information. In Proceedings of the ACM Conference on Programming Language Designand Implementation – PLDI’00, pages 334–344, 2000.

16. G. H. Golub and C. F. van Loan. Matrix Computations. John Hopkins University Press, 3rd edition,1996. ISBN: 0801854148.



17. L. Gong. Java 2 Platform Security Architecture. Sun Microsystems, 1998. Available athttp://java.sun.com/j2se/1.3/docs/.

18. J. Gosling, B. Joy, G. Steele, and G. Bracha. The Java Language Specification. The Java Series. Addison-Wesley, 2nd edition, 2000. ISBN: 0201310082.

19. R. Gupta. A fresh look at optimizing array bound checking. In Proceedings of the ACM Conference onProgramming Language Design and Implementation – PLDI’90, pages 272–282, 1990.

20. R. Gupta. Optimizing array bound checks using flow analysis. ACM Letters on Programming Languagesand Systems, 2(1–4):135–150, 1993.

21. N. J. Higham. Accuracy and Stability of Numerical Algorithms. SIAM Publications, 2nd edition, 2002.ISBN: 0898715210.

22. Java Grande Forum. Making Java Work for High-End Computing, November 1998. Available athttp://www.javagrande.org/reports.htm.

23. Java Grande Forum. Interim Java Grande Forum Report, June 1999. Available athttp://www.javagrande.org/reports.htm.

24. R. L. Johnston. Numerical Methods: A Sofware Approach. John Wiley and Sons, 1982. ISBN: 0471866644.25. I. H. Kazi, H. H. Chen, B. Stanley, and D. Lilja. Techniques for obtaining high performance in Java

programs. ACM Computing Surveys, 32(3):213–240, 2000.26. P. Kolte and M. Wolfe. Elimination of redundant array subscript range checks. In Proceedings of the ACM

Conference on Programming Language Design and Implementation – PLDI’95, pages 270–278, 1995.27. D. Lea. Concurrent Programming in Java: Design Principles and Patterns. The Java Series. Addison-

Wesley, 2nd edition, 1999. ISBN: 0201310090.28. M. Lujan. OoLaLa – From Numerical Linear Algebra to Compiler Technology for Design Patterns. PhD

thesis, Department of Computer Science, University of Manchester, 2002.29. M. Lujan, T. L. Freeman, and J. R. Gurd. OoLaLa: an object oriented analysis and design of numerical

linear algebra. In Proceedings of the ACM Conference on Object-Oriented Programming, Systems,Languages, and Applications – OOPSLA’00, pages 229–252, 2000.

30. M. Lujan, J. R. Gurd, and T. L. Freeman. OoLaLa: Transformations for implementations of matrixoperations at high abstraction levels. In Proceedings of the 4th Workshop on Parallel Object-OrientedScientific Computing – POOSC’01, 2001.

31. V. Markstein, J. Cocke, and P. Markstein. Optimization of range checking. In Proceedings of a Symposiumon Compiler Optimization, pages 114–119, 1982.

32. B. Meyer. Object Oriented Software Construction. Prentice Hall, 2nd edition, 1997. ISBN: 0136291554.33. S. P. Midkiff, J. E. Moreira, and M. Snir. Optimizing array reference checking in Java programs. IBM

Systems Journal, 37(3):409–453, 1998.34. J. E. Moreira, S. P. Midkiff, and M. Gupta. A standard Java array package for technical computing. In

Proceedings of the Ninth SIAM Conference on Parallel Processing for Scientific Computing, 1999.35. J. E. Moreira, S. P. Midkiff, and M. Gupta. From flop to megaflops: Java for technical computing. ACM

Transactions on Programming Languages and Systems, 22(2):265–295, 2000.36. J. E. Moreira, S. P. Midkiff, M. Gupta, P. V. Artigas, M. Snir, and R. D. Lawrence. Java programming

for high-performance numerical computing. IBM Systems Journal, 39(1):21–56, 2000.37. J. E. Moreira, S. P. Midkiff, M. Gupta, P. V. Artigas, P. Wu, and G. Almasi. The Ninja project.

Communications of the ACM, 44(10):102–109, 2001.38. G. C. Necula and P. Lee. The design and implementation of a certifying compiler. In Proceedings of

the ACM Conference on Programming Language Design and Implementation – PLDI’98, pages 333–344,1998.

39. M. Paleczny, C. Vick, and C. Click. The Java Hotspot server compiler. In Proceedings of the USENIXSymposium on Java Virtual Machine Research and Technology, 2001.

40. I. Pechtchanski and V. Sarkar. Immutability specification and its applications. In Proceedings of the 2002joint ACM-ISCOPE conference on Java Grande, pages 202–211, 2002.

41. M. J. Serrano, R. Bordawekar, S. P. Midkiff, and M. Gupta. Quicksilver: a quasi-static compiler forJava. In Proceedings of the ACM Conference on Object-Oriented Programming, Systems, Languages,and Application – OOPSLA’00, pages 66–82, 2000.

42. T. Suganuma, T. Ogasawara, M. Takeuchi, T. Yasue, M. Kawahito, K. Ishizaki, H. Komatsu, andT. Nakatani. Overview of the IBM Java just-in-time compiler. IBM Systems Journal, 39(1):175–193,2000.

43. N. Suzuki and K. Ishihata. Implementation of an array bound checker. In Conference Record of the ACMSymposium of Principles of Programming Languages, pages 132–143, 1977.


28 LUJAN ET AL.

44. G. K. Thiruvathukal. Java at middle age: Enabling Java for computational science. IEEE Computing inScience and Engineering, 4(1):74–84, 2002.

45. J. Whaley and M. Rinard. Compositional pointer and escape analysis for Java programs. In Proceedingsof the ACM Conference on Object-Oriented Programming, Systems, Languages, and Application –OOPSLA’99, pages 187–206, 1999.

46. P. Wu, S. P. Midkiff, J. E. Moreira, and M. Gupta. Improving Java performance through semantic inlining.Technical Report 21313, IBM Research Division, 1998.

47. Z. Xu, B. P. Miller, and T. Reps. Safety checking of machine code. In Proceedings of the ACM Conferenceon Programming Language Design and Implementation – PLDI’00, pages 70–82, 2000.


Documents

Elimination of Java Array Bounds Checks in the Presence of