Thesis: A Static Slicing Tool for sequential Java programs

A Static Slicing Tool for Sequential Java Programs

A Thesis

Submitted For the Degree of

Master of Science (Engineering)

in the Faculty of Engineering

by

Arvind Devaraj

Computer Science and Automation

Indian Institute of Science

BANGALORE – 560 012

March 2007

i

Abstract

A program slice consists of a subset of the statements of a program that can potentially

affect values computed at some point of interest. Such a point of interest along with a set

of variables is called a slicing criterion. Slicing tools are useful for several applications,

such as program understanding, testing, program integration, and so forth. Slicing object

oriented programs has some special problems, that need to be addressed due to features

like inheritance, polymorphism and dynamic binding. Alias analysis is important for

precision of slices. In this thesis we implement a slicing tool for sequential Java programs

in the SOOT framework. SOOT is a front-end for Java developed at McGill University

and it provides several forms of intermediate code. We have integrated the slicer into

the framework. We also propose an improved technique for intraprocedural points-to

analysis. We have implemented this technique and compare the results of the analysis

with those for a flow-insensitive scheme in SOOT. Performance results of the slicer are

reported for several benchmarks.

ii

Contents

Abstract ii

1 Introduction 11.1 Slicing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.2 The SOOT Framework . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51.3 Contributions of the thesis . . . . . . . . . . . . . . . . . . . . . . . . . . 5

2 Slicing 72.1 Intraprocedural Slicing using PDG . . . . . . . . . . . . . . . . . . . . . 7

2.1.1 Program Dependence Graph . . . . . . . . . . . . . . . . . . . . . 82.1.2 Slicing using the Program Dependence Graph . . . . . . . . . . . 82.1.3 Construction of the Data Dependence Graph . . . . . . . . . . . . 92.1.4 Control Dependence Graph . . . . . . . . . . . . . . . . . . . . . 112.1.5 Slicing in presence of unstructured control flow . . . . . . . . . . . 142.1.6 Reconstructing CFG from the sliced PDG . . . . . . . . . . . . . 17

2.2 Interprocedural Slicing using SDG . . . . . . . . . . . . . . . . . . . . . . 182.2.1 System Dependence Graph . . . . . . . . . . . . . . . . . . . . . . 182.2.2 Calling context problem . . . . . . . . . . . . . . . . . . . . . . . 202.2.3 Computing Summary Edges . . . . . . . . . . . . . . . . . . . . . 212.2.4 The Two Phase Slicing Algorithm . . . . . . . . . . . . . . . . . 212.2.5 Handling Shared Variables . . . . . . . . . . . . . . . . . . . . . . 23

2.3 Slicing Object Oriented Programs . . . . . . . . . . . . . . . . . . . . . . 262.3.1 Dependence Graph for Object Oriented Programs . . . . . . . . . 262.3.2 Handling Inheritance . . . . . . . . . . . . . . . . . . . . . . . . . 312.3.3 Handling Polymorphism . . . . . . . . . . . . . . . . . . . . . . . 342.3.4 Case Study - Elevator Class and its Dependence Graph . . . . . . 35

3 Points to Analysis 383.1 Need for Points to Analysis . . . . . . . . . . . . . . . . . . . . . . . . . 383.2 Pointer Analysis using Constraints . . . . . . . . . . . . . . . . . . . . . 393.3 Dimensions of Precision . . . . . . . . . . . . . . . . . . . . . . . . . . . 413.4 Andersen’s Algorithm for C . . . . . . . . . . . . . . . . . . . . . . . . . 443.5 Andersen’s Algorithm for Java . . . . . . . . . . . . . . . . . . . . . . . . 45

3.5.1 Model for references and heap objects . . . . . . . . . . . . . . . . 45

iii

CONTENTS iv

3.5.2 Computation of points to sets in SPARK . . . . . . . . . . . . . 473.6 CallGraph Construction . . . . . . . . . . . . . . . . . . . . . . . . . . . 48

3.6.1 Handling Virtual Methods . . . . . . . . . . . . . . . . . . . . . . 493.7 Improvements to Points to Analysis . . . . . . . . . . . . . . . . . . . . . 503.8 Improving Flow Sensitivity . . . . . . . . . . . . . . . . . . . . . . . . . . 51

3.8.1 Computing Valid Subgraph at each Program Point . . . . . . . . 533.8.2 Computation of Access Expressions . . . . . . . . . . . . . . . . 553.8.3 Checking for Satisfiability . . . . . . . . . . . . . . . . . . . . . . 60

4 Implementation and Experimental Results 624.1 Soot-A bytecode analysis framework . . . . . . . . . . . . . . . . . . . . 624.2 Steps in performing slicing in Soot . . . . . . . . . . . . . . . . . . . . . 654.3 Points to Analysis and Call Graph . . . . . . . . . . . . . . . . . . . . . 654.4 Computing Required Classes . . . . . . . . . . . . . . . . . . . . . . . . . 674.5 Side effect computation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 684.6 Preprocessing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 694.7 Computing the Class Dependence Graph . . . . . . . . . . . . . . . . . . 704.8 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71

5 Conclusion and Future Work 75

Bibliography 77

List of Tables

3.1 Constraints for C . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 443.2 Constraints for Java . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 473.3 Data flow equations for computing valid edges . . . . . . . . . . . . . . . 533.4 Computation of Valid edges . . . . . . . . . . . . . . . . . . . . . . . . . 54

4.1 Benchmarks Description . . . . . . . . . . . . . . . . . . . . . . . . . . . 724.2 Number of Edges in the Class Dependence Graph . . . . . . . . . . . . . 724.3 Timing Requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . 724.4 Program Statistics - Partial Flow Sensitive . . . . . . . . . . . . . . . . . 734.5 Precision Comparison . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73

v

List of Figures

1.1 A program and its slice . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

2.1 A Control Flow Graph . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122.2 Post Dominator Tree for the CFG in Figure 2.1 . . . . . . . . . . . . . . 122.3 Dominance Frontiers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132.4 A program and its PDG (taken from [39]) . . . . . . . . . . . . . . . . . 152.5 Augmented CFG and PDG for the program in Figure 2.4 (taken from [39]) 162.6 A program with function calls . . . . . . . . . . . . . . . . . . . . . . . . 182.7 System Dependence Graph for an interprocedural program . . . . . . . . 192.8 Slicing the System Dependence Graph . . . . . . . . . . . . . . . . . . . 242.9 Program . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 282.10 The Dependence Graph for the main function (from [67]) . . . . . . . . 292.11 The Dependence Graphs for functions C() and D() (from [67]) . . . . . 292.12 Interface Dependence Graph (from [58]) . . . . . . . . . . . . . . . . . . 332.13 The Elevator program . . . . . . . . . . . . . . . . . . . . . . . . . . . . 362.14 Dependence Graph for Elevator program . . . . . . . . . . . . . . . . . . 37

3.1 Need for Points to Analysis . . . . . . . . . . . . . . . . . . . . . . . . . 393.2 Points to Graphs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 403.3 Imprecision due to context insensitive analysis . . . . . . . . . . . . . . . 433.4 Object Flow Graph . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 533.5 An example program . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 543.6 Access Expressions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 543.7 OFG Subgraph . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 563.8 Access Expressions(for a DAG) . . . . . . . . . . . . . . . . . . . . . . . 583.9 Access Expressions (for general graph) . . . . . . . . . . . . . . . . . . . 603.10 Simplified Access Expressions . . . . . . . . . . . . . . . . . . . . . . . . 603.11 Dominator Tree . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60

4.1 Soot Framework Overview . . . . . . . . . . . . . . . . . . . . . . . . . . 644.2 Computation of the class dependence graph . . . . . . . . . . . . . . . . 664.3 Jimple code and its slice . . . . . . . . . . . . . . . . . . . . . . . . . . . 74

vi

Chapter 1

Introduction

1.1 Slicing

A program slice consists of the parts of a program that can potentially affect the value of

variables computed at some point of interest. Such a point is called the slicing criterion

and is specified by a pair (program point,set of variables).The original concept of a

program slice was proposed by Mark Weiser [61]. According to his definition

A slice s of program p is a subset of the statements of p that retains some

specified behavior of p. The desired behavior is detailed by means of a slicing

criterion c. Generally, a slicing criterion c is a set of variables V and a

program point l. When the slice s is executed, it must always have the same

values as program p for the variables in V at point l.

Weiser claimed that a program slice was the abstraction that users had in mind as

they debugged programs. There have been variations in the definitions of program slices

depending on the application in mind. Weiser’s original definition required a slice S of

a program to be an executable subset of the program, whereas another common defini-

tion defines a slice as a subset of statements that directly or indirectly affect the values

computed at the point of interest but are not necessarily an executable segment. Fig-

ure 1.1 shows a program sliced with respect to the slicing criterion ( print(product),

1

Chapter 1. Introduction 2

read(n);

i = 1;

sum = 0;

product = 1;

while (i<=n) {

sum = sum + i;

product = product * i;

i = i + 1;

}

print(sum);

print(product);

read(n);

i = 1;

product = 1;

while (i<=n) {

product = product * i;

i = i + 1;

}

print(product);

Figure 1.1: A program and its slice

product) . Since the transformed program is expected to be much smaller than the

original it is hoped that dependencies between statements in the program will be more

explicit. Surveys on program slicing are presented in [45], [73]. Slicing tools have been

used for several applications, such as program understanding [82], testing [74] [75], pro-

gram integration [78], model checking [79] and so forth.

1. Program Understanding: Software engineers are assigned to understand a mas-

sive piece of code and modify parts of them. When modifying a program, we need

to comprehend a section of the program rather than the whole program. Backward

and forward slicing can be used to browse the code and understand the interde-

pendence between various parts of the program.

2. Testing: In the context of testing, a problem that is often encountered is that of

finding the set of program statements that are affected by a change in the program.

This analysis is termed impact analysis. To determine what tests need to be re-run

to test test a modified statement S, a backward slice on S will get the statements

that actually influence the behavior of the program.

3. Debugging: Quite often the statement that is actually responsible for a bug that

shows up at some program point P is statically far away from P . To reduce the

search space of possible causes for the error the programmer can use a backward


slice to eliminate parts of the code that could not have been the cause of the

problem.

4. Model Checking: Model checking is a verification technique that performs an

exhaustive exploration of a program’s state space. Typically the execution of a

program is simulated and path and states encountered in the simulation are checked

against correctness specifications phrased as temporal logic formula. The use of

slicing here is to reduce the size of a program P beginning checked for a property

by eliminating statements and variables that are irrelevant to the formula.

There is an essential difference between static and dynamic slices. A static slice

disregards the actual inputs to a program whereas the latter relies on a specific test case

and therefore is in general , more precise.

When slicing a program P we are concerned with both correctness as well as precision.

For correctness we demand that the slice S produced by the tool is a superset of the

actual slice S(p) for the slicing criterion p. Precision has to do with the size of the slice.

For two correct slices S1 and S2, S1 is more precise than S2 , if the statements of S1

are a subset of the statements of S2. Obtaining the most precise slice, is in general not

computable, hence our aim is to compute a correct slice that is as precise as possible.

The slicing problem can be addressed by viewing it as a reachability problem in a

Program Dependence Graph (PDG) [54]. A PDG is a directed graph with vertices cor-

responding to statements and predicates and edges corresponding to data and control

dependences. For the sequential intraprocedural case, the backward slice with respect

to a node in the PDG is the set of all nodes in the PDG on which this node is tran-

sitively dependent. Thus given the PDG, a simple reachability algorithm on the PDG

will construct the slice. However when considering interprocedural slices, the process

is more complicated as mere reachability will produce imprecise slices. One needs to

track only interprocedural realizable paths, where a realizable path corresponds to legal

call/return pairs where a procedure always returns to the call site where it was invoked.

The structure on which interprocedural slicing is generally implemented is the System

Dependence Graph [63] (SDG). This graph is a collection of graphs corresponding to


PDG’ss for individual procedures augmented with some extra edges that capture the

interaction between them. Slicing of interprocedural programs is described by Horwitz

et.al [63]. They use the SDG to track dependencies in a program and use a two phase

algorithm to ensure that only feasible paths are tracked, that is, those in which procedure

calls are matched with the correct return statements.

Slicing object oriented programs adds yet another dimension of complexity to the

slicing problem. Object-oriented concepts such as classes, objects, inheritance, poly-

morphism and dynamic binding make representation and analysis techniques used for

imperative programming languages inadequate for object-oriented programs. The Class

Dependence Graph has been introduced by Larsen and Harrold [66], which can represent

class hierarchy, data members and polymorphism. Some more features were added by

Liang and Harrold [67].

The resolution of aliases is required for the correct computation of data dependencies.

To compute the dependence graph, it is necessary to build a call graph. The computation

of call graph becomes complicated in presence of dynamic binding , i.e. when the target

of a method call depends on the runtime type of a variable. Algorithms like Rapid Type

Analysis (RTA) [26] compute call graphs using type information.

A key analysis for object oriented languages is alias analysis. The objective here is

to follow an object O from its point of allocation to find out which objects reference

O and which other objects are referenced by the fields of O Resolving aliasing becomes

important for the correct computation of data dependencies in the dependence graph.

The precision of the analysis depends on various factors like flow sensitivity, context

sensitivity and handling of field references. Andersen [64] gives a flow insensitive method

for finding aliases using subset constraints. Lhotak [70] describes the method adapted

for Java programs.

In this thesis we implement a slicing tool for sequential Java programs and integrate

it into the SOOT framework. We briefly describe the framework and the contributions

of the thesis.


1.2 The SOOT Framework

The SOOT analysis and transformation framework [69] is a Java optimization framework

developed by the Sable Research Group at McGill University and it is intended to be a

robust, easy-to-use research framework. It has been used extensively for program analy-

sis, instrumentation, and optimization. It provides several forms of intermediate code for

analyzing and optimizing Java bytecode. Jimple is a typed three address representation,

which we have used in our implementation.

Our objective is to implement a slicing tool within the Soot framework [69] and make

it publicly available. At the time this work was begun there was no publicly available

slicing infrastructure for Java. The Indus [81] project addresses the slicing problem for

Java programs and source code has been made available in February 2007.

1.3 Contributions of the thesis

The following are the contributions of this thesis:

1. We have implemented the routines for creating the program dependence graphs

and the class dependence graph for an input Java program that is represented in

the form of Jimple intermediate code.

2. We have integrated a slicer into the framework. For inter-procedural slicing we

have implemented the two-phase slicing algorithm of [63].

3. We propose an improved technique for intraprocedural points-to analysis. This uses

path expressions to track paths that encode valid points-to information. A simple

data-flow analysis formulation collects valid edges, i.e. those that are added to

the object flow graph. Reachability queries are handled in a reasonable amount of

time. We have implemented this technique and compare the results of the analysis

with those for a flow-insensitive scheme in SOOT.

4. The slicing tool has been run on several benchmarks and we report on times taken


to build the class dependence graph, its size, slice sizes for some given slicing criteria

and slicing times.

Chapter 2

Slicing

In this chapter, we discuss techniques for slicing a program and in particular issues that

arise when slicing object oriented programs. The first part of the chapter describes the

Program Dependence Graph (PDG), its construction and the algorithm for intraproce-

dural slicing. For slicing programs with function calls, the System Dependence Graph

(SDG) is used. The SDG is a collection of PDGs individual procedures with additional

edges for modeling procedure calls and parameter bindings. The second part of the

chapter describes the construction of SDG and the algorithm for interprocedural slicing.

The third part of the chapter describes dependence graph computation of object ori-

ented programs, which is complicated because objects can be passed as parameters and

methods can be invoked upon objects. Also we need the results of points to analysis to

determine what objects are pointed by each reference variable. Then we describe the ex-

tension of the algorithm for computing the dependence graph in presence of inheritance

and polymorphic function calls.

2.1 Intraprocedural Slicing using PDG

Weiser’s approach [61] to program slicing is based on dataflow equations. In his approach,

the set of relevant variables is iteratively computed till a fixed point is reached. Slicing

via graph reachability was introduced by Ottenstein [54]. In this approach a dependence

7

Chapter 2. Slicing 8

graph of the program is constructed and the problem of slicing reduces to computing

reachability on the dependence graph. We adopt this in our implementation.

2.1.1 Program Dependence Graph

A program dependence graph (PDG) represents the data and control dependencies in

the program. Nodes of PDG represent statements and predicates in a source program,

and its edges denote dependence relations. The PDG can be constructed as follows.

1. Build the program’s CFG, and use it to compute data and control dependencies:

Node N is data dependent on node M iff M defines a variable x, N uses x, and

there is an x-definition-free path in the CFG from M to N . Node N is control

dependent on node M iff M is a predicate node whose evaluation to true or false

determines whether N will be executed.

2. Build the PDG. The nodes of the PDG are almost the same as the nodes of the

CFG. However, in addition, there is a a special enter node, and a node for each

predicate. The PDG does not include the CFG’s exit node. The edges of the PDG

represent the data and control dependencies computed using the CFG.

2.1.2 Slicing using the Program Dependence Graph

To compute the slice from statement (or predicate) S, start from the PDG node that

represents S and follow the data- and control-dependence edges backwards in the PDG.

The components of the slice are all of the nodes reached in this manner.

The computation of the data dependence graph is described in Section 2.1.3. Com-

puting the control dependence graph is described in Section 2.1.4. Figure 2.4 shows an

example program and its corresponding PDG. Solid lines represent control dependencies

while dashes lines represent data dependencies.


2.1.3 Construction of the Data Dependence Graph

A data dependence graph represents the association between definitions and uses of a

variable. There is an association (d, u) between a definition of variable v at d and a use

of variable v at u iff there is at least one control flow path from d to u with no intervening

definition of v.

Each node represent a statement. An edge represents a flow dependency between

statements. Though there are many kinds of data dependencies between statements,

only flow dependencies are necessary for the purpose of slicing as only flow dependence

needs to be traced back in order to compute the PDG nodes comprising the slice. Output

and anti dependence edges do not represent true data dependence. Instead they encode

a partial order on program statements, which is necessary to preserve since there is no

explicit control flow relation between PDG nodes. However, PDG slices are normally

mapped back to high-level source code, where control flow is explicitly represented. Thus

there is no need for any such control flow information to be present in the computed

PDG slice.

Computation of flow dependencies is done by computing the problem of reaching

definitions. The problem of reaching definitions is a classical bitvector problem solvable

by monotone dataflow framework. This associates a program point with the set of

definitions reaching that point. The definitions reaching a program point along with the

use of a variable form flow dependencies.

Dependence in presence of arrays and records

In the presence of composite data types like arrays, records and pointers, the most

conservative method is to assume a definition of a variable to be the definition of the

entire composite object [83]. A definition (or use) of an element of an array can be

considered as definition (or use) of the entire array. For example, consider the statement

a[i] = x


Here the variable a is defined and variables i, x are used. Thus DEF = {a} and

REF = {i, x}. The value of a is used in computing the address of a[i] and thus a must

also be included in the REF set. The correct value for REF is {a, i, x} [45] . This

approach is conservative leading to large slices created due to spurious dependencies.

Our current implementation handles composite data types in this manner, though more

refined methods have been proposed in the literature. Agrawal et.al. [53] propose a

modified algorithm for computing reaching definitions that determines the memory loca-

tions defined and used in statements and computes whether the intersection among those

locations is complete or partial or statically indeterminable. Another method to avoid

spurious dependencies is to use array index tests like GCD tests which can determine

that there is no dependence between two array accesses expressions.

Data dependencies in presence of aliasing

When computing data dependencies the major problem occurs due to presence of aliasing,

Consider the following example. Here there is a data dependency between x.a = ... and ...

= y.a since both x and y point to the object o1. Without alias analysis this dependency

is missed because the syntactic expressions x.a and y.a are different. Thus resolving

aliases is necessary for the correct computation of data dependencies. Also if worst case

assumptions are made for field loads and stores, many spurious dependencies are created.

void fun ( ) {

obj x , y ;

x=new obj ( ) ; // o1 i s the ob j e c t c r ea ted

y=x ;

x . a = . . . . ;

. . . = y . a ;

}


P if(x>y)

S1 max = x;

else

S2 max = y;

2.1.4 Control Dependence Graph

Another kind of dependence between statements arises due to the presence of control

structure.

For example, in the above code, the execution of S1 is dependent on the predicate

x > y . Thus S1 is said to be control dependent on P. A slice with respect to S1 has to

include P, because the execution of S1 depends on the outcome of the predicate node P.

Two nodes Y and Z should be identified as having identical control conditions if in

every run of the program node, Y is executed if and only if Z is executed. In Figure

2.1, nodes 2 and 5 are said to be control dependent on the true branch of node 1,

since their execution is dependent conditionally on the outcome of node 1. The original

method for computing control dependence information using postdominators is presented

by Ferrante et.al. [47]. Cytron et.al. [46] gives an improved method for constructing

control dependence information by using dominance frontiers.

Finding control dependence using postdominators relationship

A node X is said to be a postdominator of node Y if all possible paths from Y to the exit

node must pass through X. A node N is said to be control dependent on edge a → b , if

1. N postdominates b

2. N does not postdominate a

In Figure 2.1, to find the nodes that are control dependent on edge 1 → 2, we find

nodes that postdominate node 2 but not node 1. Nodes 2 and 5 are such nodes. So

nodes 2 and 5 are control dependent on the edge 1 → 2.


This observation suggests that to find the nodes that are control dependent on the

edge X → Y , we can traverse the postdominator tree and mark all nodes that postdom-

inate Y to be control dependent on Y , we stop when we reach the postdominator of

X.

��1

��

��

��

��

��

��

��2

��

��

��

��

��3

��

��

��4

��

��

��6

��

��

��

��

��5

��

��

��7

Figure 2.1: A Control Flow Graph

��7

��

��

��

��

��

��

��5

��

��

��

��

��

��6 ��1

��2 ��4 ��3

Figure 2.2: Post Dominator Treefor the CFG in Figure 2.1

Using Dominance Frontiers to compute Control Dependence

Control dependencies between statements can be computed in an efficient manner us-

ing the dominance frontier information. Cytron et.al. [46] describes the method for

computing dominance frontiers.

A dominance frontier for vertex vi contains all vertices vj such that vi dominates an

immediate predecessor of vj , but vi does not strictly dominate vj [62]

DF (vi) = { vj | (vj ∈ V ) (∃vk ∈ Pred(vj)) ((vi dom vk) ∧ ¬(vi sdom vj)) }

Informally, the set of nodes lying just outside the dominated region of Y is said to


��S

��

��

��

��

��

��Y

��

��

��

��Z

��

��

��

��

��

�� Y ′

��

��

�� Y ′′

��

��

�� Y ′′′

��

��X

Figure 2.3: Dominance Frontiers

be in the dominance frontier of Y. In the example in Figure 2.3, Y dominates nodes

Y’,Y”,Y”’ and X lies just outside the dominated region. So X is said to be in the

dominance frontier of Y.

Note that if X is in the dominance frontier of Y , then there would be at least two

incoming paths to X of which one contains Y another not does not. If the CFG is

reversed, then we have two outgoing paths from X, one containing Y and another not

containing Y. This is same as the condition for Y to be control dependent on X. Thus

to find control dependence it is enough to find the dominance frontiers on the reverse

control flow graph. Algorithm 1 computes the control dependence information.


Algorithm 1 Algorithm to compute the Control Dependence Graph

compute dominance frontiers of reversed CFG G i.e.for all N in G do

let RDF (N) be reverse dominator frontiers of Nif RDF (N) is empty then

N is made control dependent on method entry nodeend iffor all node P in RDF (N) do

for all node S in CFG successor of P doif S = N or N postdominates S then

N is made control dependent on Pend if

end forend for

end for

2.1.5 Slicing in presence of unstructured control flow

In the presence of unstructured control flow caused due to jump statements like goto,

break, continue and return, the algorithm for slicing can produce an incorrect slice. While

Java does not have goto statements, break and continue statements cause unstructured

control flow. Consider computing slice with respect to the statement print(prod) in

Figure 2.4. When the slicing algorithm discussed in Section 2.1.2 is applied , the state-

ment break is not included, which is incorrect.

This was discovered by Choi and Ferrante [38] and by Ball and Horwitz [37] who

present a method to compute a correct slice in presence of unstructured control flow

statements. Their method to correct for such statements is based on the observation

that jumps are similar to predicate nodes in a way - both affect flow of control. Thus

jumps are also made to be sources of control dependence edges. A jump vertex has an

outgoing true edge to the target of the jump, and an outgoing false edge to the statement

that would execute if the jump were a no-op. A jump vertex is considered as a pseudo

predicate since the outgoing false edge is non-executable. The original CFG augmented

with these non-executable edges is called the Augmented Control Flow Graph (ACFG).

Kumar and Horwitz [39] describe the following algorithm for slicing in presence of

jump statements.


prod = 1

k = 1

while (k <= 10)

if (MAXINT/k > prod)

print(prod)

exit

break prod = prod * k

k++

print(k)

(a) Example Program (b) CFG

enterprod = 1;k = 1;while (k <= 10) { if (MAXINT/k > prod) break; prod = prod * k; k++;}print(k);print(prod);

enter

prod = 1 k = 1 while (k <= 10) print(k) print(prod)

(c) PDG


k++prod = prod * k

break

T

FT

F

Figure 2.4: A program and its PDG (taken from [39])


k = 1

prod = 1

enter

while (k <= 10)

print(k)


prod = prod * k

print(prod)

break k++

prod = 1

k = 1

while (k <= 10)


print(prod)

exit

break prod = prod * k

k++

print(k)

enter

FT

T FT

(a) ACFG (b) Corresponding APDG

F

Figure 2.5: Augmented CFG and PDG for the program in Figure 2.4 (taken from [39])


1. Build the program’s augmented control flow graph described previously. Labels

are treated as separate statements; i.e., each label is represented in the ACFG by

a node with one outgoing edge to the statement that it labels.

2. Build the program’s augmented PDG. Ignore the non-executable ACFG edges when

computing data-dependence edges; do not ignore them when computing control-

dependence edges. (This way, the nodes that are executed only because a jump

is present, as well as those that are not executed but would be if the jump were

removed, are control dependent on the jump node, and therefore the jump will be

included in their slices.)

3. To compute the slice from node S, follow data- and control-dependence edges back-

wards from S . A label L is included in a slice iff a statement “goto L” is in the

slice

2.1.6 Reconstructing CFG from the sliced PDG

Reconstructing the CFG from the PDG is described in in [71]. From the CFG and the

PDG slice, a sliced CFG is constructed by walking through all nodes. For each node n,

we execute the following.

1. If n is a goto statement or return statement, leave it in the slice

2. If n is a conditional statement , there are three cases

(a) If n is not in the PDG slice, it can be removed

(b) If n is in the PDG slice, but one of the branches is not, replace the jump to

that branch with a jump to the convergence node of the branch (the node

where two branches reconnect). If that node doesn’t exist , replace the jump

with a jump to the return statement of the program

(c) If n is present in the PDG slice and both branches are present leave n in the

CFG


main() {

sum=0;

i=1;

while(i<11)

{

sum=add(sum,i);

i=add(i,1);

}

print(sum);

print(i);

}

int add(int a,int b) {

result=a+b;

return result;

}

Figure 2.6: A program with function calls

3. Otherwise check if n is present in the PDG, if not remove it

We next describe the interprocedural slicing algorithm implemented in this thesis.

2.2 Interprocedural Slicing using SDG

2.2.1 System Dependence Graph

For interprocedural slicing, Horwitz et.al [63] introduce the System Dependence Graph

(SDG). A system-dependence graph is a collection of program-dependence graphs, one

for each procedure, with additional edges for modeling parameter passing. Figure 2.6

shows a program with function calls. Figure 2.7 displays its SDG.

Each PDG contains an entry node that represents entry to the procedure. To model

procedure calls and parameter passing, an SDG introduces additional nodes and edges.

Accesses to global variables are modeled via additional parameters of the procedure.

They assume parameters are passed by value-result, and introduce additional nodes in


main

sum=0 i=1 while(i<11) print(sum) print(i)

call add call add

a_in=sum

b_in=isum=r_out a_in=i

b_in=1

i=r_out

enter add

b=b_ina=a_in

r_out=resultresult=a+b

control edge parameter edge

data edge

summary edge

call edge

Figure 2.7: System Dependence Graph for an interprocedural program


the interprocedural case. The following additional nodes are introduced.

1. Call-site nodes representing the call sites.

2. Actual-in and actual-out nodes representing the input and output parameters at

the call sites. They are control dependent on the call-site node.

3. Formal-in and formal-out nodes representing the input and output parameters at

the called procedure. They are control dependent on the procedure’s entry node.

They also introduce additional edges to link the program dependence graphs together:

1. Call edges link the call-site nodes with the procedure entry nodes.

2. Parameter-in edges link the actual-in nodes with the formal-in nodes.

3. Parameter-out edges link the formal-out nodes with the actual-out nodes

2.2.2 Calling context problem

For computing an intraprocedural slice, a simple reachability algorithm on the PDG is

sufficient. However in interprocedural case, a simple reachability over the SDG doesn’t

work since not all the paths are valid. For example, in Figure 2.7, the path a in = sum →

a = a in → result = a + b → r out = result → i = r out is not valid interprocedurally.

In an interprocedural valid path, a call edge must be matched with its corresponding

return edge.

To address this problem, Horwitz et.al. [63] introduce the concept of summary edges.

These edges summarize the effect of a procedure call. There is a summary edge between

an actual in and an actual out node of a call site, if there is a dependency between the

corresponding formal in and formal out node of the called procedure. Thus a summary

edge summarizes the effect of a procedure call.


2.2.3 Computing Summary Edges

We describe computation of summary edges in Algorithm 2. The algorithm takes the

given SDG and adds summary edges. P is the set of path edges. Each edge in P of

the form (n, m) encodes the information that there is a realizable path in the SDG from

n to m. The worklist contains path edges that need to be processed. The algorithm

begins by asserting that there is a realizable path from each formal out node to itself.

The set of realizable paths P is extended by traversing backwards through dependence

edges. If during the traversal, a formal in-node is encountered, then we have a realizable

path from formal-in to formal-out node. Therefore a summary edge is added between

the actual in and actual out nodes of the corresponding call sites. Because the insertion

of summary edges makes more paths feasible, this process is continued iteratively, till no

more summary edges can be added. The algorithm for computing summary information

is displayed in Algorithm 2

Computing the summary edges is equivalent to the functional approach suggested by

Sharir and Pnueli [41].

2.2.4 The Two Phase Slicing Algorithm

Horwitz et.al [63] describe the two phase algorithm. The interprocedural backward slicing

algorithm consists of two phases. The first phase traverses backwards from the node in

the SDG that represents the slicing criterion along all edges except parameter-out edges,

and marks those nodes that are reached. The second phase traverses backwards from all

nodes marked during the first phase along all edges except call and parameter-in edges,

and marks reached nodes. The slice is the union of the marked nodes. Let s be the

slicing criterion in procedure P

1. Phase 1 identifies vertices that can reach s, and are either in P itself or in a

procedure that calls P (either directly or transitively). Because parameter out

edges are not followed, the traversal in Phase 1, does not descend into procedures


Algorithm 2 Computing Summary Information

W = ∅, W is the worklistP = ∅, P is the set of pathedgesfor all n ∈ N which is a formal out node do

W = W ∪ (n, n)P = P ∪ (n, n)

end for

while W 6= ∅, worklist is not empty doremove one element (n,m) from worklistif n is a formal in node then

for all n′ → n which is a parameter in edge dofor all m → m′ which is a parameter out edge do

if n′ and m′ belong to the same call site thenE = E ∪ n′ → m′ add a new summary edgefor all (m′, x) ∈ P do

P = P ∪ (n′, x)W = W ∪ (n′, x)

end forend if

end forend for

elsefor all n′ → n do

if (n′, m) /∈ P thenP = P ∪ (n′, m)W = W ∪ (n′, m)

end ifend for

end ifend while


called by P. Though the algorithm doesn’t descend into the called procedures, the

effects of such procedures are not ignored due to the presence of summary edges.

2. Phase 2 identifies vertices that reach s from procedures (transitively) called by P

or from procedures called by procedures that (transitively) call P. Because call

edges and parameter in edges are not followed, the traversal in phase 2 doesn’t

ascend into calling procedures; the transitive flow dependence edges from actual in

to actual out vertices make such ascents unnecessary.

We implemented a variation of the two phase slicing algorithm as described by Krinke

[49]. Figure 2.8 shows the vertices in SDG marked during phase 1 and phase 2, when

the statement print(i) is given as slicing criteria. The first phase traverses backwards

along all edges except the parameter out edge r out = result → i = r out . Thus the

first phase does not descend into the procedure add. In second phase traverses backwards

all edges except the parameter in edges and call edges. Thus in the second phase neither

the edge a in = sum → a = a in nor the edge call add → a = a in is traversed.

2.2.5 Handling Shared Variables

This section deals with handling variables that are shared across procedures. Shared

variables include global variables in imperative languages. Though Java does not have

global variables, instance members of a class can be treated as global variables that are

accessible by the member functions.

Shared variables are handled by passing them as a additional parameters in every

function. Considering every shared variable as a parameter is a correct but inefficient as

it increases the number of nodes. We can reduce the number of parameters passed by

doing interprocedural analysis and using the GMOD and GREF information [42].

1. GMOD(P) : The set of variables that might be modified by P itself or by a proce-

dure (transitively) called from P

2. GREF(P) : The set of variables that might be referenced by P itself or by a pro-

cedure (transitively) called from P


main

sum=0 i=1 while(i<11) print(sum) print(i)

call add call add

a_in=sum

b_in=isum=r_out a_in=i

b_in=1

i=r_out

enter add

b=b_ina=a_in

r_out=resultresult=a+b

marked in phase 1

marked in phase 2

control edge parameter edge

data edge

summary edge

call edge

Figure 2.8: Slicing the System Dependence Graph


Algorithm 3 Two phase slicing algorithm (Krinke’s version)

input G=(N,E) the given SDG, s ∈ N the slicing criterionoutput S ⊆ N , the sliceW up = sW down = ∅First phasewhile W up 6= ∅ worklist is not empty do

remove one element n from W up

for all m → n ∈ E doif m /∈ S then

if m → n is a parameter out edge thenW down = W down ∪mS = S ∪m

elseW up = W up ∪mS = S ∪m

end ifend if

end forend while

while W down 6= ∅ worklist not empty doremove an element n from the worklistfor all m → n ∈ E do

if m /∈ S thenif m → n is not a parameter in edge or call edge then

W down = W down ∪mS = S ∪m

end ifend if

end forend while


GMOD and GREF sets are used to determine which parameter vertices are included

in procedure dependence graphs . At procedure entry, these nodes are inserted

1. Formal in for each variable in GMOD(P ) ∪GREF (P )

2. Formal out for each variable in GMOD(P )

Similarly at a call site, the following nodes are inserted

1. Actual in for each variable in GMOD(P ) ∪GREF (P )

2. Actual out for each variable in GMOD(P )

2.3 Slicing Object Oriented Programs

The System Dependence Graph (SDG) is not sufficient to represent all dependencies

for object oriented programs. An efficient graph representation of an object oriented

program should employ a class representation that can be reused in the construction of

other classes and applications that use the class. Section 2.3.1 discuss about dependence

graph representation for object oriented programs. Sections 2.3.2 and 2.3.3 discuss about

inheritance and polymorphism respectively.

2.3.1 Dependence Graph for Object Oriented Programs

The dependencies within a single method are represented using a Method Dependence

Graph (MDG), which is composed of data dependence subgraph and control dependence

subgraph. The MDG has a method entry node which represents the start of a method.

The method entry vertex has a formal in vertex for every formal parameter and a formal

out vertex for each formal parameter that may be modified. Each call site has a call vertex

and a set of actual parameter vertices: an actual-in vertex for each actual parameter at

the call site and an actual-out vertex for each actual parameter that may be modified

by the called procedure. Parameter out edges are added from each formal-out node to

the corresponding actual-out node. The effects of return statements are modeled by


connecting the return statement to its corresponding call vertex using a parameter-out

edge. Summary edges are added from actual in to actual out nodes as described in

Section 2.2.3.

Larsen and Harrold [66] represent the dependencies in a class using the class de-

pendence graph (ClDG). A ClDG is a collection of MDGs constructed for individual

methods in the program. In addition it contains a class entry vertex that is connected to

the method entry vertex for each method in the class by a class member edge. Class entry

vertices and class member edges let us track dependencies that arise due to interaction

among classes.

In presence of multiple classes, additional dependence edges are required to record

the interaction between classes. For example, when a class C1 creates an object of class

C2, there is an implicit call to C2’s constructor. When there is a call site in method m1

of class C1 to method m2 of class C2, there is a call dependence edge from the call site

in m1 to method start vertex of m2. Parameter in edges are added from actual in to the

corresponding formal in node and parameter out edges are added from formal out to the

corresponding actual in node.

In object oriented programs, data dependence computation is complicated by the

fact that statements can read to and write from fields of objects, i.e. a statement can

have side effects. Computation of side effect information requires points to analysis and is

further discussed in Chapter 3. Also, methods can be invoked on objects and objects can

be passed as parameters. An algorithm for computing data dependence must consider

this into account.

Handling objects at callsites

In presence of a function call invoked on an object such as o.m1(), the function call can

modify the data members of o. Larsen and Harrold observe that data member variables

of a class are accessible to all methods in the class and hence can be treated as global

variables. They use additional parameters to represent the data members referenced by a

method. Thus the data dependence introduced by two consecutive method calls via data


class Base {int a,b;protected void vm() {

a=a+b;}public Base() {

a=0;b=0;

}public void m2(int i) {

b=b+i;}public void m1() {

if(b>0) vm();b=b+1;

}

public void main1() {Base o = new Base();Base ba = new Base();ba.m1();ba.m2(1);o.m2(1);

}public void C(Base ba) {

ba.m1();ba.m2(1);

}public void D() {

Base o = new Base();C(o);o.m1();

}}

class Derived extends Base {long d;public void vm() {

d=d+b;}public Derived() {

super();d=0;

}public void m3() {

d=d+1;m2(1);

}public void m4() {

m1();}

public void main2() {int i=read();Base p;if(i>0)

p=new Base();else

p=new Derived();C(p);p.m1();

}}

Figure 2.9: Program


Figure 2.10: The Dependence Graph for the main function (from [67])

Figure 2.11: The Dependence Graphs for functions C() and D() (from [67])


member variables can be represented as data dependence between the actual parameters

at the method callsites. Figure 2.10 shows the dependence graph constructed for the

main program of Figure 2.9. Variables a and b are considered as global variables shared

across methods m1(), m2() and Base(). The data member variables are considered as

additional parameters that are passed to the function. This method of slicing includes

only those statements that are necessary for data members at the slicing criteria to

receive correct values. For example, slicing with respect to the node b = b out associated

with the statement o.m2() will exclude statements that assign to data member a.

One source of imprecision of this method is that it does not consider the fact that

data members may belong to different objects and creates spurious dependencies between

data members of different objects. In the above example, the slice wrongly includes the

statements ba.m1() and ba.m2(). Liang and Harrold [67] give an improved algorithm for

object sensitive slicing.

In the dependence graph representation of [67], the constructor has no formal in

vertices for the instance variables since these variables cannot be referenced before they

are allocated by the class constructor. Thus the algorithm omits formal-in vertices

for instance variables in the class constructor In the approaches of [67], [66] the data

members of the class are treated as additional parameters to be passed to the function.

This increases the number of parameter nodes. The number of additional nodes can

be reduced using GMOD/GREF information. Actual-out and Formal-out vertices are

needed only for those data members that are modified by the member function. Actual-in

and Formal-in vertices are needed for those data members accessed by the function.

Handling Parameter Objects

Tonella [59] represents an object as a single vertex when the object is used as a parameter.

This representation can lead to imprecise slices because it considers modification (or

access) of an individual field in an object to be a modification(or access) of the entire

object. For example, if the slicing criteria is o.b at the end of D() (in Figure 2.9), then

C(o) must be included. This in turn causes the slicer to include the parameter ba,


which causes ba.a and ba.b to be included, though ba.a does not affect o.b. To overcome

this limitation, Liang and Harrold [67] expand the parameter object as a tree. Figure

2.11 shows the parameter ba being expanded into a tree. At the first level, the node

representing ba is expanded into two nodes, Base and Derived each representing the type

ba can possibly have. At the next level, each node is expanded into its constituent data

members. Since data members can themselves be objects, the expansion is recursively

done till we get primitive data types. In presence of recursive data types, where tree

height can be infinite , k-limiting is used to limit the height of the tree to k. At the call

statement C(o) in Figure 2.9, the parameter object o is expanded into its data members.

At the function call, actual in and actual out vertices are created for the data members

of o. Summary edges are added between the actual in and actual out vertices if there is

a dependence possible through the called procedure.

2.3.2 Handling Inheritance

Java provides a single inheritance model which means that a new Java class can be

designed that inherits state variables and functionality from an existing class. The

functionality of base class methods can be overridden by simply redefining the methods

in the base class. Larsen and Harrold [66] construct dependence graph representations

for methods defined by the derived class . The representations of all methods that

are inherited from superclasses are simply reused. To construct the dependence graph

representation of class Derived (Figure 2.9), new representations are constructed for

methods such m3(), m4(). The representation of m1() is reused from class Base

Liang and Harrold [67] illustrate that in the presence of virtual methods, it is not pos-

sible to directly reuse the representations of the methods of the superclass.For example,

we cannot directly reuse the representation for m1() in class Base when we construct

the representation for class Derived. In the Base class , the call statement vm() in

m1() resolves to Base :: vm(). If a class derived from Base redefines vm(), then the call

statement vm() no longer resolves to Base :: vm(), but to the newly defined vm() of the

derived class. The callsites in the representation of m1() for class Derived have to be


changed. A method needs a new representation if

1. the method is declared in the new class

2. the method is declared in a lower class in the hierarchy and calls a newly redefined

virtual method directly or indirectly.

For example, methods declared in Dervied need a new representation because these

methods satisfy (1), Base.m1() also needs a new representation because it satisfies (2):

Base.m1() calls Dervied.vm() which is redefined in class Derived

Handling Interfaces

In Java, interfaces declare methods but let the responsibility of defining the methods to

concrete classes implementing the interface. Interfaces allows the programmer to work

with objects by using the interface behavior that they implement, rather than by their

class definition.

Single Interfaces

We use the interface representation graph [58] to represent a Java interface and its

corresponding classes that implement it. There is a unique vertex called interface start

vertex for the entry of the interface. Each method declaration in the interface can be

regarded as a call to its corresponding method in a class that implements it and therefore

a call vertex is created for each method declaration in the interface. The interface start

vertex is connected to each call vertex of the method declaration by interface membership

dependence arcs. If there are more than once classes that implement the interface, we

connect a method call in the interface to every corresponding method that implement it

in the classes.

Interface Extending Similar to extending classes, the representation of extended

interface is constructed by reusing the representation of all methods that are inherited

from superinterfaces. For newly defined methods in the extended interface, new repre-

sentations are created.


ie1 interface A {

c1 void method1(int h);

c2 void method2(int v);

}

ie3 interface B extends A {

c4 void method3(int u);

}

ce5 class C1 implements A {

s6 int h, v;

e7 public void method1(int h1) {

s8 this.h = h1;

}

e9 public void method2(int v1) {

s10 this.v = v1;

}

}

ce11 class C2 implements A {

s12 int h, v;


s14 this.h = h2+1;

}


s17 this.v = v2+1;

}

}

ce18 class C3 implements B {

s19 int h, v, u;


s21 this.h = h1+2;

}


s23 this.v = v1+2;

}

e24 public void method3(int u1) {

s25 this.u = u1+2;

}

}

ie1

c1 c2

e7 e13e9 e16

ie3

c1

e20

c2

e22

c4

e24

f1_in: this.h=this.h_in

f2_in: this.v=this.v_in

f3_in: this.u=this.u_in

f4_in: h1=h1_in

f5_in: v1=v1_in

f6_in: u1=u1_in

f7_in: h2=h2_in

f8_in: v2=v2_in

a1_in: h1_in=h

a2_in: v1_in=v

a3_in: u1_in=u

f4_in

a1_in

f7_in f5_in f8_in

a2_in

(b)

(a)

a1_in a2_in a3_in

f4_in f5_in f6_in

interface-membership

dependence arc

control dependence arc

call dependence arc

parameter dependence arc

s8 s14 s10 s17

s21 s23 s25

Figure 2.12: Interface Dependence Graph (from [58])


2.3.3 Handling Polymorphism

In Java, method calls are bound to the implementation at runtime. Method invocation

expressions such as o.m(args) are executed as follows

1. The runtime type T of o is determined.

2. Load T.class

3. Check T to find an implementation for method m. If T does not define an imple-

mentation, T checks its superclass, and its superclass until an implementation is

found.

4. Invoke method m with the argument list, args, and also pass o to the method,

which will become the this value for method m.

A polymorphic reference can refer to instances of more than one class. A class

dependence graph represents such polymorphic method call by using a polymorphic

choice vertex [66]. A polymorphic choice vertex represents the selection of a particular

call given a set of possible destinations. In this method a message sent to a polymorphic

object is represented as a set of callsites one for each candidate message handling method,

connected to a polymorphic choice vertex with polymorphic choice edges. This approach

may give incorrect results: in function main() , Larsen’s approach uses only one callsite to

represent statement p.m1() because m1() is declared only in Base. However, when m1()

is called from objects of class Derived, it invokes Derived.vm() to modify d and when

m1() is called from objects of class Base, it invokes Base.vm() to modify a. One callsite

cannot precisely represent both cases. This approach also computes spurious dependence:

the approach is equivalent to using several objects, each belonging to a different type

to represent a polymorphic object. The data dependence construction algorithm cannot

distinguish data members with the same names in these different objects.

Liang and Harrold [67] give an improved method in representing polymorphism to

overcome this limitation. A polymorphic object is represented as a tree: the root of the

tree represents the polymorphic object and the children of the root represent objects of


the possible types. When the polymorphic object is used as a parameter, the children

are further expanded into trees; when the polymorphic object receives a message, the

children are further expanded into callsites. In Figure 2.11 the callsite ba.m1() can have

receiver types Base and Derived . Thus the call site is expanded (one for each type of

receiver).

2.3.4 Case Study - Elevator Class and its Dependence Graph

Figure 2.13 shows the elevator program and the slice with respect to the line 59. Figure

2.14 shows the class dependence graph constructed for the program. The C++ Elevator

class discussed in [72] has been modified for Java.


1 class Elevator {

2 static int UP=1, DOWN=-1;

3 public Elevator(int t) {4 current floor=1;

5 current direction = UP;

6 top floor = t;

7 }

8 public void up() {9 current direction=UP;

10 }

11 public void down() {12 current direction=DOWN;13 }14 int which floor() {15 return current floor;16 }

17 public int direction() {18 return current direction;19 }

20 public void go (int floor) {

21 if(current direction==UP) {22 while (current floor!= floor

23 && current floor <= top floor))

24 current floor= current floor+1 ;25 }26 else {27 while (current floor != floor

28 && current floor >0)

29 current floor= current floor-1;

30 }

31 int current floor;32 int current direction;33 int top floor;34 }

35 class AlarmElevator extends Elevator {

36 public AlarmElevator(int top floor) {37 super(top floor);

38 alarm on=0;39 }40 public void set alarm() {41 alarm on=1;42 }43 public void reset alarm() {44 alarm on=0; }45 public void go(int floor) {46 if(!alarm on)

47 super.go(floor);

48 }

49 protected int alarm on;50 }

51 class Test {52 public static void main(String args[]) {53 Elevator e;54 if(condition)

55 e=new Elevator(10);56 else57 e=new AlarmElevator(10);

58 e.go(5);

59 System.out.print(e.which floor());

60 }61 }

Figure 2.13: The Elevator program


A10_in

52

54

57 55

36

37

40

31

32

33

20

21

27

24

F1_in F5_in F1_out

A5_out A6_out A7_out

F3_in F1_out F2_out F8_out

A8_in A4_out A5_out A6_out

A11_in A4_out A5_out A6_out

A4_in A5_in A6_in A7_in A9_in

F1_in F2_in F3_in F5_in F1_out

A4_in A5_in A6_in A9_in

F4_in F3_out

A4_in A5_in A6_in A8_in

P1

59

A4_in

F1_in

14

A4_out

15

F2_out

3

58

29key for parameter vertices

F1_in: current_floor = current_floor_inF1_out: current_floor_out = current_floor

F3_in: top_floor = top_floor_inF3_out: top_floor_out = top_floor

F5_in: floor = floor_inF6_in: a = a_inF6_out: a_out = aF7_in: b = b_inF8_in: alarm_on = alarm_on_inF8_out: alarm_on_out = alarm_on

F2_in: current_dirn = current_dirn_inF2_out: current_dirn_out = current_dirn

A1_in: a_in = current_floor A1_out: current_floor = a_out

A2_in: b_in = 1A3_in: b_in: = ?1

A4_in: current_floor_in = current_floor A4_out: current_floor = current_floor_out A5_in: current_dirn_in = current_dirnA5_out: current_dirn = current_dirn_out

F4_in: 1_top_floor = 1_top_floor_in A6_in: top_floor_in = top_floorA6_out: top_floor = top_floor_out

A7_in: alarm_on_in = alarm_onA7_out: alarm_on = alarm_on_outA8_in: 1_top_floor_in = 1_top_floorA9_in: floor_in = 5

control dependenceedge

summary edge

22

F2_in F3_in

F1_out

slice point

call edge, parameter edge

A!0_in: top_floor = 10A11_in: 1_top_floor = 10

data dependenceedge

F3_out

F8_in

A4_out

A4_out

A4_out

4 5 6

Figure 2.14: Dependence Graph for Elevator program

Chapter 3

Points to Analysis

In this chapter we first discuss the need for points to analysis. In the context of slicing,

points to analysis is essential for the correct computation of data dependencies and

construction of call graph. We summarize some issues related to computing points to

sets, including the methods for its computation and various factors that affect precision

. We next describe Andersen’s algorithm for pointer analysis for C and its adaptation

for Java. We then describe a new method for intra-procedural alias analysis which is an

improvement over flow insensitive analysis but not as precise as a flow sensitive analysis.

3.1 Need for Points to Analysis

The goal of pointer analysis is to statically determine the set of memory locations that

can be pointed to by a pointer variable. If two variables can access the same memory

location, the variables are said to be aliased. Alias analysis is necessary for program anal-

ysis, optimizations and correct computation of data dependence which is necessary for

slicing. Consider the computation of data dependence in Figure 3.1. Here the statement

print(y.a) is dependent on x.a=... , since x and y are aliased due to the execution

of the statement y=x. Without alias analysis, it is not possible to infer that statement 7

is dependent on statement 4.

A points to graph gives information about the set of memory locations pointed at by

38

Chapter 3. Points to Analysis 39

1 void fun() {2 obj x,y;3 x=new obj(); // O1 represent the object allocated4 x.a = ....;5 ... = y.a;6 y = x;7 print(y.a);8 }

Figure 3.1: Need for Points to Analysis

each variable. Figure 3.1 shows a program and its associated points to graph.

In C a variable can point to another stack variable or dynamically allocated memory

on heap, whereas in Java a reference variable can point only to objects allocated on

heap, as stack variables cannot be pointed to due to lack of address of operator (&).

Dynamically allocated memory locations on heap are not named. One convention is to

refer objects (memory locations) by the statement at which they are created. A statement

can be executed many times and therefore can create a new object each time. Thus

approximations are introduced in the points to graph if the above convention is used.

Another cause for approximation is the presence of recursion and dynamic allocation of

memory, which leads to statically unbounded number of memory locations.

3.2 Pointer Analysis using Constraints

Our aim is to derive the points to graph from the program text. One method to derive

the points to graph is using constraints [64]. If pts(q) denotes the set of objects initially

pointed by q, after an assignment such as p = q, p can additionally point to those objects,

which are initially pointed at by q. Thus we have the constraint pts(p) ⊇ pts(q). Every

statement in the program has an associated constraint. A solution to the constraints

gives the points to sets associated with every variable.

The constraints such as pts(p) ⊇ pts(q) are also called subset constraints or inclusion

based constraints. Andersen uses subset constraints for analyzing C program and his

algorithm is described in Section 3.4


int a=1, b=2;

int *p, *q;

void *r, *s;

p = &a;

q = &b;

h1: r = malloc

h2: s = malloc

s

r

q

p

a

b

heap2

heap1

Points to graph for a C program

class Obj { int f; }

Obj r,s,t;

h1: r = new Obj();

h2: s = new Obj();

h3: r.f = new Obj();

t = s;

t

s

rf

heap1f

heap2

heap3f

Points to graph for a Java program

Figure 3.2: Points to Graphs

Subset vs Unification Constraints

The constraints generated can be either subset based or equality based. A subset con-

straint such as p ⊇ q says that the the points-to set of p contains the points-to set of

q. Instead of having subset constraints, Steensgaard [13] uses equality based constraints

where after each assignment like p = q, the points to sets of p and q are unified i.e. the

points to sets of both the variables are made identical.

Steensgaard’s approach is based on a non standard type system, where type does not

refer to declared type in the program source. Instead, the type of a variable describes

a set of locations possibly pointed to by the variable at runtime. At initialization each

variable is described by a different type. When two variables can point to the same mem-

ory location, the types represented by the variables are merged. However the stronger

constraints make the analysis less precise. The equality based approach is also called

unification because it treats assignments as bidirectional. This unification merges the


points to set of both sides of the assignment and is essentially computing an equivalence

relation defined by assignments, which is done by the fast union find algorithm [22]

If all the variables can be assigned types, subject to the constraints, then the sys-

tem of constraints is said to be satisfiable or well typed. Points-to analysis reduces to

the problem of assigning types to all locations (variables) in a program, such that the

variables in the program are well-typed. At the end of the analysis, two locations are

assigned different types, unless they have to be described by the same type in order for

the system of constraints to be well-typed.

3.3 Dimensions of Precision

The various factors that contribute to the precision of the analysis computed are flow

sensitivity, field sensitivity, context sensitivity and heap modelling. Ryder [17] discusses

various parameters that contribute to the precision of the analysis

Flow Sensitive vs Flow Insensitive approach

A flow sensitive analysis takes into account the control flow structure of the program.

Thus the points-to set associated with a variable is dependent on the program point. It

computes the mapping variable ⊗ program point → memory location. This is precise

but requires a large amount of memory since the points to sets of the same variable at

two different program points may be different and their points-to sets have to be recorded

separately. Flow sensitive analysis allows us to take advantage of strong updates, where

after a statement x = ..., the points to information about x prior to that statement can

be removed.

A flow insensitive approach computes conservative information that is valid at all

program points. It considers the program as a set of statements and computes points-to

information ignoring control flow. Flow insensitive analysis computes a single points to

relation that holds regardless of the order in which assignment statements are actually


executed.

A flow insensitive analysis produces imprecise results. Consider the computation of

data dependence for the program in Figure 3.1. If we apply flow insensitive alias anal-

ysis, then the analysis will conclude that x and y can both point to O1 , and thus the

statement ... = y.a (line 5) is made dependent on x.a = ... . But y can point to O1

only after the statement y = x. Thus flow insensitive analysis leads to spurious data

dependence.

Field Sensitivity

Aggregate objects such as structures can be handled by one of three approaches: field-

insensitive, where field information is discarded by modeling each aggregate with a single

constraint variable; field-based, where one constraint variable models all instances of a

field; and finally, field-sensitive, where a unique variable models each field instance of an

object. The following table describes these approaches for the code segment

x.a = new object();

y.b = x.a ;

field based pts(b) ⊇ pts(a)field insensitive pts(y) ⊇ pts(x)field sensitive pts(y.b) ⊇ pts(x.a)

Heap Abstraction

Two variables are aliased if they can refer to the same object in memory. Thus we need

to keep track of objects that can be present at runtime. The objects created at runtime

cannot be determined statically and have to be conservatively approximated. The least

precise manner is to consider the entire heap as a single object. The most common man-

ner of abstraction is to have one abstract object per program point. This abstract object

is a representative of all the objects that can be created at runtime due to that program


main() {object a,b,c,d;a=new object(); pts(a) ⊇ {o1}b=new object(); pts(b) ⊇ {o2}c=id(a); pts(r) ⊇ pts(a), pts(c) ⊇ pts(r)d=id(b); pts(r) ⊇ pts(b), pts(d) ⊇ pts(r)

}

object id(object r) {

return r;}

Figure 3.3: Imprecision due to context insensitive analysis

point. A more precise abstraction is to take context sensitivity into account using the

calling context to distinguish between various objects created at the same program point.

Context Sensitivity

A context sensitive analysis distinguishes between different calling contexts and does not

merge data flow information from multiple contexts. In Figure 3.3, a and b point to o1

and o2 respectively. Due to the function calls, c is made to point to o1 and d is made

to point to o2. So the actual points to sets are a → o1 , b → o2, c → o1 and c → d A

context insensitive analysis models parameter bindings as explicit assignments. Thus r

points to both the objects o1 and o2. This leads to smearing of information making c

and d point to both o1 and o2.

One method to incorporate context sensitivity is to summarize each procedure and

embed that information at the call sites. A method can change the points to sets of

all data reachable through static variables, incoming parameters and all objects created

by the method and its callees. A method’s summary must include the effect of all the

updates that the function and all its callees can make, in terms of incoming parameters.

Thus summaries are huge. Also there is another difficulty due to call back mechanism.


In presence of dynamic binding, we do not know which method would be called making

it difficult to summarize the method [1].

Another method to incorporate context sensitivity is the cloning based approach.

Cloning based approaches expands the call graph for each calling context. Thus there

is a separate path for each calling context. A context insensitive algorithm can thus be

run on the expanded graph. This leads to an exponential blowup. Whaley and Lam

[18] use Binary Decision Diagrams (BDD) are used to handle the exponential increase in

complexity caused due to cloning. BDDs were first used for pointer analysis by Berndl

et.al [31]. Milanova et.al [20] introduces object sensitivity, which is a form of context

sensitivity. Instead of using the call stack to distinguish different contexts, they use the

receiver object to distinguish between different contexts.

3.4 Andersen’s Algorithm for C

Andersen proposed a flow insensitive , context insensitive version of points to analysis

for C. His analysis modeled the heap using a separate concrete location to represent all

memory allocated at a given dynamic allocation site. The implementation expressed the

analysis using subset constraints and then solved the constraints.

Andersen’s algorithm [64] models the points to relations as subset constraints. After a

statement such as p=q, p additionally points to those objects, which are initially pointed

by q. Thus we have the constraint pts(p) ⊇ pts(q). The list of constraints for C is given

in Table 3.1

p = &x x ∈ pts(p)p = q pts(p) ⊇ pts(q)p = ∗q ∀x ∈ pts(q), pts(p) ⊇ pts(x)∗p = q ∀x ∈ pts(p), pts(x) ⊇ pts(q)

Table 3.1: Constraints for C

Constraints are represented using a constraint graph. Each node N in the constraint

graph represents a variable and is annotated with pts(N), the set of objects the variable


can point to. A statement such as p = &x initializes pts(p) to {x}. Each edge q → p

represents that p can point to whatever q can point.

Solving the constraints involves propagating points to information along the edges.

As the points to information associated with the node changes, new edges may be added

due to statements p = ∗q and ∗p = q. The statement p = ∗q creates an edge from each

variable in pts(q) to p. The statement ∗p = q creates an edge from q to each variable in

pts(p).

An iterative algorithm is used to compute the points to sets till a fixed point is

reached. This is equivalent to computing the transitive closure of the graph and has

complexity O(n3) as discussed in [14].

3.5 Andersen’s Algorithm for Java

3.5.1 Model for references and heap objects

It is impossible for two locals to be aliased in Java, since there is no mechanism that

allows another variable to refer/point to a local variable on stack. The following memory

model is discussed in [1]

1. certain variables are reference to T where T is a declared type. These variables are

either static or live on runtime stack.

2. There is a heap of objects. All variables point to heap objects not to other variables

3. A heap object can have fields and the value of a field can be a reference to a heap

object

In Java, aliases arise due to assignments (either explicit in case of assignment state-

ment or implicit in case of actual to formal parameters binding occurring in method

calls). The following are the effects of various statements on the points to graph.


1. Object creation: h : T v = new T () : This statement creates a new heap object

denoted by h and makes the variable v point to h. All objects created at line h are

represented by a representative abstract object named h.

2. Copy statement: v = w : The statement makes v point to whatever heap objects

w currently points to

3. Field Store : v.f = w : The type of object that v points to must have a field f and

this field must be of some reference type. Let h denote an object pointed to by v.

This statement makes the field f in h point to whatever heap objects w currently

points to.

4. Field Load: v = w.f : here w is a variable pointing to some heap object that has a

field f and f points to some heap object h. The statement makes variable v point

to h.

5. Cast statement: Points to analysis in Java can take advantage of type safety. A

reference variable can only point to objects of type x or subtype of x. A cast

statement of the form p=(T)q causes the pointer stored in the variable q to be

assigned to the variable p , provided that the type of the target of the pointer is a

subtype of T. Only objects oi ∈ pt(q) having a type typeof(oi) which is a subtype

of T should be constrained to pt(p)

6. Method Invocation: l = r0.m(r1, r2, ...rn):

Using the call graph, the call targets of m are found. Call graph construction is

discussed in Section 3.6. The following implicit assignments are created due to

parameter bindings.

(a) The formal parameters of m are assigned the objects pointed to by actual

parameters. The actual parameters include not just the parameters passed in

directly, but also the receiver object itself. Every method invocation assigns

the receiver object to this variable


(b) The returned object of m is assigned to the lhs variable l of the assignment

statement.

3.5.2 Computation of points to sets in SPARK

Lhotak [70] describes Andersen’s algorithm adapted for Java. Lhotak’s algorithm forms

the the basis of SPARK, a part of Soot framework. The constraints for Java are given

in Table 3.2.

p = new object() o1 ∈ pts(p). o1 is the representative objectq = p pts(q) ⊇ pts(p)q = p.f ∀o ∈ pts(p), pts(q) ⊇ pts(o.f)q.f = p ∀o ∈ pts(q), pts(o.f) ⊇ pts(p)

Table 3.2: Constraints for Java

In SPARK, the constraints are represented using the constraint graph. A node rep-

resent either an object allocation such as oi or a variable v or a field deference such as

a.f .

1. Allocation Node: Runtime objects may be grouped based on allocation site or

based on given run time type.

2. Variable node: The variable node is used to represent local variables of a method

and parameters, but they are also used to represent static fields and may be used

to represent instance fields if instances of a field are being modeled together in a

field-based analysis.

3. Field reference node: A field reference node p.f represents field f of the object

pointed by base variable p.

Each node n is has an associated set pts(n) which denote the set of objects it can

point to. An assignment statement q = p creates an assignment edge from p → q. A

store statement q.f = p creates a store edge p → q.f . A load statement q = p.f creates a


load edge p.f → q. An allocation statement p = new object(); initializes pts(p) to {o1}.

The points to sets are propagated as given in Algorithm 4 which is due to Lhotak [70].

Algorithm 4 Lhotak’s algorithm for computing points-to sets

initialize sets according to allocation edgesrepeat

propagate sets along each assignment edge p → qfor each load statement p.f → q do

for each a ∈ pts(p) dopropagate sets pts(a.f) → pts(q)

end forend forfor each store edge p → q.f do

for each a ∈ pts(q) dopropagate sets pts(p) → pts(a.f)

end forend for

until no changes

3.6 CallGraph Construction

Computation of call graph is necessary for points to sets computation because the call

graph establishes parameter bindings. This section describes how call targets are com-

puted in SPARK for various method call statements in Jimple.

1. invokestatic: This statement occurs when there is a call to a static method. The

target method of this statement is known at compile time.

2. invokespecial : In Java, invokespecial is used to invoke a) instance initialization

methods b) private methods c) superclass method. The target method is known

at compile time.

3. invokevirtual : To compute the call targets of a statement r0.m(r1, r2..., rn), the

types of the receiver (i.e. the types of objects pointed by r0) needs to be computed.

This is described in Section 3.6.1. If C represents a receiver type, the algorithm

checks for m() in the declared class C. If the method is not found, the class


hierarchy is traversed until a superclass is found which declares a method with

same signature as m().

4. invokeinterface: This statement occurs when a virtual method is invoked on an

interface . The handling of this statement is similar to invokevirtual.

3.6.1 Handling Virtual Methods

The targets of a virtual method r0.m(r1, r2..., rn) is not known at compile time. The

target of these statements depends on the type of receiver objects. The types that the

receiver r0 can point to can be computed in the following ways.

Computing receiver types using points to information

This method uses the result of points to analysis to find what types r0 can point to. But

points to analysis requires the call graph to know the parameter bindings. So both points

to analysis and call graph construction are carried on simultaneously. This method is

called on-the-fly call graph construction.

Computing receiver types using subclass relationships

Another approach is to statically compute the types of objects that can be pointed by

r0. Variations of this technique are as follows.

Class Hierarchy Analysis: Class Hierarchy Analysis (CHA) [27] is a method to

conservatively estimate the types of receiver. It uses subclass relationships to resolve

method targets. Given a receiver o of a declared type d, receiver-types(o,d) for Java is

defined as follows:

1. If d is a class type C, receiver-types(o,d) includes C plus all subclasses of C .

2. If d is a interface type I, receiver-types(o,d) includes:

(a) the set of all classes that implement I or implement a sub-interface of I, which

we call implements(I), plus


(b) all subclasses of implements(I).

Rapid Type Analysis: Rapid Type analysis(RTA) [26] is an extension to CHA.

RTA algorithm maintains a set variable S for the whole program. This set variable keeps

track of all the instantiated classes. The idea is that if there is no instance created for a

class C in the program, then there could not be calls to C’s methods. This can greatly

reduce the set of executable virtual functions and so increase the precision of CHA.

Variable Type Analysis: Variable Type Analysis (VTA) uses subset constraints

to express the possible sets of runtime types of objects each variable may hold [25].

3.7 Improvements to Points to Analysis

Various techniques have been proposed to speed up Andersen’s analysis. These are based

on the observation that a constraint graph can have cycles and the points to sets of all

variables in the cycle are the same. Fahndrich et.al [10] , Rountev and Chandra [12],

Heintze and Tardieu [11] use this technique to speed up the analysis.

Shapiro [24] describes tradeoffs between the more precise Andersen analysis and the

more efficient Steensgaard analysis. Their idea was to separate the variables in a program

into k categories. When two variables are in the same category, the constraints between

them are treated as equality constraints; only variables in different categories have subset

constraints among them. Das [30] observes that in C programs , many pointers are used

to implement call by reference. He proposed an analysis that uses subset constraints

between stack variables that do not have their address taken and equality constraint

among other variables. The remaining pointers which could slow down a subset based

analysis are analyzed using the fast but imprecise equality based analysis.

Diwan et.al. [33] use type information to refine the analysis. They describe three

different analyses. The first analysis was to treat variables as possibly aliased whenever

the type of one variable is a subtype of the other. The second analysis added the

constraint that a field in an object may be aliased to same field of another object. The

third was an equality based analysis similar to Steensgaard.


Improvements to context sensitivity is done by Wilson and Lam [29] who implemented

flow sensitive, context sensitive subset based analysis using partial transfer functions to

summarize the effect of each function on points to sets. Their analysis did not have to

analyze the function for every calling context; rather it had to apply the partial transfer

function in every calling context.

Improvements to field sensitivity is done by Rountev et.al. [28] in their framework

called BANE. They were unsuccessful in expressing an efficient field based analysis di-

rectly in BANE. So they modified to allow a subset constraint to be annotated with a

field. During the analysis the declared type of each variable was not considered; however

objects of incompatible type were removed from final points to sets. Whaley and Lam

[34] adapt the fast points-to algorithm of Heintze and Tardieu [11] by incorporating field

sensitivity and respecting declared types.

Demand driven alias analysis for Java is presented by Manu Sridharan et.al [32]. The

stores and the corresponding loads should be matched for reachability in the constraint

graph.They formulate the points to analysis for Java as a balanced parentheses problem,

which is based on context free language reachability.

3.8 Improving Flow Sensitivity

Usual methods to perform points to analysis are flow insensitive. We now present a new

algorithm which is more precise than a flow insensitive algorithm but less precise than a

flow sensitive algorithm.

To incorporate flow sensitivity we observe that at any program point, only a subgraph

of the constraint graph (which will be referred as Object Flow Graph) is valid and we

compute what objects are accessed by a variable in this subgraph. In other words, we

need to answer queries of the form reaches(O,V,S), where O is the an object allocation

node, V is a variable node and S is the subgraph comprising of valid edges at that point.

A flow insensitive algorithm answers queries of the form reaches(O,V). This reachabil-

ity problem is solvable by computing transitive closure. The standard transitive closure


algorithm cannot handle queries of the form reaches (O,V,S) since information about

what edges are necessary for reachability is not maintained. To track this information,

we introduce the concept of access expressions.An access expression Eij tracks the con-

ditions necessary for node j to be reachable from node i. An access expression is a set of

terms. Each term represent the set of edges present on a distinct path from i to j.

The following algorithm computes whether a variable node V is reachable from an

object allocation node O at a particular program point P.

1. Construct the OFG G=(N,E) (as described in Section 3.5.2)

2. At a program point, find the subset of edges in the OFG that are valid.This gives

a mapping P → 2E . This is described in Section 3.8.1

3. Construct the access expressions for each pair of nodes of the form (O,V) in the

subgraph. This is described in Section 3.8.2

4. Check whether the set of valid edges S satisfies the access expression constructed

for (O,V). This is described in Section 3.8.3

Before we describe the algorithm in detail, here is a brief description of how it works.

Consider the query reaches(o1,d,7) which asks if o1 is accessible by variable d at program

point 7 in our example (Figure 3.5). Figure 3.4 shows the OFG constructed for the

program. At line 7, the valid edges are 0,4,5,6,7. Section 3.8.1 describes the algorithm

to computes the set of edges that are valid at every program point. Figure 3.6 shows the

access expressions computed by algorithm 5 (Section 3.8.2) .The expression 0.1.2.3+0.5.3

present computed for (O1,d) says that O1 reaches d if either all the edges in {0,1,2,3}

are present or all the edges in {0,5,3} are present. Reachability is possible if the set of

valid edges satisfies the access expression as computed by algorithm 7 (Section 3.8.3).

Here the set of valid edges doesn’t satisfy the access expression. Thus d cannot access

o1 at line 7.


ONMLHIJKo10 // GFED@ABCa

1 //

5

��>>

>

>

>

>

>

>

>

>

>

>

>

>

>

ONMLHIJKb

2

��

ONMLHIJKo2

4

��GFED@ABCc

3 //

6

HH

ONMLHIJKd

7 // GFED@ABCe

Figure 3.4: Object Flow Graph

3.8.1 Computing Valid Subgraph at each Program Point

We need to compute the edges of the OFG that are valid at every program point (i.e.

the mapping Program Point → Valid Edges) . This can be considered as a data flow

problem. Each edge Ei in the OFG is created by a statement Si . Thus the GEN set of

Si is initialized to be Ei. The dataflow equations are as shown in Table 3.3

GEN(Si) = Ei The GEN set of statement Si is initialized to Ei. .IN(Si) =

⋃S′∈pred(Si)

OUT (S ′) The valid edges at the entry of a statement is defined

as union of valid edges over all predecessors.OUT (Si) = GEN(Si)

⋃IN(Si) The valid edges at the exit of a statement

Table 3.3: Data flow equations for computing valid edges

The meet operator merges the set of valid edges along each of the program paths. An


0 a= new obj(); // o1if(P) {

1 b = a ;2 c = b;3 d = c;

} else {4 d = new obj(); //o25 c = a;6 b = c;7 e = d;8 d.f = 1;

}

Figure 3.5: An example program

o1,a 0o1,b 0.1 + 0.5.6o1,c 0.1.2 + 0.5o1,d 0.1.2.3 + 0.5.3o1,e 0.1.2.3.7 + 0.5.3.7o2,d 4o2,e 4.7

Figure 3.6: Access Expressions

iterative algorithm is used to arrive at a fixed point. This associates with each program

point the set of edges of the OFG (i.e. the OFG subgraph) that are valid at that point.

Thus we obtain the mapping Program point → Valid Edges . Table 3.4 computes this

information for the program fragment of Figure 3.5.

GEN OUT0 a= new obj(); e0 e0

if(P) {1 b = a ; e1 e0,e12 c = b; e2 e0,e1,e23 d = c; e3 e0,e1,e2,e3

}else{

4 d = new obj(); e4 e0,e45 c = a; e5 e0,e4,e56 b = c; e6 e0,e4,e5,e67 e = d; e7 e0,e4,e5,e6,e78 d.f = 1 - e0,e4,e5,e6,e7

}9 print(e) - e0,e1,e2,e3,e4,e5,e6,e7

Table 3.4: Computation of Valid edges

The advantage of querying the valid subgraph illustrated by considering “d.f” at line

8 (Table 3.4). It is clear from the program that d cannot access O1. This fact is captured


by OFG Subgraph (comprising of e0,e4,e5,e6,e7) in Figure 3.7. The dotted lines show

the edges that are invalid at that program point. Information could flow only through

e0,e4,e5,e6,e7. This shows that d could not access O1. Though considering the OFG

subgraph helps in refining the points to sets, imprecision is caused due to merging of

valid edges and absence of strong updates as described below.

Imprecision due to merging the set of valid edges

As we have seen, the meet operator merges the set of valid edges along each of the control

flow paths. This leads to imprecision. At line 9, all of the edges in the OFG are valid

. So node e is reachable from o1. However from the program, we can see that e cannot

access o1.

Imprecision due to absence of strong updates

In computing the valid edges at a program point, the edges are not killed. In our program,

suppose there is a reassignment to d at a statement S after line 7, it might seem feasible

to kill the edge e4 at S . However this would be incorrect since this would disrupt the

reachability information from O2 → e. O2 would reach e even if there is a reassignment

to d. Removing e4 would make it unreachable. Therefore edges are not killed, which

leads to imprecision.

3.8.2 Computation of Access Expressions

An access expression is associated with every pair of nodes of the form (O,V) where O is

an allocation node and V is a variable node. The access expression tracks the conditions

for node V to be reachable from O. We have seen that the OFG is comprised of three

types of nodes - variable nodes, object allocation nodes and field dereference nodes.

Algorithm 5 describes the computation of access expressions for a simple graph without

considering field dereference nodes. Algorithm 6 extends this to handle field references

as well.


ONMLHIJKo10 // GFED@ABCa

1 //

5

��>>

>

>

>

>

>

>

>

>

>

>

>

>

>

>

>

>

>

ONMLHIJKb

2

��

ONMLHIJKo2

4

��GFED@ABCc

3 //

6

II

ONMLHIJKd

7 // GFED@ABCe

Figure 3.7: OFG Subgraph

The computation of access expression for each variable can be considered as a data

flow problem. Algorithm 5 computes the access expressions.

If the graph is a DAG (Figure 3.8) , the access-expressions can be computed in a

single pass by considering the nodes in topological order. In presence of cycles as in

Figure 3.9 then we may have to process a node multiple times (re-evaluated). For com-

puting access expressions in Figure 3.9, the worklist is initialized to node a (which is the

allocation node) and is assigned the expression ε). a’s successors b and c are added to

the worklist, which now has b,c. We get the assignment (a, b) → 1. Next c is evaluated

to get (a, c) → 1.3 + 4. Next b is re-evaluated to get 1+ 1.3.2 + 4.2 which simplifies

to 1+4.2. (Simplification of access expressions is discussed later in this section). Since

the access expression of b is changed, its successor c is added to the worklist.The access


Algorithm 5 Constructing access expressions for a simple graph

input Object Flow Graph Goutput Access expressions for every pair of nodes (O,V) such that O is an allocationnode and V is a variable nodefor all Oi ∈ allocation nodes do

Initialize the access expression Oi to εW is the worklist containing the nodes to be processedadd the successors of Oi to the worklistwhile worklist is not empty do

remove a node N from worklistexpr(Oi, N) = expr(Oi, N) +

∑P∈predecessor(N) expr(Oi, P ) ∗ EPN { EPN denotes

the edge label present on P → N}if access expression of N is changed add successors of N to worklist.

end whileend for

expression of c is reevaluated as 4.2.3 + 1.3 + 4 which simplifies to 1.3+4. The iteration

stops when there is no change to the access expressions We get (a, b) → 1 + 4.2 and

(a, c) → 1.3 + 4.

Handling Load and Store statements

Load and store statements can create additional reachable paths from object allocation

nodes to variable nodes. Consider a program in which a store statement b.f = c is

followed by a load statement a = b.f. The statement Es b.f = c induces an edge from

Oc → Ob.f . The statement El a = b.f induces an edge from Ob.f → Oa . Thus due to

loads and stores, a new reachable path is established from Oc → a.

We annotate the condition under which the flow happens through loads and stores

using access expressions. The flow from Oc → Ob.f is possible if the set of valid edges

contain Es. The function process-stores records this information. A flow from Ob.f → Oa

is possible when two conditions are met a) the edges required for a store to Ob.f must

be valid and b) set of valid edges must contain El. The function process-loads records

this information.


GFED@ABCr

1

��

(r,r,ε)

GFED@ABCa

2��

��

��

�

3

��>>

>>>>

>>>>

>

(r,a,1)

ONMLHIJKb

4

��==

====

====

=

(r,b,1.2) GFED@ABCc

5

��

��

��

(r,c,1.3)

ONMLHIJKd

(r,d,1.2.4+1.3.5)

Figure 3.8: Access Expressions(for a DAG)

The algorithm of computing access expressions is given in Algorithm 6 which con-

structs the expression that tracks the condition for reachability, instead of propagating

the points to sets as in Algorithm 4.

Simplification of Access Expressions

To reduce the space for storing access expressions, they can be simplified by eliminating

redundant terms and factors. Redundant terms as in expressions like 1.2 + 1.2.3 can be

simplified to 1.2 since reachability is already established if edges 1 and 2 alone are present.

In general any term which is a superset of an existing term is redundant. Redundant

factors in a term can be eliminated using dominator information. Let e1 and e2 be the

edges created by nodes n1 and n2 respectively. If n1 dominates n2 in the control flow

graph, then e1 would be a factor in any term involving e2. It is redundant to record

the factor e1. This simplifies terms of the form ...e1.e2... to e2. Figure 3.10 shows the

access expressions after simplification of the original access expressions in Figure 3.6.


Algorithm 6 Computing Access expressions with Loads and Stores

program maininput The Object Flow Graphoutput Access expressions for (Oi,Vj) ( Oi ∈ allocation node and Vj ∈ variable node) taking into consideration the effect of loads and stores.

repeatcompute access expressions for (Oi,Vj) ( Oi ∈ allocation node and Vj ∈ variablenode ) using Algorithm 5process-storesprocess-loads

until no changes occur to access-expressionsend program

function process-storesfor each store statement Es a.f = b do

for each Oa ∈ pts(a) dofor each Ob ∈ pts(b) do

expr(Ob, Oa.f) = expr(Ob, Oa.f) + Es

end forend for

end forend function

function process-loadsfor each load statement El a = b.f do

for each Ob ∈ pts(b) dofor each Oa ∈ pts(a) do

expr(Oa, a) = expr(Oa, a) + expr(Oa, Ob.f) ? El

end forend for

end forend function


GFED@ABCa1 //

4

��??

?

?

?

?

?

?

?

?

?

?

?

?

?

?

?

?

?

?

?

?

?

?

?

?

?

(a,a,ε) ONMLHIJKb

3

(a,b,1+4.2)

GFED@ABCc

2

JJ

(a,c,1.3+4)

Figure 3.9: Access Expressions (for general graph)

Dominator information is necessary for removing redundant factors. Figure 3.11 shows

the dominator tree constructed for the program in Figure 3.5.

o1,a 0

o1,b 1 + 6

o1,c 2 + 5

o1,d 3

o1,e 3.7

o2,d 4

o2,e 7

Figure 3.10: Simplified Access Ex-pressions

0

��==

==

��

1

��

4

��

2

��

5

��

3 6

��

7

��

8

Figure 3.11: Dominator Tree

3.8.3 Checking for Satisfiability

Once we have a set of valid edges (which form a subgraph), we can test whether the set

of valid edges S satisfies the access expression for (O,V) denoted by EOV . Each term in


EOV represent the set of edges present in a path from O → V. If there is a path that

can be formed with the set of valid edges S, then S satisfies EOV . Algorithm 7 computes

this information.

Algorithm 7 Algorithm to check satisfiability of an access expression

Input An access expression E expressed as sum of terms and a set of valid edges S.Output boolean value indicating if the set of valid edges satisfy the access expression.{ The access expression is expressed as a sum of terms. Each term represent a set ofedges }for each term Ti in E do

if S ⊇ Ti thenreturn true

end ifend forreturn false

Thus our algorithm computes whether a variable V can point to an object allocation

node O in the subgraph that is valid at a given program point. Since only the valid

subgraph of the object flow graph is considered, it avoids computing spurious points to

sets, thereby gaining improvement in precision over flow insensitive approaches.

Chapter 4

Implementation and Experimental

Results

In this chapter we discuss the details of our implementation and provide some experimen-

tal results based on the slicing infrastructure developed in this thesis. We first describe

the framework into which we have integrated the slicer.

4.1 Soot-A bytecode analysis framework

Soot [69] is a framework capable of analyzing and optimizing bytecode. There are

four kinds of intermediate representations in Soot, namely Baf, Jimple, Shimple and

Grimp. Baf is a stack based code useful for low level optimizations such as peephole

transformations. Jimple is a typed three address code Shimple is an SSA variant of

Jimple. Grimp is an aggregated form of Jimple. Figure 4.1 is a pictorial view of the

framework.

We found Jimple to be most suitable for performing our analysis required for build-

ing dependence graphs. Jimple statements are in three address code form x := y op

z . The main problem in analyzing stack code is keeping track of flow of values. Three

address code is better suited for program analysis than stack code. Since the operand

stack that is present in the bytecode is eliminated, the stack locations are represented in

62

Chapter 4. Implementation and Experimental Results 63

Jimple as local variables. Also the declared types of variables is present in Jimple. The

typing information is inferred from bytecode using explicit references to types present in

method signatures and instantiations. In Jimple, there are just 15 statements as com-

pared to more than 200 bytecode instructions, making its analysis simpler than bytecode.

Soot provides many facilities to perform scalar optimizations like constant propaga-

tion, branch elimination, dead code elimination as well as whole program optimization

like performing points to analysis and side effect analysis. Apart from optimizations and

analysis, Soot has facilities to create, instrument and annotate bytecode.

We now describe some important classes and methods available in the Soot frame-

work. The Scene class contains information about about the application being analyzed.

The method loadClassAndSupport(String className) loads the given class and re-

solves all the classes necessary to support that class. As each class is read, it is converted

into Jimple representation. After this conversion, each class is stored in an instance of

SootClass which contains information like its superclass, list of interfaces it implements

and a collection of SootFields and SootMethods. Each SootMethod contains informa-

tion such as the list of local variables defined, parameters and a list of three address code

instructions. At the beginning of the Jimple instruction list, there are special identity

statements that provide explicit assignments from the parameters (including the implicit

this parameter) to locals within the SootMethod. This makes sure that every variable

is defined at least once before it is used. The control flow graph can be constructed from

the method body using the class UnitGraph.

To represent data, Soot provides the Value interface. Different types of values include

Locals, Constants, Expressions , parameters passed represented by ParameterRef

and this pointer represent by ThisRef. The Unit interface is used to represent state-

ments. In Jimple, Stmt interface which extends Unit is used to represent the three

address code statement. Boxes encapsulate Values and Units. It provides indirect access

to Soot objects.The Unit interface contains the following useful methods

1. getDefBoxes returns the list of Value Boxes which contain definitions of values


.java

baf

grimp

.class

jimple

Internal Representations of Soot

Scene

SootClass

SootMethod

JimpleBody

UnitGraph Chain of Units

Chain of Locals

UseBoxes DefBoxes

Figure 4.1: Soot Framework Overview


in this Unit

2. getUseBoxes returns the list of Value Boxes which contains uses of values in this

Unit

Soot provides transformations on the whole program level or method level by pro-

viding classes SceneTransformer and BodyTransformer respectively. To create a new

whole program analysis, it is enough to extend the SceneTransformer class and override

its internalTransform method.

4.2 Steps in performing slicing in Soot

1. The first step is to use Spark [70] to compute both points to information and call

graph

2. The second step is to preprocess the source code to insert additional assignment

statements that model parameter passing and make the control flow graph a single

entry, single exit graph.

3. The third step is to compute dependence graph on this processed source code

4. Given a slicing criteria, we run the two phase slicing algorithm and mark the

included nodes, from which the CFG is reconstructed using the Soot framework.

We now describe the individual steps in greater detail.

4.3 Points to Analysis and Call Graph

We have seen in Chapter 3 that call graph construction and points to sets computation

are dependent on each other. To obtain better precision, we used on-the-fly option in

Spark to compute the call graph.

The class SparkTransformer is used to compute the points to information. Spark-

Transformer is a subclass of SceneTransformer that performs points to set computation


Soot

call graph builder

points tosets

computation

receiver types

side effect analysis

compute required classes

methods

call graph

SESE Graph computation

explicit parameter

assignments are introduced

control dependence

graph computation

data dependence

graph computation

summary computation

data dependence edges

control dependence edges

call and parameter

edges

System Dependence

Graph computation

summary edges

class hierarchy information

Class Dependence Graph computation

Jimple IRbytecode orsource

CFG for required classes

Soot/SPARK classesrepresentations of input

program part of implementation

Figure 4.2: Computation of the class dependence graph


of the whole program. It is necessary to compute the points to information before the

call graph can be queried. Once the points to information is computed, the call graph

can be queried using the class CallGraph. The following code illustrates how to get the

possible methods that can be called by a particular method.

main ( ) {

/∗ load nece s sa ry c l a s s e s ∗/

/∗ s e t spark opt ions ∗/

SparkTransformer . v ( ) . t rans form (”” , opt )

SootMethod method= Scene . v ( ) . getMethodByName(” fun ” ) ;

I t e r a t o r t a r g e t s i t=p o s s i b l eC a l l e r s (method ) ;

}

I t e r a t o r p o s s b i l eC a l l e r s ( SootMethod source ) {

CallGraph cg=Scene . v ( ) . getCallGraph ( ) ;

I t e r a t o r t a r g e t s=new Targets ( cg . edgesOutOf ( s r c ) ) ;

r e turn t a r g e t s ;

}

4.4 Computing Required Classes

Most often the input to Soot framework is a jar file containing the classes to be analyzed.

Therefore Scene may contain lot of classes that are not necessary for the construction of

dependence graph. The set of required entities (classes, methods and fields) is calculated

as follows [68]

1. A set of compulsory entities such as methods and fields of the java.lang.Object

class.

2. The main method of the main class to be compiled is required.


3. If a method m is required, the following also become required: the class declaring

m, all methods that may possibly be called by m, all fields accessed in the body of

m, the classes of all local variables and arguments of m, the classes corresponding to

all exceptions that may be caught or thrown by m, and the method corresponding

to m in all required subclasses of the class declaring m.

4. If a field f is required, the following also become required: the class declaring f, the

class corresponding to the type of f if f is a reference type (not a primitive type)

and the field corresponding to f in all required subclasses of the class declaring it.

5. If a class c is required, the following also become required: all superclasses of c,

the class initialization method of c, and the instance initialization method of c.

4.5 Side effect computation

Side Effect information gives information about the memory locations read and written

by a procedure. This information becomes necessary for dependence computation. In the

following program, there is a dependence between statements x.f=1 and print(y.f).

Here a dependence exists because the reads and writes are to the same object created at

line 3. We use the side effect analysis algorithm provided in the Soot framework.

void f() {Foo x,y;x=new Foo();x.f=1;y=x;print(y.f);

}

The Side Effect Analysis algorithm uses the points to information computed by Spark

to compute the read and write sets of every statement.Spark computes that variables

x and y can point to the same object and thus the statement print(y.f) can read

from the locations written by x.f=1. Thus there is a data dependency between these

statements. The read and write sets are analogous to GMOD and GREF information


for procedural programs.

The side effect information is calculated as follows. For each statement , the algorithm

computes the sets read(s) and write(s) containing every static field s.f read (written) by

s and a pair (o,f) for every field f of object o that may be read(written) by s. These sets

also include fields read(written) by all code executed during execution of s including any

other methods that may be called directly or transitively.

4.6 Preprocessing

The flow of values in method calls implicitly caused due to parameters is made explicit

by adding additional assignment statements. This step is necessary before computing

the data dependence graph, since the additional assignment statements are also present

in the data dependence graph.

Additional statements are inserted at call sites and at the beginning of methods called

by the call sites. For this, we need to get the call graph information. If s represent a

call statement, the method edgesOutOf( Unit u ) present in CallGraph class can be

queried to get the target methods called by s. The following assignment statements are

created and inserted into the Jimple code.

1. Actual-in statements representing assignment to parameters that are read and

actual-out statements representing assignment to parameters that are written are

created at the call site. These statements are made control dependent on the call

site.

2. Formal-in statements representing assignment to parameters that are read and

Formal-out statements representing assignment to parameters that are written are

created at method entry. These statements are made control dependent on the

method entry.

Additionally, in this stage , the control flow graph represented by UnitGraph is made

single entry, single exit graph by adding unique start and end nodes. This step is


necessary because the computation of control dependence graph requires the graph to

be single entry, single exit.

Preprocessing stage becomes prerequisite for computation of data dependence and

control dependence information. However other dependence edges can be added at this

stage. Parameter in edges are added from actual in statements to the corresponding

formal in statements. Parameter out edges are added from actual out statements to

the corresponding actual out statements. Call dependence edges and edges representing

class interaction are added by using information present in the CallGraph class. Class

membership edges from the node representing a class to method entry nodes are added

for all the methods.

4.7 Computing the Class Dependence Graph

Once the Jimple source is in preprocessed form, the computation of dependence graph

is done as outlined in Chapter 2.

Algorithm 8 Computation of Class Dependence Graph

for all C , where C is a required class dofor all M , where M is a method in C do

get the UnitGraph G associated with Mcompute Control dependence graph (CDG) of Gcompute Data dependence graph of (DDG) G { If M ’s representation from theparent class can be reused, then there is no need to build CDG and DDG of M }build summary edges for M;

end forend for

Computation of data dependence graph for simple local variables is done by comput-

ing reaching definitions using the class SimpleLocalDefs. This class takes the UnitGraph

of the method as input and computing the definitions reaching at a particular point. The

definitions reaching a program point (def boxes) can be queried using getDefsOfAt

function. These definitions are paired with the uses at the program point. The use

boxes of the current statement can be queried using getUseBoxes. Data dependence


edges are added from def boxes reaching the current statement to use boxes in the

current statement.

Apart from the dependence arising due to simple local variables, another kind of de-

pendence arises due to presence of side effects. There is a dependence between statement

S1 and S2 if the there is an intersection between the write set of S1 and read set of S2.

The computation of Control Dependence Graph and Summary edges are discussed

in Chapter 2. Once the class dependence graph is computed, the two phase slicing

algorithm is used to compute the slice.

4.8 Experimental Results

We computed dependence graphs for some programs from SourceForge and Spec JVM

98 benchmarks suite. All the analysis was performed on 3.20 GHz Intel Pentium 4

processor with 1 GB RAM. Table 4.1 gives the benchmark characteristics. Table 4.2

gives the information about the number of different edges in the dependence graph.

Table 4.3 gives the time required for computation of the dependence graph. It also

shows the the average time for running the slicing algorithm and size of the slice is

calculated for a set of slicing criteria. The number of summary edges seems to be the

determining factor of time taken for dependence graph computation. Table 4.4 gives the

memory and time requirements for implementing our partially flow sensitive algorithm

in the intraprocedural case. Incorporating partial flow sensitivity reduces the points to

sets as compared to the flow insensitive Andersen analysis. This information is given in

Table 4.5

Figure 4.3 shows the input Jimple program and the sliced version obtained when line

16 is given as the slicing criteria.


Benchmark bytecode description classes methods statementssize (kb)

jlex 96 Lexer generator for Java 26 164 8230junit 193 Java Unit Testing 100 591 6159

mpegaudio-7 409 MPEG decoder 154 915 20659nfc 814 Distributed Chat 224 1550 20364

jgraph 312 Graph drawing component 90 1423 21534compress 16 Modified Lempel Ziv method 37 288 6274

db 12 Memory resident database 28 278 6275check 36 Checker for JVM features 42 352 7714jess 447 Java Expert Shell System 288 1796 28197

raytrace 56 Ray tracing 50 420 9023

Table 4.1: Benchmarks Description

Benchmark nodes data control param-in param-out summary calledges edges edges edges edges edges

jlex 8230 12450 8055 672 504 3181 598junit 6159 9010 9847 759 424 4017 902

mpegaudio-7 20659 34338 19632 1516 1178 59271 2188nfc-chat 20364 30745 27438 2196 976 54266 2089jgraph 21534 37420 26437 1816 2068 36123 2158

compress 6274 9199 7334 322 302 1295 372db 6275 9170 7368 303 117 880 357

check 7714 10476 9260 440 406 3809 463jess 28197 46101 35412 3397 4525 114245 4908

raytrace 9023 14842 10989 755 782 4108 308

Table 4.2: Number of Edges in the Class Dependence Graph

Name Dependence graph Slicing time Slice Sizecomputation time (sec) (sec)

jlex 15 1 70junit 15 1 48

mpegaudio-7 242 2 173nfc-chat 220 2 180jgraph 211 1 66

compress 21 2 41db 23 1 58

check 25 1 42jess 332 2 165

raytrace 35 1 46

Table 4.3: Timing Requirements


Name Load time Analysis time Memory used(seconds) (seconds) (MB)

jlex 22 6 55junit 10 3 45

mpegaudio-7 58 9 75nfc-chat 107 15 80jgraph 37 10 66

compress 3 2 45db 3 2 28

check 5 4 45jess 32 13 65

raytrace 9 4 48

Table 4.4: Program Statistics - Partial Flow Sensitive

Benchmark points to sets points to sets percentagePFS Andersen reduction

jlex 3711 3998 7.1junit 2529 2762 8.4

mpegaudio-7 7235 7270 0.4nfc-chat 8363 9124 8.3jgraph 6847 7229 5.2

compress 3179 4261 25.3db 3068 4126 25.6

check 3327 4375 23.9jess 8557 8842 3.2

raytrace 4170 5223 20.1

Table 4.5: Precision Comparison


1 : args := @parameter0: java.lang.String[]

2 : FI:args = args

3 : sum = 0

4 : i = 1

5 : product = 1

6 : goto [?= (branch)]

7 : sum = sum + i

8 : product = product * i

9 : i = i + 1

10 : if i < 11 goto sum = sum + i

11 : $r0 = <java.lang.System: java.io.PrintStream out>

12 : AI:sum_ = sum

13 : virtualinvoke $r0.<java.io.PrintStream: void print(int)>(sum_)


15 : AI:product_ = product

16 : virtualinvoke $r0.<java.io.PrintStream: void print(int)>(product_)


18 : AI:i_ = i

19 : virtualinvoke $r0.<java.io.PrintStream: void print(int)>(i_)

20 : return

The Slice obtained

args := @parameter0: java.lang.String[]

FI:args = arg

i = 1

product = 1

goto [?= (branch)]

product = product * i

i = i + 1

if i < 11 goto product = product * i

$r0 = <java.lang.System: java.io.PrintStream out>

AI:product_ = product

virtualinvoke $r0.<java.io.PrintStream: void print(int)>(product_)

return

Figure 4.3: Jimple code and its slice

Chapter 5

Conclusion and Future Work

In this thesis, we have described the implementation of a slicing tool for Java programs.

We first describe the implementation of the two phase interprocedural slicing algorithm

by Horwitz et.al. [63]. We then discuss the issues in computing the dependencies of ob-

ject oriented programs. Computation of data dependencies in object oriented programs

requires the computation of side effect information. We then describe the computation

of dependence graph in presence of inheritance and polymorphism.

We use SPARK framework for computing side effect analysis and call graph construc-

tion. Both side effect analysis and call graph construction requires the computation of

points to information. We describe Lhotak’s algorithm [70] for computing points to sets

which is implemented in SPARK. We have implemented an intraprocedural algorithm

that enhances flow sensitivity while maintaining minimal additional information.

We next discuss the limitations of our slicing tool and possible scope for future work.

To support a slicer that can handle the entire Java language requires handling of threads,

exceptions and reflection. Dependence between statements in multi threaded programs

is not transitive. Krinke [49] propose algorithms for slicing multi threaded programs.

Handling of exceptions is described by Allen et.al. [44]. Features such as reflection and

75

Chapter 5. Conclusion and Future Work 76

dynamic class loading, which allows classes to be loaded at runtime complicate depen-

dence computation.

We have run our slicing tool on a set of benchmarks and have reported statistics on the

size and time required for construction of class dependence graphs. In our experiments,

we found that the time required for computing the dependence graph is dominated by

summary edge computation phase. Improvements to the summary computation algo-

rithm can vastly decrease the time for computing dependence graph.

Bibliography

[1] A.V. Aho, R. Sethi, and J.D. Ullman, M.Lam Compilers: Principles, Techniques,

and Tools. , Addison-Wesley

[2] Kildall, G. A. , A unified approach to global program optimization , in Proc. First

Annual ACM SIGACT-SIGPLAN Symp. Principles of Programming Languages.

1973

[3] W. Landi and B. G. Ryder. A safe approximate algorithm for interprocedural pointer

aliasing , ACM SIGPLAN Notices 1992

[4] M. Emami, R. Ghiya, and L. J. Hendren. Context-sensitive interprocedural points-

to analysis in the presence of function pointers. In Proceedings of PLDI94, pages

242-256, 1994.

[5] L. O. Andersen. Program Analysis and Specialization for the C Programming Lan-

guage. PhD thesis, University of Copenhagen, DIKU, 1994.

[6] Alexander Aiken. Introduction to set constraint-based program analysis. Science of

Computer Programming, 35(23):79111, 1999.

[7] Ondrej Lhotak, Laurie Hendren. Scaling Java Points-To Analysis using SPARK

In Proceedings of the conference on Compiler Construction (CC), volume 2622 of

Lecture Notes in Computer Science, pages 153–169. Springer-Verlag, April 2003.

[8] M. Hind. Pointer analysis: Havent we solved this problem yet? In Proceedings of

PASTE01, pages 54-61, June 2001.

77

BIBLIOGRAPHY 78

[9] A. Diwan, K. S. McKinley, and J. E. B. Moss. Type-based alias analysis. In Pro-

ceedings of PLDI98, pages 106-117, 1998.

[10] M. Fahndrich, J. S. Foster, Z. Su, and A. Aiken. Partial online cycle elimination in

inclusion constraint graphs. In Proceedings of PLDI98, pages 85-96, June 1998.

[11] N. Heintze and O. Tardieu. Ultra-fast aliasing analysis using CLA: A million lines

of C code in a second. In Proceedings of PLDI01, volume 36.5 of ACM SIGPLAN

Notices, pages 254-263, June 2001.

[12] Atanas Rountev and Satish Chandra. Off-line variable substitution for scaling

points-to analysis. In PLDI, 2000

[13] B. Steensgaard. Points-to analysis in almost linear time. In Conference Record of

23rd POPL96, pages 32-41, Jan. 1996.

[14] David J. Pearce. Some directed graph algorithms and their application to pointer

analysis. Ph.D thesis. University of London Imperial College of Science, Technology

and Medicine Department of Computing Feb.2005

[15] Ondrej Lhotak. Spark: A flexible points-to analysis framework for Java. Master ’s

thesis, McGill University, December 2002

[16] R. Hasti and S.Horwitz. Using static single assignment form to improve flow-

insensitive pointer analysis. In SIGPLAN 98: Conference on Programming Lan-

guage Design and Implementation, (Montreal, Canada, June 1998).

[17] Barbara G. Ryder. Dimensions of Precision in Reference Analysis of Object-

oriented Programming Languages In CC, pages 126–137, 2003.

[18] John Whaley and Monica S. Lam. Cloning-based context-sensitive pointer alias

analysis using Binary Decision Diagrams. In Proceedings of the ACM conference

on Programming Language Design and Implementation (PLDI), pages 131–144.

ACM Press, June 2004.

BIBLIOGRAPHY 79

[19] Manu Sridharan, Denis Gopan, Lexin Shan, Rastislav : Demand-driven points-to

analysis for Java. OOPSLA 2005

[20] Ana Milanova, Atanas Rountev, and Barbara Ryder. Parameterized object sensi-

tivity for points-to and side-effect analyses for Java. In Proceedings of the ACM

International Symposium on Software Testing and Analysis (ISSTA), pages 1–11.

ACM Press, July 2002.

[21] R.E.Tarjan Fast algorithms for solving Path problems. Journal of the

ACM,3(28):591–642,July 1981.

[22] R.E. Tarjan. Efficiency of a good by not linear set union algorithm. J. ACM, 22:215-

225, 1975.

[23] Steven W. K. Tjiang and John L. Hennessy. Sharlit A tool for building optimizers.

PLDI 1992.

[24] Marc Shapiro and Susan Horwitz. Fast and accurate flow-insensitive points-to anal-

ysis. In Proceedings of the Symposium on Principles of Programming Languages

(POPL), pages 1–14. ACM Press, January 1997.

[25] Vijay Sundaresan and Laurie J. Hendren and Chrislain Razafimahefa and Raja

Vallee-Rai and Patrick Lam and Etienne Gagnon and Charles Godin Practical vir-

tual method call resolution for Java. OOPSLA 2000

[26] D. Bacon and P. Sweeney. Fast Static Analysis Of C++ Virtual Function Calls.

Proceedings of the ACM SIGPLAN 96 Conference on Object-Oriented Programming

Systems, Languages and Applications, San Jose, USA, October 1996, pp. 324–341.

[27] J. Dean, D. Grove, and C. Chambers. Optimization Of Object-Oriented Programs

Using Static Class Hierarchy Analysis. Proceedings of the 9th European Conference

on Object-Oriented Programming, Aarhus, Denmark, August 1995, Springer-Verlag

LNCS 952, pp. 77–101.

BIBLIOGRAPHY 80

[28] A. Rountev, A. Milanova, and B. Ryder. Points-to Analysis For Java Using Anno-

tated Inclusion Constraints.

[29] R. Wilson and M. Lam. Efficient Context-Sensitive Pointer Analysis For C Pro-

grams. Proceedings of the ACM SIGPLAN 95 Conference on Programming Lan-

guage Design and Implementation, La Jolla, USA, June 1995, pp. 1–12.

[30] M. Das. Unification-Based Pointer Analysis With Directional Assignments. Pro-

ceedings of the ACM SIGPLAN 00 Conference on Programming Language Design

and Implementation, Vancouver, Canada, June 2000, pp. 35–46.

[31] Marc Berndl, Ondrej Lhotak, Feng Qian, Laurie J. Hendren, and Navindra Umanee.

Points-to analysis using BDDs. In Proceedings of the ACM conference on Program-

ming Language Design and Implementation (PLDI), pages 196–207. ACM Press,

June 2003.

[32] M. Sridharan, D. Gopan, L. Shan, and R. Bodik. Demand-driven points-to analysis

for Java. In Conference on Object-Oriented Programming, Systems, Languages, and

Applications (OOPSLA), 2005.

[33] A. Diwan, J. Moss, and K. McKinley. Simple And Effective Analysis Of Statically-

Typed Object-Oriented Programs. Proceedings of the ACM SIGPLAN ’96 Conference

on Object-Oriented Programming Systems, Languages and Applications, San Jose,

USA, October 1996, pp. 292–305.

[34] John Whaley and Monica S. Lam. An Efficient Inclusion-Based Points-To Analysis

for Strictly-Typed Languages. SAS 2002

[35] Karl J. Ottenstein and Linda M. Ottenstein. The program dependence graph in a

software development environment. In Proceedings of the ACM SIGSOFT/SIG-

PLAN Software Engineering Symposium on Practical Software Development Envi-

ronments, volume 19(5) of ACM SIGPLAN Notices, pages 177–184, 1984.

BIBLIOGRAPHY 81

[36] Susan B. Horwitz, Thomas W. Reps, and David Binkley. Interprocedural slicing

using dependence graphs. ACM Transactions on Programming Languages and Sys-

tems, 12(1):26–60, January 1990.

[37] T. Ball and S. Horwitz. Slicing programs with arbitrary control flow. In Lecture Notes

in Computer Science, volume 749, New York, NY, November 1993. Springer-Verlag.

[38] J. Choi and J. Ferrante. Static slicing in the presence of goto statements. ACM

Trans. on Programming Languages and Systems, 16(4):1097-1113, July 1994.

[39] Sumit Kumar and Susan Horwitz. Better slicing of programs with jumps and

switches. In Proceedings of FASE 2002: Fundamental Approaches to Software

Engineering, volume 2306 of Lecture Notes in Computer Science, pages 96–112.

Springer, 2002.

[40] Thomas Reps, Susan Horwitz, Mooly Sagiv, and Genevieve Rosay. Speeding up

slicing. In Proceedings of the ACM SIGSOFT ’94 Symposium on the Foundations

of Software Engineering, pages 11–20, 1994.

[41] M. Sharir and A. Pnueli. Two approaches to interprocedural data flow analysis.

In S.S. Muchnick and N.D. Jones, editors, Program Flow Analysis: Theory and

Applications, chapter 7, pages 189-234. Prentice-Hall, Englewood Cliffs, NJ, 1981.

Cytron,

[42] Banning, J. P. An efficient way to find the side effects of procedure calls and the

aliases of variables. In Proceedings of the 6th Annual ACM Symposium on Princi-

ples of Programming Languages , ACM, New York, 29–41. (Jan. 1979)

[43] Cooper, K. D., and Kennedy, K. Efficient computation of flow-insensitive interpro-

cedural summary information. In Proceedings of the SIGPLAN 84 Symposium on

Compiler Construction; SIGPLAN Not. 19,6 , 247–258.(June 1984)

[44] Randy Allen and Ken Kennedy Optimizing Compilers for Modern Architectures

Elsevier Publications

BIBLIOGRAPHY 82

[45] Frank Tip. A survey of program slicing techniques. Journal of programming lan-

guages, 3(3), September 1995.

[46] Ron Cytron, Jeanne Ferrante, Barry K. Rosen, Mark N. Wegman, and F. Ken-

neth Zadeck. Efficiently computing static single assignment form and the control

dependence graph. ACM Transactions on Programming Languages and Systems,

13(4):451-490, 1991.

[47] Jeanne Ferrante, Karl J. Ottenstein, and Joe D. Warren. The program dependence

graph and its use in optimization. ACM Transactions on Programming Languages

and Systems, 9(3):319-349, July 1987.

[48] Keith B. Gallagher, Notes on interprocedural slicing Proceedings of the Fourth IEEE

International Workshop on Source Code Analysis and Manipulation (SCAM04)

[49] Jens Krinke. Advanced Slicing of Sequential and Concurrent Programs Ph.D Thesis.

Universitat Passau April 2003

[50] L. Larsen and M. J. Harrold. Slicing object-oriented software. In 18th International

Conference on Software Engineering, pages 495- 505, 1996.

[51] Donglin Liang and Mary Jean Harrold. Slicing objects using system dependence

graphs. In Proceedings of the International Conference On Software Maintanence,

pages 358-367, 1998.

[52] Jianjun Zhao, Applying program dependence analy-sis to Java software, Proceedings

of Workshop on Software Engineering and Database Systems, pp. 162–169, 1998.

[53] Hiralal Agrawal, Richard A. DeMillo, and Eugene H. Spafford. Dynamic slicing in

the presence of unconstrained pointers. In Symposium on Testing, Analysis, and

Verification, pages 60-73, 1991.

[54] K.J. Ottenstein and L.M. Ottenstein. The program dependence graph in a software

BIBLIOGRAPHY 83

development environment. In Proceedings of the ACM SIGSOFT/SIGPLAN Soft-

ware Engineering Symposium on Practical Software Development Environments,

pages 177-184, 1984. SIGPLAN Notices 19(5).


Conference on Software Engineering, pages 495- 505, 1996.

[56] Thomas Reps, Susan Horwitz, Mooly Sagiv, and Genevieve Rosay. Speeding up

slicing. In Proceedings of the ACM SIGSOFT ’94 Symposium on the Foundations

of Software Engineering, pages 11–20, 1994.


graphs. In Proceedings of the International Conference On Software Maintenance,

pages 358-367, 1998.

[58] Jianjun Zhao, Applying program dependence analysis to Java software, Proceedings

of Workshop on Software Engineering and Database Systems, pp. 162–169, 1998.

[59] Paolo Tonella, Giuliano Antoniol, Roberto Fiutem, and Ettore Merlo, Flow insen-

sitive C++ pointers and polymorphism analysis and its application to slicing , In

International Conference on Software Engineering, pp. 433–443, 1997.

[60] Chrislain Razafimahefa, A study of side effect analysis for Java M.Sc Thesis, McGill

University, 1999

[61] Mark Weiser. Program slicing. IEEE Transactions on Software Engineering,

10(4):352-357, July 1984.

[62] Steven S. Muchnick. Advanced Compiler Design and Implementation. Morgan Kauf-

mann Publishers, San Francisco, CA, 1997.

[63] Susan B. Horwitz, Thomas W. Reps, and David Binkley. Interprocedural slicing

using dependence graphs. ACM Transactions on Programming Languages and Sys-

tems, 12(1):26–60, January 1990.

BIBLIOGRAPHY 84

[64] L. O. Andersen. Program Analysis and Specialization for the C Programming Lan-

guage. PhD thesis, University of Copenhagen, DIKU, 1994.



Lecture Notes in Computer Science, pages 153.169. Springer-Verlag, April 2003.


Conference on Software Engineering, pages 495-505, 1996.


graphs. In Proceedings of the International Conference On Software Maintanence,

pages 358-367, 1998.

[68] Ankush Varma, A Retargetable Optimizing Java-to-C Compiler for Embedded Sys-

tems M.Sc Thesis

[69] R. Vallee-Rai, P. Co, E. Gagnon, L. Hendren, P. Lam, and V. Sundaresan. Soot -

a java bytecode optimization framework. In CASCON 99: Proceedings of the 1999

conference of the Centre for Advanced Studies on Collaborative research, page 13.

IBM Press, 1999. The framework is available from www.sable.mcgill.ca



Lecture Notes in Computer Science, pages 153–169. Springer-Verlag, April 2003.

[71] Jelte Jansen. Slicing Midlets Technical Report

[72] Durga Prasad Mohapatra, Rajib Mall, Rajeev Kumar An Overview of Slicing Tech-

niques for Object-Oriented Programs Informatica 30 (2006) 253-277.

[73] Baowen Xu, Ju Qian, Xiaofang Zhang, Zhongqiang Wu ,Lin Chen A Brief Survey

Of Program Slicing ACM SIGSOFT Software Engineering Notes. 2005

BIBLIOGRAPHY 85

[74] Keith Brian Gallagher and James R. Lyle, Using program slicing in software main-

tenance, IEEE Transactions on Software Engineering, 17(8), pp. 751-761, 1991.

[75] Samuel Bates and Susan Horwitz, Incremental program testing using program de-

pendence graphs, ACM Symposium on Principles of Programming Languages, pp.

384-396, 1993

[76] Mangala Gowri Nanda and S. Ramesh, Slicing con-current programs ,Software En-

gineering Notes, 25(5), pp. 180-190, 2000.

[77] Srihari Sukumaran, Ashok Sreenivas: Identifying Test Conditions for Software

Maintenance. CSMR 2005.

[78] Thomas Reps and Wuu Yang, The semantics of program slicing and program in-

tegration, In Proceedings of the Colloquium on Current Issues in Programming

Languages, 352 of Lecture Notes in Computer Science, pp. 360-374, Springer 1989.

[79] John Hatcliff, Matthew B. Dwyer, and Hongjun Zheng, Slicing software for model

construction , Higher-Order and Symbolic Computation, 13(4), pp. 315-353, 2000.

[80] Ranganath, V.P. Object-Flow Analysis for Optimizing Finite-State Models of Java

Software Masters thesis, Kansas State University (2002)

[81] http://indus.projects.cis.ksu.edu/

[82] Panos E. Livadas and Scott D. Alden, A toolset for program understanding , Pro-

ceedings of the IEEE Second Workshop on Program Comprehension, 1993.

[83] James R. Lyle, Evaluating variations of program slicing for debugging , PhD thesis,

University of Maryland, College Park, Maryland, Dec. 1984.