1 Presented By Krishna Balasubramanian Lightweight Defect Localization for Java Valentin Dallmeier, Christian Lindig, and Andreas Zeller

11

Presented ByPresented ByKrishna BalasubramanianKrishna Balasubramanian

Lightweight Defect Lightweight Defect Localization for JavaLocalization for Java

VValentin Dallmeier, Christian alentin Dallmeier, Christian Lindig, and Andreas ZellerLindig, and Andreas Zeller

22

ContentsContents IntroductionIntroduction

– Coverage method for defect localizationCoverage method for defect localization– Why use Call Sequences?Why use Call Sequences?

Motivation: Experiment performed Motivation: Experiment performed

– Defect Indicated by Call SequencesDefect Indicated by Call Sequences

Approach: Summarizing Call SequencesApproach: Summarizing Call Sequences

– Deriving Call Sequences from TracesDeriving Call Sequences from Traces– Going from Objects to ClassesGoing from Objects to Classes– Incoming vs Outgoing CallsIncoming vs Outgoing Calls– Collecting TracesCollecting Traces– Overhead of the methodOverhead of the method

33

IntroductionIntroduction

Defect Localization is time consumingDefect Localization is time consuming

Compare coverage of passing and failing Compare coverage of passing and failing runsruns

Method executed only on failing runs is Method executed only on failing runs is defectivedefective

This might not always be the caseThis might not always be the case

44

Certain failures occur only through a sequence of method Certain failures occur only through a sequence of method calls tied to an objectcalls tied to an object

For streams in Java:For streams in Java:– Destructor closes stream after usage if not done so Destructor closes stream after usage if not done so

alreadyalready

– File handles run out if too many left openFile handles run out if too many left open

– A call to read() --> finalize() -X-> close()A call to read() --> finalize() -X-> close()

– Defect OccursDefect Occurs

Indicated by a sequence of method calls.Indicated by a sequence of method calls.

Call Sequences pointing to defectCall Sequences pointing to defect

55

3 Questions Explored3 Questions Explored

1.1. Sequences of Method Calls vs single calls?Sequences of Method Calls vs single calls?

2.2. Per Object vs Global collection?Per Object vs Global collection?

3.3. Defect indication in Callee or Caller?Defect indication in Callee or Caller?

Experiment conducted Experiment conducted – Instrumented a Java program Instrumented a Java program – Collected sequences on a per-object basisCollected sequences on a per-object basis

66

ContributionsContributions

Results:Results:– Sequences better defect predictors Sequences better defect predictors

– Per-object sequences better defect Per-object sequences better defect predictorspredictors

– Caller more likely to be defective than Caller more likely to be defective than CalleeCallee

– Lightweight Lightweight performance comparable to performance comparable to coverage-based approachescoverage-based approaches

77

Defect Indication using Call SequencesDefect Indication using Call Sequences

16 aspect Log { 17 pointcut assign(Object newval, Object targ):18 set(* test..*) && args(newval) && target(targ);1920 before(Object newval, Object targ): assign(newval,targ) { 21 Signature sign = thisJoinPoint.getSignature();22 System.out.println(targ.toString() + "." + sign.getName() +23 ":=" + newval);24 } 2526 pointcut tracedCall():27 call(* test..*(..)) && !within(Log);2829 after() returning (Object o): tracedCall() { 30 // Works if you comment out either of these two lines31 thisJoinPoint.getSignature();32 System.out.println(thisJoinPoint);33 } 34 }

Figure 1: Part of an AspectJ program that causes the Java Virtual Machine to crash

JVM crashes when AspectJ JVM crashes when AspectJ program compiled -> rfailprogram compiled -> rfail

AspectJ compiler has 2929 AspectJ compiler has 2929 classesclasses

Passing run : Comment Passing run : Comment line 32 -> rpassline 32 -> rpass

Compare rfail and rpass Compare rfail and rpass Method getThisJoin-PointVar() Method getThisJoin-PointVar()

of Class BcelShadow called of Class BcelShadow called only in rfailonly in rfail

BcelShadow.getThisJoinPoiBcelShadow.getThisJoinPointVar() potential ntVar() potential candidate with defectcandidate with defect

Proved to be incorrectProved to be incorrect

Bug fixed elsewhereBug fixed elsewhere

88

Failure Failure Sequence of method calls Sequence of method calls only in only in rfailrfail

Sequences collected per Object for Sequences collected per Object for Incoming/Outgoing callsIncoming/Outgoing calls

Sequence of outgoing calls for object Sequence of outgoing calls for object ThisJoinPointVisitor collected only in rfail:ThisJoinPointVisitor collected only in rfail:

ThisJoinPointVisitor.isRef(),ThisJoinPointVisitor.isRef(), ThisJoinPointVisitor.canTreatAsStatic(),ThisJoinPointVisitor.canTreatAsStatic(), MethodDeclaration.traverse(),MethodDeclaration.traverse(), ThisJoinPointVisitor.isRef(),ThisJoinPointVisitor.isRef(), ThisJoinPointVisitor.isRef()ThisJoinPointVisitor.isRef()

AspectJ bug fixed in class ThisJoinPointVisitor !!AspectJ bug fixed in class ThisJoinPointVisitor !!

Difference in coverage may not point to a Difference in coverage may not point to a defectdefect

Difference in call sequences may do so.Difference in call sequences may do so.

Defect Indication using Call SequencesDefect Indication using Call Sequences

99

Performance MeasuresPerformance Measures

Comparison yielded 556 differing Comparison yielded 556 differing sequences of length 5.sequences of length 5.

Originating Class of sequence is Originating Class of sequence is determined.determined.

Each Sequence is assigned a weightEach Sequence is assigned a weight

Classes with most important sequences Classes with most important sequences are ranked at the top.are ranked at the top.

1010

Class ThisJoinPointVisitor ranked 10 out of 542 Class ThisJoinPointVisitor ranked 10 out of 542 executed classesexecuted classes

Programmer has to examine only:Programmer has to examine only:– 1.8% of the executed classes 1.8% of the executed classes – 3.3% of the executed code3.3% of the executed code– 0.3% of the total classes0.3% of the total classes– 0.8% of the entire code0.8% of the entire code

ThisJoinPointVisitor class not included in Coverage ThisJoinPointVisitor class not included in Coverage MethodMethod

This is worse than a random guess!This is worse than a random guess!

ObservationsObservations

1111

Approach: Summarizing Call SequencesApproach: Summarizing Call Sequences

Object receives & initiates millions of method Object receives & initiates millions of method callscalls

Means required to capture and summarize this.Means required to capture and summarize this.

Approach:Approach:– Record observed sequences in sets, not the full trace.Record observed sequences in sets, not the full trace.

– Sequence sets are aggregated per class Sequence sets are aggregated per class

– Incoming and Outgoing calls consideredIncoming and Outgoing calls considered

– Overhead in collecting and analyzing traces kept to a Overhead in collecting and analyzing traces kept to a minimum.minimum.

1212

Trace: Recording of all calls an object Trace: Recording of all calls an object receivesreceives

Recording whole trace becomes Recording whole trace becomes unmanageable due to the large number of unmanageable due to the large number of calls received by the objectcalls received by the object

Abstract representation: Abstract representation: – Record only Record only characteristic sequencescharacteristic sequences of trace of trace

Deriving Call Sequences from Traces

1313

Figure 2: The trace of an object is abstracted to a sequence set using a sliding window.Figure 2: The trace of an object is abstracted to a sequence set using a sliding window.

1.1. Window is slid over trace to get the sequence set Window is slid over trace to get the sequence set 2.2. Window content characterizes the trace.Window content characterizes the trace.3.3. Wider window gives a more precise characteristic set.Wider window gives a more precise characteristic set.

Deriving Call Sequences from Traces

1414

Let trace S be defined as string of calls: Let trace S be defined as string of calls: < m1, m2, … mn> < m1, m2, … mn>

Window size = k, sequence set P(S,k): Window size = k, sequence set P(S,k): PP((SS, , kk) = {w | w is a substring of ) = {w | w is a substring of S S ^ |w| = ^ |w| = kk}}

Example,Example, Window size k =2Window size k =2

Trace S = <abcabcdc>Trace S = <abcabcdc>Then, the resulting set of sequences isThen, the resulting set of sequences isP(S,2) = {ab, bc, ca, cd, dc}P(S,2) = {ab, bc, ca, cd, dc}

Different traces may lead to same setDifferent traces may lead to same set

For trace T = <abcdcdca>,For trace T = <abcdcdca>,P(S,2) = P(T,2)P(S,2) = P(T,2)

Deriving Call Sequences from Traces: Effect of Window Size

1515

Trace Trace Sequence set: Entails a loss of information Sequence set: Entails a loss of information

Window size is importantWindow size is important

Window size k >2, Window size k >2, P(S,k) != P(T,k)P(S,k) != P(T,k)

Context sensitivity of approach dependent on Window size.Context sensitivity of approach dependent on Window size.

Exponential growth of sequence sets does not happenExponential growth of sequence sets does not happen– Method calls do not happen randomly.Method calls do not happen randomly.– They are part of static code with loops that lead to similar They are part of static code with loops that lead to similar

sequences of calls.sequences of calls.

Underlying regularity make sequence sets a useful and compact Underlying regularity make sequence sets a useful and compact abstractionabstraction

Deriving Call Sequences from Traces: Effect of Window Size

1616

Going from Objects to ClassesGoing from Objects to Classes Collecting call traces of objects individually pose memory Collecting call traces of objects individually pose memory

issuesissues

Traces collected at class levelTraces collected at class level

Objects are traced individually Objects are traced individually

Sequence sets of objects are aggregatedSequence sets of objects are aggregated

Traces X and Y of two objects areTraces X and Y of two objects areX = < a b c d d c >; Y = <a a b c a b > X = < a b c d d c >; Y = <a a b c a b >

P(X, 2) = {ab, bc, cd, dd, dc}P(X, 2) = {ab, bc, cd, dd, dc} P(Y, 2) = {ab, bc, ca, aa}P(Y, 2) = {ab, bc, ca, aa} P(X, 2) U P(Y, 2) = {aa, ab, bc, cd, dd, dc, ca}P(X, 2) U P(Y, 2) = {aa, ab, bc, cd, dd, dc, ca}

Union characterizes the behavior of the class.Union characterizes the behavior of the class.

To compare classes in Passing and Failing sequence runs, To compare classes in Passing and Failing sequence runs, Sequence sets are compared.Sequence sets are compared.

1717

Incoming vs Outgoing CallsIncoming vs Outgoing Calls An object receives incoming calls An object receives incoming calls

– Trace says how the object is used by the clientTrace says how the object is used by the client

An object initiates outgoing callsAn object initiates outgoing calls

– Trace says how the object is implementedTrace says how the object is implemented

Both traces are used to detect control flow deviations Both traces are used to detect control flow deviations between a passing and a failing runbetween a passing and a failing run

However, they differ in their ability to relate deviations However, they differ in their ability to relate deviations to defects.to defects.

1818

How do they differ?How do they differ?

1919

Incoming vs Outgoing CallsIncoming vs Outgoing Calls : An Example : An Example

Figure 3: Traces of incoming calls (left) and outgoing calls (right) for the Figure 3: Traces of incoming calls (left) and outgoing calls (right) for the aQueue object.aQueue object.

Queue object aQueue Queue object aQueue implemented as a implemented as a Linked ListLinked List

Incoming calls are:Incoming calls are:– add()add()– get()get()

Outgoing calls are:Outgoing calls are:– add()add()– firstElement()firstElement()– removeFirst()removeFirst()

2020

Incoming CallsIncoming Calls Incoming Calls determine client behaviorIncoming Calls determine client behavior

Detects non-conforming clients.Detects non-conforming clients.

Client behavior recorded as sequence sets Client behavior recorded as sequence sets

Class of receiving object is known Class of receiving object is known only method only method names are recorded.names are recorded.

Trace of incoming calls for the aQueue object:Trace of incoming calls for the aQueue object: <add(), isEmpty(), . . . , add(), add()><add(), isEmpty(), . . . , add(), add()>

Deviation is detected … Deviation is detected …

2121

AdvantagesAdvantages Number of methods calls an Object receives is Number of methods calls an Object receives is

restricted by its classrestricted by its class– Leads to smaller tracesLeads to smaller traces– Window size tuned to the number of methodsWindow size tuned to the number of methods

Class behavior can be learnt across different Class behavior can be learnt across different applicationsapplications

LimitationLimitation

* * Difficult to identify the client that causes the Difficult to identify the client that causes the deviation deviation

Led to analyzing Outgoing callsLed to analyzing Outgoing calls

Incoming CallsIncoming Calls

2222

Outgoing CallsOutgoing Calls Method calls for aQueue:Method calls for aQueue:

<LinkedList.add(), LinkedList.size(), Logger.add(), . . . ><LinkedList.add(), LinkedList.size(), Logger.add(), . . . >

Object calls several classesObject calls several classes

Method names are no longer uniqueMethod names are no longer unique

Class Name and Method name recorded in traceClass Name and Method name recorded in trace

Detection of a sequence not in learned set leads to the Detection of a sequence not in learned set leads to the Queue classQueue class

Trace of Outgoing Calls guides the programmer to the Trace of Outgoing Calls guides the programmer to the defectdefect

2323

Collecting TracesCollecting Traces Before execution of program, bytecode is instrumentedBefore execution of program, bytecode is instrumented

Program collects traces, computes the sequence sets Program collects traces, computes the sequence sets and emits them in XML formatand emits them in XML format

Analyzing sequence sets is done offlineAnalyzing sequence sets is done offline

Use Bytecode Engineering Library for instrumentationUse Bytecode Engineering Library for instrumentation

Requires only programs class filesRequires only programs class files

Works on any JVMWorks on any JVM

2424

Each object builds a trace of its Incoming & Outgoing callsEach object builds a trace of its Incoming & Outgoing calls

Trace data stored in global hash tablesTrace data stored in global hash tables

Indexed by an object’s identityIndexed by an object’s identity

Each object creates a unique integer for identification in its Each object creates a unique integer for identification in its constructorconstructor

Incoming callIncoming call– Callee adds its name and signature to its own traceCallee adds its name and signature to its own trace

Outgoing callOutgoing call– Callee adds its name, signature and class to the callers traceCallee adds its name, signature and class to the callers trace– Requires caller’s id for thisRequires caller’s id for this

Collecting TracesCollecting Traces

2525

Example of instrumentation to trace Example of instrumentation to trace Outgoing callsOutgoing calls

class Caller extends Object { class Caller extends Object { class Callee extends Object {class Callee extends Object {

…… ......

public void m() { public void m() { public void message(Object x) {public void message(Object x) {

Callee c;Callee c; Tracer.addCallTracer.addCall

...... (({ { message id for Callee.message})message id for Callee.message});;

Tracer.storeCaller(this.id);Tracer.storeCaller(this.id); { {body of message }body of message }

c.message(anObject);c.message(anObject); } }

{ { body of m body of m }}}} }}

}}Figure 4 Instrumentation of caller and callee to capture outgoing calls.Figure 4 Instrumentation of caller and callee to capture outgoing calls.

1.1. Id of caller stored in Tracer in method Caller.m before Id of caller stored in Tracer in method Caller.m before invocation of callee.messageinvocation of callee.message

2.2. At the start of Callee.message, Tracer.addCall adds the At the start of Callee.message, Tracer.addCall adds the method id of Callee.message to the trace of the calling method id of Callee.message to the trace of the calling objectobject

3.3. Hence addCall only receives the message idHence addCall only receives the message id

4.4. This is an integer key associated with a method, its class, This is an integer key associated with a method, its class, and signature.and signature.

2626

Original trace is not stored due to large sizeOriginal trace is not stored due to large size

Sequence set for each class is computed onlineSequence set for each class is computed online

Sequence sets are Sequence sets are – Small in sizeSmall in size– Kept in memoryKept in memory– Emitted when the program quitsEmitted when the program quits

LimitationsLimitations– Window size must be fixed for a program runWindow size must be fixed for a program run

Sequence sets for many window sizes could be Sequence sets for many window sizes could be computed offline from a raw tracecomputed offline from a raw trace

– Trace is ordered, sequence set is notTrace is ordered, sequence set is not Some of the trace’s inherent notion of time is lostSome of the trace’s inherent notion of time is lost

Generating Sequence SetsGenerating Sequence Sets

2727

Evaluation of overheadEvaluation of overhead

Evaluation done on programs from SPEC Evaluation done on programs from SPEC JVM 98 benchmark suiteJVM 98 benchmark suite

SPEC JVM 98 benchmark suite SPEC JVM 98 benchmark suite – Collection of Java programsCollection of Java programs– Deployed as 543 class filesDeployed as 543 class files– Total size of 1.48 MBTotal size of 1.48 MB

Compared overhead with JCoverage, a tool Compared overhead with JCoverage, a tool for coverage analysisfor coverage analysis

2828

Instrumenting to trace Incoming calls took 100 KB or Instrumenting to trace Incoming calls took 100 KB or 38 class files per second38 class files per second

Instrumented class files increased in size by 26%Instrumented class files increased in size by 26%

Running this takes longer and requires more memory Running this takes longer and requires more memory than original programthan original program

Tracing AspectJ compiler for window size of 8 gave a Tracing AspectJ compiler for window size of 8 gave a modest overheadmodest overhead

Considered more typical for the approach followedConsidered more typical for the approach followed

Overhead of InstrumentationOverhead of Instrumentation

2929

Questions??Questions??

Documents

1 Presented By Krishna Balasubramanian Lightweight Defect Localization for Java Valentin Dallmeier, Christian Lindig, and Andreas Zeller