10
Automated Breakpoint Generation for Debugging Cheng Zhang, 1 Dacong Yan, 2 Jianjun Zhao, 1,3 Shengqian Yang 3 1 Department of Computer Science and Engineering, Shanghai Jiao Tong University 2 Department of Computer Science and Engineering, The Ohio State University 3 School of Software, Shanghai Jiao Tong University [email protected], [email protected], [email protected], [email protected] ABSTRACT During debugging processes, breakpoints are frequently used to inspect and understand the run-time program behavior. Although modern development environments provide con- venient breakpoint utilities, it mostly requires considerable human effort to create useful breakpoints. Before setting a breakpoint or typing a breakpoint condition, developers usu- ally have to make some judgements and hypotheses based on their observations and experiences. To reduce this kind of effort, we present an approach to automatically generating breakpoints for debugging. We combine the nearest neigh- bor queries method, dynamic program slicing, and memory graph comparison to identify suspicious program statements and states. Based on this information, breakpoints are gen- erated and divided into two groups, where the primary group contains conditional breakpoints and the secondary group contains unconditional ones. We implement the presented approach on top of the Eclipse JDT platform. Our case stud- ies indicate that the generated breakpoints can be useful in debugging work. 1. INTRODUCTION Debugging has long been a laborious task for software de- velopers[21, 23]. Despite the existence of various automated debugging techniques, the practical debugging activities are mainly manual and interactive. It is difficult to mechani- cally deal with diversified bugs some of which may be ex- tremely tricky. Thus the fundamental tool in debugging is the human brain; the common debugging process consists of several iterations of observing, making hypotheses, and testing the hypotheses[29]. Most of the modern development environments provide comprehensive utilities to aid debugging work. Among these utilities, breakpoint is one of the most frequently used [20]. By using breakpoints, developers can suspend the program execution and explore the program state space to find clues to bugs. During the debugging process, however, develop- ers usually have to ask themselves a question: where to set Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Copyright 200X ACM X-XXXXX-XX-X/XX/XX ...$10.00. the breakpoints? Setting breakpoints involves their observa- tions, judgements and hypotheses on the current task and even experiences of similar situations. It is common that a veteran sets a small number of breakpoints to reveal a bug immediately, whereas a novice sets excessive useless break- points, but still fails to find the bug. Therefore, for a devel- oper, a set of well defined breakpoints presented in a famil- iar debugger may be a quite pleasant starting point for the subsequent debugging work. Some modern IDEs, such as Eclipse[3], provide the function to export and import break- points which facilitates reuse and share of them. In addition, we believe that breakpoints can also be automatically gen- erated to further accelerate debugging processes. On the other hand, to reduce the manual effort of de- bugging, researchers have proposed a variety of automated debugging techniques. Some techniques [24, 11, 17, 6] focus on the abnormity of execution traces by collecting and an- alyzing the information of executed statements during the executions of passing and failing test cases. Various mea- surements are taken to indicate which statements are more involved in failing runs and identify the most involved state- ments as the places where the faults reside. Another group of techniques put emphasis on the abnormal run-time pro- gram states [9, 14, 15]. They observe the selected program states at certain locations and identify the most suspicious states which are much more likely to appear in failing runs. Ko and Myers [18] view the debugging process as a question and answer session and develop the innovative Whyline tool to prepare candidate questions and help to find the answers by using both static and dynamic analysis techniques. How- ever, developers may still encounter some problems when using these techniques for real world debugging tasks. For example, they may want to try different techniques to find the fault more effectively. Then the problem is how they could use the outcomes of different methods. They may have to familiarize themselves with various kinds of reports in the form of tables of suspicious states or lists of faulty statements or some novel visualizations. Moreover, when the reports fail to indicate the faulty statements, which is often the case, developers probably have to switch to their normal methods and make further effort to find the bugs. In this case, they may need a seamless integration between the innovative techniques and the practical productivity tools. Thus it is helpful to improve the usability and the way of combining automated debugging techniques to provide con- tinuous support for debugging. In this paper, we propose a method to generate break-

Automated Breakpoint Generation for Debugging

  • Upload
    ucl

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Automated Breakpoint Generation for Debugging

Cheng Zhang,1 Dacong Yan,2 Jianjun Zhao,1,3 Shengqian Yang3

1Department of Computer Science and Engineering, Shanghai Jiao Tong University2Department of Computer Science and Engineering, The Ohio State University

3School of Software, Shanghai Jiao Tong [email protected], [email protected], [email protected],

[email protected]

ABSTRACTDuring debugging processes, breakpoints are frequently usedto inspect and understand the run-time program behavior.Although modern development environments provide con-venient breakpoint utilities, it mostly requires considerablehuman effort to create useful breakpoints. Before setting abreakpoint or typing a breakpoint condition, developers usu-ally have to make some judgements and hypotheses based ontheir observations and experiences. To reduce this kind ofeffort, we present an approach to automatically generatingbreakpoints for debugging. We combine the nearest neigh-bor queries method, dynamic program slicing, and memorygraph comparison to identify suspicious program statementsand states. Based on this information, breakpoints are gen-erated and divided into two groups, where the primary groupcontains conditional breakpoints and the secondary groupcontains unconditional ones. We implement the presentedapproach on top of the Eclipse JDT platform. Our case stud-ies indicate that the generated breakpoints can be useful indebugging work.

1. INTRODUCTIONDebugging has long been a laborious task for software de-

velopers[21, 23]. Despite the existence of various automateddebugging techniques, the practical debugging activities aremainly manual and interactive. It is difficult to mechani-cally deal with diversified bugs some of which may be ex-tremely tricky. Thus the fundamental tool in debugging isthe human brain; the common debugging process consistsof several iterations of observing, making hypotheses, andtesting the hypotheses[29].

Most of the modern development environments providecomprehensive utilities to aid debugging work. Among theseutilities, breakpoint is one of the most frequently used [20].By using breakpoints, developers can suspend the programexecution and explore the program state space to find cluesto bugs. During the debugging process, however, develop-ers usually have to ask themselves a question: where to set

Permission to make digital or hard copies of all or part of this work forpersonal or classroom use is granted without fee provided that copies arenot made or distributed for profit or commercial advantage and that copiesbear this notice and the full citation on the first page. To copy otherwise, torepublish, to post on servers or to redistribute to lists, requires prior specificpermission and/or a fee.Copyright 200X ACM X-XXXXX-XX-X/XX/XX ...$10.00.

the breakpoints? Setting breakpoints involves their observa-tions, judgements and hypotheses on the current task andeven experiences of similar situations. It is common that aveteran sets a small number of breakpoints to reveal a bugimmediately, whereas a novice sets excessive useless break-points, but still fails to find the bug. Therefore, for a devel-oper, a set of well defined breakpoints presented in a famil-iar debugger may be a quite pleasant starting point for thesubsequent debugging work. Some modern IDEs, such asEclipse[3], provide the function to export and import break-points which facilitates reuse and share of them. In addition,we believe that breakpoints can also be automatically gen-erated to further accelerate debugging processes.

On the other hand, to reduce the manual effort of de-bugging, researchers have proposed a variety of automateddebugging techniques. Some techniques [24, 11, 17, 6] focuson the abnormity of execution traces by collecting and an-alyzing the information of executed statements during theexecutions of passing and failing test cases. Various mea-surements are taken to indicate which statements are moreinvolved in failing runs and identify the most involved state-ments as the places where the faults reside. Another groupof techniques put emphasis on the abnormal run-time pro-gram states [9, 14, 15]. They observe the selected programstates at certain locations and identify the most suspiciousstates which are much more likely to appear in failing runs.Ko and Myers [18] view the debugging process as a questionand answer session and develop the innovative Whyline toolto prepare candidate questions and help to find the answersby using both static and dynamic analysis techniques. How-ever, developers may still encounter some problems whenusing these techniques for real world debugging tasks. Forexample, they may want to try different techniques to findthe fault more effectively. Then the problem is how theycould use the outcomes of different methods. They mayhave to familiarize themselves with various kinds of reportsin the form of tables of suspicious states or lists of faultystatements or some novel visualizations. Moreover, whenthe reports fail to indicate the faulty statements, which isoften the case, developers probably have to switch to theirnormal methods and make further effort to find the bugs. Inthis case, they may need a seamless integration between theinnovative techniques and the practical productivity tools.Thus it is helpful to improve the usability and the way ofcombining automated debugging techniques to provide con-tinuous support for debugging.

In this paper, we propose a method to generate break-

points based on a synergy of existing fault localization1 tech-niques. With regard to the typical situation of debugging,where there are much fewer failing test cases than passingones, we first use the nearest neighbor queries method [24] toselect a passing case that is the most similar one to the fail-ing case with respect to execution flow and compare them tofind the suspicious basic blocks. Then we use backward dy-namic slicing [19] to augment the report of nearest neighbormethod in order to reveal the causes of the differences be-tween the two execution flows. Line breakpoints are gener-ated at the conditional statements contained in the dynamicslices. At last we extract memory graphs at specific break-points, compare the graphs to identify the most interestingdifferences of program states, and generate breakpoint con-ditions with respect to the state differences. The goal ofour work is to automatically generate useful breakpoints toaid real world debugging activities. In the case study us-ing Java programs retrieved from Apache Commons [1] andSIR [10], we demonstrate the usefulness of our approach. Insummary, the main contributions of this paper are:

• We propose the idea of automated breakpoint genera-tion by leveraging the power of fault localization tech-niques. This approach might be a viable way of uti-lizing academic productions in real world debuggingenvironments.

• We present a way of combining nearest neighbor queriesmethod, dynamic slicing, and memory graph compari-son, which is well applicable to breakpoint generation.

• We implement a prototype on top of the Eclipse JDTplatform and use it to perform a case study on ourapproach. The result is encouraging.

The rest of this paper is organized as follows. In Section 2,we briefly describe the background of our approach. Then,in Section 3, we illustrate the whole process of the presentedapproach with a small example. We give detailed explana-tions and discussions on the considerations, strategies andtrade-offs in our approach in Section 4. The architecture andimplementation details of our automated breakpoint gen-erator are described in Section 5. Section 6 presents theevaluation results and observations. Section 7 compares ourapproach with related work. In Section 8, we conclude thispaper and present the future work.

2. BACKGROUNDIn this section, we give a brief description about the con-

cepts and terminologies needed to present our approach.

2.1 Nearest Neighbor QueriesIn [24], Renieris and Reiss adopt the k-NN algorithm to

select from a set of passing runs the one that most resem-bles the failing run, compare it with the failing run, andreport suspicious basic blocks from the differences. The dis-tance between a pair of runs (one passing and one failing) isdefined in terms of common distance metrics, such as Ham-ming distance, Ulam distance and so on. Before computingthe distance, execution traces are converted to either bi-nary coverage spectra or exact execution count. In binarycoverage spectra, the number of execution times is mapped

1Fault localization is the activity of locating bugs duringdebugging.

to 0 or 1; in exact execution count spectra, the number oftimes each basic block is executed matters. Take asymmetricHamming distance (AHD) and binary coverage spectrum asan example. Binary coverage spectra are usually representedas binary vectors. Thus the Hamming distance between tworuns f (failing run) and p (passing run) is defined as thenumber of positions at which the two representing binaryvectors disagree. To report features that are only present inthe failing run f , the asymmetric variant of Hamming dis-tance, AHD, which considers the non-symmetric differencef − p, is used. A sample calculation is given in Table1.

Table 1: AHD between two runsBlock #1 Block #2 Block #3

Failing Run f 3 0 2Passing Run p 0 1 3

Block Difference 1 0 0

Distance(f, p) = (1 + 0 + 0)/3 = 1/3

When all the pairwise distances are computed, the pairwith the smallest distance value is selected and compared.Finally, those basic blocks executed in the failing run onlywill be reported as suspicious.

2.2 Dynamic Program SlicingProgram slicing was first introduced by Weiser[33] to ex-

tract portions of program code corresponding to program be-haviors of interest. Since then it has been widely applied toprogram testing, debugging, understanding and other main-tenance activities. During program debugging it can be usedto reduce the amount of codes developers need to focus on.In program slicing, a program can be sliced with regard to aslicing criterion. A slicing criterion is a pair <p, V >, wherep is the specified program point and V is a set of variablesof interest. The conventional concept of program slicing iscalled static program slicing, because it is performed basedupon static program dependencies and finds all the state-ments that might affect the specific program behavior. Al-though it is, to some extent, helpful to debugging, staticprogram slicing often produces too large slices to reduce thedevelopers’ effort. In order to narrow down the search spacefurther, Korel and Laski [19] proposed dynamic programslicing which is based upon dynamic program dependenciesand finds the statements that actually affect the specific pro-gram behavior. The backward dynamic slice of a variableat a point in the execution trace includes all the executedstatements which directly or indirectly affect the value of thevariable at that point. In [39], Zhang et al. compare threedynamic backward slicing methods: data slicing, full slicingand relevant slicing. As demonstrated in their work, a fullslice can reveal dynamic data or control dependence chainsof specified statements. Therefore, it would be helpful toperform a full backward dynamic slicing in order to locatethe root cause of failures more accurately.

2.3 Memory GraphThe concept of memory graph is first proposed by Zim-

mermann and Zeller [40] to capture and explore programstates. Intuitively a memory graph represents a program’sstate at a specific moment during its execution in the formof a directed graph whose vertices contain information ofvariable values and edges represent the relations betweenvariables. In its formal definition in [40], a memory graphG is defined as a triple (V, E, root), where V is a set of ver-tices, E is a set of edges, and root is a dedicated vertex.

Each vertex v in V is a triple (val, tp, addr) which standsfor the value, type, and memory address of a variable. Eachedge e in E is a triple (v1, v2, op) where v1 and v2 are thesource and target nodes of e, respectively, and op constructsv2’s name from v1’s name indicating the way of accessing v2

from v1. The dedicated vertex root is designed to referenceall the base variables2 so that each v in G is accessible fromroot. root itself contains no variable information.

As a typical application for memory graph in softwaredebugging, memory graph comparison finds the commonsubgraph of two memory graphs and pinpoints the differ-ences between them. Details of the comparison algorithmscan be found in [40] and [37]. In our approach, we adaptthe comparison method to aid the generation of breakpointconditions.

3. EXAMPLEIn this section, we illustrate our approach by applying it

to a simple contrived Java program shown in Figure 2. Inthe program, the correct version of the statement at line 12should be c += 2. Suppose we have three test cases t1,t2, and t3, whose inputs and expected outputs, described inthe form of (input, expected output), are (1, 8), (4, 10), and(7, 12), respectively. If we run the test cases, t1 fails, whilet2 and t3 pass, because the actual output of t1 is 7. Thenthe debugging task is to find out why t1 fails and fix theprogram to make it pass.

1 public int method to tes t ( int a ){2 int b = 5 ;3 i f ( a < b)4 a = methodA( a ) ;5

6 b += a ;7 return b ;8 }9

10 public int methodA( int c ){11 int d = c + 1 ;12 c += 1 ; // t h i s s tatement shou ld be c+=213

14 i f ( c < 4)15 return c ;16

17 return d ;18 }

Figure 2: An example Java program

Our approach is applied just before the manual debuggingsession begins. Figure 1 illustrates an overview of the processof the approach. Summarily speaking, we first use nearestneighbor queries and backward dynamic slicing to determinethe locations for breakpoints. Then we make use of thememory graph comparison to equip some of the generatedbreakpoints with conditions.

In the first phase, we instrument the program and runthe test cases, t1, t2, and t3, to collect their executiontraces. Table 2 shows the test results and executed basicblocks, in the form of (begin line, end line), for the threetest cases3. Then we calculate the distances between failedand passed test cases, and get Distance(t1, t2) = 1/6 andDistance(t1, t3) = 3/5. Since Distance(t1, t2) is smallerthan Distance(t1, t3), we identify t2 as the case nearest to

2Base variables are those which can be directly accessedfrom the current context, such as the local variables in thetop stack frame.3Here we use AHD to perform nearest neighbor queries.

t1 and report the difference in executed basic blocks betweent1 and t2 as suspicious. In this example, the only statementcontained in the suspicious basic block is at line 15.

Table 2: Test results and executed basic blockstest case results executed basic blocks, (begin, end)

t1 fail (2, 3), (4, 4), (6, 7), (11, 14), (15, 15)

t2 pass (2, 3), (4, 4), (6, 7), (11, 14), (17, 17)

t3 pass (2, 3), (6, 7)

In the second phase, we perform backward dynamic slicingusing the suspicious statement and the relevant variable cas the slicing criterion, <Line15, c>. The resultant slicecontains the statements at lines 2, 3, 4, 12, 14, and 15. Inthe third phase of our approach, we take the strategy thatgenerates breakpoints at conditional statements. Thus, twobreakpoints are generated at lines 3 and 14 (noted as bp3and bp14 hereafter), respectively.

In the last phase, we re-run the test cases t1 and t2, extracta memory graph during each run when bp14 is hit, and com-pare the two memory graphs to identify the program stateswhich only appear in the failing memory graph. We regardthese states as suspicious program states and generate thebreakpoint condition for bp14 based on them. The memorygraphs in this example are illustrated in Figure 3 where thesuspicious program states are colored grey4. As a result, thebreakpoint condition of bp14 is set to (c == 2)&&(d == 2).Here we choose to perform memory graph comparison atbp14 rather than bp3, because the run-time values of thebranch predicate c < 4 are different between the failing andpassing runs, while those of the predicate a < b are bothtrue. Moreover, we focus on the variables accessible from thecurrent context, whose names are prefixed with frame[0],when generating breakpoint conditions. The rules of choos-ing breakpoints for memory graph comparison and generat-ing breakpoint conditions will be explained in Section 4.2.

< < R O O T > >

i n t 2

i n t 1

i n t 2

i n t 5

f r a m e [ 0 ] . d

f r a m e [ 1 ] . a f r a m e [ 0 ] . c

f r a m e [ 1 ] . b

< < R O O T > >

i n t 5

i n t 4

i n t 5

i n t 5

f r a m e [ 0 ] . d

f r a m e [ 1 ] . a f r a m e [ 0 ] . c

f r a m e [ 1 ] . b

M e m o r y g r a p h i n f a i l i n g r u n

M e m o r y g r a p h i n p a s s i n g r u n

Figure 3: Memory graphs extracted at bp14

So far, two breakpoints, bp14 and bp3, have been pro-duced. As last we classify the conditional breakpoints, suchas bp14, as primary breakpoints, and the other unconditionalones, such as bp3 as secondary breakpoints. When presentedto users, primary breakpoints are enabled and secondarybreakpoints are disabled (but not removed) by default. Dur-ing the subsequent debugging work, if we run t1, the exe-cution will be suspended at line 14 and the breakpoint con-dition (c == 2)&&(d == 2) indicates the suspicious pro-gram states. If we enable bp3 and re-run t1, we may find out

4Irrelevant states, such as this reference, are not displayed.

� � � � � � � � � � � � � � � � � � � � � � �� � � � � � � �� � � � � � � � �� � � � � � � � � � � �� �� � � � � � � � ! � � ! � � �" � #$ �� % � � � � � � � � � � � � � � � � � � �� � � � � � � � � � �� � � � � � � � & & ' ( ) * + , - + ( + . / . 0 +� � �� � � � � � � � � �� � � ! � � ! � � �� � �� � ! � � ! � � �� " � #� � � � � � � � � � � � � � � � � � � � � � �� � � � � � � �� � � � � � � � �� � � � � � � � � � � �� �� � � � � � � � ! � � ! � � �" � #$ �� % � � � � � � � � � � � � � � � � � � �� � � � � � � � � � �� � � � � � � � & & ' ( ) * + , - + ( + . / . 0 +� � �� � � � � � � � � �� � � ! � � ! � � �� � �� � ! � � ! � � �� " � #

1 � � ! � � � 2 ! � � � � � � � � � 2� � � � � � ! � � � � � 3 � � ! 4 � � ! 5 6 � � 7 8 � ! � � 5 � � � �� ! 3 ! � � � � � � � 39 � : ! � � � � � � � � � 2� � ! � ! � ! 5 3 ! � � �� � � ! � � � 2 � � � 3 � � � ! � �� ! � � 7 � � � � � � � � � � ; � � � ! � � � ! � � 7 � � � �

� � � � � � � � � � � � � � � � � � � � � � �� � � � � � � �� � � � � � � � �� � � � � � � � � � � �� �� � � � � � � � ! � � ! � � �" � #$ �� % � � � � � � � � � � � � � � � � � � �� � � � � � � � � � �� � � � � � � � & & ' ( ) * + , - + ( + . / . 0 +� � �� � � � � � � � � �� � � ! � � ! � � �� � �� � ! � � ! � � �� " � #� � � � � � � � � � � � � � � � � � � � � � �� � � � � � � �� � � � � � � � �� � � � � � � � � � � �� �� � � � � � � � ! � � ! � � �" � #$ �� % � � � � � � � � � � � � � � � � � � �� � � � � � � � � � �� � � � � � � � & & ' ( ) * + , - + ( + . / . 0 +� � �� � � � � � � � � �� � � ! � � ! � � �� � �� � ! � � ! � � �� " � #� � � � � � � � � � � � � � � � � � � � � � �� � � � � � � �� � � � � � � � �� � � � � � � � � � � �� �� � � � � � � � ! � � ! � � �" � #$ �� % � � � � � � � � � � � � � � � � � � �� � � � � � � � � � �� � � � � � � � & & ' ( ) * + , - + ( + . / . 0 +� � �� � � � � � � � � �� � � ! � � ! � � �� � �� � ! � � ! � � �� " � #< = > > ? @ A A < B > > ? @

Figure 1: Overview of the automated breakpoint generation process

that the value of variable a is 1 at that time. Since in thecorrect program the state (a == 1) implies (c == 3) at lint14, the contrast between (c == 2) and (c == 3) stronglyindicates the faulty statement between lines 3 and 14.

4. APPROACHThis section describes the technical details in automated

breakpoint generation, the underlying considerations andtrade-offs. The process of our approach can be subdividedinto two major steps: 1) selecting breakpoint locations, whichincludes the nearest neighbor queries and the dynamic pro-gram slicing, and 2) generating breakpoint conditions, whichis mainly based on the memory graph comparison.

4.1 Selecting Breakpoint LocationsThere are different kinds of breakpoints to support de-

bugging, such as line breakpoints, exception breakpoints,method entry breakpoints and watchpoints. According toa survey [4] and our own experience, line breakpoints arethe most frequently used. Thus, in the presented work, allthe generated breakpoints are line breakpoints. For a linebreakpoint, the primary property is its location, that is, atwhich line of code it is set.

Regarding the typical debugging scenario, where there isone failing test case and several passing test cases, we firstadopt the strategy that generates breakpoints at a certaintype of statements. All these statements are conditional oneswhere the execution flow of the failing test case diverges fromthat of its nearest passing neighbor.The nearest passing testcase is the one that has an execution flow most resemblingthat of the failing test case. We focus on the nearest passingtest case, because, as shown in [24, 11], when the executiontraces of both failing and passing test cases are similar (butnot identical), the differences between the traces are morelikely to indicate faults. On the contrary, if we compare two

runs that are fundamentally different, it is probably verydifficult to find useful information from the comparison.1 public int method to tes t ( int a ){2 int b = 5 ;3 i f ( a < b)4 a = methodA( a ) ;5

6 b += a ;7 return b ;8 }9

10 public int methodA( int c ){11 i f ( c < 3)12 c += 3 ; // f a u l t y s tatement13

14 return c ;15 }

Figure 4: Another example Java programWe use the nearest neighbor queries method from [24]

to find the nearest passing test case for a given failing oneand identify the differences in execution traces. Comparedwith other fault localization techniques, the nearest neigh-bor queries approach has an acceptable accuracy of 50% andless technical complexity. As described in Section 2, it justrequires execution traces of the failing and passing test casesand suits well to the debugging scenario. We collect execu-tion traces by instrumenting the programs and then runningthe test cases. In the nearest neighbor queries, we use AHDto compute the distance between execution traces and deter-mine the nearest passing test case. Finally, the differencesare recorded in the form of basic blocks. The conditionalstatements anterior to these basic blocks are proper loca-tions to set breakpoints, because they may be close to thefaulty statements and contain information to explain the di-vergence of execution flow. The simple Java program shownin Figure 4 presents a common case where the faulty state-

ment will be executed only if c < 3 is evaluated to true.Thus it might be quite indicative to generate a breakpointat line 11 to suspend the execution just before the faultystatement and show the value of c at that time.

However, it is often inadequate to generate breakpointsat such conditional statements. There are common caseswhere control-flow based techniques, like the nearest neigh-bor queries method, do not perform so well. For example,in Figure 2, the control-flow difference is found at line 15,but it is just the late symptom of executing the faulty state-ment at line 12. Although the developer may discover thefault by manually analyzing data dependence and search-ing backward, the task could be laborious if the dependenceand program structure are complex. To address such prob-lems, we use backward dynamic slicing to trace back to theprogram points where the variable values initially come intothe incorrect program states. We add all the statementsreported by the nearest neighbor queries method and thevariables referenced by them to the slicing criterion set, andre-run the failing test case to perform a backward dynamicslicing. Breakpoints are generated at the conditional state-ments contained in the resultant slice.

4.2 Generating Breakpoint ConditionsA major advantage of breakpoints over other debugging

facilities, such as assertions and printing statements, is thatdevelopers can use them to suspend the execution, inspectthe run-time program states, make some judgements, andthen resume the execution. Therefore, it is important for de-velopers to choose the right program states to inspect whena breakpoint is hit.

We believe that breakpoint conditions can be good indi-cators to suspicious program states. A breakpoint condi-tion is a boolean expression that will be evaluated when thebreakpoint is hit. The behavior of a breakpoint is usuallydecided by the evaluation of its condition. A common us-age of a breakpoint condition is to suspend the executionat the breakpoint only if its condition is evaluated to true.Thus, once equipped with a proper condition, a breakpointmay suspend the execution just before or after the incor-rect program state occurs. This is especially helpful whena breakpoint may be hit for many times, but only a smallnumber of hits are worth inspecting.

In our approach, we adapt the memory graph comparisonto synthesize breakpoint conditions for some of the gener-ated breakpoints. Since memory graphs can represent therun-time program states exhaustively in a structural way,the comparison results are usually precise,with respect topresenting incorrect states, and also convenient for condi-tion generation. In the example described in Section 3,the breakpoint condition (c == 2)&&(d == 2) is directlycomposed based on the grey vertices and their incomingedges. However, extracting a memory graph may causeheavy performance overhead when the program states arelarge in number and have complex data structures. There-fore, it is prohibitive to extract a memory graph at ev-ery breakpoint when it is hit. We believe that it is alsounnecessary, since too many duplicate meaningless break-point conditions may be generated. Regarding that thebreakpoint locations are selected based on the differencesin execution flow, we propose the following rule for choos-ing the breakpoints and opportunities to perform memorygraph comparisons. Given a breakpoint bpi, we gather the

run-time values of its corresponding condition expression5

into two sequences V pi = (vp

i,1, vpi,2, v

pi,3, ..., v

pi,m) and V f

i =

(vfi,1, v

fi,2, v

fi,3, ..., v

fi,n) which stand for the value sequences

in the passing and failing runs, respectively. Then we definethe set of comparison points for bpi:

CPi =

8>>>>>>><>>>>>>>:

{(i, j)|j = min({k|vfi,k = true, vp

i,k = false})}S{(i, j)|j = max({k|vf

i,k = true, vpi,k = false})},

if vpi,j , v

fi,j ∈ {true, false};

{(i, j)|j = min({k|vfi,k 6= vp

i,k})}S

{(i, j)|j = max({k|vfi,k 6= vp

i,k})}, otherwise

where 0 < k ≤ min(m, n). As conditional statements in-clude if, for, while, and switch, their condition expres-sions may have boolean values or values of other types, suchinteger and enumeration in Java. Thus we distinguish be-tween these two cases in the definition. Suppose there aretotally N breakpoints, the set of all the comparison pointsS

1≤i≤N CPi contains, for each breakpoint site, the first and

last times (if they exist) where execution flow of the fail-ing run diverges from that of the passing run. Note thatwe do not consider all the divergent points for efficiencyin case V p

i and V fi are (false, false, false, ..., false) and

(true, true, true, ..., true). On the other hand, we take intoaccount some extra divergent points that do not contributeto the distances in nearest neighbor queries. For example,when V p

i and V fi are (false, true, ...) and (true, false, ...),

(i, 1) is a comparison point even though the basic block pos-terior to bpi will be executed in both of the failing and pass-ing runs. We find that interesting program states can oftenbe observed at such kind of points.

For each comparison point (i, j), we extract two memory

graphs Gfi,j and Gp

i,j when the breakpoint bpi is hit for thejth time during the failing and passing runs, respectively.Then we compare the two graphs using the algorithm bor-rowed from [40] and [37] to find out the almost largest com-mon subgraph of them, the edges only in the failing graph,and the edges only in the passing graph. Since there maybe zero/one/two comparison points for each breakpoint, ze-ro/one/two sets of graph differences may be produced forthe breakpoint during the graph comparison. No conditionswill be generated for the breakpoints having no correspond-ing graph differences. If a breakpoint has exactly one set ofgraph differences, this set will be used to generate the break-point condition. When there are two sets of differences for abreakpoint, we follow the rules below to select one of themfor breakpoint condition generation:

• When nc and n′c are both zero, we choose d if nef +nep ≤ n′ef + n′ep, otherwise we choose d′.

• When nc is non-zero and n′c is zero, we choose d.

• When nc is zero and n′c is non-zero, we choose d′.

• When nc and n′c are both non-zero, we choose d ifnef +nep

nc≤ n′ef +n′ep

n′c, otherwise we choose d′.

Here d, nc, nef , and nep (d′, n′c, n′ef , and n′ep) stand forthe set of edges only in the failing graph, the number ofedges in the common subgraph, the number of edges only

5Recall that every breakpoint is located at a conditionalstatement which may be executed for several times during asingle execution.

1: function ConditionGeneration(Edges)2: SubExprs ← {}3: for all ei ∈ Edges do4: nodei ← getTargetNode(ei)5: opi ← getOperation(ei)6: if isAccessible(opi) then7: V arName ← getVarName(opi)8: if isArrayNode(nodei) then9: SubExprs ← SubExprs ∪

10: {V arName.length == nodei.getLength()}11: end if12: if isNullNode(nodei) then13: SubExprs ← SubExprs ∪14: {V arName == null}15: end if16: if isPrimitiveNode(nodei) then17: SubExprs ← SubExprs ∪18: {V arName == nodei.getValue()}19: end if20: if isStringNode(nodei) then21: SubExprs ← SubExprs ∪22: {V arName.equals(nodei.getStrValue())}23: end if24: end if25: end for26: return connect(SubExprs)27: end function

Figure 5: Procedure for breakpoint condition gen-eration

in the failing graph, and the number of edges in only thepassing graph for the first comparison (the second compari-son). We propose these rules based on the idea that the setof graph differences between two memory graphs which aremore similar to each other should be preferable. Based onthe set of graph differences which comprises a set of graphedges, we compose the breakpoint condition using the algo-rithm shown in Figure 5.

The function ConditionGeneration takes the set of edgeswhich only exist in the failing memory graph as its inputand outputs a breakpoint condition in the form of a char-acter string. During condition generation we focus on thevariables accessible from the top stack frame (checked bythe function isAccessible), because only these variables arevisible and can be used to evaluate the breakpoint conditionwhen a breakpoint is hit. The function connect composesthe string output by concatenating all the subexpressionswith logical conjunction operators.

At last each breakpoint is classified as primary or sec-ondary with respect to whether it is conditional or not. Pri-mary breakpoints will be enabled by default. Thus, whenthey are hit during an execution, their conditions will alwaysbe evaluated and the execution may be suspended accord-ingly. On the other hand, secondary breakpoints will bedisabled by default, that is, they will not come into effectuntil users enable them. We divide the breakpoints into twogroups in order to 1) distinguish the breakpoints that aremore likely to reveal suspicious program states and 2) re-duce the number of breakpoints that users have to inspectin the first iteration of debugging.

5. IMPLEMENTATIONAs illustrated in Figure 6, our automated breakpoint gen-

erator is composed of two subsystems: the fault localizationsubsystem and the breakpoint generation subsystem. Thissection describes the implementation details of these twosubsystems.

Figure 6: Architecture of the implementation

5.1 Fault Localization SubsystemWe combine several techniques in a pipe-and-filter archi-

tecture to implement our fault localization subsystem. Theinput of the subsystem is ordinary Java bytecode, which willbe instrumented at runtime by InsECTJ [25]. InsECTJ isa generic instrumentation framework for collecting dynamicinformation for Java programs. While running the instru-mented program, we record the number of times each basicblock is executed as traces. Then we use NN4J to performnearest neighbor queries on the collected traces to select apassing test case that most resembles the failing one, andreport the differences between their traces. NN4J is our im-plementation of the nearest neighbor queries method, whosemajor algorithms are borrowed from [24]. After finding outthe differences in execution traces, we re-run the failing case,using JSlice [31], to compute a backward dynamic slice withthe statements reported by NN4J and the variables refer-enced by these statements as slicing criteria. The calculatedslice is the final output of the fault localization subsystem.

5.2 Breakpoint Generation SubsystemWe build the breakpoint generation subsystem on top of

Eclipse JDT. One major function of this subsystem is togenerate breakpoints at branch statements contained in theslice computed by the fault localization subsystem. We im-plement this function using the Java parser and the debugplatform provided in Eclipse JDT. The parser parses a Javasource file into its AST representation, with which we caneasily point out the branch statements in the source file.The breakpoint framework in the Eclipse JDT debug plat-form provides a group of utilities to manipulate breakpoints.We directly use the framework to generate breakpoints. Theother major function of the subsystem is to identify compar-ison points, perform memory graph comparisons and gen-erate breakpoint conditions. We use Soot framework [5]to slightly instrument the source code to record run-timevalue sequences at each branch statement where there is abreakpoint. We reuse the memory graph component of theDDstate plugin [2] to implement a breakpoint listener. Oncethe listener has been registered to the debug model, mem-ory graphs will be extracted and compared when relatedcomparison points are reached during a debugging session.

At last we implement the condition generation algorithmdirectly and leverage the breakpoint framework to set theconditions and default status of breakpoints.

6. EVALUATIONTo evaluate the usefulness of the breakpoint generation

approach, we performed a user study on 24 cases of pro-gram debugging. The participants were provided with theautomatically generated breakpoints in 12 cases and theirperformance was evaluated and compared with that in theother 12 cases.

6.1 Design of the User StudyThe user study involved six student volunteers each of

whom has at least three-year experience in Java program-ming and debugging. They also use Eclipse as their pri-mary Java development environment. Each participant wasassigned four debugging tasks and provided with the corre-sponding breakpoints generated by our tool in two of thetasks. As the experimental setup, we retrieved four pro-grams from Apache Commons [1] and SIR [10] and manu-ally seeded a bug in each of them. Each bug was seededby slightly modifying, instead of adding or deleting, oneline of code in the program so that there was an explicitfaulty statement for the participants to find. Moreover, webelieve that these are typical bugs introduced by carelessprogramming or incomplete understanding of some cornercases. Details of the programs and bugs are described inTable 3.

To avoid the bias caused by the differences between indi-viduals, we divided the six participants into two groups sothat the groups had similar programming experience6. Inaddition, we arranged the task assignments as described inTable 4. Therefore, each group was allowed to use the gener-ated breakpoints in only two debugging tasks, which reducedthe chance that a better group might amplify(diminish) theevaluated usefulness of the breakpoints by using(not using)them.

Table 4: Debugging tasks arrangementGroups Persons b1 b2 b3 b4

p1 not use use use not use

g1 p2 not use use use not use

p3 not use use use not use

p4 use not use not use use

g2 p5 use not use not use use

p6 use not use not use use

As a preparation for the study, we generated the break-points for each bug using our tool implementation before theparticipants began debugging. Additionally we gave eachbreakpoint user a 5-minute introduction to briefly describethe basic idea of the breakpoint generation. The failing testcase used in our breakpoint generation was pointed out asthe symptom of the bug. Thus the participants could checkwhether they made the correct fix by running the test case.As the participants were not familiar with the programs, itmight be too difficult for them to correctly discover and fixthe bugs. Therefore, we loosely set a time limit for each de-bugging task, that is, we suggested that they could choose togive up the current task if they had spent more than 90 min-utes on it. However, they could choose to go on debugging

6The programming experience of a group was estimated withthe sum of every group member’s duration of programming.

after 90 minutes. To analyze and evaluate the debugging ses-sions, we recorded all the actions on-screen using a screenrecorder. We evaluated the debugging performance basedon the time spent when the bug was successfully fixed. Wealso asked the participants who gave up their tasks to pointout the statements they felt suspicious and evaluated thedebugging work by estimating the distance between thesestatements and the faulty ones7.

6.2 Results

bug1 bug2 bug3 bug40

20

40

60

80

100

Tim

e fo

r deb

uggi

ng (m

in.)

Bugs

use unuse

Figure 7: Time spent on each debugging task

Figure 7 shows the time spent on the debugging tasks.Each bar represents the duration of a debugging session ofa participant if the bug was successfully fixed in this ses-sion. Thus for each of the four bugs there can be six barswhich correspond to the six participants’ debugging time.Note that there are only three and two bars for bug3 andbug4, respectively, because three participants failed to fixbug3 and four participants failed to fix bug4. To comparethe performance between two groups, we fill the bars of theparticipants aided by the breakpoints with stripes and thoseof the participants in the control group with check pattern.

We can see that the group which was provided with thegenerated breakpoints spent less average debugging time forthree of the four bugs. We may also notice that all theparticipants spent much more time on bug3 and bug4 thanbug1 and bug2. It is because bug3 and bug4, which areseeded in NanoXML and JTopas, are far more complicatedthan bug1 and bug2, each of which involves just one Javasource file. As a result, it is very difficult for the participantsto find and fix the bugs without correctly understanding thebusiness logics of the two real-world applications. It is alsothe reason why some participants failed to fix bug3 and bug4.

Table 5: Ranks of participantsBugs R1 R2 R3 R4 R5 R6

b1 • • ◦ • ◦ ◦b2 • • ◦ • ◦ ◦b3 • • ◦ ◦ • ◦b4 ◦ • ◦ • ◦ •

To take into account the performance of all the partici-pants, we ranked them for each bug by following the rulesbelow:

1. Participants who fixed the current bug are ranked higherthan those who failed to do so.

2. For successful participants, the less time they spent,the higher they are ranked.

7The distance is roughly represented by the number of linesof code between two statements.

Table 3: Subject programs and seeded bugsBugs Programs Versions Files Description

b1 Apache Commons 2.0 FastArrayList.java line 417: ”return (false)” –> ”return (true)”b2 Apache Commons 2.0 CursorableLinkedList.java line 265: ”||” –> ”&&”b3 JTopas 0.6 StandardTokenizer.java line 1658: ”return 1” –> ”return -1”b4 NanoXML 2.2 ContentReader.java line 132: ’&’ –> ’ ’

3. For unsuccessful participants, we ranked them basedon the evaluation of the suspicious statements theypointed out. The way of performing this evaluation isdescribed in Section 6.1.

The ranks of participants are shown in Table 5. Thecolumns ”R1”, ”R2”, ..., and ”R6” list the marks of partici-pants who are ranked as ”Rank 1”, ”Rank 2”, ..., and ”Rank6”, respectively. For each bug, a participant using generatedbreakpoints is marked with a bullet • and a participant notusing generated breakpoints is marked with a circle ◦. FromTable 5 we can see that the participants using generatedbreakpoints are generally ranked higher, which means theyperformed better than the other participants in general.

We analyzed the recorded on-screen actions to investigatehow much the generated breakpoints affected the debug-ging performance. During our analysis, we focused on howmany times that each participant used print statements andbreakpoints for debugging. Then we calculated the UsingRate with the following formula:

Rateusing =nugb

numb + nup + nugb

where nup, numb, and nugb stand for the number of timesthat the participant used print statements, manually addedbreakpoints, and generated breakpoints. Since print state-ments and breakpoints were major debugging utilities in theuser study, we use the using rate to roughly estimate howoften the generated breakpoints were used.

0 1 2 3 4 5 6 7 8 9 10 11 12 13-10

0

10

20

30

40

50

60

70

80

90

100

110

Usi

ng ra

te (%

)

Debugging cases descendingly ordered by using rate

Figure 8: Using rates of the debugging cases sup-ported by generated breakpoints

Figure 8 shows the using rates of the 12 debugging caseswhere generated breakpoints were provided. We can see thatthe using rate is higher than 45% in 9 of 12 cases, which in-dicates that the generated breakpoints were frequently usedduring the debugging sessions. Moreover, in the three caseswhere the using rate is 0, the generated breakpoints were

also used for many times for code inspection. However, theparticipants did not use them to suspend the program exe-cutions.

6.3 Additional ObservationsDuring the analysis on the records, we found a common

pattern of using the generated breakpoints. At the begin-ning of debugging sessions, the users usually browsed all thebreakpoints, especially the conditional ones, and briefly in-spected the source code near the breakpoint locations. Thenthey performed the ordinary debugging process, searchingbackwards from the failed assertion for clues to the bug.When the users decided to check run-time states at a pro-gram point, they tended to use a generated breakpoint nearthe point, instead of writing a print statement or setting abreakpoint by themselves.

For simple cases, such as bug1 and bug2, it was usuallyadequate to use the generated breakpoints to find the bugs.However, when the bugs was complicated, the user had touse utilities other than the generated breakpoints. Someparticipants commented that they found it helpful to openall the source files where the generated breakpoints werelocated, because they could obtain an overall impression ofthe execution traces. In these cases, the breakpoints can beviewed as a special kind of dynamic slice.

6.4 Limitations of the EvaluationSince the user study was performed on a small sample of

participants with a small number of bugs, the generality ofthe results is obviously limited. In addition, all the partici-pants were unfamiliar to the subject programs, which devi-ates from practical debugging scenarios. It indeed affectedthe results, as some participants failed to accomplish theirtasks. To reduce the possible bias, we carefully grouped theparticipants and allocated the debugging tasks, as describedin Section 6.1. We also tried to objectively evaluate the par-ticipants’ performances in unsuccessful cases and comparedthem using a ranking method.

7. RELATED WORKBreakpoints are generally acknowledged as a helpful tool

in program debugging. Chern and Volder [8] design a novelform of dynamic breakpoints named context-flow breakpointsto improve the debugging tools. The Control-flow Break-point Debugger(CBD) provides a pointcut language to spec-ify dynamic breakpoints in program execution flows. De-velopers can use the language to attach much more spec-ified runtime conditions to static breakpoints to producedynamic ones, which are believed to be more helpful fordebugging. Our work is greatly inspired by CBD. And weobserve that although CBD facilitates developers in spec-ifying breakpoints, it is still cumbersome to manually addbreakpoints and describe them in detail. Our work aimsto facilitate the process by automatically generating break-points based upon fault localization techniques.

Zeller et al. develop the Delta Debugging technique tosystematically analyze the differences in program inputs,source codes, and run-time program states to isolate failure-inducing inputs, extract cause-effect chains, and link causetransitions to program failures [38, 36, 9]. Compared withthe fully automated Delta Debugging techniques, our ap-proach is primarily an aid to interactive debugging. More-over, we put more emphasis on finding the locations to setbreakpoints and the opportunities to extract and comparememory graphs, while Delta Debugging generally performsthe searches in space and time in a divide-and-conquer style.In our approach, we borrow the part of memory graph com-parison from [36] and reuse its DDstate implementation. Al-though DDstate implementation systematically changes thememory graph to isolate relevant states, we just compare thegraphs and analyze the graph differences to find accessiblesuspicious states to generate breakpoint conditions.

As an essential step of debugging, fault localization hasbeen extensively studied. Particularly the spectra-basedfault localization techniques, which share the commonal-ity of comparing program spectra, have been well investi-gated. Harrold et al. classify spectra-based approaches in[13] and adopt code coverage spectra to build the Tarantulatool [17]. Additionally, a variety of techniques, such as [14,12, 32], have been proposed to improve the Tarantula ap-proach. In [12], Hao et al. introduce a breakpoint-based in-teractive debugging aid, which is similar to our work. How-ever, we combine different fault localization techniques togenerate breakpoints, instead of using visualization to rec-ommend breakpoint locations. Moreover, we generate con-ditions for some breakpoints to strengthen their usability.Another group of fault localization techniques [24, 11, 28]are based on comparing execution traces. They share theviewpoint that faults are more probable to be disclosed bycomparing two runs that are similar rather than fundamen-tally different. In [24], Renieris and Reiss develop the near-est neighbor queries technique which adopts the k-NN algo-rithm to select a passing run that most resembles the failingone. This technique is claimed to be independent of inputstructure, which is good for building tools that are widelyapplicable. To achieve satisfactory report accuracy, it alsoassumes the availability of a large set of test cases, whichmight not always be possible in real world debugging sit-uations. Although the nearest neighbor queries method isthe base of our approach, we can probably overcome thisrestriction since the successive steps in our approach are ca-pable of improving the accuracy. In addition to the faultlocalization techniques which report suspicious statementsor states, Jiang and Su propose an approach to generatefaulty control flow paths which are more useful to under-stand the bugs [16]. Similar to [16], our approach focuses onconditional statements. Thus a group of generated break-points are likely to convey control flow information to users.Although not refined by machine learning techniques, thebreakpoints can also be useful in interactive debugging.

Program slicing was first introduced in [33], and proved tobe helpful in program debugging [34, 30]. Since our break-point generation focuses on the dynamic aspects of pro-grams, we use dynamic program slicing in our approach.While the slices are static positions in programs, we ap-pend dynamic information by attaching conditions to thecalculated locations and producing conditional breakpoints.After all the size and precision of the slices can significantly

affect the quality of the generated breakpoints. Therefore,we may further improve our approach by leveraging recentadvancement in program slicing techniques, such as [26]. Bycombining dynamic and static program slicing, Ko and My-ers develop a tool called Whyline[18] to support interroga-tive debugging. Using Whyline developers can easily queryboth dynamic and static dependencies in program elementsand execution events, which may greatly accelerate the de-bugging process. Compared with Whyline, our tool is moreof an enhancement to the breakpoint utility than a rein-vented debugging approach. Thus our approach may not beas friendly as Whyline to users, but the generated break-points may be more acceptable.

8. CONCLUSION AND FUTURE WORKBreakpoint is one of the most frequently used utilities in

debugging activities. In this paper, we have proposed an ap-proach to generating breakpoints automatically. The evalu-ation has shown that the generated breakpoints can usuallysave some human effort for debugging. In our future work,we will try to design new strategies to select better locationsto generate breakpoints. Existing work on cause-effect rela-tions in debugging, such as [36, 27], may be good candidatesto improve our current strategy. Furthermore, we are plan-ning to investigate automatic prioritization and refinementof breakpoints based on bug taxonomies and fix patterns [22]and explore the reuse opportunities of debugging artifacts,such as breakpoints, based on recent works [7, 35].

AcknowledgementsWe are grateful to Dr. Yuting Chen for his insightful dis-cussion and suggestions, to Hao Xu and Fangyue Wang forprogramming assistance and to Karsten Lehmanm for hisinstructions on the usage of DDstate plugin. This work wassupported in part by National High Technology Develop-ment Program of China (Grant No. 2006AA01Z158), Na-tional Natural Science Foundation of China (NSFC) (GrantsNo.60673120 and No. 60970009).

9. REFERENCES[1] Apache commons. http://commons.apache.org/.[2] DDstate: Failure-Inducing States.

http://www.st.cs.uni-saarland.de/eclipse/.

[3] Eclipse project. http://www.eclipse.org/.[4] Results from NetBeans Debugger Survey.

http://debugger.netbeans.org/survey.html.[5] Soot: a java optimization framework.

http://www.sable.mcgill.ca/soot/.[6] H. Agrawal, J. Horgan, S. London, W. Wong, and

M. Bellcore. Fault localization using execution slices anddataflow tests. In Software Reliability Engineering, 1995.Proceedings., Sixth International Symposium on, pages143–151, 1995.

[7] B. Ashok, J. Joy, H. Liang, S. K. Rajamani, G. Srinivasa,and V. Vangala. Debugadvisor: a recommender system fordebugging. In ESEC/FSE ’09: Proceedings of the 7th jointmeeting of the European software engineering conferenceand the ACM SIGSOFT symposium on The foundations ofsoftware engineering on European software engineeringconference and foundations of software engineeringsymposium, pages 373–382, New York, NY, USA, 2009.ACM.

[8] R. Chern and K. D. Volder. Debugging with control-flowbreakpoints. Proceedings of the 6th InternationalConference on Aspect-Oriented Software Development,208:96–106, 2007.

[9] H. Cleve and A. Zeller. Locating causes of program failures.27th International Conference on Software Engineering,pages 342–351, 2005.

[10] H. Do, S. G. Elbaum, and G. Rothermel. Supportingcontrolled experimentation with testing techniques: Aninfrastructure and its potential impact. Empirical SoftwareEngineering: An International Journal, 10(4):405–435,2005.

[11] L. Guo, A. Roychoudhury, and T. Wang. Accuratelychoosing execution runs for software fault localization. 15thInternational Conference on Compiler Construction,3923:80–95, 2006.

[12] D. Hao, L. Zhang, L. Zhang, J. Sun, and H. Mei. Vida:Visual interactive debugging. In ICSE ’09: Proceedings ofthe 2009 IEEE 31st International Conference on SoftwareEngineering, pages 583–586, Washington, DC, USA, 2009.IEEE Computer Society.

[13] M. J. Harrold, G. Rothermel, K. Sayre, R. Wu, and L. Yi.An empirical investigation of the relationship betweenspectra differences and regression faults. Softw. Test.,Verif. Reliab., 10(3):171–194, 2000.

[14] T.-Y. Huang, P.-C. Chou, C.-H. Tsai, and H.-A. Chen.Automated fault localization with statistically suspiciousprogram states. ACM SIGPLAN/SIGBED Conference onLanguages, Compilers, and Tools for Embedded Systems,pages 11–20, 2007.

[15] D. Jeffrey, N. Gupta, and R. Gupta. Fault localizationusing value replacement. International Symposium onSoftware Testing and Analysis, pages 167–178, 2008.

[16] L. Jiang and Z. Su. Context-aware statistical debugging:from bug predictors to faulty control flow paths. In ASE’07: Proceedings of the twenty-second IEEE/ACMinternational conference on Automated softwareengineering, pages 184–193, New York, NY, USA, 2007.ACM.

[17] J. A. Jones and M. J. Harrold. Empirical evaluation of thetarantula automatic fault-localization technique. 20thIEEE/ACM International Conference on AutomatedSoftware Engineering, pages 273–282, 2005.

[18] A. J. Ko and B. A. Myers. Debugging reinvented: askingand answering why and why not questions about programbehavior. Proceedings of the 30th International Conferenceon Software Engineering, pages 301–310, 2008.

[19] B. Korel and J. W. Laski. Dynamic program slicing. Inf.Process. Lett., 29(3):155–163, 1988.

[20] G. C. Murphy, M. Kersten, and L. Findlater. How are javasoftware developers using the eclipse ide? IEEE Software,23(4):76–83, 2006.

[21] G. J. Myers. The Art of Software Testing. John Wiley andSons, 1979.

[22] K. Pan, S. Kim, and E. J. W. Jr. Toward an understandingof bug fix patterns. Empirical Software Engineering,14(3):286–315, 2009.

[23] R. Pressman. Software Engineering: A Practitioner’sApproach, Sixth Edition. McGraw-Hill, 2004.

[24] M. Renieris and S. P. Reiss. Fault localization with nearestneighbor queries. 23rd IEEE/ACM InternationalConference on Automated Software Engineering, pages30–39, 2003.

[25] A. Seesing and P. Orso. InsECTJ : a genericinstrumentation framework for collecting dynamicinformation within eclipse. Proceedings of the 2005OOPSLA workshop on Eclipse technology eXchange, pages45–49, 2005.

[26] M. Sridharan, S. J. Fink, and R. Bodik. Thin slicing. InPLDI ’07: Proceedings of the 2007 ACM SIGPLANconference on Programming language design andimplementation, pages 112–122, New York, NY, USA, 2007.ACM.

[27] W. N. Sumner and X. Zhang. Automatic failure inducingchain computation through aligned execution comparison.

Technical Report CSD-TR-08-023, Department ofComputer Science, Purdue University, West Lafayette,Indiana 47907, August 2008.

[28] W. N. Sumner and X. Zhang. Algorithms for automaticallycomputing the causal paths of failures. In FASE, pages355–369, 2009.

[29] M. Telles and Y. Hsieh. The Science of Debugging. CoriolisGroup Books, 2001.

[30] F. Tip. A survey of program slicing techniques. Technicalreport, Amsterdam, The Netherlands, The Netherlands,1994.

[31] T. Wang and A. Roychoudhury. Using compressed bytecodetraces for slicing java programs. 26th InternationalConference on Software Engineering, pages 512–521, 2004.

[32] X. Wang, S. C. Cheung, W. K. Chan, and Z. Zhang.Taming coincidental correctness: Coverage refinement withcontext patterns to improve fault localization. In ICSE ’09:Proceedings of the 2009 IEEE 31st InternationalConference on Software Engineering, pages 45–55,Washington, DC, USA, 2009. IEEE Computer Society.

[33] M. Weiser. Program slicing. Proceedings of the 5thInternational Conference on Software engineering, pages439–449, 1981.

[34] M. Weiser. Programmers use slices when debugging.Commun. ACM, 25(7):446–452, 1982.

[35] T. Xie, S. Thummalapenta, D. Lo, and C. Liu. Data miningfor software engineering. IEEE Computer, 42(8):35–42,August 2009.

[36] A. Zeller. Isolating cause-effect chains from computerprograms. Proceedings of the Tenth ACM SIGSOFTSymposium on Foundations of Software Engineering, pages1–10, 2002.

[37] A. Zeller. Why Programs Fail-A Guide to SystematicDebugging. Morgan Kaufmann Publishers, 2005.

[38] A. Zeller and R. Hildebrandt. Simplifying and isolatingfailure-inducing input. IEEE Trans. Software Eng.,28(2):183–200, 2002.

[39] X. Zhang, H. He, N. Gupta, and R. Gupta. Experimentalevaluation of using dynamic slices for fault location.Proceedings of the Sixth International Workshop onAutomated Debugging, pages 33–42, 2005.

[40] T. Zimmermann and A. Zeller. Visualizing memory graphs.In Software Visualization, pages 191–204, 2001.