Of 18 Is Bytecode Instrumentation as Good as Source Instrumentation? An Empirical Study with Industrial Tools Nan Li, Xin Meng, Jeff Offutt, and Lin Deng

@ GMU

of 18

Is Bytecode Instrumentation as Good as Source Instrumentation?

An Empirical Study with Industrial Tools

Nan Li, Xin Meng, Jeff Offutt, and Lin DengSoftware Engineering

George Mason University

Fairfax, VA USA

www.cs.gmu.edu/~offutt/

[email protected]

@ GMU

of 18

Software Testing• The main purpose of testing is to discover

failures early– Failures found during system testing cost 10 times as

much as if found during unit testing– Failures found after deployment cost 50 times as

much

• Repairing faults early saves money– Fixing faults after release is like having surgery– Fixing faults during development is having a healthy

lifestyle

• Overall goals of test design :1. Few tests2. Efficient process3. Effective testsISSRE 2013 © Li, Meng, Offutt, and Deng 2

Health

care web

applicatio

n ?

@ GMU

of 18

Test Coverage Criteria

ISSRE 2013 © Li, Meng, Offutt, and Deng 3

Test Requirements :

Specific things that must be satisfied or covered during testing

Test Criterion : A collection of rules and a process that define test requirements

Coverage criteria balance the three test design goals

@ GMU

of 18

Code Coverage Criteria• Statement Coverage (SC) : Every statement in

the software must be covered• Branch Coverage (BC) : Every branch in the

software must be covered• Clause Coverage (CC) : Every clause in every

predicate must evaluate to true and false


1. if (a && b)2. print (“yes”);3. print (“Finished”);

SC : [1, 2, 3]

BC : [1, 2, 3] (a=T, b=T) [1, 3] (a=F, b=F)

CC : a=T, a=F, b=T, b=F

@ GMU

of 18

Criteria Subsumption• If branch coverage is satisfied, then all

statements are guaranteed to be reached– That is, BC subsumes SC

• Clause coverage does not subsume either branch coverage or statement coverage

• For the previous example, clause coverage can be satisfied with two tests

• Predicate is true for both, both take the path [1, 3]


(a=T, b=F) & (a=F, b=T)

This is why nobody wants to use CC

@ GMU

of 18

Measuring Coverage Criteria• Tools measure coverage by instrumenting the

software


if (a > b || b == 0) { bc.trueReached (1); return true;} else { bc.falseReached (1); return false;}

• All defs and theory are based on source code– SCI : Source Code

Instrumentation

• Some tools instrument bytecode– Easier to build tools– BCI : ByteCode Instrumentation

• This leaves significant questions– Does BCI preserve the definitions and theory behind

coverage ?– Does BCI == SCI ??

What do tools do?

@ GMU

of 18

3 Research Questions


RQ 1 : .What language control structures do the tools instrument ?

RQ 2 : .Do the tools reflect the theory that branch coverage subsumes statement coverage ?

RQ 3 : .Is bytecode instrumented branch coverage the same as source code instrumented branch coverage ?

@ GMU

of 18

Benefits of This Research

1. Testers need consistency from tools

2. Testers need to understand what tool results mean

3. Tool builders need to know how to build tools


@ GMU

of 18

Analyzing Instrumentation Methods (RQ1 & RQ2)

• Three tools compared– One used bytecode instrumentation– Two used source code instrumentation

• Studied open source software (FindBugs)– 19 classes– 105 methods


@ GMU

of 18

Tools Considered• 31 tools considered (Table I in paper)• Access :19 free, 12 charge for use

– 9 had free student trials (AgitarOne, Jtest, VectorCAST did not)

• Active : 13 had websites updated in past four years

• Branch Coverage : Six support, 22 do not, three unclear

• Method of Instrumentation– BCI : 14– SCI : 5– Both : 1– Did not say : 11


BCI :

EclEmma

SCI :

CloverCodeCover

@ GMU

of 18

Control Structures Instrumented (RQ1)

• Determined by comparing tool results with hand analysis

• All tools had the same omission :– If a method had no branches, the tool reported 0%

coverage– This is as if I don’t ask for a glass of wine, and call

my waiter a failure for not bringing it

• Control structures instrumented (Table II in paper)– EclEmma (BCI) : if-else, while, do-while, for, enhanced for, assignments

expressions, ternary, return, assert, try, catch, finally, switch (all but try-catch)– CodeCover (SCI) : if-else, while, do-while, for, try, catch, finally

– Clover (SCI) : if-else, while, do-while, for, ternery, assert, try, catch, finallyISSRE 2013 © Li, Meng, Offutt, and Deng 11

@ GMU

of 18

Does BC Subsume SC? (RQ2)

• Control structures not instrumented means some statements are not covered, even if BC is fully satisfied

• Reporting 0% coverage on methods with no branches result in statements not being covered


Thus, none of the tools implement branch coverage correctly

@ GMU

of 18

Does BCI == SCI ? (RQ3)• We compared methods that only used control

structures both tools instrumented• Compared tools pair-wise :

– CodeCover (SCI) vs EclEmma (BCI)– Clover (SCI) vs EclEmma (BCI)


Branches Covered

EclEmma 170 55.29%

CodeCover 130 61.53%

Branches Covered

EclEmma 394 54.56%

Clover 342 56.14%

• Only if-else in common• EclEmma skips dead code• Discovered a fault in

CodeCover

• while, do-while, for, assert, try, catch, finally, ternary

@ GMU

of 18

Method Comparison• BCI (EclEmma) compared with SCI (CodeCover

& Clover)– Same score on 49 methods– BCI lower on 11 methods– BCI higher on 4 methods

• Because of the way Java is translated to bytecode, BCI requires each clause in each predicate to be true and false– Thus it measures clause coverage (CC), not branch

coverage (BC)– Most predicates have only one clause

• CC is harder to satisfy than BC with multiple clauses– So BCI scores are usually the same or lower– Exceptions occur when some predicates are not

covered


@ GMU

of 18

Threats to Validity

• As with most software research, we do not know whether our subjects are representative

• Results are based on three tools

• Some of the tools could have faults that affected the results

– We identified one fault

• We performed hand analysis of branch measurement to verify tools’ approach

– Used two raters


@ GMU

of 18

Four Findings


1. None of the tools evaluate all Java branches

2. None of the tools implement branch coverage correctly• Branch coverage should subsume statement

coverage, but does not

3. Instrumenting bytecode measures clause coverage, not branch coverage

4. Bytecode instrumentation is not valid for measuring branch coverage

@ GMU

of 18

Future Directions

We would like to gather more information along several lines

1. We focused on branch coverage, and it would be good to perform a similar study for statement coverage

• More tools are available for SC

2. The study could be extended to more tools, however some tools are expensive to access

3. A major extension would be to measure whether either instrumentation technique helps testers design better tests

4. Can we modify branch coverage instrumentation to measure branch coverage correctly?


@ GMU

of 18

Contact


Jeff Offutt

[email protected]

http://cs.gmu.edu/~offutt/

Documents

Of 18 Is Bytecode Instrumentation as Good as Source Instrumentation? An Empirical Study with Industrial Tools Nan Li, Xin Meng, Jeff Offutt, and Lin Deng