Upload
samuel-cook
View
215
Download
1
Embed Size (px)
Citation preview
@ GMU
of 18
Is Bytecode Instrumentation as Good as Source Instrumentation?
An Empirical Study with Industrial Tools
Nan Li, Xin Meng, Jeff Offutt, and Lin DengSoftware Engineering
George Mason University
Fairfax, VA USA
www.cs.gmu.edu/~offutt/
@ GMU
of 18
Software Testing• The main purpose of testing is to discover
failures early– Failures found during system testing cost 10 times as
much as if found during unit testing– Failures found after deployment cost 50 times as
much
• Repairing faults early saves money– Fixing faults after release is like having surgery– Fixing faults during development is having a healthy
lifestyle
• Overall goals of test design :1. Few tests2. Efficient process3. Effective testsISSRE 2013 © Li, Meng, Offutt, and Deng 2
Health
care web
applicatio
n ?
@ GMU
of 18
Test Coverage Criteria
ISSRE 2013 © Li, Meng, Offutt, and Deng 3
Test Requirements :
Specific things that must be satisfied or covered during testing
Test Criterion : A collection of rules and a process that define test requirements
Coverage criteria balance the three test design goals
@ GMU
of 18
Code Coverage Criteria• Statement Coverage (SC) : Every statement in
the software must be covered• Branch Coverage (BC) : Every branch in the
software must be covered• Clause Coverage (CC) : Every clause in every
predicate must evaluate to true and false
ISSRE 2013 © Li, Meng, Offutt, and Deng 4
1. if (a && b)2. print (“yes”);3. print (“Finished”);
SC : [1, 2, 3]
BC : [1, 2, 3] (a=T, b=T) [1, 3] (a=F, b=F)
CC : a=T, a=F, b=T, b=F
@ GMU
of 18
Criteria Subsumption• If branch coverage is satisfied, then all
statements are guaranteed to be reached– That is, BC subsumes SC
• Clause coverage does not subsume either branch coverage or statement coverage
• For the previous example, clause coverage can be satisfied with two tests
• Predicate is true for both, both take the path [1, 3]
ISSRE 2013 © Li, Meng, Offutt, and Deng 5
(a=T, b=F) & (a=F, b=T)
This is why nobody wants to use CC
@ GMU
of 18
Measuring Coverage Criteria• Tools measure coverage by instrumenting the
software
ISSRE 2013 © Li, Meng, Offutt, and Deng 6
if (a > b || b == 0) { bc.trueReached (1); return true;} else { bc.falseReached (1); return false;}
• All defs and theory are based on source code– SCI : Source Code
Instrumentation
• Some tools instrument bytecode– Easier to build tools– BCI : ByteCode Instrumentation
• This leaves significant questions– Does BCI preserve the definitions and theory behind
coverage ?– Does BCI == SCI ??
What do tools do?
@ GMU
of 18
3 Research Questions
ISSRE 2013 © Li, Meng, Offutt, and Deng 7
RQ 1 : .What language control structures do the tools instrument ?
RQ 2 : .Do the tools reflect the theory that branch coverage subsumes statement coverage ?
RQ 3 : .Is bytecode instrumented branch coverage the same as source code instrumented branch coverage ?
@ GMU
of 18
Benefits of This Research
1. Testers need consistency from tools
2. Testers need to understand what tool results mean
3. Tool builders need to know how to build tools
ISSRE 2013 © Li, Meng, Offutt, and Deng 8
@ GMU
of 18
Analyzing Instrumentation Methods (RQ1 & RQ2)
• Three tools compared– One used bytecode instrumentation– Two used source code instrumentation
• Studied open source software (FindBugs)– 19 classes– 105 methods
ISSRE 2013 © Li, Meng, Offutt, and Deng 9
@ GMU
of 18
Tools Considered• 31 tools considered (Table I in paper)• Access :19 free, 12 charge for use
– 9 had free student trials (AgitarOne, Jtest, VectorCAST did not)
• Active : 13 had websites updated in past four years
• Branch Coverage : Six support, 22 do not, three unclear
• Method of Instrumentation– BCI : 14– SCI : 5– Both : 1– Did not say : 11
ISSRE 2013 © Li, Meng, Offutt, and Deng 10
BCI :
EclEmma
SCI :
CloverCodeCover
@ GMU
of 18
Control Structures Instrumented (RQ1)
• Determined by comparing tool results with hand analysis
• All tools had the same omission :– If a method had no branches, the tool reported 0%
coverage– This is as if I don’t ask for a glass of wine, and call
my waiter a failure for not bringing it
• Control structures instrumented (Table II in paper)– EclEmma (BCI) : if-else, while, do-while, for, enhanced for, assignments
expressions, ternary, return, assert, try, catch, finally, switch (all but try-catch)– CodeCover (SCI) : if-else, while, do-while, for, try, catch, finally
– Clover (SCI) : if-else, while, do-while, for, ternery, assert, try, catch, finallyISSRE 2013 © Li, Meng, Offutt, and Deng 11
@ GMU
of 18
Does BC Subsume SC? (RQ2)
• Control structures not instrumented means some statements are not covered, even if BC is fully satisfied
• Reporting 0% coverage on methods with no branches result in statements not being covered
ISSRE 2013 © Li, Meng, Offutt, and Deng 12
Thus, none of the tools implement branch coverage correctly
@ GMU
of 18
Does BCI == SCI ? (RQ3)• We compared methods that only used control
structures both tools instrumented• Compared tools pair-wise :
– CodeCover (SCI) vs EclEmma (BCI)– Clover (SCI) vs EclEmma (BCI)
ISSRE 2013 © Li, Meng, Offutt, and Deng 13
Branches Covered
EclEmma 170 55.29%
CodeCover 130 61.53%
Branches Covered
EclEmma 394 54.56%
Clover 342 56.14%
• Only if-else in common• EclEmma skips dead code• Discovered a fault in
CodeCover
• while, do-while, for, assert, try, catch, finally, ternary
@ GMU
of 18
Method Comparison• BCI (EclEmma) compared with SCI (CodeCover
& Clover)– Same score on 49 methods– BCI lower on 11 methods– BCI higher on 4 methods
• Because of the way Java is translated to bytecode, BCI requires each clause in each predicate to be true and false– Thus it measures clause coverage (CC), not branch
coverage (BC)– Most predicates have only one clause
• CC is harder to satisfy than BC with multiple clauses– So BCI scores are usually the same or lower– Exceptions occur when some predicates are not
covered
ISSRE 2013 © Li, Meng, Offutt, and Deng 14
@ GMU
of 18
Threats to Validity
• As with most software research, we do not know whether our subjects are representative
• Results are based on three tools
• Some of the tools could have faults that affected the results
– We identified one fault
• We performed hand analysis of branch measurement to verify tools’ approach
– Used two raters
ISSRE 2013 © Li, Meng, Offutt, and Deng 15
@ GMU
of 18
Four Findings
ISSRE 2013 © Li, Meng, Offutt, and Deng 16
1. None of the tools evaluate all Java branches
2. None of the tools implement branch coverage correctly• Branch coverage should subsume statement
coverage, but does not
3. Instrumenting bytecode measures clause coverage, not branch coverage
4. Bytecode instrumentation is not valid for measuring branch coverage
@ GMU
of 18
Future Directions
We would like to gather more information along several lines
1. We focused on branch coverage, and it would be good to perform a similar study for statement coverage
• More tools are available for SC
2. The study could be extended to more tools, however some tools are expensive to access
3. A major extension would be to measure whether either instrumentation technique helps testers design better tests
4. Can we modify branch coverage instrumentation to measure branch coverage correctly?
ISSRE 2013 © Li, Meng, Offutt, and Deng 17
@ GMU
of 18
Contact
ISSRE 2013 © Li, Meng, Offutt, and Deng 18
Jeff Offutt
http://cs.gmu.edu/~offutt/