64
Must.Kill.Mutants. Introduction to Mutation Testing Gerald Mücke DevCon5 GmbH, Switzerland

Must.kill.mutants. TopConf Tallinn 2016

Embed Size (px)

Citation preview

Page 1: Must.kill.mutants. TopConf Tallinn 2016

Must.Kill.Mutants.Introduction to Mutation Testing

Gerald MückeDevCon5 GmbH, Switzerland

Page 2: Must.kill.mutants. TopConf Tallinn 2016

2

AGENDA

Quality Assurance

Value of Testing

Mutation Testing

Tool Demo: Pit

Conclusion

Page 3: Must.kill.mutants. TopConf Tallinn 2016

About me

• Gerald Mücke

• Founder & CEO of DevCon5 GmbH

• Passionate Software Developer• Focal Points

• Performance Analysis• Test Automation• Mutation Testing• DevOps

• Using mutation testing for > 2 years

Page 4: Must.kill.mutants. TopConf Tallinn 2016

Effectiveness & Efficiency

Die Quickly Thrive

Die Slowly Survive

4

Doing the right thing

Doin

gth

ings

right

Ineffectively Effectively

Inef

ficie

ntly

Effic

ient

ly

GetShit

Done

Do rightthingsright

Page 5: Must.kill.mutants. TopConf Tallinn 2016

«Quality Assurance»“is a way of preventing mistakes or defects in

manufactured products and avoiding problems when delivering solutions or services to customers”

(Wikipedia)

Page 6: Must.kill.mutants. TopConf Tallinn 2016

«manufactured» products

• «The process of converting raw materials, components, or parts into finished goods that meet a customer's expectations or specifications.»

• Most of the critical code is written manually• Raw Materials

• existing software or parts of it• brain, ideas, knowledge, experience,

requirements,• Every product is unique

(may look similar, though)

Page 7: Must.kill.mutants. TopConf Tallinn 2016

«Preventing» defects

• Defects are «created» in development• Can not be prevented,

it’s human to make mistakes• Could be detected:

the earlier, the better• Defects manifest in production

• Or during test• Can be prevented:

the earlier, the better

Page 8: Must.kill.mutants. TopConf Tallinn 2016

Sources of a Product• Internal Development

• QA embeddable• QA along the pipe line• Quality is shared effort• More Easy to change or

influence

• External Development

• Software Vendors• more effort required for

dedicated QA• Less easy to change• handoff «Waterfall» style

Page 9: Must.kill.mutants. TopConf Tallinn 2016

«We have tested it»Anonymous Developer

Page 10: Must.kill.mutants. TopConf Tallinn 2016

Real-Life Bugs

if( isThreadSafe() ) {computeSingleThreaded();

} else {computeMultiThreaded();

} Made it to Production, Performance Impact: 500% Duration of Day

End Processing

Page 11: Must.kill.mutants. TopConf Tallinn 2016

Real-Life Bugs

if( ! isDevelopmentMode() ){collectProfileDataAndSendDeveloperReport();

}

In Production, Impact:

20% Performance lossCompliance Violation

Page 12: Must.kill.mutants. TopConf Tallinn 2016

Real-Life Bugsvoid function(LocalDate begin, LocalDate end, LocalDate minFrom, ...) {//...outerLoop:while( it.hasNext() ) {

Object current = it.next();Local from = funcA(current);Local upto = funcB(current);while(true){

if( ! isBeforeOrEqual( from , upto ) ) {continue outerLoop;

}if( condY(from, minFrom) ) {

from = DateUtil.addDaysToDate(upto, 1);upto = DateUtil.getLastOfMonth(from);from = DateUtil.min(new LocalDate[]{ end, from});upto = DateUtil.min(new LocalDate[]{ end, upto});

void function(LocalDate begin, LocalDate end, LocalDate minFrom, ...) {//...outerLoop:while( it.hasNext() ) {

Object current = it.next();Local from = funcA(current);Local upto = funcB(current);while(true){

if( ! isBeforeOrEqual( from , upto ) ) {continue outerLoop;

}if( condY(from, minFrom) ) {

from = DateUtil.addDaysToDate(upto, 1);upto = DateUtil.getLastOfMonth(from);from = DateUtil.min(new LocalDate[]{ end, from});upto = DateUtil.min(new LocalDate[]{ end, upto});

Page 13: Must.kill.mutants. TopConf Tallinn 2016

«This will never happen in

production»Anonymous Developer

Page 14: Must.kill.mutants. TopConf Tallinn 2016
Page 15: Must.kill.mutants. TopConf Tallinn 2016

How to make informed decisions?

… without having a clue

Page 16: Must.kill.mutants. TopConf Tallinn 2016

Product Delivery Pipeline

Development ContinuousIntegration

Quality Assurance Release Operations

Decision Point

Page 17: Must.kill.mutants. TopConf Tallinn 2016

Good Decisions are based on Information

SimpleMetricsNumber of Unit TestsLine CoverageBranch Coverage

ComplexTest ResultsCode ReviewStatic Code Analysis…

17

Page 18: Must.kill.mutants. TopConf Tallinn 2016

Code Coverage

Information about what elements of a product have been touched by a test.

Common Coverage Metrics Line Coverage Condition Coverage Branch Coverage

Semantics ?

Code

Test

Test Oracle

Page 19: Must.kill.mutants. TopConf Tallinn 2016

Would yourelease a product basedon100% Line Coverage100% Branch CoverageAnd all Tests are green

19

Page 20: Must.kill.mutants. TopConf Tallinn 2016

«Line or Branche coverage provide no value»

Page 21: Must.kill.mutants. TopConf Tallinn 2016

Arcance Arts To most of the Non-Developers

Software Development seems increasingly like an arcane art

Languages, Paradigms, Frameworks

Algorithms & DatastructuresO(n), ByteCode, Lambdas...

Page 22: Must.kill.mutants. TopConf Tallinn 2016

Magic Delivery Pipeline

Development ContinuousIntegration

Quality Assurance Release Operations

Magic Happens Here

Decision Point

Page 23: Must.kill.mutants. TopConf Tallinn 2016

Quality Gates

Decision PointList of Checks when the Product is ready to be releasedBased on informationBased on agreement between stakeholdersPart of Definition of DoneEvolves over timeShould not replace human judgement

Page 24: Must.kill.mutants. TopConf Tallinn 2016

«Testing is about gaining new

information»

Page 25: Must.kill.mutants. TopConf Tallinn 2016

Perspectives

Programmers• Implement the Solution• Provide indication the solution is

working• claim, they did it «right»

Testers • Show if and how the solution will fail• have to provide information for

stakeholders to make informeddecisions

• usually don’t understand arcane arts

Page 26: Must.kill.mutants. TopConf Tallinn 2016

Checking vs. Testing

Things weare aware of

but don‘tunderstand

Things weare aware of

andunderstand

Things weare neither

aware of norunderstand

Things weunderstandbut are not

aware of

28

Understanding

Awar

enes

s

Unknowns Knowns

Unk

now

nKn

own

CheckingTesting

AutomatedChecking

Page 27: Must.kill.mutants. TopConf Tallinn 2016

The Testing Pyramid of FunctionalTests

UI Tests

Integration Tests

Unit Tests

Degr

eeof

Auto

mat

ion

Page 28: Must.kill.mutants. TopConf Tallinn 2016

Information Gain (without Testing)

Information

Development ContinuousIntegration

Quality Assurance Release Operations

Page 29: Must.kill.mutants. TopConf Tallinn 2016

The Testing Pyramid of FunctionalTests

UI Tests

Integration Tests

Unit Tests

Degr

eeof

Auto

mat

ion

Degr

eeof

Expl

orat

ory

Test

ing

Page 30: Must.kill.mutants. TopConf Tallinn 2016

Information Gain (with Testing)

Information

Development ContinuousIntegration

Quality Assurance Release Operations

Value of Testing

Page 31: Must.kill.mutants. TopConf Tallinn 2016

The Testing Pyramid of FunctionalTests

UI Tests

Integration Tests

Unit Tests

Degr

eeof

Auto

mat

ion

Degr

eeof

Expl

orat

ory

Test

ing

Info

rmat

ion

Gap

Page 32: Must.kill.mutants. TopConf Tallinn 2016

Test Coverability

% CoverableSemantics

Development ContinuousIntegration

Quality Assurance Release Operations

Page 33: Must.kill.mutants. TopConf Tallinn 2016

Cost of Defects

Development ContinuousIntegration

Quality Assurance Release Operations

Cost

Page 34: Must.kill.mutants. TopConf Tallinn 2016

Where to improve?

Development ContinuousIntegration

Quality Assurance Release Operations

Cost / Defect

CoverableSemantics

InformationInformation Gap

Magic Happens Here

Page 35: Must.kill.mutants. TopConf Tallinn 2016

How to prove, to test the right

thing right?

Page 36: Must.kill.mutants. TopConf Tallinn 2016

Mutation Testing – HistoryMutations testing injects faults, based on rules, into a product

to verify if the test suite is capable of finding it.

Fault injection technique Concept is known since ~1970 First implementation of a mutation testing tool in 1980 Most of the time it was subject to academic research only Recently, with increasing processing power, there is a growing interest

More academic research ongoing Practical tooling available

Page 37: Must.kill.mutants. TopConf Tallinn 2016

Mutation Testing – Some Theory Mutation testing is a special form of Fault Injection Based on two hypotheses

1: Most of the software faults are due to small syntactic errors 2: Simple faults can cascade to more emergent faults

Assumption: “if a mutant was introduced without the behavior of the test suite being affected, this indicated either

that the code that had been mutated was never executed (dead code) or that the test suite was unableto locate the faults represented by the mutant” (Wikipedia)

Page 38: Must.kill.mutants. TopConf Tallinn 2016

Mutation Testing - Definitions

Mutant a variation P’ of the product P created by

applying a mutant operator mP’ = m(P)

Killed Mutant a variation P’ in which a test has found at

least ONE error Live Mutant

a variation P’ in which a test has found NO errors

Mutation Operators A function m() that creates a variation of the

Product P by applying a set of modification rules

Inject Faults into the Product Based on Bug Taxonomies

Mutation Score Number of Killed Mutants / Total number of

Mutants Also Called Mutation Coverage

Page 39: Must.kill.mutants. TopConf Tallinn 2016

Some more definitions

Equivalent Mutation a variation P’ that is semantically

identical to P

Duplicate Mutation a variation P’ that is equivalent to another

variation P’’

Weak Mutation Fault does not lead to incorrect output

Strong Mutation Fault propagates to incorrect output

Unstable Mutation Any test can find the mutations generated

by it

High-Order Mutants Mutants that are defined by a set of Low-

Level Mutants

Subsumed Mutants One mutant subsumes another if at least

one test kills the first and every test that kills the first also kills the second.

Page 40: Must.kill.mutants. TopConf Tallinn 2016

Mutation Operators

Boundaries Conditional Boundary Negate Conditionals Remove Conditional

Return values Return Values Argument Propagation

Method Calls Non Void Method Calls Void Method Calls Constructor Calls

Calculations Invert Negatives Increments / Remove Increments Math

Members and Constants Inline Constants Member Variable (experimental)

Java Language Switch (experimental) Modifiers ...

...

Page 41: Must.kill.mutants. TopConf Tallinn 2016

«Alive Mutants will eventually turn

into a Bug»

Page 42: Must.kill.mutants. TopConf Tallinn 2016

Approaches to Mutation Testing Byte Code Mutation

Can be done on-the-fly Faster to apply and execute Might be affected by compiler optimizations

Source Code Mutation Requires recompilation after every change Takes very long Is not affected by compiler optimizations

Higher Level Mutations Configuration, Architecture, Specification,

Use/Business Case, ... No Tooling Support (yet?)

Page 43: Must.kill.mutants. TopConf Tallinn 2016

Mutation Testing Phases Mutant generation

analyzing classes and generate mutations for them

Test selection selecting the tests to run against the mutations

Mutant insertion loading the mutations into a JVM / Runtime Environment

Mutant detection executing tests against the loaded mutants

Page 44: Must.kill.mutants. TopConf Tallinn 2016

Mutation Testing 101

Modify your code(Mutant generation)

Re-Run the Test(Test selection + Loading)

Check if test is failing(Detection)

class Builder {Builder withValue(String in) {

this.value = in;this.value = in;return this;

}}@Testpublic void testLeft() {Builder b = b.withValue(„one");assertNotNull(b);

}

If test is Green it‘s a Fail!!!

Page 45: Must.kill.mutants. TopConf Tallinn 2016

Related Techniques Bebugging / Fault Seeding

randomly adding bugs, programmers are tasked to find them

Fuzzing Injecting Faults into Test Data

For Operations: Chaos Monkey (Simian Army, Netflix)

Randomly terminating running processes or servers to test operational procedures or fitnesse

Page 47: Must.kill.mutants. TopConf Tallinn 2016

Tool: PIT Mutation Testing for Java / JVM

Operates on ByteCode modification easy to use - works with ant, maven, gradle and

others

~ 20 Mutation Operands for altering your code

Parallel execution fast - can analyze in minutes what would take

earlier systems days

Active Community actively developed & supported

Mature Tooling Good Documentation

HTML & XML Reports

Page 48: Must.kill.mutants. TopConf Tallinn 2016

Example Output

Page 49: Must.kill.mutants. TopConf Tallinn 2016

Interpreting Results Live Mutants

Reflects unspecified behavior superfluous code / unrequired semantics Could be an actual bug that is not covered by the test suite Could be equivalent mutation

Killed by TimeOut or MemError Could be “real kill” (i.e. endless loop) Could be still alive

Mutation Score Gives an indication of the overall quality of you test suite

Page 50: Must.kill.mutants. TopConf Tallinn 2016

Unit Test Maturity ModelLevel Description0 No Test1 We have a test2 1 + We have > 0% Line Coverage3 2 + We have > 50% Branch Coverage4 3 + We have at least 1 effective assertion per test5 4 + We have > 80 % Mutation Coverage

Page 51: Must.kill.mutants. TopConf Tallinn 2016

Demo

53

Page 52: Must.kill.mutants. TopConf Tallinn 2016

Timeouts Mutating abort-conditions of loops can cause timeouts

Loop runs endlessly Mutation is effectively killed Mutation might not be killed

Loop runs longer (i.e. counter underrun / overrun) -> Mutation might eventually survive Your System is just too slow / the tests takes too long

When to stop the test? Will the test fail?

If a loop runs longer, the machine performance is important for choosing the timeout.

Page 53: Must.kill.mutants. TopConf Tallinn 2016

Limitations Fault Coverage

~¼ of real faults are not coverable by mutation testing

Mutation Score PIT does not recognize subsumed or equivalent mutations mutation score may not be “academically” precise – context matters!

Mutation Operators PIT has no Java concurrency mutation operands PIT has no high-order mutation operands PIT has no Java-language specific mutation operands

Techniques PIT does not support sampling

Page 54: Must.kill.mutants. TopConf Tallinn 2016

Value has it’s cost Mutation Testing is computationally expensive Duration of a mutation test depends on

number of tests test suite execution time number of mutation operators Processing Power

Basically:D = xn

n = number of mutation operators x = number of tests

Page 55: Must.kill.mutants. TopConf Tallinn 2016

Deviation in Mutation Score

Impact of Mutation Operator Selection

Size of Codebase

Computational Effort

MutationsFound More Operands

Less Operands

Page 56: Must.kill.mutants. TopConf Tallinn 2016

Mutation Analysis of Large Code bases

Computational Effort

Time Cap

Size of CodebaseBreak into Chunks

Mo Tu We Th Fr

Page 57: Must.kill.mutants. TopConf Tallinn 2016

Other techniques Incremental Analysis

Based on historical data Only test code that has changed Increases deviation

Sampling Good for Mutation Scoring Increases Deviation No support for sampling in PIT

Page 58: Must.kill.mutants. TopConf Tallinn 2016

Challenges of Mutation Testing Redundant Mutants

Subsuming Duplicates Equivalent

Equivalent Detection Current Algorithm achieves 50% detection rate in

Research

High Order Mutations Computational Cost

Mutation and Test Selection

EquivalentMutants

Subsuming/DuplicateMutants

Page 59: Must.kill.mutants. TopConf Tallinn 2016

Conclusion

Page 60: Must.kill.mutants. TopConf Tallinn 2016

Some Advices Unit Tests are usually owned by development

challenge them with Mutation Testing! It’s NOT unit tested until mutation tested.

Don’t go on a killing spree Set achievable goals for mutation score Triage surviving mutants A mutation score > 0.8 is considered good (it depends…)

Determine mutation score regularly in a sensible intervall Every build vs. Every release Use historical data & SCM support

Find concrete mutants as needed Adjust mutators & scope

Page 61: Must.kill.mutants. TopConf Tallinn 2016

Use Cases Finding Gaps in Test Suite Testing Highly Exposed Code

Algorithms and Calculations Security-related code Transaction-related code

Assessing Test Suites / Testing Strategies / Methodologies By comparing the mutation scores, i.e.

Developing Test Suites for Legacy Code Finding semantic hotspots Finding gaps in Test Suite Forced to break Code Base into more manageable pieces

Minimizing Test Suites Reduce number of Tests while keeping mutation score stable ! Reduces the effectiveness of the suite for detecting real faults

Page 62: Must.kill.mutants. TopConf Tallinn 2016

«Test First does not lead to better test

suites»

Page 63: Must.kill.mutants. TopConf Tallinn 2016

Takeaways Don’t trust your Unit Tests unless you mutation-tested it.

Mutation Testing is the practice to find bugs in your test suite

Forget about other coverage metrics Cheap to get, but next to no value

Include Mutation Testing in your project. Always.

Use it with common sense don’t go on a killing spree.

For Java PIT is the tool to use.

Page 64: Must.kill.mutants. TopConf Tallinn 2016

Gerald Mücke

DevCon5 @gmuecke