225
An Overview of Soſtware Tesng CMPT 745 Soſtware Engineering Nick Sumner [email protected]

An Overview of Software Testingwsumner/teaching/745/04-testing.pdf · Software Testing The most common way of measuring and ensuring program correctness Input Program Observed Behavior

  • Upload
    others

  • View
    12

  • Download
    0

Embed Size (px)

Citation preview

Page 1: An Overview of Software Testingwsumner/teaching/745/04-testing.pdf · Software Testing The most common way of measuring and ensuring program correctness Input Program Observed Behavior

An Overview ofSoftware Testing

CMPT 745Software Engineering

Nick [email protected]

Page 2: An Overview of Software Testingwsumner/teaching/745/04-testing.pdf · Software Testing The most common way of measuring and ensuring program correctness Input Program Observed Behavior

Software Testing

● The most common way of measuring and ensuring program correctness

Input Program

Page 3: An Overview of Software Testingwsumner/teaching/745/04-testing.pdf · Software Testing The most common way of measuring and ensuring program correctness Input Program Observed Behavior

Software Testing

● The most common way of measuring and ensuring program correctness

Input Program ObservedBehavior

Page 4: An Overview of Software Testingwsumner/teaching/745/04-testing.pdf · Software Testing The most common way of measuring and ensuring program correctness Input Program Observed Behavior

Software Testing

● The most common way of measuring and ensuring program correctness

Input Program ObservedBehavior

Oracle

Page 5: An Overview of Software Testingwsumner/teaching/745/04-testing.pdf · Software Testing The most common way of measuring and ensuring program correctness Input Program Observed Behavior

Software Testing

● The most common way of measuring and ensuring program correctness

Input Program ObservedBehavior

Oracle Outcome

Page 6: An Overview of Software Testingwsumner/teaching/745/04-testing.pdf · Software Testing The most common way of measuring and ensuring program correctness Input Program Observed Behavior

Software Testing

Input Program ObservedBehavior

Oracle Outcome

● The most common way of measuring and ensuring program correctness

Test SuiteTest 1 Input OracleTest 2 Input OracleTest 3 Input OracleTest 4 Input OracleTest 5 Input OracleTest 6 Input OracleTest 7 Input Oracle

Page 7: An Overview of Software Testingwsumner/teaching/745/04-testing.pdf · Software Testing The most common way of measuring and ensuring program correctness Input Program Observed Behavior

Software Testing

● The most common way of measuring and ensuring program correctness

Input Program ObservedBehavior

Oracle Outcome

Test SuiteTest 1 Input OracleTest 2 Input OracleTest 3 Input OracleTest 4 Input OracleTest 5 Input OracleTest 6 Input OracleTest 7 Input Oracle

Key Issues

Page 8: An Overview of Software Testingwsumner/teaching/745/04-testing.pdf · Software Testing The most common way of measuring and ensuring program correctness Input Program Observed Behavior

Software Testing

Input Program ObservedBehavior

Oracle Outcome

Test 1 Input OracleTest 2 Input OracleTest 3 Input OracleTest 4 Input OracleTest 5 Input OracleTest 6 Input OracleTest 7 Input Oracle

Key Issues● Test suite adequacy Test Suite

● The most common way of measuring and ensuring program correctness

Page 9: An Overview of Software Testingwsumner/teaching/745/04-testing.pdf · Software Testing The most common way of measuring and ensuring program correctness Input Program Observed Behavior

Software Testing

● The most common way of measuring and ensuring program correctness

Input Program ObservedBehavior

Oracle Outcome

Test SuiteTest 1 Input OracleTest 2 Input OracleTest 3 Input OracleTest 4 Input OracleTest 5 Input OracleTest 6 Input OracleTest 7 Input Oracle

Key Issues● Test suite adequacy● Automated input generation

Page 10: An Overview of Software Testingwsumner/teaching/745/04-testing.pdf · Software Testing The most common way of measuring and ensuring program correctness Input Program Observed Behavior

Software Testing

● The most common way of measuring and ensuring program correctness

Input Program ObservedBehavior

Oracle Outcome

Test SuiteTest 1 Input OracleTest 2 Input OracleTest 3 Input OracleTest 4 Input OracleTest 5 Input OracleTest 6 Input OracleTest 7 Input Oracle

Key Issues● Test suite adequacy● Automated input generation● Automated oracle generation

Page 11: An Overview of Software Testingwsumner/teaching/745/04-testing.pdf · Software Testing The most common way of measuring and ensuring program correctness Input Program Observed Behavior

Software Testing

● The most common way of measuring and ensuring program correctness

Input Program ObservedBehavior

Oracle Outcome

Test SuiteTest 1 Input OracleTest 2 Input OracleTest 3 Input OracleTest 4 Input OracleTest 5 Input OracleTest 6 Input OracleTest 7 Input Oracle

Key Issues● Test suite adequacy● Automated input generation● Automated oracle generation● Robustness/flakiness/maintainability

Page 12: An Overview of Software Testingwsumner/teaching/745/04-testing.pdf · Software Testing The most common way of measuring and ensuring program correctness Input Program Observed Behavior

Software Testing

Input Program ObservedBehavior

Oracle Outcome

Test 1 Input OracleTest 2 Input OracleTest 3 Input OracleTest 4 Input OracleTest 5 Input OracleTest 6 Input OracleTest 7 Input Oracle

● The most common way of measuring and ensuring program correctness

Key Issues● Test suite adequacy● Automated input generation● Automated oracle generation● Robustness/flakiness/maintainability● Regression test selection

Test Suite

Page 13: An Overview of Software Testingwsumner/teaching/745/04-testing.pdf · Software Testing The most common way of measuring and ensuring program correctness Input Program Observed Behavior

Software Testing

● The most common way of measuring and ensuring program correctness

Input Program ObservedBehavior

Oracle Outcome

Test SuiteTest 1 Input OracleTest 2 Input OracleTest 3 Input OracleTest 4 Input OracleTest 5 Input OracleTest 6 Input OracleTest 7 Input Oracle

Key Issues● Test suite adequacy● Automated input generation● Automated oracle generation● Robustness/flakiness/maintainability● Regression test selection● Fault localization & automated debugging

Page 14: An Overview of Software Testingwsumner/teaching/745/04-testing.pdf · Software Testing The most common way of measuring and ensuring program correctness Input Program Observed Behavior

Software Testing

● The most common way of measuring and ensuring program correctness

Input Program ObservedBehavior

Oracle Outcome

Test SuiteTest 1 Input OracleTest 2 Input OracleTest 3 Input OracleTest 4 Input OracleTest 5 Input OracleTest 6 Input OracleTest 7 Input Oracle

Key Issues● Test suite adequacy● Automated input generation● Automated oracle generation● Robustness/flakiness/maintainability● Regression test selection● Fault localization & automated debugging● Automated program repair● ...

Page 15: An Overview of Software Testingwsumner/teaching/745/04-testing.pdf · Software Testing The most common way of measuring and ensuring program correctness Input Program Observed Behavior

Software Testing

● The most common way of measuring and ensuring program correctness

Input Program ObservedBehavior

Oracle Outcome

Test SuiteTest 1 Input OracleTest 2 Input OracleTest 3 Input OracleTest 4 Input OracleTest 5 Input OracleTest 6 Input OracleTest 7 Input Oracle

Key Issues● Test suite adequacy● Automated input generation● Automated oracle generation● Robustness/flakiness/maintainability● Regression test selection● Fault localization & automated debugging● Automated program repair● ...

We will discuss a few basics nowand revisit the problem as we learn new techniques

Page 16: An Overview of Software Testingwsumner/teaching/745/04-testing.pdf · Software Testing The most common way of measuring and ensuring program correctness Input Program Observed Behavior

Test Suite Design

● Objectives– Functional correctness– Nonfunctional attributes (performance, ...)

Page 17: An Overview of Software Testingwsumner/teaching/745/04-testing.pdf · Software Testing The most common way of measuring and ensuring program correctness Input Program Observed Behavior

Test Suite Design

● Objectives– Functional correctness– Nonfunctional attributes (performance, ...)

● Components – The Automated Testing Pyramid

Unit

API/Integration/Component/

System UI

Page 18: An Overview of Software Testingwsumner/teaching/745/04-testing.pdf · Software Testing The most common way of measuring and ensuring program correctness Input Program Observed Behavior

Test Suite Design

● Objectives– Functional correctness– Nonfunctional attributes (performance, ...)

● Components – The Automated Testing Pyramid

Unit

API/Integration/Component/

System UI

Page 19: An Overview of Software Testingwsumner/teaching/745/04-testing.pdf · Software Testing The most common way of measuring and ensuring program correctness Input Program Observed Behavior

Test Suite Design

● Objectives– Functional correctness– Nonfunctional attributes (performance, ...)

● Components – The Automated Testing Pyramid

Unit

API/Integration/Component/

System UI

Page 20: An Overview of Software Testingwsumner/teaching/745/04-testing.pdf · Software Testing The most common way of measuring and ensuring program correctness Input Program Observed Behavior

Test Suite Design

● Objectives– Functional correctness– Nonfunctional attributes (performance, ...)

● Components – The Automated Testing Pyramid

Unit

API/Integration/Component/

System UI

Page 21: An Overview of Software Testingwsumner/teaching/745/04-testing.pdf · Software Testing The most common way of measuring and ensuring program correctness Input Program Observed Behavior

Test Suite Design

● Objectives– Functional correctness– Nonfunctional attributes (performance, ...)

● Components – The Automated Testing Pyramid

Unit

API/Integration/Component/

System UI

Integrated

Isolated

Page 22: An Overview of Software Testingwsumner/teaching/745/04-testing.pdf · Software Testing The most common way of measuring and ensuring program correctness Input Program Observed Behavior

Test Suite Design

● Objectives– Functional correctness– Nonfunctional attributes (performance, ...)

● Components – The Automated Testing Pyramid

Unit

API/Integration/Component/

System UI

Integrated

Isolated

Slow

Fast

Page 23: An Overview of Software Testingwsumner/teaching/745/04-testing.pdf · Software Testing The most common way of measuring and ensuring program correctness Input Program Observed Behavior

Designing a Unit Test

● Common structure

Page 24: An Overview of Software Testingwsumner/teaching/745/04-testing.pdf · Software Testing The most common way of measuring and ensuring program correctness Input Program Observed Behavior

Designing a Unit Test

● Common structure

TEST_CASE("empty") { Environment env; ExprTree tree;

auto result = evaluate(tree, env);

CHECK(!result.has_value());}

Page 25: An Overview of Software Testingwsumner/teaching/745/04-testing.pdf · Software Testing The most common way of measuring and ensuring program correctness Input Program Observed Behavior

Designing a Unit Test

● Common structure

TEST_CASE("empty") { Environment env; ExprTree tree;

auto result = evaluate(tree, env);

CHECK(!result.has_value());}

Set up a scenario

Page 26: An Overview of Software Testingwsumner/teaching/745/04-testing.pdf · Software Testing The most common way of measuring and ensuring program correctness Input Program Observed Behavior

Designing a Unit Test

● Common structure

TEST_CASE("empty") { Environment env; ExprTree tree;

auto result = evaluate(tree, env);

CHECK(!result.has_value());}

Run the scenario

Page 27: An Overview of Software Testingwsumner/teaching/745/04-testing.pdf · Software Testing The most common way of measuring and ensuring program correctness Input Program Observed Behavior

Designing a Unit Test

● Common structure

TEST_CASE("empty") { Environment env; ExprTree tree;

auto result = evaluate(tree, env);

CHECK(!result.has_value());} Check the outcome

Page 28: An Overview of Software Testingwsumner/teaching/745/04-testing.pdf · Software Testing The most common way of measuring and ensuring program correctness Input Program Observed Behavior

Designing a Unit Test

● Common structure

● Tests should run in isolation

struct Frob { Frob() : conn{getDB().connect()} { } DBConnection conn;};

Page 29: An Overview of Software Testingwsumner/teaching/745/04-testing.pdf · Software Testing The most common way of measuring and ensuring program correctness Input Program Observed Behavior

Designing a Unit Test

● Common structure

● Tests should run in isolation

struct Frob { Frob() : conn{getDB().connect()} { } DBConnection conn;};

TEST_CASE("bad test 1") { Frob frob; ...}

TEST_CASE("bad test 2") { Frob frob; ...}

Page 30: An Overview of Software Testingwsumner/teaching/745/04-testing.pdf · Software Testing The most common way of measuring and ensuring program correctness Input Program Observed Behavior

Designing a Unit Test

● Common structure

● Tests should run in isolation

struct Frob { Frob() : conn{getDB().connect()} { } DBConnection conn;};

TEST_CASE("bad test 1") { Frob frob; ...}

TEST_CASE("bad test 2") { Frob frob; ...}

Page 31: An Overview of Software Testingwsumner/teaching/745/04-testing.pdf · Software Testing The most common way of measuring and ensuring program correctness Input Program Observed Behavior

Designing a Unit Test

● Common structure

● Tests should run in isolation

struct Frob { Frob() : conn{getDB().connect()} { } DBConnection conn;};

TEST_CASE("bad test 1") { Frob frob; ...}

TEST_CASE("bad test 2") { Frob frob; ...}

The order of the test can affect the results!

Page 32: An Overview of Software Testingwsumner/teaching/745/04-testing.pdf · Software Testing The most common way of measuring and ensuring program correctness Input Program Observed Behavior

Designing a Unit Test

● Common structure

● Tests should run in isolation

struct Frob { Frob() : conn{getDB().connect()} { } DBConnection conn;};

TEST_CASE("bad test 1") { Frob frob; ...}

TEST_CASE("bad test 2") { Frob frob; ...}

The order of the test can affect the results!

A flaky DB can affect results!

Page 33: An Overview of Software Testingwsumner/teaching/745/04-testing.pdf · Software Testing The most common way of measuring and ensuring program correctness Input Program Observed Behavior

Designing a Unit Test

● Common structure

● Tests should run in isolation!

Page 34: An Overview of Software Testingwsumner/teaching/745/04-testing.pdf · Software Testing The most common way of measuring and ensuring program correctness Input Program Observed Behavior

Designing a Unit Test

● Common structure

● Tests should run in isolation

struct Frob { Frob(Connection& inConn) : conn{inConn} { } Connection& conn;};

Page 35: An Overview of Software Testingwsumner/teaching/745/04-testing.pdf · Software Testing The most common way of measuring and ensuring program correctness Input Program Observed Behavior

Designing a Unit Test

● Common structure

● Tests should run in isolation

struct Frob { Frob(Connection& inConn) : conn{inConn} { } Connection& conn;};

Dependency injection allowsthe user of a class tocontrol its behavior

Page 36: An Overview of Software Testingwsumner/teaching/745/04-testing.pdf · Software Testing The most common way of measuring and ensuring program correctness Input Program Observed Behavior

Designing a Unit Test

● Common structure

● Tests should run in isolation

struct Frob { Frob(Connection& inConn) : conn{inConn} { } Connection& conn;};

Connection

Dependency injection allowsthe user of a class tocontrol its behavior

Page 37: An Overview of Software Testingwsumner/teaching/745/04-testing.pdf · Software Testing The most common way of measuring and ensuring program correctness Input Program Observed Behavior

Designing a Unit Test

● Common structure

● Tests should run in isolation

struct Frob { Frob(Connection& inConn) : conn{inConn} { } Connection& conn;};

Connection

DBConnection

Dependency injection allowsthe user of a class tocontrol its behavior

Page 38: An Overview of Software Testingwsumner/teaching/745/04-testing.pdf · Software Testing The most common way of measuring and ensuring program correctness Input Program Observed Behavior

Designing a Unit Test

● Common structure

● Tests should run in isolation

struct Frob { Frob(Connection& inConn) : conn{inConn} { } Connection& conn;};

Connection

DBConnection FakeConnection

Dependency injection allowsthe user of a class tocontrol its behavior

Page 39: An Overview of Software Testingwsumner/teaching/745/04-testing.pdf · Software Testing The most common way of measuring and ensuring program correctness Input Program Observed Behavior

Designing a Unit Test

● Common structure

● Tests should run in isolation

struct Frob { Frob(Connection& inConn) : conn{inConn} { } Connection& conn;};

TEST_CASE("better test 1") { FakeDB db; FakeConnection conn = db.connect(); Frob frob{conn}; ...}

Connection

DBConnection FakeConnection

Page 40: An Overview of Software Testingwsumner/teaching/745/04-testing.pdf · Software Testing The most common way of measuring and ensuring program correctness Input Program Observed Behavior

Designing a Unit Test

● Common structure

● Tests should run in isolation

struct Frob { Frob(Connection& inConn) : conn{inConn} { } Connection& conn;};

TEST_CASE("better test 1") { FakeDB db; FakeConnection conn = db.connect(); Frob frob{conn}; ...}

Connection

DBConnection

Mocks & stubs isolate and examinehow a component interacts with

dependenciesFakeConnection

Page 41: An Overview of Software Testingwsumner/teaching/745/04-testing.pdf · Software Testing The most common way of measuring and ensuring program correctness Input Program Observed Behavior

Designing a Unit Test

● Common structure

● Tests should run in isolation

● Key problem to resolve:– How do you define your inputs & oracles?

Page 42: An Overview of Software Testingwsumner/teaching/745/04-testing.pdf · Software Testing The most common way of measuring and ensuring program correctness Input Program Observed Behavior

Selecting Inputs

● Two broad categories

Page 43: An Overview of Software Testingwsumner/teaching/745/04-testing.pdf · Software Testing The most common way of measuring and ensuring program correctness Input Program Observed Behavior

Selecting Inputs

● Two broad categories– Black box testing – treat the program as opaque/unkown

Page 44: An Overview of Software Testingwsumner/teaching/745/04-testing.pdf · Software Testing The most common way of measuring and ensuring program correctness Input Program Observed Behavior

Selecting Inputs

● Two broad categories– Black box testing – treat the program as opaque/unkown

specification based (BDD?)model drivennaive fuzzingboundary value analysis

Page 45: An Overview of Software Testingwsumner/teaching/745/04-testing.pdf · Software Testing The most common way of measuring and ensuring program correctness Input Program Observed Behavior

Selecting Inputs

● Two broad categories– Black box testing – treat the program as opaque/unkown– White box testing – program structure & semantics can be used

Page 46: An Overview of Software Testingwsumner/teaching/745/04-testing.pdf · Software Testing The most common way of measuring and ensuring program correctness Input Program Observed Behavior

Selecting Inputs

● Two broad categories– Black box testing – treat the program as opaque/unkown– White box testing – program structure & semantics can be used

symbolic executioncall chain synthesiswhitebox fuzzing

Page 47: An Overview of Software Testingwsumner/teaching/745/04-testing.pdf · Software Testing The most common way of measuring and ensuring program correctness Input Program Observed Behavior

Designing Oracles

● Sometimes it is simple– For a known scenario, a specific output is expected

Page 48: An Overview of Software Testingwsumner/teaching/745/04-testing.pdf · Software Testing The most common way of measuring and ensuring program correctness Input Program Observed Behavior

Designing Oracles

● Sometimes it is simple– For a known scenario, a specific output is expected

● Invariants & properties are powerful

Page 49: An Overview of Software Testingwsumner/teaching/745/04-testing.pdf · Software Testing The most common way of measuring and ensuring program correctness Input Program Observed Behavior

Designing Oracles

● Sometimes it is simple– For a known scenario, a specific output is expected

● Invariants & properties are powerful– foo-1(foo(x)) == x (e.g. archive & unarchive a file)

Page 50: An Overview of Software Testingwsumner/teaching/745/04-testing.pdf · Software Testing The most common way of measuring and ensuring program correctness Input Program Observed Behavior

Designing Oracles

● Sometimes it is simple– For a known scenario, a specific output is expected

● Invariants & properties are powerful– foo-1(foo(x)) == x (e.g. archive & unarchive a file)– turn(360, direction) == direction

Page 51: An Overview of Software Testingwsumner/teaching/745/04-testing.pdf · Software Testing The most common way of measuring and ensuring program correctness Input Program Observed Behavior

Designing Oracles

● Sometimes it is simple– For a known scenario, a specific output is expected

● Invariants & properties are powerful– foo-1(foo(x)) == x (e.g. archive & unarchive a file)– turn(360, direction) == direction

Metamorphic testing

Page 52: An Overview of Software Testingwsumner/teaching/745/04-testing.pdf · Software Testing The most common way of measuring and ensuring program correctness Input Program Observed Behavior

Designing Oracles

● Sometimes it is simple– For a known scenario, a specific output is expected

● Invariants & properties are powerful– foo-1(foo(x)) == x (e.g. archive & unarchive a file)– turn(360, direction) == direction– program1(x) == program2(x)

Page 53: An Overview of Software Testingwsumner/teaching/745/04-testing.pdf · Software Testing The most common way of measuring and ensuring program correctness Input Program Observed Behavior

Designing Oracles

● Sometimes it is simple– For a known scenario, a specific output is expected

● Invariants & properties are powerful– foo-1(foo(x)) == x (e.g. archive & unarchive a file)– turn(360, direction) == direction– program1(x) == program2(x)

Differential testing

Page 54: An Overview of Software Testingwsumner/teaching/745/04-testing.pdf · Software Testing The most common way of measuring and ensuring program correctness Input Program Observed Behavior

Designing Oracles

● Sometimes it is simple– For a known scenario, a specific output is expected

● Invariants & properties are powerful– foo-1(foo(x)) == x (e.g. archive & unarchive a file)– turn(360, direction) == direction– program1(x) == program2(x)

General invariants can be exploitedin (semi)automated test generation

(e.g. property based)

Page 55: An Overview of Software Testingwsumner/teaching/745/04-testing.pdf · Software Testing The most common way of measuring and ensuring program correctness Input Program Observed Behavior

Designing Oracles

● Sometimes it is simple– For a known scenario, a specific output is expected

● Invariants & properties are powerful– foo-1(foo(x)) == x (e.g. archive & unarchive a file)– turn(360, direction) == direction– program1(x) == program2(x)

● Fully automated tests benefit from fully automated oracles– But the problem is hard

Page 56: An Overview of Software Testingwsumner/teaching/745/04-testing.pdf · Software Testing The most common way of measuring and ensuring program correctness Input Program Observed Behavior

Test Suite Adequacy

● A test suite should provide a metric on software quality

Page 57: An Overview of Software Testingwsumner/teaching/745/04-testing.pdf · Software Testing The most common way of measuring and ensuring program correctness Input Program Observed Behavior

Test Suite Adequacy

● A test suite should provide a metric on software quality– Passing a test should increase the metric– Failing a test should decrease the metric

Page 58: An Overview of Software Testingwsumner/teaching/745/04-testing.pdf · Software Testing The most common way of measuring and ensuring program correctness Input Program Observed Behavior

Test Suite Adequacy

● A test suite should provide a metric on software quality– Passing a test should increase the metric– Failing a test should decrease the metric

● But a test suite samples from the input space

Page 59: An Overview of Software Testingwsumner/teaching/745/04-testing.pdf · Software Testing The most common way of measuring and ensuring program correctness Input Program Observed Behavior

Test Suite Adequacy

● A test suite should provide a metric on software quality– Passing a test should increase the metric– Failing a test should decrease the metric

● But a test suite samples from the input space– Is it representative/biased?

Page 60: An Overview of Software Testingwsumner/teaching/745/04-testing.pdf · Software Testing The most common way of measuring and ensuring program correctness Input Program Observed Behavior

Test Suite Adequacy

● A test suite should provide a metric on software quality– Passing a test should increase the metric– Failing a test should decrease the metric

● But a test suite samples from the input space– Is it representative/biased?– Can we know?

Page 61: An Overview of Software Testingwsumner/teaching/745/04-testing.pdf · Software Testing The most common way of measuring and ensuring program correctness Input Program Observed Behavior

Test Suite Adequacy

● A test suite should provide a metric on software quality– Passing a test should increase the metric– Failing a test should decrease the metric

● But a test suite samples from the input space– Is it representative/biased?– Can we know?– Can we measure how likely a test suite is to measure what we want?

Page 62: An Overview of Software Testingwsumner/teaching/745/04-testing.pdf · Software Testing The most common way of measuring and ensuring program correctness Input Program Observed Behavior

Test Suite Adequacy

● A test suite should provide a metric on software quality– Passing a test should increase the metric– Failing a test should decrease the metric

● But a test suite samples from the input space– Is it representative/biased?– Can we know?– Can we measure how likely a test suite is to measure what we want?

● High level decision making– Is a test suite good enough? (Will a higher score mean fewer defects?)

Page 63: An Overview of Software Testingwsumner/teaching/745/04-testing.pdf · Software Testing The most common way of measuring and ensuring program correctness Input Program Observed Behavior

Test Suite Adequacy

● A test suite should provide a metric on software quality– Passing a test should increase the metric– Failing a test should decrease the metric

● But a test suite samples from the input space– Is it representative/biased?– Can we know?– Can we measure how likely a test suite is to measure what we want?

● High level decision making– Is a test suite good enough? (Will a higher score mean fewer defects?)– What parts of a program should be tested better?

Page 64: An Overview of Software Testingwsumner/teaching/745/04-testing.pdf · Software Testing The most common way of measuring and ensuring program correctness Input Program Observed Behavior

Test Suite Adequacy

● Metrics Remember: A higher scoreshould mean fewer defects

Page 65: An Overview of Software Testingwsumner/teaching/745/04-testing.pdf · Software Testing The most common way of measuring and ensuring program correctness Input Program Observed Behavior

Test Suite Adequacy

● Metrics– Statement coverage

def my_lovely_fun(a,b,c): if (a and b) or c: ... else: ...print(‘awesome’)

Is each statement coveredby at least one test

in the test suite?

score =# covered

# statements

Page 66: An Overview of Software Testingwsumner/teaching/745/04-testing.pdf · Software Testing The most common way of measuring and ensuring program correctness Input Program Observed Behavior

Test Suite Adequacy

● Metrics– Statement coverage– Branch coverage

def my_lovely_fun(a,b,c): if (a and b) or c: ... else: ...print(‘awesome’)

Is each branch coveredby at least one test

in the test suite?

score =# covered

# branches

a

b c

#T #F

p

Page 67: An Overview of Software Testingwsumner/teaching/745/04-testing.pdf · Software Testing The most common way of measuring and ensuring program correctness Input Program Observed Behavior

Test Suite Adequacy

● Metrics– Statement coverage– Branch coverage

def my_lovely_fun(a,b,c): if (a and b) or c: ... else: ...print(‘awesome’)

Is each branch coveredby at least one test

in the test suite?

score =# covered

# branches

a

b c

#T #F

p

Page 68: An Overview of Software Testingwsumner/teaching/745/04-testing.pdf · Software Testing The most common way of measuring and ensuring program correctness Input Program Observed Behavior

Test Suite Adequacy

● Metrics– Statement coverage– Branch coverage– MC/DC coverage*

def my_lovely_fun(a,b,c): if (a and b) or c: ... else: ...print(‘awesome’)

Does each term determinethe outcome of at least one condition

in the test suite?

Page 69: An Overview of Software Testingwsumner/teaching/745/04-testing.pdf · Software Testing The most common way of measuring and ensuring program correctness Input Program Observed Behavior

Test Suite Adequacy

● Metrics– Statement coverage– Branch coverage– MC/DC coverage*

def my_lovely_fun(a,b,c): if (a and b) or c: ... else: ...print(‘awesome’)

Does each term determinethe outcome of at least one condition

in the test suite?

Page 70: An Overview of Software Testingwsumner/teaching/745/04-testing.pdf · Software Testing The most common way of measuring and ensuring program correctness Input Program Observed Behavior

Test Suite Adequacy

● Metrics– Statement coverage– Branch coverage– MC/DC coverage*

def my_lovely_fun(a,b,c): if (a and b) or c: ... else: ...print(‘awesome’)

Does each term determinethe outcome of at least one condition

in the test suite? a=#T b=#T c=#F ↦ #Ta=#F b=#T c=#F ↦ #F

Page 71: An Overview of Software Testingwsumner/teaching/745/04-testing.pdf · Software Testing The most common way of measuring and ensuring program correctness Input Program Observed Behavior

Test Suite Adequacy

● Metrics– Statement coverage– Branch coverage– MC/DC coverage*

def my_lovely_fun(a,b,c): if (a and b) or c: ... else: ...print(‘awesome’)

Does each term determinethe outcome of at least one condition

in the test suite? a=#T b=#T c=#F ↦ #Ta=#F b=#T c=#F ↦ #F

a in this conditionis covered by the test suite

Page 72: An Overview of Software Testingwsumner/teaching/745/04-testing.pdf · Software Testing The most common way of measuring and ensuring program correctness Input Program Observed Behavior

Test Suite Adequacy

● Metrics– Statement coverage– Branch coverage– MC/DC coverage*

def my_lovely_fun(a,b,c): if (a and b) or c: ... else: ...print(‘awesome’)

Required by regulationin (e.g.) avionics,

safety critical systems,automotive software

Does each term determinethe outcome of at least one condition

in the test suite?

Page 73: An Overview of Software Testingwsumner/teaching/745/04-testing.pdf · Software Testing The most common way of measuring and ensuring program correctness Input Program Observed Behavior

Test Suite Adequacy

● Metrics– Statement coverage– Branch coverage– MC/DC coverage*– Mutation coverage*

def my_lovely_fun(a,b,c): if (a and b) or c: ... else: ...print(‘awesome’)

How many injected bugscan be detected by the test suite?

Page 74: An Overview of Software Testingwsumner/teaching/745/04-testing.pdf · Software Testing The most common way of measuring and ensuring program correctness Input Program Observed Behavior

Test Suite Adequacy

● Metrics– Statement coverage– Branch coverage– MC/DC coverage*– Mutation coverage*

def my_lovely_fun(a,b,c): if (a and b) or c: ... else: ...print(‘awesome’)

def my_lovely_fun(a,b,c): if (a and b) or c: ... else: ...print(‘awesome’)

def my_lovely_fun(a,b,c): if (a and b) or c: ... else: ...print(‘awesome’)

def my_lovely_fun(a,b,c): if (a and b) or c: ... else: ...print(‘awesome’)

def my_lovely_fun(a,b,c): if (a and b) or c: ... else: ...print(‘awesome’)

def my_lovely_fun(a,b,c): if (a and b) or c: ... else: ...print(‘awesome’)

def my_lovely_fun(a,b,c): if (a and b) or c: ... else: ...print(‘awesome’)

def my_lovely_fun(a,b,c): if (a and b) or c: ... else: ...print(‘awesome’)

def my_lovely_fun(a,b,c): if (a and b) or c: ... else: ...print(‘awesome’)

def my_lovely_fun(a,b,c): if (a and b) or c: ... else: ...print(‘awesome’)

How many injected bugscan be detected by the test suite?

Page 75: An Overview of Software Testingwsumner/teaching/745/04-testing.pdf · Software Testing The most common way of measuring and ensuring program correctness Input Program Observed Behavior

Test Suite Adequacy

● Metrics– Statement coverage– Branch coverage– MC/DC coverage*– Mutation coverage*

def my_lovely_fun(a,b,c): if (a and b) or c: ... else: ...print(‘awesome’)

def my_lovely_fun(a,b,c): if (a and b) or c: ... else: ...print(‘awesome’)

def my_lovely_fun(a,b,c): if (a and b) or c: ... else: ...print(‘awesome’)

def my_lovely_fun(a,b,c): if (a and b) or c: ... else: ...print(‘awesome’)

def my_lovely_fun(a,b,c): if (a and b) or c: ... else: ...print(‘awesome’)

def my_lovely_fun(a,b,c): if (a and b) or c: ... else: ...print(‘awesome’)

def my_lovely_fun(a,b,c): if (a and b) or c: ... else: ...print(‘awesome’)

def my_lovely_fun(a,b,c): if (a and b) or c: ... else: ...print(‘awesome’)

def my_lovely_fun(a,b,c): if (a and b) or c: ... else: ...print(‘awesome’)

def my_lovely_fun(a,b,c): if (a and b) or c: ... else: ...print(‘awesome’)

How many injected bugscan be detected by the test suite?

Page 76: An Overview of Software Testingwsumner/teaching/745/04-testing.pdf · Software Testing The most common way of measuring and ensuring program correctness Input Program Observed Behavior

Test Suite Adequacy

● Metrics– Statement coverage– Branch coverage– MC/DC coverage*– Mutation coverage*

def my_lovely_fun(a,b,c): if (a and b) or c: ... else: ...print(‘awesome’)

def my_lovely_fun(a,b,c): if (a and b) or c: ... else: ...print(‘awesome’)

def my_lovely_fun(a,b,c): if (a and b) or c: ... else: ...print(‘awesome’)

def my_lovely_fun(a,b,c): if (a and b) or c: ... else: ...print(‘awesome’)

def my_lovely_fun(a,b,c): if (a and b) or c: ... else: ...print(‘awesome’)

def my_lovely_fun(a,b,c): if (a and b) or c: ... else: ...print(‘awesome’)

def my_lovely_fun(a,b,c): if (a and b) or c: ... else: ...print(‘awesome’)

def my_lovely_fun(a,b,c): if (a and b) or c: ... else: ...print(‘awesome’)

def my_lovely_fun(a,b,c): if (a and b) or c: ... else: ...print(‘awesome’)

def my_lovely_fun(a,b,c): if (a and b) or c: ... else: ...print(‘awesome’)

score =# non-equivalent mutants

# covered/killed

How many injected bugscan be detected by the test suite?

Page 77: An Overview of Software Testingwsumner/teaching/745/04-testing.pdf · Software Testing The most common way of measuring and ensuring program correctness Input Program Observed Behavior

Test Suite Adequacy

● Metrics– Statement coverage– Branch coverage– MC/DC coverage*– Mutation coverage*– Path coverage– ...

def my_lovely_fun(a,b,c): if (a and b) or c: ... else: ...print(‘awesome’)

a

b c

#T #F

p

Is each path coveredby at least one test

in the test suite?

Page 78: An Overview of Software Testingwsumner/teaching/745/04-testing.pdf · Software Testing The most common way of measuring and ensuring program correctness Input Program Observed Behavior

Test Suite Adequacy

● Metrics– Statement coverage– Branch coverage– MC/DC coverage*– Mutation coverage*– Path coverage– ...

def my_lovely_fun(a,b,c): if (a and b) or c: ... else: ...print(‘awesome’)

a

b c

#T #F

p

abTabcTabcFacTacF

Is each path coveredby at least one test

in the test suite?

Page 79: An Overview of Software Testingwsumner/teaching/745/04-testing.pdf · Software Testing The most common way of measuring and ensuring program correctness Input Program Observed Behavior

Test Suite Adequacy

● Metrics– Statement coverage– Branch coverage– MC/DC coverage*– Mutation coverage*– Path coverage– ...

def my_lovely_fun(a,b,c): if (a and b) or c: ... else: ...print(‘awesome’)

a

b c

#T #F

p

abTabcTabcFacTacF

Is each path coveredby at least one test

in the test suite?

Page 80: An Overview of Software Testingwsumner/teaching/745/04-testing.pdf · Software Testing The most common way of measuring and ensuring program correctness Input Program Observed Behavior

Test Suite Adequacy

● Metrics– Statement coverage– Branch coverage– MC/DC coverage*– Mutation coverage*– Path coverage– ...

But shrinking test suites while maintainingSt, Br, MC/DC decreases defect detection.

There is more going on here.

Page 81: An Overview of Software Testingwsumner/teaching/745/04-testing.pdf · Software Testing The most common way of measuring and ensuring program correctness Input Program Observed Behavior

MC/DC Testing

Page 82: An Overview of Software Testingwsumner/teaching/745/04-testing.pdf · Software Testing The most common way of measuring and ensuring program correctness Input Program Observed Behavior

MC/DC Coverage

● Logic & conditional behaviors are pervasive

Page 83: An Overview of Software Testingwsumner/teaching/745/04-testing.pdf · Software Testing The most common way of measuring and ensuring program correctness Input Program Observed Behavior

MC/DC Coverage

● Logic & conditional behaviors are pervasive

● if statements are the most frequently fixed statements in bug fixes[Pan, ESE 2008]

Page 84: An Overview of Software Testingwsumner/teaching/745/04-testing.pdf · Software Testing The most common way of measuring and ensuring program correctness Input Program Observed Behavior

MC/DC Coverage

● Logic & conditional behaviors are pervasive

● if statements are the most frequently fixed statements in bug fixes[Pan, ESE 2008]

● Safety critical systems often involve many complex conditions(avionics, medical, automotive, ...)

Page 85: An Overview of Software Testingwsumner/teaching/745/04-testing.pdf · Software Testing The most common way of measuring and ensuring program correctness Input Program Observed Behavior

MC/DC Coverage

● Logic & conditional behaviors are pervasive

● if statements are the most frequently fixed statements in bug fixes[Pan, ESE 2008]

● Safety critical systems often involve many complex conditions(avionics, medical, automotive, ...)

● We should place more effort/burden on ensuring correctness of conditions

Page 86: An Overview of Software Testingwsumner/teaching/745/04-testing.pdf · Software Testing The most common way of measuring and ensuring program correctness Input Program Observed Behavior

MC/DC Coverage

● A predicate is simply a boolean expression.

Page 87: An Overview of Software Testingwsumner/teaching/745/04-testing.pdf · Software Testing The most common way of measuring and ensuring program correctness Input Program Observed Behavior

MC/DC Coverage

● A predicate is simply a boolean expression.

● Predicate Coverage requires each predicate to be true in one test & be false in one test.

Page 88: An Overview of Software Testingwsumner/teaching/745/04-testing.pdf · Software Testing The most common way of measuring and ensuring program correctness Input Program Observed Behavior

MC/DC Coverage

● A predicate is simply a boolean expression.

● Predicate Coverage requires each predicate to be true in one test & be false in one test.

if (a || b) && (c || d): s

if (a | b) & (c | d): s

How does it do in these cases?

Page 89: An Overview of Software Testingwsumner/teaching/745/04-testing.pdf · Software Testing The most common way of measuring and ensuring program correctness Input Program Observed Behavior

if (a || b) && (c || d): s

if (a | b) & (c | d): s

MC/DC Coverage

● A predicate is simply a boolean expression.

● Predicate Coverage requires each predicate to be true in one test & be false in one test.

T F T F

Page 90: An Overview of Software Testingwsumner/teaching/745/04-testing.pdf · Software Testing The most common way of measuring and ensuring program correctness Input Program Observed Behavior

MC/DC Coverage

● A predicate is simply a boolean expression.

● Predicate Coverage requires each predicate to be true in one test & be false in one test.

● Clause Coverage requires each clause to be true in one test & be false in one test.

Page 91: An Overview of Software Testingwsumner/teaching/745/04-testing.pdf · Software Testing The most common way of measuring and ensuring program correctness Input Program Observed Behavior

MC/DC Coverage

● A predicate is simply a boolean expression

● Predicate Coverage requires each predicate to be true in one test & be false in one test.

● Clause Coverage requires each clause to be true in one test & be false in one test.

if (a || b) && (c || d): s

if (a | b) & (c | d): s

How does it do in these cases?

Page 92: An Overview of Software Testingwsumner/teaching/745/04-testing.pdf · Software Testing The most common way of measuring and ensuring program correctness Input Program Observed Behavior

MC/DC Coverage

● A predicate is simply a boolean expression

● Predicate Coverage requires each predicate to be true in one test & be false in one test.

● Clause Coverage requires each clause to be true in one test & be false in one test.

if (a || b) && (c || d): s

if (a | b) & (c | d): s

T F T FT F T F T F T F T F T F

Page 93: An Overview of Software Testingwsumner/teaching/745/04-testing.pdf · Software Testing The most common way of measuring and ensuring program correctness Input Program Observed Behavior

MC/DC Coverage

● A predicate is simply a boolean expression

● Predicate Coverage requires each predicate to be true in one test & be false in one test.

● Clause Coverage requires each clause to be true in one test & be false in one test.

if (a || b) && (c || d): s

if (a | b) & (c | d): s

T F T FT F T F T F T F T F T F

How many tests?

Page 94: An Overview of Software Testingwsumner/teaching/745/04-testing.pdf · Software Testing The most common way of measuring and ensuring program correctness Input Program Observed Behavior

MC/DC Coverage

● A predicate is simply a boolean expression

● Predicate Coverage requires each predicate to be true in one test & be false in one test.

● Clause Coverage requires each clause to be true in one test & be false in one test.

if (a || b) && (c || d): s

if (a | b) & (c | d): s

T F T FT F T F T F T F T F T F

How many tests?

Minimum of 2 tests

Page 95: An Overview of Software Testingwsumner/teaching/745/04-testing.pdf · Software Testing The most common way of measuring and ensuring program correctness Input Program Observed Behavior

MC/DC Coverage

● A predicate is simply a boolean expression

● Predicate Coverage requires each predicate to be true in one test & be false in one test.

● Clause Coverage requires each clause to be true in one test & be false in one test.

if (a || b) && (c || d): s

if (a | b) & (c | d): s

T F T FT F T F T F T F T F T F

How many tests?

Minimum of 2 tests

a=true, b=true, c=false, d=falsea=false, b=false, c=true, d=true

Page 96: An Overview of Software Testingwsumner/teaching/745/04-testing.pdf · Software Testing The most common way of measuring and ensuring program correctness Input Program Observed Behavior

MC/DC Coverage

● A predicate is simply a boolean expression

● Predicate Coverage requires each predicate to be true in one test & be false in one test.

● Clause Coverage requires each clause to be true in one test & be false in one test.

if (a || b) && (c || d): s

if (a | b) & (c | d): s

T F T FT F T F T F T F T F T F

How many tests?

Minimum of 2 tests

a=true, b=true, c=false, d=falsea=false, b=false, c=true, d=true

Page 97: An Overview of Software Testingwsumner/teaching/745/04-testing.pdf · Software Testing The most common way of measuring and ensuring program correctness Input Program Observed Behavior

MC/DC Coverage

● Modified Condition/Decision Coverage

Page 98: An Overview of Software Testingwsumner/teaching/745/04-testing.pdf · Software Testing The most common way of measuring and ensuring program correctness Input Program Observed Behavior

MC/DC Coverage

● Modified Condition/Decision Coverage1) Each entry & exit is used

Page 99: An Overview of Software Testingwsumner/teaching/745/04-testing.pdf · Software Testing The most common way of measuring and ensuring program correctness Input Program Observed Behavior

MC/DC Coverage

● Modified Condition/Decision Coverage1) Each entry & exit is used2) Each decision/branch takes every possible outcome

Page 100: An Overview of Software Testingwsumner/teaching/745/04-testing.pdf · Software Testing The most common way of measuring and ensuring program correctness Input Program Observed Behavior

MC/DC Coverage

● Modified Condition/Decision Coverage1) Each entry & exit is used2) Each decision/branch takes every possible outcome3) Each clause takes every possible outcome

Page 101: An Overview of Software Testingwsumner/teaching/745/04-testing.pdf · Software Testing The most common way of measuring and ensuring program correctness Input Program Observed Behavior

MC/DC Coverage

● Modified Condition/Decision Coverage1) Each entry & exit is used2) Each decision/branch takes every possible outcome3) Each clause takes every possible outcome4) Each clause independently impacts the the outcome

Page 102: An Overview of Software Testingwsumner/teaching/745/04-testing.pdf · Software Testing The most common way of measuring and ensuring program correctness Input Program Observed Behavior

MC/DC Coverage

● Modified Condition/Decision Coverage1) Each entry & exit is used2) Each decision/branch takes every possible outcome3) Each clause takes every possible outcome4) Each clause independently impacts the the outcome

● Use in safety critical systems: avionics, spacecraft, …

Page 103: An Overview of Software Testingwsumner/teaching/745/04-testing.pdf · Software Testing The most common way of measuring and ensuring program correctness Input Program Observed Behavior

MC/DC Coverage

● Modified Condition/Decision Coverage1) Each entry & exit is used2) Each decision/branch takes every possible outcome3) Each clause takes every possible outcome4) Each clause independently impacts the the outcome

● Use in safety critical systems: avionics, spacecraft, …

● Not only ensures that clauses are tested, but that each has an impact

Page 104: An Overview of Software Testingwsumner/teaching/745/04-testing.pdf · Software Testing The most common way of measuring and ensuring program correctness Input Program Observed Behavior

MC/DC Coverage

● A clause determines the outcome of a predicate when changing only the value of that clause changes the outcome of the predicate

Page 105: An Overview of Software Testingwsumner/teaching/745/04-testing.pdf · Software Testing The most common way of measuring and ensuring program correctness Input Program Observed Behavior

MC/DC Coverage

● A clause determines the outcome of a predicate when changing only the value of that clause changes the outcome of the predicate

φ(a,b,c) ≠ φ(a,b,¬c)

Page 106: An Overview of Software Testingwsumner/teaching/745/04-testing.pdf · Software Testing The most common way of measuring and ensuring program correctness Input Program Observed Behavior

MC/DC Coverage

● A clause determines the outcome of a predicate when changing only the value of that clause changes the outcome of the predicate

φ(a,b,c) ≠ φ(a,b,¬c)(a || b && c)

Page 107: An Overview of Software Testingwsumner/teaching/745/04-testing.pdf · Software Testing The most common way of measuring and ensuring program correctness Input Program Observed Behavior

MC/DC Coverage

● A clause determines the outcome of a predicate when changing only the value of that clause changes the outcome of the predicate

φ(a,b,c) ≠ φ(a,b,¬c)(a || b && c)

a=Fb=Tc=T

T

Page 108: An Overview of Software Testingwsumner/teaching/745/04-testing.pdf · Software Testing The most common way of measuring and ensuring program correctness Input Program Observed Behavior

MC/DC Coverage

● A clause determines the outcome of a predicate when changing only the value of that clause changes the outcome of the predicate

φ(a,b,c) ≠ φ(a,b,¬c)(a || b && c)

a=Fb=Tc=T

T

a=Fb=Tc=F

F

Page 109: An Overview of Software Testingwsumner/teaching/745/04-testing.pdf · Software Testing The most common way of measuring and ensuring program correctness Input Program Observed Behavior

MC/DC Coverage

● A clause determines the outcome of a predicate when changing only the value of that clause changes the outcome of the predicate

φ(a,b,c) ≠ φ(a,b,¬c)(a || b && c)

a=Fb=Tc=T

T

a=Fb=Tc=F

F

Page 110: An Overview of Software Testingwsumner/teaching/745/04-testing.pdf · Software Testing The most common way of measuring and ensuring program correctness Input Program Observed Behavior

MC/DC Coverage

● A clause determines the outcome of a predicate when changing only the value of that clause changes the outcome of the predicate

φ(a,b,c) ≠ φ(a,b,¬c)(a || b && c)

a=Fb=Tc=T

T

a=Fb=Tc=F

F

This pair of tests shows the impact of c.

Page 111: An Overview of Software Testingwsumner/teaching/745/04-testing.pdf · Software Testing The most common way of measuring and ensuring program correctness Input Program Observed Behavior

MC/DC Coverage

● A clause determines the outcome of a predicate when changing only the value of that clause changes the outcome of the predicate

● The basic steps come from & and |

Page 112: An Overview of Software Testingwsumner/teaching/745/04-testing.pdf · Software Testing The most common way of measuring and ensuring program correctness Input Program Observed Behavior

MC/DC Coverage

● A clause determines the outcome of a predicate when changing only the value of that clause changes the outcome of the predicate

● The basic steps come from & and |

a & bIf a=True, b determines

the outcome.

Page 113: An Overview of Software Testingwsumner/teaching/745/04-testing.pdf · Software Testing The most common way of measuring and ensuring program correctness Input Program Observed Behavior

MC/DC Coverage

● A clause determines the outcome of a predicate when changing only the value of that clause changes the outcome of the predicate

● The basic steps come from & and |

a | bIf a=False, b determines

the outcome.

a & bIf a=True, b determines

the outcome.

Page 114: An Overview of Software Testingwsumner/teaching/745/04-testing.pdf · Software Testing The most common way of measuring and ensuring program correctness Input Program Observed Behavior

MC/DC Coverage

● A clause determines the outcome of a predicate when changing only the value of that clause changes the outcome of the predicate

● The basic steps come from & and |

● By definition, solve φc=true ⊕ φc=false

a | bIf a=False, b determines

the outcome.

a & bIf a=True, b determines

the outcome.

Page 115: An Overview of Software Testingwsumner/teaching/745/04-testing.pdf · Software Testing The most common way of measuring and ensuring program correctness Input Program Observed Behavior

MC/DC Coverage

● Given a | (b & c), generate tests for a

a has impact ↔

Page 116: An Overview of Software Testingwsumner/teaching/745/04-testing.pdf · Software Testing The most common way of measuring and ensuring program correctness Input Program Observed Behavior

MC/DC Coverage

● Given a | (b & c), generate tests for a

#T | (b & c)a has impact ↔ ≠ #F | (b & c)

Page 117: An Overview of Software Testingwsumner/teaching/745/04-testing.pdf · Software Testing The most common way of measuring and ensuring program correctness Input Program Observed Behavior

MC/DC Coverage

● Given a | (b & c), generate tests for a

#T | (b & c)a has impact ↔ ≠ #F | (b & c)

#T↔ ≠ b & c

Page 118: An Overview of Software Testingwsumner/teaching/745/04-testing.pdf · Software Testing The most common way of measuring and ensuring program correctness Input Program Observed Behavior

MC/DC Coverage

● Given a | (b & c), generate tests for a

#T | (b & c)a has impact ↔ ≠ #F | (b & c)

#T↔ ≠ b & c

#T↔ = ¬b | ¬c

Page 119: An Overview of Software Testingwsumner/teaching/745/04-testing.pdf · Software Testing The most common way of measuring and ensuring program correctness Input Program Observed Behavior

MC/DC Coverage

● Given a | (b & c), generate tests for a

#T | (b & c)a has impact ↔ ≠ #F | (b & c)

#T↔ ≠ b & c

#T↔ = ¬b | ¬c

b is false or c is false↔

Page 120: An Overview of Software Testingwsumner/teaching/745/04-testing.pdf · Software Testing The most common way of measuring and ensuring program correctness Input Program Observed Behavior

MC/DC Coverage

● Given a | (b & c), generate tests for a

#T | (b & c)a has impact ↔ ≠ #F | (b & c)

#T↔ ≠ b & c

#T↔ = ¬b | ¬c

b is false or c is false↔

defines two different ways to test a

Page 121: An Overview of Software Testingwsumner/teaching/745/04-testing.pdf · Software Testing The most common way of measuring and ensuring program correctness Input Program Observed Behavior

MC/DC Coverage

● Given a | (b & c), generate tests for a

#T | (b & c)a has impact ↔ ≠ #F | (b & c)

#T↔ ≠ b & c

#T↔ = ¬b | ¬c

b is false or c is false↔

defines two different ways to test a

a=#T, b=#F, c=#Ta=#F, b=#F, c=#T

Have b be #F

Page 122: An Overview of Software Testingwsumner/teaching/745/04-testing.pdf · Software Testing The most common way of measuring and ensuring program correctness Input Program Observed Behavior

MC/DC Coverage

● Given a | (b & c), generate tests for a

#T | (b & c)a has impact ↔ ≠ #F | (b & c)

#T↔ ≠ b & c

#T↔ = ¬b | ¬c

b is false or c is false↔

defines two different ways to test a

a=#T, b=#F, c=#Ta=#F, b=#F, c=#T

a=#T, b=#T, c=#Fa=#F, b=#T, c=#F

Have b be #F Have c be #F

Page 123: An Overview of Software Testingwsumner/teaching/745/04-testing.pdf · Software Testing The most common way of measuring and ensuring program correctness Input Program Observed Behavior

MC/DC Coverage

● What about (a & b) | (a & ¬b)? – Can you show the impact of a?

Page 124: An Overview of Software Testingwsumner/teaching/745/04-testing.pdf · Software Testing The most common way of measuring and ensuring program correctness Input Program Observed Behavior

MC/DC Coverage

● What about (a & b) | (a & ¬b)? – Can you show the impact of a?– Can you show the impact of b?

Page 125: An Overview of Software Testingwsumner/teaching/745/04-testing.pdf · Software Testing The most common way of measuring and ensuring program correctness Input Program Observed Behavior

MC/DC Coverage

● What about (a & b) | (a & ¬b)? – Can you show the impact of a?– Can you show the impact of b?

Lack of MC/DC coveragecan also identify bugs.

Page 126: An Overview of Software Testingwsumner/teaching/745/04-testing.pdf · Software Testing The most common way of measuring and ensuring program correctness Input Program Observed Behavior

MC/DC Coverage

● What about (a & b) | (a & ¬b)? – Can you show the impact of a?– Can you show the impact of b?

● BUT NASA recommended not generating MC/DC coverage.– Use MC/DC as a means of evaluating test suites generated by other means

Page 127: An Overview of Software Testingwsumner/teaching/745/04-testing.pdf · Software Testing The most common way of measuring and ensuring program correctness Input Program Observed Behavior

Mutation Testing

Page 128: An Overview of Software Testingwsumner/teaching/745/04-testing.pdf · Software Testing The most common way of measuring and ensuring program correctness Input Program Observed Behavior

Mutation Analysis

● Instead of covering program elements,estimate defect finding on a sample of representative bugs

Page 129: An Overview of Software Testingwsumner/teaching/745/04-testing.pdf · Software Testing The most common way of measuring and ensuring program correctness Input Program Observed Behavior

Mutation Analysis

● Instead of covering program elements,estimate defect finding on a sample of representative bugs

● Mutant– A valid program that behaves differently than the original

Page 130: An Overview of Software Testingwsumner/teaching/745/04-testing.pdf · Software Testing The most common way of measuring and ensuring program correctness Input Program Observed Behavior

Mutation Analysis

● Instead of covering program elements,estimate defect finding on a sample of representative bugs

● Mutant– A valid program that behaves differently than the original– Consider small, local changes to programs

Page 131: An Overview of Software Testingwsumner/teaching/745/04-testing.pdf · Software Testing The most common way of measuring and ensuring program correctness Input Program Observed Behavior

Mutation Analysis

● Instead of covering program elements,estimate defect finding on a sample of representative bugs

● Mutant– A valid program that behaves differently than the original– Consider small, local changes to programs

a = b + c a = b * c

Page 132: An Overview of Software Testingwsumner/teaching/745/04-testing.pdf · Software Testing The most common way of measuring and ensuring program correctness Input Program Observed Behavior

Mutation Analysis

● Instead of covering program elements,estimate defect finding on a sample of representative bugs

● Mutant– A valid program that behaves differently than the original– Consider small, local changes to programs– A test t kills a mutant m if t produces a different outcome on m than

the original program

Page 133: An Overview of Software Testingwsumner/teaching/745/04-testing.pdf · Software Testing The most common way of measuring and ensuring program correctness Input Program Observed Behavior

Mutation Analysis

● Instead of covering program elements,estimate defect finding on a sample of representative bugs

● Mutant– A valid program that behaves differently than the original– Consider small, local changes to programs– A test t kills a mutant m if t produces a different outcome on m than

the original program What does this mean?

Page 134: An Overview of Software Testingwsumner/teaching/745/04-testing.pdf · Software Testing The most common way of measuring and ensuring program correctness Input Program Observed Behavior

Mutation Analysis

● Instead of covering program elements,estimate defect finding on a sample of representative bugs

● Mutant– A valid program that behaves differently than the original– Consider small, local changes to programs– A test t kills a mutant m if t produces a different outcome on m than

the original program

● Systematically generate mutants separately from original program

Page 135: An Overview of Software Testingwsumner/teaching/745/04-testing.pdf · Software Testing The most common way of measuring and ensuring program correctness Input Program Observed Behavior

Mutation Analysis

● Instead of covering program elements,estimate defect finding on a sample of representative bugs

● Mutant– A valid program that behaves differently than the original– Consider small, local changes to programs– A test t kills a mutant m if t produces a different outcome on m than

the original program

● Systematically generate mutants separately from original program

● The goal is to:– Mutation Analysis – Measure bug finding ability– Mutation Testing – create a test suite that kills a representative set of

mutants

Page 136: An Overview of Software Testingwsumner/teaching/745/04-testing.pdf · Software Testing The most common way of measuring and ensuring program correctness Input Program Observed Behavior

Mutation Analysis

● Instead of covering program elements,estimate defect finding on a sample of representative bugs

● Mutant– A valid program that behaves differently than the original– Consider small, local changes to programs– A test t kills a mutant m if t produces a different outcome on m than

the original program

● Systematically generate mutants separately from original program

● The goal is to:– Mutation Analysis – Measure bug finding ability– Mutation Testing – create a test suite that kills a representative set of

mutants Depending on the source, these may swap...

Page 137: An Overview of Software Testingwsumner/teaching/745/04-testing.pdf · Software Testing The most common way of measuring and ensuring program correctness Input Program Observed Behavior

137

Mutation

● What are possible mutants?

int foo(int x, int y) { if (x > 5) {return x + y;} else {return x;}}

Page 138: An Overview of Software Testingwsumner/teaching/745/04-testing.pdf · Software Testing The most common way of measuring and ensuring program correctness Input Program Observed Behavior

138

Mutation

● What are possible mutants?

● Once we have a test case that kills a mutant, the mutant itself is no longer useful.

int foo(int x, int y) { if (x > 5) {return x + y;} else {return x;}}

Page 139: An Overview of Software Testingwsumner/teaching/745/04-testing.pdf · Software Testing The most common way of measuring and ensuring program correctness Input Program Observed Behavior

139

Mutation

● What are possible mutants?

● Once we have a test case that kills a mutant, the mutant itself is no longer useful.

● Some are not generally useful:

– (Still Born) Not compilable

int foo(int x, int y) { if (x > 5) {return x + y;} else {return x;}}

Page 140: An Overview of Software Testingwsumner/teaching/745/04-testing.pdf · Software Testing The most common way of measuring and ensuring program correctness Input Program Observed Behavior

140

Mutation

● What are possible mutants?

● Once we have a test case that kills a mutant, the mutant itself is no longer useful.

● Some are not generally useful:

– (Still Born) Not compilable– (Trivial) Killed by most test cases

int foo(int x, int y) { if (x > 5) {return x + y;} else {return x;}}

Page 141: An Overview of Software Testingwsumner/teaching/745/04-testing.pdf · Software Testing The most common way of measuring and ensuring program correctness Input Program Observed Behavior

141

Mutation

● What are possible mutants?

● Once we have a test case that kills a mutant, the mutant itself is no longer useful.

● Some are not generally useful:

– (Still Born) Not compilable– (Trivial) Killed by most test cases– (Equivalent) Indistinguishable from original program

int foo(int x, int y) { if (x > 5) {return x + y;} else {return x;}}

Page 142: An Overview of Software Testingwsumner/teaching/745/04-testing.pdf · Software Testing The most common way of measuring and ensuring program correctness Input Program Observed Behavior

142

Mutation

● What are possible mutants?

● Once we have a test case that kills a mutant, the mutant itself is no longer useful.

● Some are not generally useful:

– (Still Born) Not compilable– (Trivial) Killed by most test cases– (Equivalent) Indistinguishable from original program– (Redundant) Indistinguishable from other mutants

int foo(int x, int y) { if (x > 5) {return x + y;} else {return x;}}

Page 143: An Overview of Software Testingwsumner/teaching/745/04-testing.pdf · Software Testing The most common way of measuring and ensuring program correctness Input Program Observed Behavior

143

Mutation

int min(int a, int b) { int minVal; minVal = a; if (b < a) { minVal = b; } return minVal;}

– Mimic mistakes– Encode knowledge from other techniques

Page 144: An Overview of Software Testingwsumner/teaching/745/04-testing.pdf · Software Testing The most common way of measuring and ensuring program correctness Input Program Observed Behavior

144

Mutation

int min(int a, int b) { int minVal; minVal = a; if (b < a) { minVal = b; } return minVal;}

int min(int a, int b) { int minVal; minVal = a; if (b < a) { minVal = b; } return minVal;}

– Mimic mistakes– Encode knowledge from other techniques

Page 145: An Overview of Software Testingwsumner/teaching/745/04-testing.pdf · Software Testing The most common way of measuring and ensuring program correctness Input Program Observed Behavior

145

Mutation

int min(int a, int b) { int minVal; minVal = a; if (b < a) { minVal = b; } return minVal;}

int min(int a, int b) { int minVal; minVal = a; minVal = b; if (b < a) { minVal = b; } return minVal;}

Mutant 1: minVal = b;

– Mimic mistakes– Encode knowledge from other techniques

Page 146: An Overview of Software Testingwsumner/teaching/745/04-testing.pdf · Software Testing The most common way of measuring and ensuring program correctness Input Program Observed Behavior

146

Mutation

int min(int a, int b) { int minVal; minVal = a; if (b < a) { minVal = b; } return minVal;}

int min(int a, int b) { int minVal; minVal = a; minVal = b; if (b < a) { if (b > a) { minVal = b; } return minVal;}

Mutant 1: minVal = b;

Mutant 2: if (b > a) {

– Mimic mistakes– Encode knowledge from other techniques

Page 147: An Overview of Software Testingwsumner/teaching/745/04-testing.pdf · Software Testing The most common way of measuring and ensuring program correctness Input Program Observed Behavior

147

Mutation

int min(int a, int b) { int minVal; minVal = a; if (b < a) { minVal = b; } return minVal;}

int min(int a, int b) { int minVal; minVal = a; minVal = b; if (b < a) { if (b > a) { if (b < minVal) { minVal = b; } return minVal;}

Mutant 1: minVal = b;

Mutant 2: if (b > a) {Mutant 3: if (b < minVal) {

– Mimic mistakes– Encode knowledge from other techniques

Page 148: An Overview of Software Testingwsumner/teaching/745/04-testing.pdf · Software Testing The most common way of measuring and ensuring program correctness Input Program Observed Behavior

148

Mutation

int min(int a, int b) { int minVal; minVal = a; if (b < a) { minVal = b; } return minVal;}

int min(int a, int b) { int minVal; minVal = a; minVal = b; if (b < a) { if (b > a) { if (b < minVal) { minVal = b; BOMB(); } return minVal;}

Mutant 1: minVal = b;

Mutant 2: if (b > a) {Mutant 3: if (b < minVal) {

Mutant 4: BOMB();

– Mimic mistakes– Encode knowledge from other techniques

Page 149: An Overview of Software Testingwsumner/teaching/745/04-testing.pdf · Software Testing The most common way of measuring and ensuring program correctness Input Program Observed Behavior

149

Mutation

int min(int a, int b) { int minVal; minVal = a; if (b < a) { minVal = b; } return minVal;}

int min(int a, int b) { int minVal; minVal = a; minVal = b; if (b < a) { if (b > a) { if (b < minVal) { minVal = b; BOMB(); minVal = a; } return minVal;}

Mutant 1: minVal = b;

Mutant 2: if (b > a) {Mutant 3: if (b < minVal) {

Mutant 4: BOMB();Mutant 5: minVal = a;

– Mimic mistakes– Encode knowledge from other techniques

Page 150: An Overview of Software Testingwsumner/teaching/745/04-testing.pdf · Software Testing The most common way of measuring and ensuring program correctness Input Program Observed Behavior

150

Mutation

int min(int a, int b) { int minVal; minVal = a; if (b < a) { minVal = b; } return minVal;}

int min(int a, int b) { int minVal; minVal = a; minVal = b; if (b < a) { if (b > a) { if (b < minVal) { minVal = b; BOMB(); minVal = a; minVal = failOnZero(b); } return minVal;}

Mutant 1: minVal = b;

Mutant 2: if (b > a) {Mutant 3: if (b < minVal) {

Mutant 4: BOMB();Mutant 5: minVal = a;Mutant 6: minVal = failOnZero(b);

– Mimic mistakes– Encode knowledge from other techniques

Page 151: An Overview of Software Testingwsumner/teaching/745/04-testing.pdf · Software Testing The most common way of measuring and ensuring program correctness Input Program Observed Behavior

151

Mutation

int min(int a, int b) { int minVal; minVal = a; if (b < a) { minVal = b; } return minVal;}

int min(int a, int b) { int minVal; minVal = a; minVal = b; if (b < a) { if (b > a) { if (b < minVal) { minVal = b; BOMB(); minVal = a; minVal = failOnZero(b); } return minVal;}

Mutant 1: minVal = b;

Mutant 2: if (b > a) {Mutant 3: if (b < minVal) {

Mutant 4: BOMB();Mutant 5: minVal = a;Mutant 6: minVal = failOnZero(b);

What mimicsstatement coverage?

– Mimic mistakes– Encode knowledge from other techniques

Page 152: An Overview of Software Testingwsumner/teaching/745/04-testing.pdf · Software Testing The most common way of measuring and ensuring program correctness Input Program Observed Behavior

152

Mutation Analysis

MutantsMutant 1Mutant 2Mutant 3Mutant 4Mutant 5Mutant 6

Page 153: An Overview of Software Testingwsumner/teaching/745/04-testing.pdf · Software Testing The most common way of measuring and ensuring program correctness Input Program Observed Behavior

153

Mutation Analysis

Mutants Test Suitemin(1,2) Ô 1min(2,1) Ô 1

Mutant 1Mutant 2Mutant 3Mutant 4Mutant 5Mutant 6

Page 154: An Overview of Software Testingwsumner/teaching/745/04-testing.pdf · Software Testing The most common way of measuring and ensuring program correctness Input Program Observed Behavior

154

Mutation Analysis

Mutants Test Suitemin(1,2) Ô 1min(2,1) Ô 1

Try every mutant on test 1.

Mutant 1Mutant 2Mutant 3Mutant 4Mutant 5Mutant 6

Page 155: An Overview of Software Testingwsumner/teaching/745/04-testing.pdf · Software Testing The most common way of measuring and ensuring program correctness Input Program Observed Behavior

155

Mutation Analysis

Mutants Test Suitemin(1,2) Ô 1min(2,1) Ô 1Ki

lledMutant 1

Mutant 2Mutant 3Mutant 4Mutant 5Mutant 6

Page 156: An Overview of Software Testingwsumner/teaching/745/04-testing.pdf · Software Testing The most common way of measuring and ensuring program correctness Input Program Observed Behavior

156

Mutation Analysis

Mutants Test Suitemin(1,2) Ô 1min(2,1) Ô 1Ki

lled

Try every live mutant on test 2.

Mutant 1Mutant 2Mutant 3Mutant 4Mutant 5Mutant 6

Page 157: An Overview of Software Testingwsumner/teaching/745/04-testing.pdf · Software Testing The most common way of measuring and ensuring program correctness Input Program Observed Behavior

157

Mutation Analysis

Mutants Test Suitemin(1,2) Ô 1

Kille

d

min(2,1) Ô 1Ki

lled

Mutant 1Mutant 2Mutant 3Mutant 4Mutant 5Mutant 6

Page 158: An Overview of Software Testingwsumner/teaching/745/04-testing.pdf · Software Testing The most common way of measuring and ensuring program correctness Input Program Observed Behavior

158

Mutation Analysis

Mutants Test Suitemin(1,2) Ô 1

Kille

d

min(2,1) Ô 1Ki

lled

So the mutation score is...

Mutant 1Mutant 2Mutant 3Mutant 4Mutant 5Mutant 6

Page 159: An Overview of Software Testingwsumner/teaching/745/04-testing.pdf · Software Testing The most common way of measuring and ensuring program correctness Input Program Observed Behavior

159

Mutation Analysis

Mutants Test Suitemin(1,2) Ô 1

Kille

d

min(2,1) Ô 1Ki

lled

So the mutation score is... 4/5. Why?

Mutant 1Mutant 2Mutant 3Mutant 4Mutant 5Mutant 6

Page 160: An Overview of Software Testingwsumner/teaching/745/04-testing.pdf · Software Testing The most common way of measuring and ensuring program correctness Input Program Observed Behavior

160

Mutation Analysis

Mutants Test Suitemin(1,2) Ô 1

Kille

d

min(2,1) Ô 1Ki

lled

min3(int a, int b): int minVal; minVal = a; if (b < minVal) minVal = b; return minVal;

min6(int a, int b): int minVal; minVal = a; if (b < a) minVal = failOnZero(b); return minVal;

So the mutation score is... 4/5. Why?

Mutant 1Mutant 2Mutant 3Mutant 4Mutant 5Mutant 6

Page 161: An Overview of Software Testingwsumner/teaching/745/04-testing.pdf · Software Testing The most common way of measuring and ensuring program correctness Input Program Observed Behavior

161

Mutation Analysis

Mutants Test Suitemin(1,2) Ô 1

Kille

d

min(2,1) Ô 1Ki

lled

min3(int a, int b): int minVal; minVal = a; if (b < minVal) minVal = b; return minVal;

min3(int a, int b): int minVal; minVal = a; if (b < a) minVal = failOnZero(b); return minVal;

Equivalent to the original!There is no injected bug.

So the mutation score is... 4/5. Why?

Mutant 1Mutant 2Mutant 3Mutant 4Mutant 5Mutant 6

Page 162: An Overview of Software Testingwsumner/teaching/745/04-testing.pdf · Software Testing The most common way of measuring and ensuring program correctness Input Program Observed Behavior

162

Equivalent Mutants

● Equivalent mutants are not bugs and should not be counted

Page 163: An Overview of Software Testingwsumner/teaching/745/04-testing.pdf · Software Testing The most common way of measuring and ensuring program correctness Input Program Observed Behavior

163

Equivalent Mutants

● Equivalent mutants are not bugs and should not be counted

● New Mutation Score:

Page 164: An Overview of Software Testingwsumner/teaching/745/04-testing.pdf · Software Testing The most common way of measuring and ensuring program correctness Input Program Observed Behavior

164

Equivalent Mutants

● Equivalent mutants are not bugs and should not be counted

● New Mutation Score:

# Killed

# MutantsStart with the simplest score

from fault seeding

Page 165: An Overview of Software Testingwsumner/teaching/745/04-testing.pdf · Software Testing The most common way of measuring and ensuring program correctness Input Program Observed Behavior

165

Equivalent Mutants

● Equivalent mutants are not bugs and should not be counted

● New Mutation Score:

# Killed

# Mutants−# Equivalent

Traditional mutation scorefrom literature

Page 166: An Overview of Software Testingwsumner/teaching/745/04-testing.pdf · Software Testing The most common way of measuring and ensuring program correctness Input Program Observed Behavior

166

Equivalent Mutants

● Equivalent mutants are not bugs and should not be counted

● New Mutation Score:

# Killed−# KilledDuplicates

# Mutants−# Equivalent−# Duplicates

Updated for modern handlingof duplicate & equivalent mutants

Page 167: An Overview of Software Testingwsumner/teaching/745/04-testing.pdf · Software Testing The most common way of measuring and ensuring program correctness Input Program Observed Behavior

167

Equivalent Mutants

● Equivalent mutants are not bugs and should not be counted

● New Mutation Score:

● Detecting equivalent mutants is undecidable in general

# Killed−# KilledDuplicates

# Mutants−# Equivalent−# Duplicates

Page 168: An Overview of Software Testingwsumner/teaching/745/04-testing.pdf · Software Testing The most common way of measuring and ensuring program correctness Input Program Observed Behavior

168

Equivalent Mutants

● Equivalent mutants are not bugs and should not be counted

● New Mutation Score:

● Detecting equivalent mutants is undecidable in general

● So why are they equivalent?

Reachability Infection Propagation

# Killed−# KilledDuplicates

# Mutants−# Equivalent−# Duplicates

Page 169: An Overview of Software Testingwsumner/teaching/745/04-testing.pdf · Software Testing The most common way of measuring and ensuring program correctness Input Program Observed Behavior

169

Equivalent Mutants

● Identifying equivalent mutants is one of the most expensive / burdensome aspects of mutation analysis.

Page 170: An Overview of Software Testingwsumner/teaching/745/04-testing.pdf · Software Testing The most common way of measuring and ensuring program correctness Input Program Observed Behavior

170

Equivalent Mutants

● Identifying equivalent mutants is one of the most expensive / burdensome aspects of mutation analysis.

min3(int a, int b): int minVal; minVal = a; if (b < minVal) minVal = b; return minVal;

Requires reasoning about whythe result was the same.

Page 171: An Overview of Software Testingwsumner/teaching/745/04-testing.pdf · Software Testing The most common way of measuring and ensuring program correctness Input Program Observed Behavior

171

Mutation Operators

● Are the mutants representative of all bugs?

● Do we expect the mutation score to be meaningful?

Ideas? Why? Why not?

Page 172: An Overview of Software Testingwsumner/teaching/745/04-testing.pdf · Software Testing The most common way of measuring and ensuring program correctness Input Program Observed Behavior

172

Mutation Operators

● Are the mutants representative of all bugs?

● Do we expect the mutation score to be meaningful?

2 Key ideas are missing....

Ideas? Why? Why not?

Page 173: An Overview of Software Testingwsumner/teaching/745/04-testing.pdf · Software Testing The most common way of measuring and ensuring program correctness Input Program Observed Behavior

173

Competent Programmer Hypothesis

Programmers tend to write code that is almost correct

Page 174: An Overview of Software Testingwsumner/teaching/745/04-testing.pdf · Software Testing The most common way of measuring and ensuring program correctness Input Program Observed Behavior

174

Competent Programmer Hypothesis

Programmers tend to write code that is almost correct– So most of the time simple mutations should reflect the real bugs.

Page 175: An Overview of Software Testingwsumner/teaching/745/04-testing.pdf · Software Testing The most common way of measuring and ensuring program correctness Input Program Observed Behavior

175

Coupling Effect

Tests that cover so much behavior that even simple errors are detected should also be sensitive enough to detect more complex errors

Page 176: An Overview of Software Testingwsumner/teaching/745/04-testing.pdf · Software Testing The most common way of measuring and ensuring program correctness Input Program Observed Behavior

176

Coupling Effect

Tests that cover so much behavior that even simple errors are detected should also be sensitive enough to detect more complex errors

– By casting a fine enough net, we'll catch the big fish, too (sorry dolphins)

Page 177: An Overview of Software Testingwsumner/teaching/745/04-testing.pdf · Software Testing The most common way of measuring and ensuring program correctness Input Program Observed Behavior

177

Mutation Testing

● Considered one of the strongest criteria

Page 178: An Overview of Software Testingwsumner/teaching/745/04-testing.pdf · Software Testing The most common way of measuring and ensuring program correctness Input Program Observed Behavior

178

Mutation Testing

● Considered one of the strongest criteria– Mimics some input specifications– Mimics some traditional coverage (statement, branch, …)

Page 179: An Overview of Software Testingwsumner/teaching/745/04-testing.pdf · Software Testing The most common way of measuring and ensuring program correctness Input Program Observed Behavior

179

Mutation Testing

● Considered one of the strongest criteria– Mimics some input specifications– Mimics some traditional coverage (statement, branch, …)

● Massive number of criteria.

Why?

Page 180: An Overview of Software Testingwsumner/teaching/745/04-testing.pdf · Software Testing The most common way of measuring and ensuring program correctness Input Program Observed Behavior

180

Traditional Coverage vs Mutation

● Statement & branch based coverage are the most popular adequacy measures in practice.

Page 181: An Overview of Software Testingwsumner/teaching/745/04-testing.pdf · Software Testing The most common way of measuring and ensuring program correctness Input Program Observed Behavior

181

Traditional Coverage vs Mutation

● Statement & branch based coverage are the most popular adequacy measures in practice.– Covstmt(T1) > Covstmt(T2) → ?

Page 182: An Overview of Software Testingwsumner/teaching/745/04-testing.pdf · Software Testing The most common way of measuring and ensuring program correctness Input Program Observed Behavior

182

Traditional Coverage vs Mutation

● Statement & branch based coverage are the most popular adequacy measures in practice.– Covstmt(T1) > Covstmt(T2) → T1 is more likely to find

more bugs?

Page 183: An Overview of Software Testingwsumner/teaching/745/04-testing.pdf · Software Testing The most common way of measuring and ensuring program correctness Input Program Observed Behavior

183

Traditional Coverage vs Mutation

● Statement & branch based coverage are the most popular adequacy measures in practice.– Covstmt(T1) > Covstmt(T2) → T1 is more likely to find

more bugs?

What if you change |T|?

Page 184: An Overview of Software Testingwsumner/teaching/745/04-testing.pdf · Software Testing The most common way of measuring and ensuring program correctness Input Program Observed Behavior

184

Traditional Coverage vs Mutation

● Statement & branch based coverage are the most popular adequacy measures in practice.– Covstmt(T1) > Covstmt(T2) → – Covstmt(T) increases with the |T|

T1 is more likely to findmore bugs?

Page 185: An Overview of Software Testingwsumner/teaching/745/04-testing.pdf · Software Testing The most common way of measuring and ensuring program correctness Input Program Observed Behavior

185

Traditional Coverage vs Mutation

● Statement & branch based coverage are the most popular adequacy measures in practice.– Covstmt(T1) > Covstmt(T2) → – Covstmt(T) increases with the |T|

→ You cannot assume that better coverage increases defect finding ability!

T1 is more likely to findmore bugs?

Then does coverage serve a purpose?

Page 186: An Overview of Software Testingwsumner/teaching/745/04-testing.pdf · Software Testing The most common way of measuring and ensuring program correctness Input Program Observed Behavior

186

Traditional Coverage vs Mutation

● Statement & branch based coverage are the most popular adequacy measures in practice.– Covstmt(T1) > Covstmt(T2) → – Covstmt(T) increases with the |T|

→ You cannot assume that better coverage increases defect finding ability!– Coverage still tells you which portions of a program haven't been tested!

T1 is more likely to findmore bugs?

Page 187: An Overview of Software Testingwsumner/teaching/745/04-testing.pdf · Software Testing The most common way of measuring and ensuring program correctness Input Program Observed Behavior

187

Traditional Coverage vs Mutation

● Statement & branch based coverage are the most popular adequacy measures in practice.– Covstmt(T1) > Covstmt(T2) → – Covstmt(T) increases with the |T|

→ You cannot assume that better coverage increases defect finding ability!– Coverage still tells you which portions of a program haven't been tested!– It just cannot safely measure defect finding capability.

T1 is more likely to findmore bugs?

Page 188: An Overview of Software Testingwsumner/teaching/745/04-testing.pdf · Software Testing The most common way of measuring and ensuring program correctness Input Program Observed Behavior

188

Traditional Coverage vs Mutation

● Mutation analysis/testing correlates with defect finding independent of code coverage! [Just 2014]

Page 189: An Overview of Software Testingwsumner/teaching/745/04-testing.pdf · Software Testing The most common way of measuring and ensuring program correctness Input Program Observed Behavior

189

Traditional Coverage vs Mutation

● Mutation analysis/testing correlates with defect finding independent of code coverage! [Just 2014]

So is that it?Can we just do mutation

testing & be done?

Page 190: An Overview of Software Testingwsumner/teaching/745/04-testing.pdf · Software Testing The most common way of measuring and ensuring program correctness Input Program Observed Behavior

Regression Testing

Page 191: An Overview of Software Testingwsumner/teaching/745/04-testing.pdf · Software Testing The most common way of measuring and ensuring program correctness Input Program Observed Behavior

191

Regression Testing

● Regression Testing

Page 192: An Overview of Software Testingwsumner/teaching/745/04-testing.pdf · Software Testing The most common way of measuring and ensuring program correctness Input Program Observed Behavior

192

Regression Testing

● Regression Testing– Retesting software as it evolves to ensure previous functionality

Page 193: An Overview of Software Testingwsumner/teaching/745/04-testing.pdf · Software Testing The most common way of measuring and ensuring program correctness Input Program Observed Behavior

193

Regression Testing

● Regression Testing– Retesting software as it evolves to ensure previous functionality

● Useful as a tool for ratcheting software quality

Page 194: An Overview of Software Testingwsumner/teaching/745/04-testing.pdf · Software Testing The most common way of measuring and ensuring program correctness Input Program Observed Behavior

194

Regression Testing

● Regression Testing– Retesting software as it evolves to ensure previous functionality

● Useful as a tool for ratcheting software quality

What is a ratchet?

Page 195: An Overview of Software Testingwsumner/teaching/745/04-testing.pdf · Software Testing The most common way of measuring and ensuring program correctness Input Program Observed Behavior

195

Regression Testing

● Regression Testing– Retesting software as it evolves to ensure previous functionality

● Useful as a tool for ratcheting software quality

What is a ratchet?

Page 196: An Overview of Software Testingwsumner/teaching/745/04-testing.pdf · Software Testing The most common way of measuring and ensuring program correctness Input Program Observed Behavior

196

Regression Testing

● Regression Testing– Retesting software as it evolves to ensure previous functionality

● Useful as a tool for ratcheting software quality

● Regression tests further enable making changes

Page 197: An Overview of Software Testingwsumner/teaching/745/04-testing.pdf · Software Testing The most common way of measuring and ensuring program correctness Input Program Observed Behavior

197

Why Use Regression Testing

● As software evolves, previously working functionality can fail.

Page 198: An Overview of Software Testingwsumner/teaching/745/04-testing.pdf · Software Testing The most common way of measuring and ensuring program correctness Input Program Observed Behavior

198

Why Use Regression Testing

● As software evolves, previously working functionality can fail– Software is complex & interconnected.

Page 199: An Overview of Software Testingwsumner/teaching/745/04-testing.pdf · Software Testing The most common way of measuring and ensuring program correctness Input Program Observed Behavior

199

Why Use Regression Testing

● As software evolves, previously working functionality can fail– Software is complex & interconnected.– Changing one component can unintentionally impact another.

Page 200: An Overview of Software Testingwsumner/teaching/745/04-testing.pdf · Software Testing The most common way of measuring and ensuring program correctness Input Program Observed Behavior

200

Why Use Regression Testing

● As software evolves, previously working functionality can fail– Software is complex & interconnected.– Changing one component can unintentionally impact another.

ContentsparseFile(std::path& p) { ... auto header = parseHeader(...); ...}

Page 201: An Overview of Software Testingwsumner/teaching/745/04-testing.pdf · Software Testing The most common way of measuring and ensuring program correctness Input Program Observed Behavior

201

Why Use Regression Testing

● As software evolves, previously working functionality can fail– Software is complex & interconnected.– Changing one component can unintentionally impact another.

HeaderparseHeader(std::ifstream& in) { ...} Contents

parseFile(std::path& p) { ... auto header = parseHeader(...); ...}

Page 202: An Overview of Software Testingwsumner/teaching/745/04-testing.pdf · Software Testing The most common way of measuring and ensuring program correctness Input Program Observed Behavior

202

Why Use Regression Testing

● As software evolves, previously working functionality can fail– Software is complex & interconnected.– Changing one component can unintentionally impact another.

HeaderparseHeader(std::ifstream& in) { ...} Contents

parseFile(std::path& p) { ... auto header = parseHeader(...); ...}

Page 203: An Overview of Software Testingwsumner/teaching/745/04-testing.pdf · Software Testing The most common way of measuring and ensuring program correctness Input Program Observed Behavior

203

Why Use Regression Testing

● As software evolves, previously working functionality can fail– Software is complex & interconnected.– Changing one component can unintentionally impact another.– New environments can introduce unexpected behavior in components that

originally work.

Page 204: An Overview of Software Testingwsumner/teaching/745/04-testing.pdf · Software Testing The most common way of measuring and ensuring program correctness Input Program Observed Behavior

204

Why Use Regression Testing

● As software evolves, previously working functionality can fail– Software is complex & interconnected.– Changing one component can unintentionally impact another.– New environments can introduce unexpected behavior in components that

originally work.

● Most testing is regression testing

Page 205: An Overview of Software Testingwsumner/teaching/745/04-testing.pdf · Software Testing The most common way of measuring and ensuring program correctness Input Program Observed Behavior

205

Why Use Regression Testing

● As software evolves, previously working functionality can fail– Software is complex & interconnected.– Changing one component can unintentionally impact another.– New environments can introduce unexpected behavior in components that

originally work.

● Most testing is regression testing

● Ensuring previous functionality can require large test suites.Are they always realistic?

Page 206: An Overview of Software Testingwsumner/teaching/745/04-testing.pdf · Software Testing The most common way of measuring and ensuring program correctness Input Program Observed Behavior

206

Limiting Regression Suites

● Be careful not to add redundant test to the test suite.

Page 207: An Overview of Software Testingwsumner/teaching/745/04-testing.pdf · Software Testing The most common way of measuring and ensuring program correctness Input Program Observed Behavior

207

Limiting Regression Suites

● Be careful not to add redundant test to the test suite.– Every bug may indicate a useful behavior to test– Test adequacy criteria can limit the other tests

Page 208: An Overview of Software Testingwsumner/teaching/745/04-testing.pdf · Software Testing The most common way of measuring and ensuring program correctness Input Program Observed Behavior

208

Limiting Regression Suites

● Be careful not to add redundant test to the test suite.– Every bug may indicate a useful behavior to test– Test adequacy criteria can limit the other tests

But this is more or less where we started...

Page 209: An Overview of Software Testingwsumner/teaching/745/04-testing.pdf · Software Testing The most common way of measuring and ensuring program correctness Input Program Observed Behavior

209

Limiting Regression Suites

● Be careful not to add redundant test to the test suite.– Every bug may indicate a useful behavior to test– Test adequacy criteria can limit the other tests

● Sometimes not all tests need to run with each commit

Page 210: An Overview of Software Testingwsumner/teaching/745/04-testing.pdf · Software Testing The most common way of measuring and ensuring program correctness Input Program Observed Behavior

210

Limiting Regression Suites

● Be careful not to add redundant test to the test suite.– Every bug may indicate a useful behavior to test– Test adequacy criteria can limit the other tests

● Sometimes not all tests need to run with each commit– Run a subset of sanity or smoke tests for commits

Page 211: An Overview of Software Testingwsumner/teaching/745/04-testing.pdf · Software Testing The most common way of measuring and ensuring program correctness Input Program Observed Behavior

211

Limiting Regression Suites

● Be careful not to add redundant test to the test suite.– Every bug may indicate a useful behavior to test– Test adequacy criteria can limit the other tests

● Sometimes not all tests need to run with each commit– Run a subset of sanity or smoke tests for commits

These mostly validate the build process& core behaviors.

Page 212: An Overview of Software Testingwsumner/teaching/745/04-testing.pdf · Software Testing The most common way of measuring and ensuring program correctness Input Program Observed Behavior

212

Limiting Regression Suites

● Be careful not to add redundant test to the test suite.– Every bug may indicate a useful behavior to test– Test adequacy criteria can limit the other tests

● Sometimes not all tests need to run with each commit– Run a subset of sanity or smoke tests for commits– Run more thorough tests nightly

Page 213: An Overview of Software Testingwsumner/teaching/745/04-testing.pdf · Software Testing The most common way of measuring and ensuring program correctness Input Program Observed Behavior

213

Limiting Regression Suites

● Be careful not to add redundant test to the test suite.– Every bug may indicate a useful behavior to test– Test adequacy criteria can limit the other tests

● Sometimes not all tests need to run with each commit– Run a subset of sanity or smoke tests for commits– Run more thorough tests nightly– “ ” weekly– “ ” preparing for milestones/ integration

Page 214: An Overview of Software Testingwsumner/teaching/745/04-testing.pdf · Software Testing The most common way of measuring and ensuring program correctness Input Program Observed Behavior

214

Limiting Regression Testing

● Can we be smarter about which test we run & when?

What else could we do?

Page 215: An Overview of Software Testingwsumner/teaching/745/04-testing.pdf · Software Testing The most common way of measuring and ensuring program correctness Input Program Observed Behavior

215

Limiting Regression Testing

● Can we be smarter about which test we run & when?

● Change Impact Analysis– Identify how changes affect the rest of software

Page 216: An Overview of Software Testingwsumner/teaching/745/04-testing.pdf · Software Testing The most common way of measuring and ensuring program correctness Input Program Observed Behavior

216

Limiting Regression Testing

● Can we be smarter about which test we run & when?

● Change Impact Analysis– Identify how changes affect the rest of software

● Can decide which tests to run on demand

Page 217: An Overview of Software Testingwsumner/teaching/745/04-testing.pdf · Software Testing The most common way of measuring and ensuring program correctness Input Program Observed Behavior

217

Limiting Regression Testing

● Can we be smarter about which test we run & when?

● Change Impact Analysis– Identify how changes affect the rest of software

● Can decide which tests to run on demand– Conservative: run all tests– Cheap: run tests with test requirements related to the changed lines

Page 218: An Overview of Software Testingwsumner/teaching/745/04-testing.pdf · Software Testing The most common way of measuring and ensuring program correctness Input Program Observed Behavior

218

Limiting Regression Testing

● Can we be smarter about which test we run & when?

● Change Impact Analysis– Identify how changes affect the rest of software

● Can decide which tests to run on demand– Conservative: run all tests– Cheap: run tests with test requirements related to the changed lines

Is the cheap approach enough?

Page 219: An Overview of Software Testingwsumner/teaching/745/04-testing.pdf · Software Testing The most common way of measuring and ensuring program correctness Input Program Observed Behavior

219

Limiting Regression Testing

● Can we be smarter about which test we run & when?

● Change Impact Analysis– Identify how changes affect the rest of software

● Can decide which tests to run on demand– Conservative: run all tests– Cheap: run tests with test requirements related to the changed lines– Middle ground: Run those tests affected by how changes propagate

through the software?

Page 220: An Overview of Software Testingwsumner/teaching/745/04-testing.pdf · Software Testing The most common way of measuring and ensuring program correctness Input Program Observed Behavior

220

Limiting Regression Testing

● Can we be smarter about which test we run & when?

● Change Impact Analysis– Identify how changes affect the rest of software

● Can decide which tests to run on demand– Conservative: run all tests– Cheap: run tests with test requirements related to the changed lines– Middle ground: Run those tests affected by how changes propagate

through the software?In practice, tools can assist in findingout which tests need to be run

Page 221: An Overview of Software Testingwsumner/teaching/745/04-testing.pdf · Software Testing The most common way of measuring and ensuring program correctness Input Program Observed Behavior

Using Test SuitesFor Other Purposes

Page 222: An Overview of Software Testingwsumner/teaching/745/04-testing.pdf · Software Testing The most common way of measuring and ensuring program correctness Input Program Observed Behavior

Bug Prediction

● Does Bug Prediction Support Human Developers?Findings From a Google Case Study– https://static.googleusercontent.com/media/research.google.com/en//pub

s/archive/41145.pdf

Page 223: An Overview of Software Testingwsumner/teaching/745/04-testing.pdf · Software Testing The most common way of measuring and ensuring program correctness Input Program Observed Behavior

Fault Localization

● Doric: Foundations for Statistical Fault Localisation– https://homes.cs.washington.edu/~mernst/pubs/fault-localization-icse2017

.pdf

● An Empirical Study of Fault Localization Families and Their Combinations– https://arxiv.org/abs/1803.09939

Page 224: An Overview of Software Testingwsumner/teaching/745/04-testing.pdf · Software Testing The most common way of measuring and ensuring program correctness Input Program Observed Behavior

Automated Program Repair

● http://software-lab.org/publications/cacm2019_program_repair.pdf

● Empirical Review of Java Program Repair Tools:A Large-Scale Experiment on 2,141 Bugs and 23,551 Repair Attempts– https://arxiv.org/abs/1905.11973

Page 225: An Overview of Software Testingwsumner/teaching/745/04-testing.pdf · Software Testing The most common way of measuring and ensuring program correctness Input Program Observed Behavior

Summary (& directions to look for)