104
CS 406 Fall 98 Software Testing Part III: Test Assessment and Improvement Aditya P. Mathur Purdue university Last update: July 19, 1998

CS 406 Fall 98 Software Testing Part III: Test Assessment and Improvement

  • Upload
    baruch

  • View
    26

  • Download
    0

Embed Size (px)

DESCRIPTION

CS 406 Fall 98 Software Testing Part III: Test Assessment and Improvement. Aditya P. Mathur Purdue university. Last update: July 19, 1998. Learning Objectives. To understand the relevance and importance of test assessment. To learn the fundamental principle underlying test assessment. - PowerPoint PPT Presentation

Citation preview

Page 1: CS 406 Fall 98 Software Testing Part III: Test Assessment and Improvement

CS 406 Fall 98 Software Testing Part III: Test Assessment and Improvement

Aditya P. Mathur

Purdue university

Last update: July 19, 1998

Page 2: CS 406 Fall 98 Software Testing Part III: Test Assessment and Improvement

Test assessment and improvement 2

Learning Objectives

To understand the relevance and importance of test assessment.

To learn the fundamental principle underlying test assessment.

To learn various methods and tools for test assessment.

Page 3: CS 406 Fall 98 Software Testing Part III: Test Assessment and Improvement

Test assessment and improvement 3

Learning objectives

To understand the relative strengths/weaknesses of test assessment methods.

To learn how to improve tests based on a test assessment procedure.

Page 4: CS 406 Fall 98 Software Testing Part III: Test Assessment and Improvement

Test assessment and improvement 4

What is test assessment?

Once a test set T, a collection of test inputs, has been developed, we ask:

How good is T? It is the measurement of the goodness of T

which is known as test assessment. Test assessment is carried out based on one

or more criteria.

Page 5: CS 406 Fall 98 Software Testing Part III: Test Assessment and Improvement

Test assessment and improvement 5

Test assessment-continued

These criteria are known as test adequacy criteria.

Test assessment is also known as test adequacy assessment.

Page 6: CS 406 Fall 98 Software Testing Part III: Test Assessment and Improvement

Test assessment and improvement 6

Test assessment-continued

Test assessment provides the following information:– A metric, also known as the adequacy score or

coverage, usually between 0 and 1.– A list of all the weaknesses found in T, which

when removed, will raise the score to 1.– The weaknesses depend on the criteria used for

assessment.

Page 7: CS 406 Fall 98 Software Testing Part III: Test Assessment and Improvement

Test assessment and improvement 7

Test assessment-continued

Once the coverage has been computed, and the weaknesses identified, one can improve T.

Improvement of T is done by examining one or more weaknesses and constructing new test requirements designed to overcome the weakness(es).

The new test requirements lead to new test specifications and to further testing of the program.

Page 8: CS 406 Fall 98 Software Testing Part III: Test Assessment and Improvement

Test assessment and improvement 8

Test assessment-continued

This is continued until all weaknesses are overcome, i.e. the adequacy criterion is satisfied (coverage=1).

In some instances it may not be possible to satisfy the adequacy criteria for one or more of the following reasons:

• Lack of sufficient manpower• Weaknesses that cannot be removed because they are

infeasible.

Page 9: CS 406 Fall 98 Software Testing Part III: Test Assessment and Improvement

Test assessment and improvement 9

Test assessment-continued

• The cost of removing the weaknesses is not justified.

While improving T by removing its weaknesses, one usually tests the program more thoroughly than it has been tested so far.

This additional testing is likely to result in the discovery of remaining errors.

Page 10: CS 406 Fall 98 Software Testing Part III: Test Assessment and Improvement

Test assessment and improvement 10

Test assessment-continued

Hence we say that test assessment and improvement helps in the improvement of software reliability.

Test assessment and improvement is applicable throughout the testing process and during all stages of software development.

Page 11: CS 406 Fall 98 Software Testing Part III: Test Assessment and Improvement

Test assessment and improvement 11

Test assessment-summary procedure

Measure adequacy of Tw.r.t. C.

Is T adequate?

Select an adequacycriterion C.

Improve T

More testing is warranted ?

No

No

Yes

Yes

1

2

3

4

5

Done

Develop T0

6

Page 12: CS 406 Fall 98 Software Testing Part III: Test Assessment and Improvement

Test assessment and improvement 12

Principle underlying test assessment

There is a uniform principle that underlies test assessment throughout the testing process.

This principle is known as the coverage principle.

It has come about as a result of intensive research at Purdue and other research groups in software testing.

Page 13: CS 406 Fall 98 Software Testing Part III: Test Assessment and Improvement

Test assessment and improvement 13

The coverage principle

To formulate and understand the coverage principle, we need to understand:– coverage domains– coverage elements

A coverage domain is a finite domain, related to the program under test, that we want to cover. Coverage elements are the individual elements of this domain

Page 14: CS 406 Fall 98 Software Testing Part III: Test Assessment and Improvement

Test assessment and improvement 14

The coverage principle-continued

RequirementsClassesFunctionsInterface mutationsExceptions

Coverage Domains Coverage Elements

Page 15: CS 406 Fall 98 Software Testing Part III: Test Assessment and Improvement

Test assessment and improvement 15

The coverage principle-continued

Measuring test adequacy and improving a test set against a sequence of well defined, increasingly strong, coverage domains leads to improved confidence in the reliability of the system under test.

Page 16: CS 406 Fall 98 Software Testing Part III: Test Assessment and Improvement

Test assessment and improvement 16

The coverage principle-continued

Note the following properties of a coverage domain:– It is related to the program under test.– It is finite.– It may come from program requirements,

related to the inputs and outputs.

Page 17: CS 406 Fall 98 Software Testing Part III: Test Assessment and Improvement

Test assessment and improvement 17

The coverage principle-continued

– It may come from program code. Can you think of a coverage domain that comes from the program code?

– It aids in measuring test adequacy as well as the progress made in testing. How?

Page 18: CS 406 Fall 98 Software Testing Part III: Test Assessment and Improvement

Test assessment and improvement 18

The coverage principle-continued

Example:– It is required to write a program that takes in

the name of a person as a string and searches for the name in a file of names. The program must output the record ID which matches the given name. In case of no match a -1 is returned.

What coverage domains can be identified from this requirement?

Page 19: CS 406 Fall 98 Software Testing Part III: Test Assessment and Improvement

Test assessment and improvement 19

The coverage principle-continued

As we learned earlier, improving coverage improves our confidence in the correct functioning of the program under test.

Given a program P and a test T suppose that T is adequate w.r.t. a coverage criterion C.

Does this mean that P is error free? Obviously……???

Page 20: CS 406 Fall 98 Software Testing Part III: Test Assessment and Improvement

Test assessment and improvement 20

Test effort

There are several measures of test effort. One measure is the size of T. By this

measure a test set with a larger number of test cases corresponds to higher effort than one with a lesser number of test cases.

Page 21: CS 406 Fall 98 Software Testing Part III: Test Assessment and Improvement

Test assessment and improvement 21

Error detection effectiveness

Each coverage criterion has its error detection ability. This is also known as the error detection effectiveness or simply effectiveness of the criterion.

One measure of the effectiveness of criterion C is the fraction of faults guaranteed to be revealed by a test T that satisfies C.

Page 22: CS 406 Fall 98 Software Testing Part III: Test Assessment and Improvement

Test assessment and improvement 22

Effectiveness-continued

Another measure is the probability that at least fraction f of the faults in P will be revealed by test T that satisfies C.

Unfortunately there is no absolute measure of the effectiveness of any given coverage criterion for a general class of programs and for arbitrary test sets.

Page 23: CS 406 Fall 98 Software Testing Part III: Test Assessment and Improvement

Test assessment and improvement 23

Effectiveness-continued

One coverage criterion results in an exception to this rule: What is it?

Empirical studies conducted by researchers give us an idea of the relative goodness of various coverage criteria.

Thus, for a variety of criteria we can make a statement like: Criterion C1 is definitely better than criterion C2.

Page 24: CS 406 Fall 98 Software Testing Part III: Test Assessment and Improvement

Test assessment and improvement 24

Effectiveness-continued

In some cases we may be able to say: Criterion C1 is probably better than criterion C2.

Such information allows us to construct a hierarchy of coverage criteria.

This hierarchy is helpful in organizing and managing testing. How?

Page 25: CS 406 Fall 98 Software Testing Part III: Test Assessment and Improvement

Test assessment and improvement 25

Strength of a coverage criterion

The effectiveness of a coverage criterion is also referred to as its strength.

Strength is a measure of the criterion’s ability to reveal faults in a program.

Criterion C1 is considered stronger than criterion C2 if C1 is is capable of revealing more faults than C2.

Page 26: CS 406 Fall 98 Software Testing Part III: Test Assessment and Improvement

Test assessment and improvement 26

The Saturation Effect

The rate at which new faults are discovered reduces as test adequacy with respect to a finite coverage domain increases; it reduces to zero when the coverage domain has been exhausted.

coverage

cf /

0 1

Page 27: CS 406 Fall 98 Software Testing Part III: Test Assessment and Improvement

Test assessment and improvement 27

Saturation Effect: Fault View

Testing Effort

RemainingFaults

0

N

M

Functional tfs tfe tds tdfe tme

Page 28: CS 406 Fall 98 Software Testing Part III: Test Assessment and Improvement

Test assessment and improvement 28

Saturation Effect: Reliability View

FUNCTIONAL, DECISION, DATAFLOWAND MUTATION COVERAGE PROVIDEVARIOUS TEST EVALUATION CRITERIA.

Reliability

Testing EffortTrue reliability (R)Estimated reliability (R’)Saturation region

Mutation

Dataflow

Decision

Functional

Rm

Rdf

RdRf

R’fR’d R’df

R’m

tfs tfe tds tde tdfs tdfe tms tfe

Page 29: CS 406 Fall 98 Software Testing Part III: Test Assessment and Improvement

Test assessment and improvement 29

Coverage principle-discussion

Discuss: How will you use the knowledge of coverage principle and the saturation effect in organizing and managing testing?

Can you think of any other uses of the coverage principle and the saturation effect?

Page 30: CS 406 Fall 98 Software Testing Part III: Test Assessment and Improvement

Test assessment and improvement 30

Control flow graph

Control flow graph (CFG) of a program is a representation of the flow of execution within the program.

It is useful in program analysis such as that required during test assessment and improvement.

More formally, a CFG G is:

Page 31: CS 406 Fall 98 Software Testing Part III: Test Assessment and Improvement

Test assessment and improvement 31

Control flow graph

– G=(N,A)where N: set of nodes and A: set of arcs

– There is a unique entry node en in N.– There is a unique exit node ex in N. A node

represents a single statement or a block.– A block is a single-entry-single-exit sequence

of instructions that are always executed in a sequence without any diversion of path except at the end of the block.

Page 32: CS 406 Fall 98 Software Testing Part III: Test Assessment and Improvement

Test assessment and improvement 32

Control flow graph-continued

– Every statement in a block, except possibly the first one, has exactly one predecessor.

– Similarly, every statement in the block, except possibly the last one, has exactly one successor.

– An arc a in A is a pair (n,m) of nodes from N which represent transfer of control from node n to node m.

– A path of length k in G is an ordered sequence of arcs, from A such that:k21 aaa .....,, ,

Page 33: CS 406 Fall 98 Software Testing Part III: Test Assessment and Improvement

Test assessment and improvement 33

Control flow graph-continued

• The first node in is en

• The last node in is ex

• For any two adjacent arcs = (n,m) and = (p,q), m=p.

– A path is considered executable or feasible if there exists a test case which causes this path to be traversed during program execution, otherwise the path is unexecutable or infeasible.

1a

ka

ia ja

Page 34: CS 406 Fall 98 Software Testing Part III: Test Assessment and Improvement

Test assessment and improvement 34

Control flow graph-example

Class exercise:

Draw a CFG for the following program:

1. scanf (x,y); if (y<0)2. pow=0-y;3. else pow=y;4. z=1.0;5. while (pow !=0)6. {z=z*x; pow=pow-1;}7. if (y<0)8. z=1.0/z;9. printf(z);

What does the above program compute?

Page 35: CS 406 Fall 98 Software Testing Part III: Test Assessment and Improvement

Test assessment and improvement 35

Control flow graph-example

Class exercise:

For the CFG you have drawn, list all paths of length at most 10.

Are there more paths than what you have listed?

What does the above program compute?

Page 36: CS 406 Fall 98 Software Testing Part III: Test Assessment and Improvement

Test assessment and improvement 36

Structure-based test adequacy

Based on the CFG of a program several test adequacy criteria can be defined.

Some are:• statement coverage criterion

• branch coverage criterion

• condition coverage criterion

• path coverage criterion

Page 37: CS 406 Fall 98 Software Testing Part III: Test Assessment and Improvement

Test assessment and improvement 37

Statement coverage

The coverage domain consists of all statements in the program. Restated, in terms of the control flow graph, it is the set of all nodes in G.

A test T satisfies the statement coverage criterion if upon execution of P on each element of T, each statement of P has been executed at least once.

Page 38: CS 406 Fall 98 Software Testing Part III: Test Assessment and Improvement

Test assessment and improvement 38

Statement coverage-continued

Restated in terms of G, T is adequate w.r.t. the statement coverage criterion if each node in N is on at least one of the paths traversed when P is executed on each element of T.

Page 39: CS 406 Fall 98 Software Testing Part III: Test Assessment and Improvement

Test assessment and improvement 39

Statement coverage-continued

Class exercise:– For the program for which you have drawn the

control flow graph, develop a test set that satisfies the statement coverage criterion.

– Follow the procedure for test assessment and improvement suggested earlier.

Page 40: CS 406 Fall 98 Software Testing Part III: Test Assessment and Improvement

Test assessment and improvement 40

Statement coverage-weakness

Consider the following program:int abs (x);

int x;

{if (x>=0) x=0-x;

return x;

}

Page 41: CS 406 Fall 98 Software Testing Part III: Test Assessment and Improvement

Test assessment and improvement 41

Statement coverage-weakness

Suppose that T= {(x=0)}. Clearly, T satisfies the statement coverage

criterion. But is the program correct and is the error

revealed by T which is adequate w.r.t. the statement coverage criterion?What do you suggest we do to improve T?

Page 42: CS 406 Fall 98 Software Testing Part III: Test Assessment and Improvement

Test assessment and improvement 42

Branch (or edge) coverage

In G there may be nodes which correspond to conditions in P. Such nodes, also called condition nodes, contain branches in P.

Each such node is considered covered if during some execution of P, the condition evaluates to true and false; these executions of P need not be the same.

Page 43: CS 406 Fall 98 Software Testing Part III: Test Assessment and Improvement

Test assessment and improvement 43

Branch coverage

The coverage domain consists of all branches in G. Restated, in terms of the control flow graph, it is the set of all arcs exiting the condition nodes.

A test T satisfies the branch coverage criterion if upon execution of P on each element of T, each branch of P has been executed at least once.

Page 44: CS 406 Fall 98 Software Testing Part III: Test Assessment and Improvement

Test assessment and improvement 44

Branch coverage

Class exercise:• Identify all condition nodes in the flow graph you

have drawn earlier.

• Does T= {(x=0)} satisfy the branch coverage criterion?

• If not, then improve it so that it does.

Page 45: CS 406 Fall 98 Software Testing Part III: Test Assessment and Improvement

Test assessment and improvement 45

Branch coverage-weakness

Consider the following program which is suppose to check that the input data item is in the range 0 to 100, inclusive:

int check(x);

int x;

{if ((x>=0 )&& (x<=200))

check=true;

else check=false;

}

Page 46: CS 406 Fall 98 Software Testing Part III: Test Assessment and Improvement

Test assessment and improvement 46

Branch coverage-weakness

Class exercise:• Do you notice the error in this program?

• Find a test set T which is adequate w.r.t. statement coverage and does not reveal the error.

• Improve T so that it is adequate w.r.t. branch coverage and does not reveal the error.

• What do you conclude about the weakness of the branch coverage criterion?

Page 47: CS 406 Fall 98 Software Testing Part III: Test Assessment and Improvement

Test assessment and improvement 47

Condition coverage

Condition nodes in G might have compound conditions.

For example, in the check program the condition node contains the condition:

This is a compound condition which consists of the elementary conditions x>=0 and x<=200.

((x>=0 ) && (x<=200))

Page 48: CS 406 Fall 98 Software Testing Part III: Test Assessment and Improvement

Test assessment and improvement 48

Condition coverage-continued

A compound condition is considered covered if all of its constituent elementary conditions evaluate to true and false, respectively, during some execution of P.

A test set T is adequate w.r.t. condition coverage if all conditions in P are covered when P is executed on elements of T.

Page 49: CS 406 Fall 98 Software Testing Part III: Test Assessment and Improvement

Test assessment and improvement 49

Condition coverage-continued

Class exercise:• Improve T from the previous exercise so that it is

adequate w.r.t. the condition coverage criterion for the check function and does not reveal the error.

• Do you find the above possible?

Page 50: CS 406 Fall 98 Software Testing Part III: Test Assessment and Improvement

Test assessment and improvement 50

Branch coverage-weakness, continued

Consider the following program:

0. int set_z(x,y); {1. int x,y;2. if (x!=0) 3. y=5;4. else z=z-x;5. if (z>1)6. z=z/x; 7. else8. z=y; }

What might happen here?

Page 51: CS 406 Fall 98 Software Testing Part III: Test Assessment and Improvement

Test assessment and improvement 51

Branch coverage-weakness

Class exercise:• Construct T for set_z such that (a) T is adequate

w.r.t. the branch coverage criterion and (b) does not reveal the error.

• What do you conclude about the effectiveness of the branch and condition coverage criteria?

Page 52: CS 406 Fall 98 Software Testing Part III: Test Assessment and Improvement

Test assessment and improvement 52

Path coverage

As mentioned before, a path through a program is a sequence of statements such that the entry node of the program CFG is the first node on the path and the exit node is the last one on the path. Is this definition equivalent to the one given

earlier?

Page 53: CS 406 Fall 98 Software Testing Part III: Test Assessment and Improvement

Test assessment and improvement 53

Path coverage-continued

A test set T is considered adequate w.r.t. the path coverage criterion if all paths in P are executed at least once upon execution on each element of T.

Class exercise:• Construct T for set_z such that T is adequate w.r.t.

the path coverage criterion and does not reveal the error.

• Is the above possible?

Page 54: CS 406 Fall 98 Software Testing Part III: Test Assessment and Improvement

Test assessment and improvement 54

Path coverage-weakness

The number of paths in a program is usually very large.

How many paths in set_z? How many paths in check? How many in the program that computes

?yx

Page 55: CS 406 Fall 98 Software Testing Part III: Test Assessment and Improvement

Test assessment and improvement 55

Path coverage-weaknesses

It is the infinite or a prohibitively large number of paths that prevent the use of this criterion in practice.

Suppose that a test set T covers all paths. Will it guarantee that all errors in P are revealed ?

Is obtaining 100% path coverage equivalent to exhaustive testing?

Page 56: CS 406 Fall 98 Software Testing Part III: Test Assessment and Improvement

Test assessment and improvement 56

Variants of path coverage

As path coverage is usually impossible to attain, other heuristics have been proposed.

Loop coverage:– Make sure that each loop is executed 0, 1, and 2

times.

Try several combinations of if and switch statements. The combinations must come from requirements.

Page 57: CS 406 Fall 98 Software Testing Part III: Test Assessment and Improvement

Test assessment and improvement 57

Hierarchy in Control flow criteria

Path coverage

Condition coverage

Branch coverage

Statement coverage

X

Y

X subsumes Y.

Page 58: CS 406 Fall 98 Software Testing Part III: Test Assessment and Improvement

Test assessment and improvement 58

Exercise

Develop a test set T that is adequate w.r.t. the statement, condition, and the loop coverage criteria for the exponentiation program.

Page 59: CS 406 Fall 98 Software Testing Part III: Test Assessment and Improvement

Test assessment and improvement 59

Testing technique or strategy

One can develop a testing strategy based on any of the criteria discussed.

Example: – A testing strategy based on the statement

coverage criterion will begin by evaluating a test set T against this criterion. Then new tests will be added to T until all the statements are covered, i.e. T satisfies the criterion.

Page 60: CS 406 Fall 98 Software Testing Part III: Test Assessment and Improvement

Test assessment and improvement 60

Definitions

Error-sensitive path: a path whose execution might lead to eventual detection of an error.

Error revealing path: a path whose execution will always cause the program to fail and the error to be detected.

Page 61: CS 406 Fall 98 Software Testing Part III: Test Assessment and Improvement

Test assessment and improvement 61

Definitions

Reliable: A testing technique is reliable for an error if it guarantees that the error will always be detected.– This implies that a reliable testing technique

must lead to the exercising of at least one error-revealing path.

Page 62: CS 406 Fall 98 Software Testing Part III: Test Assessment and Improvement

Test assessment and improvement 62

Definitions

Weakly reliable: A testing technique is weakly reliable if it forces the execution of at least one error sensitive path.

Page 63: CS 406 Fall 98 Software Testing Part III: Test Assessment and Improvement

Test assessment and improvement 63

Example: error detection

Let us go over the example in Korel and Laski’s paper.

It is a sorting program which uses the bubble sort algorithm.

It sorts an array a[0:N] in descending order. There are two, nested, loops in the program. The inner loop from i6-i10 finds the largest

element of a[R1:N].

Page 64: CS 406 Fall 98 Software Testing Part III: Test Assessment and Improvement

Test assessment and improvement 64

Example: error detection

The largest element is saved in R0 and R3 points to the location of R0 in a.

The outer loop swaps a(R1) with a(R3). The completion of one iteration of the outer

loop ensures that the sub-array a[0:R1-1] has been sorted and that a[R1-1] is greater than or equal to any element of a[R1:N].

Page 65: CS 406 Fall 98 Software Testing Part III: Test Assessment and Improvement

Test assessment and improvement 65

Example: error detection

There is a missing re-initialization of R3 to R1 at the beginning of the inner loop.

In some cases this will cause the program to fail.

What are these cases?

We will get back to this error later!

Page 66: CS 406 Fall 98 Software Testing Part III: Test Assessment and Improvement

Test assessment and improvement 66

Class exercise

Is the path testing strategy reliable for the sort program and for the missing initialization error in it ?

Is it viable ? What about the branch testing strategy? What about loop testing?

Page 67: CS 406 Fall 98 Software Testing Part III: Test Assessment and Improvement

Test assessment and improvement 67

Data flow graph

It represents the flow of data in a program. The graph is constructed from the control

flow graph (CFG) of the program. A statement that occurs within a node of the

CFG might contain variables occurrences. Each variable occurrence is classified as a

def or a use.

Page 68: CS 406 Fall 98 Software Testing Part III: Test Assessment and Improvement

Test assessment and improvement 68

defs and uses

A def represents the definition of a variable. Here are some sample defs of variable x:

• x=y*x;

• scanf(&x,&y);

• int x;

• x[i-1]=y*x;

A use represents the use of a variable in a statement. Here a few examples of use of variable x:

All defs of x are italicized.

Page 69: CS 406 Fall 98 Software Testing Part III: Test Assessment and Improvement

Test assessment and improvement 69

def-use-continued

• x=x+1;

• printf (“x is %d, y is %d”, x,y);

• cout << x << endl << y

• z=x[i+1]

• if (x<y)…

Uses of a variable in input and assignments are classified as c-uses. Those in conditions are classified as p-uses.

All uses of x are italicized.

Page 70: CS 406 Fall 98 Software Testing Part III: Test Assessment and Improvement

Test assessment and improvement 70

def-use-continued

c-use stands for computational use and p-use for predicate-use.

Both c- and p-uses affect the flow of control: p-uses directly as their values are used in evaluating conditions and c-uses indirectly as their values are used to compute other variables which in turn affect the outcome of condition evaluation.

Page 71: CS 406 Fall 98 Software Testing Part III: Test Assessment and Improvement

Test assessment and improvement 71

def-use-continued

A path from node i to node j is said to be def-clear w.r.t. a variable x if there is no def of x in the nodes along the path from node i to node j. Nodes i and j may have a def of x.

A def-clear path from node i to edge (j,k) is one in which no node on the path has a def of x.

Page 72: CS 406 Fall 98 Software Testing Part III: Test Assessment and Improvement

Test assessment and improvement 72

global-def

A def of a variable x is considered global to its block if it is the last def of x within that block.

A c-use of x in a block is considered global c-use if there is no def of x preceding this c-use within this block.

Page 73: CS 406 Fall 98 Software Testing Part III: Test Assessment and Improvement

Test assessment and improvement 73

def-use graph: definitions

def(i): set of all variables for which there is a global def in node i.

c-use(i): set of all variables that have a global c-use in node i.

p-use(i,j): set of all variables for which there is a p-use for the edge (i,j).

dcu(x,i): set of all nodes such that each node has x in its c-use and x is in def(i).

Page 74: CS 406 Fall 98 Software Testing Part III: Test Assessment and Improvement

Test assessment and improvement 74

def-use graph: definitions

dpu(x,i): set of all edges such that each edge has x in its p-use , x is in def(i).

The def-use graph of program P is constructed by associating defs, c-use, and p-use sets with nodes of a flow graph.

The next example is from Jalote’s text, pp425-428.

Page 75: CS 406 Fall 98 Software Testing Part III: Test Assessment and Improvement

Test assessment and improvement 75

def-use graph-continued

1. scanf (x,y); if (y<0)2. pow=0-y;3. else pow=y;4. z=1.0;5. while (pow !=0)6. {z=z*x; pow=pow-1;}7. if (y<0)8. z=1.0/z;9. printf(z);

Sample program:

Page 76: CS 406 Fall 98 Software Testing Part III: Test Assessment and Improvement

Test assessment and improvement 76

def-use graph-continued

1

2 3

6

5

4

7

8 9

def={x,y}c-use=

def={pow}c-use={y}

def={pow}c-use={y}

def={z}c-use=

def=c-use=

def={z,pow}c-use={z,x,pow}

def=c-use=

def=c-use={z}

def={z}c-use={z}

y y

pow pow

y y

Unlabeled edgesimply empty p-use set.

Page 77: CS 406 Fall 98 Software Testing Part III: Test Assessment and Improvement

Test assessment and improvement 77

def-use graph-class exercise

0. int set_z(x,y); {1. int x,y;2. if (x!=0) 3. y=5;4. else z=z-x;5. if (z>1)6. z=z/x; 7. else8. z=y; }

Draw a def-use graph for the following program.

Page 78: CS 406 Fall 98 Software Testing Part III: Test Assessment and Improvement

Test assessment and improvement 78

def-use graph-continued

Traverse the graph to determine dcu and dpu sets.

(node, var) dcu dpu

(1,x) {6}

(1,y) {2,3} {(1,2),(1,3),(7,8),(7,9)}

(2,pow) {6} {(5,6),(5,7)}

(3,pow) {6} {5,6),(5,7)}

(4,z) {6,8,9}

(6,z) {6,8,9}

(6,pow) {6} {(5,6),(5,7)}

(8,z) {9}

Page 79: CS 406 Fall 98 Software Testing Part III: Test Assessment and Improvement

Test assessment and improvement 79

Test generation

Class exercises:– For the above graph generate a test set that satisfies

• the branch coverage criterion

• the all-defs criterion - for definitions of all variables at least one use (c- or p- use) must be exercised.

• the all-uses criterion- all p-uses and all c-uses of all variable definitions be covered.

Develop the tests incrementally, i.e. by modifying the previous test set!

Page 80: CS 406 Fall 98 Software Testing Part III: Test Assessment and Improvement

Test assessment and improvement 80

Data flow testing tool

We will use SUDS, a data flow testing tool developed at Bellcore and available commercially from IBM.

The acronym SUDS stands for Software Understanding and Debugging System.

SUDS is a collection of tools of which ATAC is the one that measures control flow and data flow coverage.

Page 81: CS 406 Fall 98 Software Testing Part III: Test Assessment and Improvement

Test assessment and improvement 81

ATAC processing: phase I

P, Program undertest

Preprocess, compile and instrument

.atac files Instrumented version of P (executable)

Test set

Program output.trace file

upon execution

generategenerate input

upon execution

Page 82: CS 406 Fall 98 Software Testing Part III: Test Assessment and Improvement

Test assessment and improvement 82

ATAC processing: phase II

coverage analyzer

.atac files .trace file

control flow and data flowcoverage values

Page 83: CS 406 Fall 98 Software Testing Part III: Test Assessment and Improvement

Test assessment and improvement 83

ATAC demo

Open DOS window. Go to /Program Files/bellcore/xSUDS/tutorial Type

ataccl /Fedemo main.c wc.c Type

xsuds *.atac You may now view program complexity statistics in the

suds window

Page 84: CS 406 Fall 98 Software Testing Part III: Test Assessment and Improvement

Test assessment and improvement 84

ATAC demo-continued

Go back to the DOS window and type:

demo -c input1 Go to the xSUDS window and examine various coverage

values. Go back to the DOS window and type:

demo -c input2 Go to the xSUDS window and examine how various

coverage values have changed.

Page 85: CS 406 Fall 98 Software Testing Part III: Test Assessment and Improvement

Test assessment and improvement 85

ATAC demo-continued

Repeat the above steps of executing demo on several test inputs. Analyze coverage values and observe how they change with new test data.

Other tools in SUDS will be discussed in the laboratory.

Page 86: CS 406 Fall 98 Software Testing Part III: Test Assessment and Improvement

Test assessment and improvement 86

Mutation testing

What is mutation testing?– Mutation testing is a code-based test

assessment and improvement technique.– It relies on the competent programmer

hypothesis which is the following assumption:

Given a specification a programmer develops a program that is either correct or differs from the correct program by a combination of simple errors.

Page 87: CS 406 Fall 98 Software Testing Part III: Test Assessment and Improvement

Test assessment and improvement 87

Mutation testing-continued

The process of program development is considered as iterative whereby an initial version of the program is refined by making simple, or a combination of simple changes, towards the final version.

Page 88: CS 406 Fall 98 Software Testing Part III: Test Assessment and Improvement

Test assessment and improvement 88

Mutation testing-definitions

Given a program P, a mutant of P is obtained by making a simple change in P.

1. int x,y;2. if (x!=0) 3. y=5;4. else z=z-x;5. if (z>1)6. z=z/x; 7. else8. z=y;

Program

1. int x,y;2. if (x!=0) 3. y=5;4. else z=z-x;5. if (z>1)6. z=z/zpush(x); 7. else8. z=y;

Mutant

What is zpush?

Page 89: CS 406 Fall 98 Software Testing Part III: Test Assessment and Improvement

Test assessment and improvement 89

Another mutant

1. int x,y;2. if (x!=0) 3. y=5;4. else z=z-x;5. if (z>1)6. z=z/x; 7. else8. z=y;

Program

1. int x,y;2. if (x!=0) 3. y=5;4. else z=z-x;5. if (z<1)6. z=z/x; 7. else8. z=y;

Mutant

Page 90: CS 406 Fall 98 Software Testing Part III: Test Assessment and Improvement

Test assessment and improvement 90

Mutant

A mutant M is considered distinguished by a test case t T iff:

• P(t)M(t)

where P(t) and M(t) denote, respectively, the observed behavior of P and M when executed on test input t.

A mutant M is considered equivalent to P iff:

• P(t)M(t) t T.

Page 91: CS 406 Fall 98 Software Testing Part III: Test Assessment and Improvement

Test assessment and improvement 91

Mutation score

During testing a mutant is considered live if it has not been distinguished or proven equivalent.

Suppose that a total of #M mutants are generated for program P.

The mutation score of a test set T, designed to test P, is computed as:

number of live mutants/(#M-number of equivalent mutants)

Page 92: CS 406 Fall 98 Software Testing Part III: Test Assessment and Improvement

Test assessment and improvement 92

Test adequacy criterion

A test T is considered adequate w.r.t. the mutation criterion if its mutation score is 1.

The number of mutants generated depends on P and the mutant operators applied on P.

A mutant operator is a rule that when applied to the program under test generates zero or more mutants.

Page 93: CS 406 Fall 98 Software Testing Part III: Test Assessment and Improvement

Test assessment and improvement 93

Mutant operators

Consider the following program:int abs (x);

int x;

{if (x>=0) x=0-x;

return x;

}

Page 94: CS 406 Fall 98 Software Testing Part III: Test Assessment and Improvement

Test assessment and improvement 94

Mutation operator

Consider the following rule:• Replace each relational operator in P by all possible

relational operators excluding the one that is being replaced.

Assuming the set of relational operators to be: {<, >, <=, >=, ==, !=}, the above mutant operator will generate a total of 5 mutants of P.

Page 95: CS 406 Fall 98 Software Testing Part III: Test Assessment and Improvement

Test assessment and improvement 95

Mutation operators

Mutation operators are language dependent. For Fortran a total of 22 operators were

proposed. For C a total of 77 operators were proposed.

None have been proposed for C++ though most of the operators for C are applicable to C++ programs.

Page 96: CS 406 Fall 98 Software Testing Part III: Test Assessment and Improvement

Test assessment and improvement 96

Equivalent mutant

Consider the following program P:int x,y,z;

scanf(&x,&y);

if (x>0) x=x+1; z=x*(y-1);

elsex=x-1; z=x*(y-1);

Here z is considered the output of P.

Page 97: CS 406 Fall 98 Software Testing Part III: Test Assessment and Improvement

Test assessment and improvement 97

Equivalent mutant-continued

Now suppose that a mutant of P is obtained by changing x=x+1 to x=abs(x)+1.

This mutant is equivalent to P as no test case can distinguish it from P.

Page 98: CS 406 Fall 98 Software Testing Part III: Test Assessment and Improvement

Test assessment and improvement 98

Mutation testing procedure

Given P and a test set T:1. Generate mutants

2. Compile P and the mutants

3. Execute P and the mutants on each testcase.

4. Determine equivalent mutants..

5. Determine mutation score.

6. If mutation score is not 1 then improvethe test set and repeat from step 3.

Page 99: CS 406 Fall 98 Software Testing Part III: Test Assessment and Improvement

Test assessment and improvement 99

Mutation testing procedure

In practice the above procedure is implemented incrementally.

One applies a few selected mutant operators to P and computes the mutation score w.r.t. to the mutants generated.

Once these mutants have been distinguished or proven equivalent, another set of mutant operators is applied.

Page 100: CS 406 Fall 98 Software Testing Part III: Test Assessment and Improvement

Test assessment and improvement 100

Mutation testing procedure

This procedure is repeated until either all the mutants have been exhausted or some external condition forces testing to stop.

We will not discuss the details of practical application of mutation testing.

Page 101: CS 406 Fall 98 Software Testing Part III: Test Assessment and Improvement

Test assessment and improvement 101

Tools for mutation testing

Mothra: for Fortran, developed at Purdue, 1990

Proteum: for C, developed at the University of Saõ Paulo at Saõ Carlos in Brazil.

Page 102: CS 406 Fall 98 Software Testing Part III: Test Assessment and Improvement

Test assessment and improvement 102

Uses of Mutation testing

Mutation testing is useful during integration testing to check for integration errors.

Only the variables that are in the interfaces of the components being integrated are mutated. This reduces the complexity of mutation testing.

Page 103: CS 406 Fall 98 Software Testing Part III: Test Assessment and Improvement

Test assessment and improvement 103

Summary

Test adequacy criterion Test improvement Coverage principle Saturation effect Control flow criteria Data flow criteria

– def, use, p-use, c-use, all-uses

Page 104: CS 406 Fall 98 Software Testing Part III: Test Assessment and Improvement

Test assessment and improvement 104

Summary continued

xSUDS, data flow testing tool. Mutation testing

– mutant, distinguishing a mutant, live mutant, mutant score, competent programmer hypothesis.