Regression Testing. 2 So far Unit testing System testing Test coverage All of these are about the first round of testing Testing is performed

Regression Testing

2

Regression Testing

So far Unit testing

System testing

Test coverage

All of these are about the first round of testing Testing is performed time to time during the software

life cycle

Test cases / oracles can be reused in all rounds

Testing during the evolution phase is regression testing

3

Regression Testing

When we try to enhance the software We may also bring in bugs

The software works yesterday, but not today, it is called “regression”

Numbers Empirical study on eclipse 2005

11% of commits are bug-inducing

24% of fixing commits are bug-inducing

4

Regression Example

public int[] reverse(int[] origin){ int[] target = new int[origin.length]; int index = 0; while(index < origin.length - 1){ index++; target[origin.length-index] = origin[index]; } return target;}

//bug, missing origin[0]

public int[] reverse(int[] origin){ int[] target = new int[origin.length]; int index = 0; while(index < origin.length - 1){ index++; target[origin.length-index] = origin[index]; } target[origin.length-1] = origin[0] return target;}

Regression, now crash when length of origin is 0

5

Regression Testing

Run old test cases on the new version of software

It will cost a lot if we run the whole suite each time

Try to save time and cost for new rounds of testing Test prioritization

Test relevant code

Record and replay

6

Test prioritization

Rank all the test cases

Run test cases according to the ranked sequence

Stop when resources are used up

How to rank test cases To discover bugs sooner

Or approximation: to achieve higher coverage sooner

7

APFD: Measurement of Test Prioritization

Average Percentage of Fault Detected (APFD) Compare two test case sequences

A number of faults (bugs) are detected after each test case

The following two sequences, which is better? S1: T1 (2), t2(3), t3(5) S2: T2(1), t1(3), t3(5)

APFD is the average of these numbers (normalized with the total number of faults), and 0 for initial state

APFD (S1) = (0/5 + 2/5 + 3/5 + 5/5) / 4 = 0.5

APFD (S2) = (0/5 + 1/5 + 3/5 + 5/5) / 4 = 0.45

8

APFD: Illustration

APFD can be deemed as the area under the TestCase-Fault curve Consider t1(f1, f2), t2(f3), t3(f3), t4(f1, f2, f3, f4)

9

Coverage-based test case prioritization

Code coverage based Require recorded code-coverage information in

previous testing

Combination coverage based Require input model

Mutation coverage based Require recorded mutation-killing stats

10

Total Strategy

The simplest strategy

Always select the unselected test case that has the best coverage

11

Example

Consider code coverage on five test cases: T1: s1, s3, s5

T2: s2, s3, s4, s5

T3: s3, s4, s5

T4: s6, s7

T5: s3, s5, s8, s9, s10

Ranking: T5, T2, T1 / T3, T4

12

Additional Strategy

An adaption of total strategy

Instead of always choosing the test case with highest coverage Choose the test case that result in most extra

coverage

Starts from the test case with highest coverage

13

Example

Consider code coverage on five test cases: T1: s1, s3, s5

T2: s2, s3, s4, s5

T3: s3, s4, s5

T4: s6, s7

T5: s3, s5, s8, s9, s10

Ranking: T5(5), T2(2, s2, s4) / T4(2, s6, s7), T1(1, s1), T3

14

Combination-coverage based prioritization

Use combination coverage instead of code coverage

Total strategy does not work for combination coverage, why?

Use additional strategy (for n-wise combinations) Example: input model: (coke, sprite), (icy, normal),

(receipt, not)

Test cases: {coke, icy, not}, {coke, normal, not}, {sprite, icy, receipt}, {sprite, normal, receipt}

Ranking for 2-wise prioritization: {coke, icy, not}, {sprite, icy, receipt} (+3), {coke, normal, not} (+2), {sprite, normal, receipt} (+2)

15

Combination-coverage based prioritization

Multi-wise coverage based prioritization

Problem It may be not reasonable to consider combinations on

only certain N-wise, (sprite, normal, receipt) > (sprite, icy, receipt)

Multi-wise prioritization Select the test case with best additional 1-wise

prioritization If there is a tie, go to 2-wise, and then 3-wise, …

Results: {coke, icy, not}, {sprite, normal, receipt} (1-wise + 3), {coke, normal, not} (2-wise + 2, 3-wise + 1), {sprite, icy, receipt} (2-wise + 2, 3-wise + 1)

16

Mutation-coverage based prioritization

Similar to code coverage based prioritization

Run mutation testing for the test suite

Use killed mutants of each test case as criteria

Work for both total and additional strategy

17

Setting the threshold

Prioritization help us to find bugs earlier

Due to resource limit, we do not want to execute all test cases

The testing should stop at some place in the prioritized rank list Resource limit

Money, time

Coverage based Cover all/certain percent of statements

Cover all/certain percent of n-wise combinations Cover all/certain percent of mutations

18

Test Relevant Code

Basic Idea: Only use test cases that cover the changed code

Can be combined with test prioritization

Give more priority to the test cases that cover more code affected by the change Determine the affected code with program slicing

19

Which test case is better?

Consider the following change and test cases

void main() { int sum, i; sum = 0; -> sum = 1; i = read; if(i >= 12){ String rep = report(invalid, i); sendReport(rep) }else{ while ( i<11 ) {

sum = add(sum, i); i = add(i, 1); } }}

Test case: 0Test case: 13

Test case: 0is better because it covers more code in the forward slice

20

Program slicing

Observation The more a test case cover code affected by a

change, the results of the test case is more likely to be changed

Only test the part that are related to the revision

Program slicing: Locating all parts in the code base that will be

affected by the value of a variable

21

Program slicing

Forward slice of variable v at statement s All the code that are either control or data

depend on v at statement s

Backward slice of variable v at statement s All the code that v at statement s depends on

(either control or data dependency)

22

Data Dependencies

Data dependencies are the dependency from the usage of a variable to the definition of the variable

Example:s1: x = 3;s2: if(y > 5){s3: y = y + x; //data depend on x in s1s4: }

23

Control Dependencies

Control dependencies are the dependency from the branch basic blocks to the predicate

Example:

s1: x = 3;s2: if(y > 5){s3: y = y + x; //control depend on y in s2s4: }

24

Example: call-site -> actual argumentsvoid main() {

int sum, i;

sum = 0;

i = 1;

while ( i<11 ) {

sum = add(sum, i);

i = add(i, 1);

}

}

entry:main

expression: sum=0

expression: i=1

control-point: while i<11

call-site: add$0

expression:sum=add$0

call-site: add$1

expression:i=add$1

actual-out:add$0

actual-out:add$1

actual-in:sum$0

actual-in: i$0

actual-in: i$1

actual-in: 1

25

Example: program slicing

static int add(int a, int b){ return a + b;}

entry: add

Formal-in: a Formal-in:b formal-out:add$result

expression: add$result=a+b

26

Example: Inter-Procedure

call-site: add$0

call-site: add$1

actual-out:add$0

actual-out:add$1

actual-in:sum$0

actual-in: i$0

actual-in: i$1

entry: add


actual-in: 1

sum = add(sum, i);i = add(i, 1);

static int add(int a, int b){ return a + b;}

27

Example: Full dependence graphentry:main

expression: sum=0

expression: i=1


call-site: add


call-site: add

expression:i=add$1

actual-out:add$0

actual-out:add$1

actual-in:sum$0

actual-in: i$0

actual-in: i$1

entry: add



actual-in: 1

28

Program slicing for sum = 0 -> sum = 1entry:main

expression: sum=0

expression: i=1


call-site: add


call-site: add

expression:i=add$1

actual-out:add$0

actual-out:add$1

actual-in:sum$0

actual-in: i$0

actual-in: i$1

entry: add



???

actual-in: 1

29

Context Sensitivity

A property that measures whether an analysis is sensitive to the method-invocation context The actual in / out of a method invocation should

match with each other

The actual in / out of different invocations should not

How to do this Bracket matching

Consider actual in / out of $0 to be ‘(’ and ‘)’

Consider actual in / out of $1 to be ‘{’ and ‘}’

A real path should have all brackets matched

30

Program slicing for sum = 0 -> sum = 1entry:main

expression: sum=0

expression: i=1


call-site: add


call-site: add

expression:i=add$1

actual-out:add$0

actual-out:add$1

actual-in:sum$0

actual-in: i$0

actual-in: i$1

entry: add



({

})

actual-in: 1

31

Program Slicing based Test Selection

Retrieve the forward slice of the changed code

Select test cases that will cover more statements in the forward slice

void main() { int sum, i; sum = 0; -> sum = 1; i = read; if(i >= 12){ String rep = report(invalid, i); sendReport(rep) } while ( i<11 ) {

sum = add(sum, i); i = add(i, 1); }}

Test case: 0Test case: 13

Test case: 0is better because it covers more code in the forward slice

32

Record and Replay

A resource waste in regression testing We change the code a little bit

We need to run all the unchanged code in the test execution

Record and Replay For all/some of the unchanged modules

Do not run the modules

Use the results of previous test instead

33

Record and Replay

Example Testing an expert system for finance

Has two components, UI and interest calculator (based on the inputs from UI)

In first round of testing, store as a map the results of interest calculator: (a, b) -> 5%, (a, c) -> 10%, (d, e) -> 7.7%

In regression testing, if the change is made on UI, you can rerun the software with the data map

Recording more objects means saving more time in regression testing, should we record every object???

34

Pros & Cons

Pros Saving time in regression testing

Cons Be careful when recording non-deterministic

components E.g., recording getSystemTime(), may conflict with

another call

Spend a lot of time for recording data maps

Stored data map can be too huge

When the stored object is changed, the data map requires updates

35

Selection of recorded modules

Rules Record time consuming modules

So that you save more time

The recorded module should be stable E.g., libraries

The interface should contain a small data flow E.g., numeric inputs and return values

36

Selection of recording modules

Recording UI Components

Recording Internet Components

Recording components that will affect real world Sending an email

Transfer money from credit cards

37

Review of Regression Testing

Test Prioritization Try only the most important test cases

Test Relevant Code Try the most relevant test cases

Record and Replay Reuse the execution results of previous test cases

Documents

Regression Testing. 2 So far Unit testing System testing Test coverage All of these are about the first round of testing Testing is performed