Testing. Testing, the necessity Cost of software failure Effort needed to locate and fix bugs A lot,...

Testing

Testing, the necessity

• Cost of software failure• Effort needed to locate and fix bugs

A lot, in particular if they occur in the latter stages of your development: 0.5 – 3.5 hour at the “unit” level 6 – 10 x more at the “system” level

Test early, and test often!

Terminology

• Test object = the thing you test.• SUT, CUT, …• Each “test” is also called “test-case” :

a single or a sequence of interactions with the test object, and where we check if along the interactions the test object satisfy one or more expectations.

(you will have to provide inputs for those interactions!)

• A test-suite = a set of test-cases

Typical setups

Test Object Test Suite<<use>>

Test Object

Test Suite<<use>>

Test Interface

(if TO can be directly interacted to by the TS, else we’ll have to use TI)

Example, testing function sqrt(x:double) : double

test1() { r = sqrt(4) assert r == 2}

implementing our expectation

test2() { r = sqrt(0) assert r == 0}

These could be our test-cases

(actually too strong when dealing with non-integral numbers. Weaken the expectations to allow a small degree of inaccuracy , e.g.

assert |r – 2 | < epsilon , for some pre-specified, small epsilon)

Testing objects

• Is a bit different, because an object have operations that may influence each other through the object’s state.

• An object is interacted to through a sequence of calls to its operation.

• So, it is natural to use such a sequence as a test-case to test an object.

• What do you want to verify in your test-case ?– Post-conditions of the operations of Person– Person’s class invariant.

Person- credit : int+ buy(product)+ getMoreCredit(e : Euro)

Testing objects is a bit different

Personcreditbuy(product)getMoreCredit(euro)

class MyTestClass { test1() { x = new Person(“Bob”) x.getMoreCredit(10) c0 = x.credit x.buy(apple) assert x.credit == c0 – apple.price() assert x.credit >= 0 }

test2() { … }}

Issue-1 : you may have to test with respect to different interaction sequences. e.g. how about

moreCredit ; buy ; moreCredit

moreCredit ; moreCredit ; buy

Issue-2a : subclassing.

We need to test that Person does not violate Liskov.

Issue-2b : what we can’t be sure that e.g. every subclass of Product respect sLiskov,; this may break your class Person.

We can still test Person where we call buy, with instances of subclasses of Product; Unfortunately this explodes the cost.

Well, while we are talking about testing…writing a test-class using JUnit

import org.junit.* ;import static org.junit.Assert.* ;

public class PersonTest {

@Test public void test0 () { System.out.print(“** testing if the initial credit is ok…”) Person P = new Person(“Bob”) ; assertTrue(P.getCredit () == 0) ; System.out.println(“pass”) ; }

@Test public void test2 () { …. Person P = new Person(“Bob”) ; P.getMoreCredit(10) ; Product a = new Apple() ; P.buy(a) ; assertTrue(P.getCredit() == 98) ; … }}

JUnit with Eclipse

How to determine which test-cases to give ?

• We can try to just choose the inputs randomly.

Not very systematic too expensive if each test-case has to be hand crafted (which is usually the case in practice).

• Idea : to systematically test, divide the “input domain” of SUT into (disjoint) “partitions”.

Hypothesis: SUT behaves “equivalently” on inputs from the same partitions. Therefore it is sufficient to cover each partition once.

• This is also called partition-based testing. Easy to do, quite effective in practice. 11

Example

isAdult(person) { … }

Propose this partitions:• persons older than 17 yrs• persons 17 yrs or younger• invalid persons

Test cases e.g.:tc1 : test with 1x person age 40ytc2 : test with 1x person age 4ytc3 : test with person =null

Using “classification tree” hierarchical partitioning

• When you divide a partition P into sub-partitions X,Y,Z … the sub-partitions must be disjoint.

• You can now for example require that each lowest level partition (called “class”) of each category should be covered at least once.

NotSufficient

< 5.0 5.0 ≤ c < 5.5

Sufficient

Student

Invalid Bachelor Master

StudentaddGrade(grade)

This actually has two parameters: the student that receives the operation, and the grade you pass to it.

Invalid

< 0 > 10

“category”

“class”

CTE: graphical tool to manually specify the combinations you want

Student

Invalid

Bachelor

Master

InvalidSufficient

Insufficient

5.5 ≤ c ≤ 6.0 > 6.0

5.0 ≤ c < 5.5< 5.0< 0 > 10

These would fully cover the combinations of Invalid student and Invalid grade; the rest are minimally covered.

Combinatoric testing

• We can try to test all possible combinations of student’s and grade’s classes.

• This is called ‘full combinations set’.• Can generate a lot of test-cases :

N = #Student x #Grade

where #C is the number of “classes” in the category C• Explode if you have more categories (imagine hundreds of

thousands test-cases!)

NotSufficient

< 5.0 5.0 ≤ c < 5.5

Sufficient

Student

Invalid Bachelor MasterInvalid

< 0 > 10

Combinatoric testing

• We can try “minimal combinations set”; it is the smallest set that contains each class at least once.

• Generate few test cases:

N = max(#Student,#Grade)

• It could be too few !• Else, in some approaches you can declaratively specify the

combinations you want; e.g. something like

(Student.Invalid /\ Grade.Invalid) (Master /\ NotSufficient)

NotSufficient

< 5.0 5.0 ≤ c < 5.5

Sufficient

Student

Invalid Bachelor MasterInvalid

< 0 > 10

Boundary testing

• Hypothesis: faults are often made at the boundary of your input domains. Therefore, also test boundary values.

• More thorough… but can explode your number of combinations.

Insufficient

< 5.0 5.0 ≤ c < 5.5 partitions without specifically trying to pick boundary values

0.0 4.5 4.99 5.0 5.2 5.49 concrete test inputs

LB HBM LB HBM adding boundaries

Positive and negative test

• Positive test: test the test object vs valid inputs.

• Negative test: test it vs invalid inputs. It usually done to check the test-object’s error handling mechanism.

May not be relevant when testing unit-level functions or classes.

You must do it when testing a system. E.g. to make sure that it does not crash on invalid inputs.

Using concrete values as expectations

test2 () { Person P = new Person(“Bob”) ; P.getMoreCredit(10) ; Product a = new Apple() ; P.buy(a) ; assertTrue(P.getCredit() == 98) ; }

Comparison to concrete values are often used to express test expectation; but this has drawbacks:• You have to calculate it by hand, and it can be non-trivial to calculate.• If the business logic changes, you have to recalculate them maintenance issue.

Personcreditbuy(product)getMoreCredit(euro)

Property-based testing, more robust

test() { p = new Person(“BOB”) e = new Email(“BOB@BOB.NET”) p.addEmail(e) assert p.getEmails().contains(e) assert e.getOwner() == p}

PersonaddEmail(email) Email*has

emails

prop_personEmail(p,e) { assert p.getEmails.contains(e) assert e.getOwner() == p}

prop_personEmail(p,e)

But sometimes, your expectation can be generalized to a “property”, which parametric. “Properties” are much more robust for maintenance. Furthermore, you can now use generators to generate your test-inputs, since you will re-use the same properties for expectation.

Your OCL specs can be converted to “test properties”

PersonaddEmail(email) Email*has

emails

context p : Personinv : p.emails forall(e | e.owner = p)

context p : Person :: addEmail(e : Email)pre : e nullpost : p.emails.includes(e)

Person_classinv(p) { for (Email e : p.getEmails()) assert e.getOwner() == p}

addMail_spec(p,e) { if (e==null) throw IllegalArg r = p.addEmail(e) assert p.getEmails.contains(e) return r}

test0() { p = … e = … addMail_spec(p,e) Person_classinv(p)}

A test case now looks like this:

Concete expectations vs property-based

Property-based+ Make automated testing much easier. + Properties are more robust.- The completeness of your properties-set determine the strength of your test. But it is not always easy to write a complete set of properties.

Concrete-expectations- Cannot be automated- Not robust+ You don’t need to formalize nor implement any specification.

Coverage

• Because your “resource” is limited, you can’t test all possible behavior je have to decide when it is enough.

• A pragmatic approach is to define a quantitative goal, like:

• Coverage : how much (in %) of such a goal is accomplished.• Too little coverage implies you are not done yet.• Full coverage (100%) gives you a ground to stop testing; but it

does not imply correctness.

• Every method of in every class C must be exercised.• Every line in every method of a class C must be exercised.• Every partition in the classification tree of method m must be tried.

Code-based coverage

• Line coverage (previous slide)• Decision coverage : all decision branches are exercised• Path coverage : all possible “execution paths” are exercised

P(x,y) { if (even(x)) x = x/2 else x-- if (even(y)) y = y/2 else y-- return x+y }

even(x)

even(y)

even(x)

even(y)

Abstractly:decision point

decision branch

Coverage strength

• Coverage criteria differs in “strength”.• Coverage criterion A is stronger than criterion B = full

coverage wrt A implies full coverage wrt B.• Path coverage is stronger than decision; and decision is

stronger than line.• But stronger criterion typically means you need more test-

cases cost.

For previous example :

Cov. criterion Min #testcasesFull line cov. 1full decision cov. 2Full path cov. 4

Path-based coverage

• Strong, but unfortunately the number of paths to cover can explode exponentially, or even infinite if you have a loop or recursion.

• More practical solution: pair-wise path coverage.

The goal is to cover all execution subpaths of length 2.

• Example:

Subpaths of length 2:

aebc, bdceda, db

Pair-wise path vs decision cov.

eSubpaths of length 2:

ae bc, bd ce da, db

Requre 3 test-cases to fully cover (in above test-cases we still miss the subpath db):

bce bdae bdbce

2 test-cases can give full decision coverage; these are test-cases that give these executions: (see colored arrows)

Pair-wise path coverage can be generalized to k-wise path coverage; stronger if k is bigger.

Testing your candy machine

• In this case, the sequence in which we call the operations matters a lot.

• Most sequences are invalid; how to make sure that we sufficiently cover the valid sequences?

• Use your “state machine model” as guidance.

CandyMachineinsertCoin(c)turnCrank()getCandy()

The model of our candy machine

no coin

has coin

inturn crank

/ [counter is updated]

[not empty]

[empty]

get candy

[1eur]

get candy

turn crank[n>1]

turn crank

Model-based testing

• The model specifies which paths are valid paths. Use it to guide in determining which paths to use as test-cases.

• You can even automatically generate the test-cases (without the expectations) !

• Expectations what to check ?• What coverage criterion do you take? E.g. :

– k-wise path– all paths from the start up to depth k

no coin

has coin

turn crank

[not empty][e

get candy

[1eur]

get candy

turn crank[n>1]

turn crank

Commonly used overall “testing strategy” : V-model

requirements

detailedrequirements

Implementation

unit testing

integration testing

system testing

acceptance testing

development testing, by developers

The “preparations” can be done parallel with development. The actual testing must

wait until we have an implementation.

by project owner, or delegated to a 3rd party

prepare

Design

Detailed Design

prepare

Requirement documents + use cases

Analysis + design models

Fitting V in an iterative SDLC

Inception Elaboration Construction Transition

phases

Iteration: 1 2 3 4 5 ...

• requirement

• analysis

• design

• implementation

• testUP’

E.g. Unified Process:

V V V V V V

Regression test

• During and after development (maintenance) you modify your software– advancing to the next iteration– bug fixes, new features, refactoring

• Regression test : to test that the new version does not introduce any new error with respect to the unchanged part of the specification.

By re-executing old test-cases (which are still relevant).

• Problem: you may have accumulated a huge set of test-cases. Executing them all (TESTALL) may take a long time...

Solution : apply a selection strategy; but you will need to invest in an infrastructure to facilitate this.

Performance testing• Goal: to see how your application reacts to increasing work

load.• Not to be under estimated!• Typical setup:

Server

“virtual user”

client machine

A “virtual user” is a program that simulates a user interacting with the App. By creating more virtual users, you increase the load on the App. You typically can run multiple virtual users from a single “clientmachine”. If you need more load, you can then add more client machines.

interaction

client machine

Several standard forms of performance testing

normal load

expected peaks

load test: to see the app’s response time under its typical load

stress test: to see what the maximum load the app can handle before breaking

app crashes

Related issues that can get in the way

• Persistence• Concurrency• GUI : testability, explosion• Privacy/data security• Bad management

Persistence

• DB, files form an implicit part of your SUT state!

• Issues:– Interference:

Via SUT, a test case may do side effect on your perisistence. You need a mechanism to undo the effect before you start the next test case.

– Interactions with persistence are slow.– How to create a representative persistence? (for testing

your SUT)

Creating a representative DB

• Copy from production DB (if you already have one). Issue:– no effort is needed– usually big; slowing down queries– contains real data privacy and data security

• Construct it from scratch e.g. apply the classification tree approach:

PersoonOID

ProductOID

namecurrency

buys 1..0..

price ≤ 1.00

price> 1.00

Suppose this is your model; every attribute and relation induce categories to partition. Once you have your classification tree, then you can proceed with determining the combinations you want, as explained before. Euro

0 >0 1 >1

Concurrency

• Concurrent execution is timing-sensitive

• Problems– Some errors may be very difficult to trigger– If you do find them, they can be hard to duplicate

P grabs fork F

Q grabs fork F

P thinks...

Q think...

Who actually gets the fork depends on the speed of P and Q. Consequently, when P causes an error when grabbing F, there are two issues: (1) for this error to surface we must get the timing right so that P can grab the fork (instead of Q); and (2) to duplicate the error we must be able to duplicate the timing.

Ideally you need a separate infrastructure to let you fully control the oncurrency /timing; but such infrastructure is hard to set up; even then it won’t be able to contain the concurrency of components outside your system.

Risk-based test plan

• Seems to be getting popular• You have limited resources fall back to some prioritization

strategy on various modules of SUT, which in turn determines the effort allocated on those modules.

• Typically you make calculated predictions of risk and impact of those modules. The product, possibly weighted, determines the priority. E.g

• Issue : how to predict risk and impact ??40

Module Risk Impact Priority

M 1 1 1

N 5 3 15

O 3 3 9

Risk of module x: your estimated chance that x fails.Impact of x: your estimated damage when x fails.

An example of discovery handling procedure

reported

rejected

opened

assigned fixed

closedre-opened

approve for fixingreview

[bad report]edit

review[report ok]

deferred

reject

[fail] test [pass]

reopen

from: Foundation of Software Testing, 2008.

create

Reportcreate()review()...

The SM below describes the life of a bug report; and also its handling procedure.

problem return

Testing. Testing, the necessity Cost of software failure Effort needed to locate and fix bugs A lot,...

Documents

Ali Kheradmand, Baris Kasikci, George Candea Lockout: Efficient Testing for Deadlock Bugs 1

Debugging and Testing (Chapter 8). Testing ● Testing is the process of running a program with the intent of finding bugs ● Requires running the program

software carpentry - Python€¦ · Software bugs in research are a serious business 5 Testing scientific code Retractions due to software bugs Retractions due to software bugs Retractions

Bugs, Exceptions, and Testingjan/mcs360/bugstest.pdf · Bugs, Exceptions, and Testing 1 Bugs classifying errors forcing type checking on input 2 catching and throwing exceptions the

WESS BETA TEST BETA Testing and Reporting “Bugs” Web Enabled Safety System

Software Engineering Testing and Debugging --- Testing · Recap I Testing – detect the presence of bugs by observing failures I Debugging – ﬁnd the bug causing a certain failure

Testing · 2020. 11. 17. · Testing does not provecorrectness “Program testing can be used to show the presence of bugs, but never to show their absence!” - Edsger Dijkstra •Testing

Necessity of Using Performance Testing

Necessity of Systematic & Automated Testing Techniques Moonzoo Kim CS Dept, KAIST

Or how to find bugs faster…. What is Exploratory Testing? Why are we doing black box testing? Session Based Testing Where do we test? Types

Property-based testing, race conditions, and …...Property-based testing •Finds bugs in everything it’s applied to! Better Testing—Less Work Title Better Testing with Less Work:

Hybrid Fuzz Testing: Discovering Software Bugs via Fuzzing and

1 Software Testing Lecture 12. 2 Software Testing Software Testing is the process of finding the bugs in a software. It helps in Verifying and Validating

Finding bugs from failed verification attemptsi12 · White-box testing through black-box testing + speci cation mining (Beckert, Gladisch) ... White-box testing through black-box

Bugs & Debugging - Testing

Software Testing is All about Adding Value, Not Just Finding Bugs

STM-boris beizer2 UNIT I Introduction: Purpose of testing, Dichotomies, model for testing, consequences of bugs, taxonomy of bugs. UNIT II Flow graphs

QA Engineering Software Testing Methodology Nizam Mahmood ... · Why is Software Testing Important? Testing is important because software bugs could be expensive or even dangerous

Finding Crash-Consistency Bugs with Bounded Black-Box Testing · Bounded Black-Box Crash Testing (B3) Focus on reproducible bugs resulting in metadata corruption, data loss. Found

FUSE: Finding File Upload Bugs via Penetration Testing