View
214
Download
1
Category
Preview:
Citation preview
Testing, the necessity
• Cost of software failure• Effort needed to locate and fix bugs
A lot, in particular if they occur in the latter stages of your development: 0.5 – 3.5 hour at the “unit” level 6 – 10 x more at the “system” level
2
Test early, and test often!
Terminology
• Test object = the thing you test.• SUT, CUT, …• Each “test” is also called “test-case” :
a single or a sequence of interactions with the test object, and where we check if along the interactions the test object satisfy one or more expectations.
(you will have to provide inputs for those interactions!)
• A test-suite = a set of test-cases
3
4
Typical setups
Test Object Test Suite<<use>>
Test Object
Test Suite<<use>>
Test Interface
(if TO can be directly interacted to by the TS, else we’ll have to use TI)
Example, testing function sqrt(x:double) : double
5
test1() { r = sqrt(4) assert r == 2}
implementing our expectation
test2() { r = sqrt(0) assert r == 0}
These could be our test-cases
(actually too strong when dealing with non-integral numbers. Weaken the expectations to allow a small degree of inaccuracy , e.g.
assert |r – 2 | < epsilon , for some pre-specified, small epsilon)
6
Testing objects
• Is a bit different, because an object have operations that may influence each other through the object’s state.
• An object is interacted to through a sequence of calls to its operation.
• So, it is natural to use such a sequence as a test-case to test an object.
• What do you want to verify in your test-case ?– Post-conditions of the operations of Person– Person’s class invariant.
Person- credit : int+ buy(product)+ getMoreCredit(e : Euro)
Testing objects is a bit different
7
Personcreditbuy(product)getMoreCredit(euro)
class MyTestClass { test1() { x = new Person(“Bob”) x.getMoreCredit(10) c0 = x.credit x.buy(apple) assert x.credit == c0 – apple.price() assert x.credit >= 0 }
test2() { … }}
Issue-1 : you may have to test with respect to different interaction sequences. e.g. how about
moreCredit ; buy ; moreCredit
moreCredit ; moreCredit ; buy
Issue-2a : subclassing.
We need to test that Person does not violate Liskov.
Issue-2b : what we can’t be sure that e.g. every subclass of Product respect sLiskov,; this may break your class Person.
We can still test Person where we call buy, with instances of subclasses of Product; Unfortunately this explodes the cost.
8
Well, while we are talking about testing…writing a test-class using JUnit
import org.junit.* ;import static org.junit.Assert.* ;
public class PersonTest {
@Test public void test0 () { System.out.print(“** testing if the initial credit is ok…”) Person P = new Person(“Bob”) ; assertTrue(P.getCredit () == 0) ; System.out.println(“pass”) ; }
@Test public void test2 () { …. Person P = new Person(“Bob”) ; P.getMoreCredit(10) ; Product a = new Apple() ; P.buy(a) ; assertTrue(P.getCredit() == 98) ; … }}
How to determine which test-cases to give ?
• We can try to just choose the inputs randomly.
Not very systematic too expensive if each test-case has to be hand crafted (which is usually the case in practice).
• Idea : to systematically test, divide the “input domain” of SUT into (disjoint) “partitions”.
Hypothesis: SUT behaves “equivalently” on inputs from the same partitions. Therefore it is sufficient to cover each partition once.
• This is also called partition-based testing. Easy to do, quite effective in practice. 11
Example
12
isAdult(person) { … }
Propose this partitions:• persons older than 17 yrs• persons 17 yrs or younger• invalid persons
Test cases e.g.:tc1 : test with 1x person age 40ytc2 : test with 1x person age 4ytc3 : test with person =null
Using “classification tree” hierarchical partitioning
• When you divide a partition P into sub-partitions X,Y,Z … the sub-partitions must be disjoint.
• You can now for example require that each lowest level partition (called “class”) of each category should be covered at least once.
13
Grade
NotSufficient
< 5.0 5.0 ≤ c < 5.5
Sufficient
Student
Invalid Bachelor Master
StudentaddGrade(grade)
This actually has two parameters: the student that receives the operation, and the grade you pass to it.
Invalid
< 0 > 10
“category”
“class”
CTE: graphical tool to manually specify the combinations you want
14
Student
Invalid
Bachelor
Master
Grade
InvalidSufficient
Insufficient
TC1
TC2
TC3
…
5.5 ≤ c ≤ 6.0 > 6.0
5.0 ≤ c < 5.5< 5.0< 0 > 10
These would fully cover the combinations of Invalid student and Invalid grade; the rest are minimally covered.
15
Combinatoric testing
• We can try to test all possible combinations of student’s and grade’s classes.
• This is called ‘full combinations set’.• Can generate a lot of test-cases :
N = #Student x #Grade
where #C is the number of “classes” in the category C• Explode if you have more categories (imagine hundreds of
thousands test-cases!)
Grade
NotSufficient
< 5.0 5.0 ≤ c < 5.5
Sufficient
Student
Invalid Bachelor MasterInvalid
< 0 > 10
16
Combinatoric testing
• We can try “minimal combinations set”; it is the smallest set that contains each class at least once.
• Generate few test cases:
N = max(#Student,#Grade)
• It could be too few !• Else, in some approaches you can declaratively specify the
combinations you want; e.g. something like
(Student.Invalid /\ Grade.Invalid) (Master /\ NotSufficient)
Grade
NotSufficient
< 5.0 5.0 ≤ c < 5.5
Sufficient
Student
Invalid Bachelor MasterInvalid
< 0 > 10
Boundary testing
• Hypothesis: faults are often made at the boundary of your input domains. Therefore, also test boundary values.
• More thorough… but can explode your number of combinations.
17
Grade
Insufficient
< 5.0 5.0 ≤ c < 5.5 partitions without specifically trying to pick boundary values
0.0 4.5 4.99 5.0 5.2 5.49 concrete test inputs
LB HBM LB HBM adding boundaries
18
Positive and negative test
• Positive test: test the test object vs valid inputs.
• Negative test: test it vs invalid inputs. It usually done to check the test-object’s error handling mechanism.
May not be relevant when testing unit-level functions or classes.
You must do it when testing a system. E.g. to make sure that it does not crash on invalid inputs.
Using concrete values as expectations
19
test2 () { Person P = new Person(“Bob”) ; P.getMoreCredit(10) ; Product a = new Apple() ; P.buy(a) ; assertTrue(P.getCredit() == 98) ; }
Comparison to concrete values are often used to express test expectation; but this has drawbacks:• You have to calculate it by hand, and it can be non-trivial to calculate.• If the business logic changes, you have to recalculate them maintenance issue.
Personcreditbuy(product)getMoreCredit(euro)
Property-based testing, more robust
20
test() { p = new Person(“BOB”) e = new Email(“BOB@BOB.NET”) p.addEmail(e) assert p.getEmails().contains(e) assert e.getOwner() == p}
PersonaddEmail(email) Email*has
emails
0..1
owner
prop_personEmail(p,e) { assert p.getEmails.contains(e) assert e.getOwner() == p}
prop_personEmail(p,e)
But sometimes, your expectation can be generalized to a “property”, which parametric. “Properties” are much more robust for maintenance. Furthermore, you can now use generators to generate your test-inputs, since you will re-use the same properties for expectation.
21
Your OCL specs can be converted to “test properties”
PersonaddEmail(email) Email*has
emails
0..1
owner
context p : Personinv : p.emails forall(e | e.owner = p)
context p : Person :: addEmail(e : Email)pre : e nullpost : p.emails.includes(e)
Person_classinv(p) { for (Email e : p.getEmails()) assert e.getOwner() == p}
addMail_spec(p,e) { if (e==null) throw IllegalArg r = p.addEmail(e) assert p.getEmails.contains(e) return r}
test0() { p = … e = … addMail_spec(p,e) Person_classinv(p)}
A test case now looks like this:
22
Concete expectations vs property-based
Property-based+ Make automated testing much easier. + Properties are more robust.- The completeness of your properties-set determine the strength of your test. But it is not always easy to write a complete set of properties.
Concrete-expectations- Cannot be automated- Not robust+ You don’t need to formalize nor implement any specification.
Coverage
• Because your “resource” is limited, you can’t test all possible behavior je have to decide when it is enough.
• A pragmatic approach is to define a quantitative goal, like:
• Coverage : how much (in %) of such a goal is accomplished.• Too little coverage implies you are not done yet.• Full coverage (100%) gives you a ground to stop testing; but it
does not imply correctness.
23
• Every method of in every class C must be exercised.• Every line in every method of a class C must be exercised.• Every partition in the classification tree of method m must be tried.
Code-based coverage
• Line coverage (previous slide)• Decision coverage : all decision branches are exercised• Path coverage : all possible “execution paths” are exercised
24
P(x,y) { if (even(x)) x = x/2 else x-- if (even(y)) y = y/2 else y-- return x+y }
even(x)
even(y)
even(x)
even(y)
Abstractly:decision point
decision branch
Coverage strength
• Coverage criteria differs in “strength”.• Coverage criterion A is stronger than criterion B = full
coverage wrt A implies full coverage wrt B.• Path coverage is stronger than decision; and decision is
stronger than line.• But stronger criterion typically means you need more test-
cases cost.
For previous example :
25
Cov. criterion Min #testcasesFull line cov. 1full decision cov. 2Full path cov. 4
Path-based coverage
• Strong, but unfortunately the number of paths to cover can explode exponentially, or even infinite if you have a loop or recursion.
• More practical solution: pair-wise path coverage.
The goal is to cover all execution subpaths of length 2.
• Example:
26
0
2
1
3
ab
c
d
e
Subpaths of length 2:
aebc, bdceda, db
Pair-wise path vs decision cov.
27
0
2
1
3
ab
c
d
eSubpaths of length 2:
ae bc, bd ce da, db
Requre 3 test-cases to fully cover (in above test-cases we still miss the subpath db):
bce bdae bdbce
2 test-cases can give full decision coverage; these are test-cases that give these executions: (see colored arrows)
Pair-wise path coverage can be generalized to k-wise path coverage; stronger if k is bigger.
28
Testing your candy machine
• In this case, the sequence in which we call the operations matters a lot.
• Most sequences are invalid; how to make sure that we sufficiently cover the valid sequences?
• Use your “state machine model” as guidance.
CandyMachineinsertCoin(c)turnCrank()getCandy()
29
The model of our candy machine
no coin
has coin
sold
empty
inse
rt co
inturn crank
/ [counter is updated]
[not empty]
[empty]
get candy
[1eur]
[else
] / [
coin
is e
ject
ed]
bonus
get candy
turn crank[n>1]
/ [counter is updated]
turn crank
/ [counter is updated]
Model-based testing
• The model specifies which paths are valid paths. Use it to guide in determining which paths to use as test-cases.
• You can even automatically generate the test-cases (without the expectations) !
• Expectations what to check ?• What coverage criterion do you take? E.g. :
– k-wise path– all paths from the start up to depth k
30
no coin
has coin
sold
empty
inse
rt co
in
turn crank
/ [counter is updated]
[not empty][e
mpty]
get candy
[1eur]
[else
] / [
coin
is e
ject
ed]
bonus
get candy
turn crank[n>1]
/ [counter is updated]
turn crank
/ [counter is updated]
Commonly used overall “testing strategy” : V-model
31
requirements
detailedrequirements
Implementation
unit testing
integration testing
system testing
acceptance testing
development testing, by developers
The “preparations” can be done parallel with development. The actual testing must
wait until we have an implementation.
by project owner, or delegated to a 3rd party
prepare
Design
Detailed Design
prepare
prepare
prepare
Requirement documents + use cases
Analysis + design models
32
Fitting V in an iterative SDLC
Inception Elaboration Construction Transition
phases
Iteration: 1 2 3 4 5 ...
time
• requirement
• analysis
• design
• implementation
• testUP’
s co
re w
orkfl
ows
E.g. Unified Process:
V V V V V V
Regression test
• During and after development (maintenance) you modify your software– advancing to the next iteration– bug fixes, new features, refactoring
• Regression test : to test that the new version does not introduce any new error with respect to the unchanged part of the specification.
By re-executing old test-cases (which are still relevant).
• Problem: you may have accumulated a huge set of test-cases. Executing them all (TESTALL) may take a long time...
Solution : apply a selection strategy; but you will need to invest in an infrastructure to facilitate this.
33
Performance testing• Goal: to see how your application reacts to increasing work
load.• Not to be under estimated!• Typical setup:
34
APP
DB
Server
“virtual user”
client machine
A “virtual user” is a program that simulates a user interacting with the App. By creating more virtual users, you increase the load on the App. You typically can run multiple virtual users from a single “clientmachine”. If you need more load, you can then add more client machines.
interaction
client machine
Several standard forms of performance testing
35
#VBs
(loa
d)
time
normal load
expected peaks
load test: to see the app’s response time under its typical load
stress test: to see what the maximum load the app can handle before breaking
X
app crashes
36
Related issues that can get in the way
• Persistence• Concurrency• GUI : testability, explosion• Privacy/data security• Bad management
Persistence
• DB, files form an implicit part of your SUT state!
• Issues:– Interference:
Via SUT, a test case may do side effect on your perisistence. You need a mechanism to undo the effect before you start the next test case.
– Interactions with persistence are slow.– How to create a representative persistence? (for testing
your SUT)
37
Creating a representative DB
• Copy from production DB (if you already have one). Issue:– no effort is needed– usually big; slowing down queries– contains real data privacy and data security
• Construct it from scratch e.g. apply the classification tree approach:
38
PersoonOID
name
ProductOID
namecurrency
price
buys 1..0..
price ≤ 1.00
price> 1.00
Suppose this is your model; every attribute and relation induce categories to partition. Once you have your classification tree, then you can proceed with determining the combinations you want, as explained before. Euro
USD
Other
0 >0 1 >1
Concurrency
• Concurrent execution is timing-sensitive
• Problems– Some errors may be very difficult to trigger– If you do find them, they can be hard to duplicate
39
P grabs fork F
Q grabs fork F
P thinks...
Q think...
Who actually gets the fork depends on the speed of P and Q. Consequently, when P causes an error when grabbing F, there are two issues: (1) for this error to surface we must get the timing right so that P can grab the fork (instead of Q); and (2) to duplicate the error we must be able to duplicate the timing.
Ideally you need a separate infrastructure to let you fully control the oncurrency /timing; but such infrastructure is hard to set up; even then it won’t be able to contain the concurrency of components outside your system.
Risk-based test plan
• Seems to be getting popular• You have limited resources fall back to some prioritization
strategy on various modules of SUT, which in turn determines the effort allocated on those modules.
• Typically you make calculated predictions of risk and impact of those modules. The product, possibly weighted, determines the priority. E.g
• Issue : how to predict risk and impact ??40
Module Risk Impact Priority
M 1 1 1
N 5 3 15
O 3 3 9
Risk of module x: your estimated chance that x fails.Impact of x: your estimated damage when x fails.
An example of discovery handling procedure
41
reported
rejected
opened
assigned fixed
closedre-opened
approve for fixingreview
[bad report]edit
review[report ok]
fix
deferred
reject
test
[fail] test [pass]
reopen
from: Foundation of Software Testing, 2008.
create
Reportcreate()review()...
The SM below describes the life of a bug report; and also its handling procedure.
problem return
Recommended