Applied Soft Computing - Bournemouth University

Applied Soft Computing 10 (2010) 730–745

Nature-inspired techniques for conformance testing of object-oriented software

A. Bouchachia *, R. Mittermeir, P. Sielecky, S. Stafiej, M. Zieminski

Group of Software Engineering and Soft Computing, Department of Informatics-Systems, University of Klagenfurt, 9020 Klagenfurt, Austria

A R T I C L E I N F O

Article history:

Received 7 December 2008

Received in revised form 17 May 2009

Accepted 5 September 2009

Available online 17 September 2009

Keywords:

Conformance testing

Testing data generation

Genetic algorithms

Evolutionary programming

Ant-colony systems

A B S T R A C T

Soft computing offers a plethora of techniques for dealing with hard optimization problems. In

particular, nature based techniques have been shown to be very efficient in optimization applications.

The present paper investigates the suitability of various nature-inspired meta-heuristics (genetic

algorithms, evolutionary programming and ant-colony systems) to the problem of software testing. The

present study is part of the nature-inspired techniques for object-oriented testing (NITOT) environment.

It aims at addressing the problem of conformance testing of object-oriented software to its specification

expressed in terms of finite state machines. Detailed description, adaptation and evaluation of the

various nature-inspired meta-heuristics are discussed showing their potential in this context of

conformance testing.

� 2009 Elsevier B.V. All rights reserved.

Contents lists available at ScienceDirect

Applied Soft Computing

journa l homepage: www.e lsevier .com/ locate /asoc

1. Introduction

Nature-inspired search meta-heuristics have been the focus ofmany research studies in the context of software engineering andin particular in software testing [41]. The typical application ofthese techniques is the task of automatic generation of test data.Because such a task can be formulated as an optimization problem(that is, the test goal is transformed into an objective function or acombination of objective functions), the suitability of searchtechniques for software testing is an indisputable issue.

For procedural software, Ferguson and Korel [21], Marre andBertolino [40], and Hierons [39] among others addressed theproblem of automatically generating efficient and effective testsuites. For object-oriented problems, the issue appears in aslightly different context. As noted in Binder [7], object-orientedsoftware poses a host of new problems to testers. The state-encapsulation problem sticks out particularly among them. At themethod level, the methodology of testing procedural programsstill applies. The key difference is though, that object-orientedtesting assumes that the code under test is state-based due toencapsulation, while procedural testing assumes that the codecan be strictly functional, i.e., it is state-free. At the level of class(unit), the notion of state becomes even central, since eachinstance is identified by its state which can be changed only by asequence of method (modifier or modifier/inspector) calls. Thisholds also for the cluster level and system level. Hence, as

* Corresponding author.

E-mail address: [email protected] (A. Bouchachia).

1568-4946/$ – see front matter � 2009 Elsevier B.V. All rights reserved.

doi:10.1016/j.asoc.2009.09.003

expressed by Jorgensen and Erickson [31] testing object-orientedsoftware has, from the class level onwards, characteristics ofintegration testing. The challenges of developing an efficient andeffective test suite become even serious.

The present research study focuses on conformance testing ofobject-oriented software. This assumes that a sufficiently formalspecification exists to devise testing data in the form of input,output pairs necessary to execute the corresponding implementa-tion. That is, the specification serves as a basis for an oracle andtesting aims at assessing whether the actual behavior of the codeconforms to the one expected by the oracle. In this context, a testcase is defined as a sequence of method calls that brings the objectto a certain state. Departing from a testing goal derived from thespecification at hand, the testers need to be supported in theirsearch for test sequences that satisfy that testing goal. For instance,the testing goal could be setting the object into a target state, hencetesters need support to obtain test data that achieves the desiredstate. The testing environment NITOT we are currently construct-ing aims at providing this support. We will show through variousexperiments and using different strategies and heuristics how testcases can be generated meeting the test goal.

As various nature-inspired meta-heuristics have already beensuccessfully used for identifying test data for procedural pro-grammes (see Section 3), it appeared reasonable to explore theirsuitability in the context of conformance testing of object-orientedprogramming. Three heuristics are studied for this purpose:genetic algorithms, evolutionary programming, and ant-colonysystems. While the first two heuristics appear to be intuitively thefirst candidates disregarding their technical formulation andadaptation, ant-colony optimization is appealing since state

mailto:[email protected]

http://www.sciencedirect.com/science/journal/15684946

http://dx.doi.org/10.1016/j.asoc.2009.09.003

A. Bouchachia et al. / Applied Soft Computing 10 (2010) 730–745 731

coverage can be conceived as a graph traversal problem. They havebeen shown to be efficient in dealing with various optimizationproblems.

Due to these considerations, the paper highlights the contribu-tions of the current investigation in Section 2. Then, it describes thetesting framework under development in Section 4. In Section 5, itfocusses on the three strategies evaluated. The adaptationsrequired for each of these techniques is described. After a briefintroduction to the whole testing framework (Sect. 6), section 7reports the results from the experiments conducted.

2. Contributions

While in the existing literature pertaining to the object-oriented software testing [12,13,16,24], conventional finite statemachines (FSMs) have been used to model the behavior of theindividual classes and even the behavior of class clusters andsystem, in this paper we introduce a new type of FSMs, called class

finite state machines (CFSM). It is an extension of the classical FSM.It allows explicitly to deal with parameterized messages (i.e.,object-oriented methods involving parameters of different types)and guarded transitions (messages are executed only if thepredicates on the transitions are fulfilled).

From the operational perspective, these CFSMs themselves arerefined into fine grained state machines (GFSMs) which serves asbasis for effective generation of test data.

The novelty of the approach proposed is also enhanced byfurther aspects:

� P
ossibility to handle parameterized methods via augmentedtransitions of the CFSM. � A pplication of search techniques to handle the problem of state
explosion.
� P roposal of adaptation mechanisms for these search techniques
to fit the context of conformance testing.
� P roposal of an integrated testing framework that is generic and
can integrate any other nature-inspired search technique.

The empirical evaluation shows how these aspects are assessedby the present paper.

3. Related work

Due to their key relevance to this study, the application of finitestate machines and nature-inspired meta-heuristics to theproblem of software testing will be highlighted in the nextsubsections.

3.1. FSM as basis for testing

In the context of object-oriented programming, FSMs are usedto represent the the behavior of the individual classes andcombination of FSMs to represent class clusters and system.Therefore, except the algorithmic level (method level), all othertesting levels (i.e., class level, cluster level, and system level) canrely on the FSM formalism [12,13,16,24].

The W-Method [14] can be considered as the precursor of state-based testing methods. It was later renamed as round-trip pathtesting [7]. Round-trip testing guarantees covering each transitionat least once. It relies on a transition tree that is obtained from thestate machine. It is then traversed using traversal algorithms suchas depth-first or breadth-first traversal. This tree shows all theround-trip paths which correspond to transition sequencesstarting and ending with the same state. Each path produces atest case. The round-trip method is quite popular in the state-basedtesting community [1,12]. This technique can also be applied to

flattened statecharts as is originally done in [7] on unifiedmodeling language (UML) diagrams.

There have been several attempts to apply state charts and statemachines especially in the context of communication protocols. In[23], a method called ‘‘partial W-Method’’ an extension of the W-Method [14] has been introduced which aims at reducing the sizeof the testing set at the price of a complex test generationprocedure. A similar method, called harmonized state identifica-tion method, targeting at the same goal has been proposed in [38].There exist also other methods, like Unique Input-Output (UIO) orUnique Input-Output Circuit (UIOC), which are reviewed in[45,26,27,20]. Some of these methods, such as UIO, do notguarantee complete fault detection by testing [9].

In [46], a detailed study of four test sequence generationmethods is presented. These include Transition Tours (T-Method),Unique Input/Output Sequence (U-Method), DistinguishingSequence (D-Method), and Characterizing Set (W-Method – thatis already mentioned). The study has shown that the U-, D-, and W-Methods outperform the T-Method. However, it appears that theW- and U-methods seem to be established as state-based testingmethodologies. The W-Method (round trip) has been applied inmany recent papers [1,12,24,2] due to its acceptable cost andcoverage criteria. On the other hand, the U-Method [45,26,27,20,9]has been widely used because of the fact that not all FSMs have adistinguishing sequence (DS) and most FSMs have UIOs for eachstate. Moreover, the length of a UIO sequence is no longer than DS.

Recently a heuristic approach based on UIO has been proposedin [27] to diagnose faults when testing from finite state machines.Very close to the idea we investigate in this paper, Guo et al.compare the implementation under test (IUT) against a finite statemachine (for which the complete transition diagram is available)describing the specifications of the program. A IUT is faulty if theoutput sequences resulting from both the specification and the IUTare different. This can be caused by either an incorrect output(output fault) or an incorrect state transfer (state transfer fault).The approach proposed by the authors provides mechanisms toidentify the set of transitions that are responsible for the faultybehavior of the IUT. To realize that, a set of reduction techniquesare used to reduce the set of suspect transitions.

Hartmann, Imoberdorf and Meisinger suggest in [28] theapplication of UML statecharts to perform conformance testing ofdistributed component based systems. In order to generate testcases automatically, developers first define the dynamic behaviorof the components via UML Statecharts, specify the interactionsamong them and finally annotate them with test requirements.Test cases are then derived from these annotated statecharts. Avery similar approach using extended finite state machines hasbeen investigated in [24].

Similar to control and data flow coverage criteria, state-basedtesting relies on coverage criteria defined in [43]. These include:all-transitions, all-transition-pairs, all-predicates and all-sequences. Other coverage criteria like all-states and all-n-transitions are very often used. However, all-states (each statemust reached at least once) is subsumed by all-transitions criterionwhich is in turn subsumed by all-n-transitions. While these criteriaare typical for state-based testing, in many research publications[6,30,24,34] state machines have been extended to deal with data-flow coverage criteria [50].

3.2. Heuristics for test case generation

It is, however, important to notice that conformance testing isstill an active research area due to the complexity of systems beingdeveloped in contemporary applications [5]. This complexity isdue to the existence of large number of states and transitions whensuch systems are modeled by state machines. This applies in

Fig. 1. Equivalence between formalisms.

A. Bouchachia et al. / Applied Soft Computing 10 (2010) 730–745732

particular to control and reactive systems. Moreover, little workhas been done in the context of system (or integration) and clustertesting using FSMs. Again at these testing levels, a large number ofstates and transitions are required to exhibit the interactionsbetween the system components and overall behavior of thesystem. Due to these problems, testing such FSMs may becomeunmanageable and subject to the state explosion problem.

To deal with such problems, there have been recently someattempts to apply search techniques. Among other population-based search techniques that have been used for structural test datageneration [11], evolutionary algorithms and ant-colony systemshave been used to conduct state-based testing. Because suchtechniques are not popular in the context of state-based testing,there have been few studies showing their relevance for testing.

In the following we review these studies from the perspective ofstate-based (be it procedural, object-oriented or conformance)testing for the sake of completeness. Note that structural testingaims at finding test suites relying on the structure of the code,hence the name of white-box testing. Moreover, state-basedtesting can be understood from two perspectives: (1) as traditionalstructural testing when the searched test cases aim at achieving astatement depending on a variable whose value (called state) is setonly by executing other parts of the code (e.g. method call) and (2)as testing object-based code in which objects change their stateafter executing messages. Test cases in the first scheme representtuples of input/ouput, while in the second scheme a test casesrepresent sequences of message calls that lead to a state of interest.

In [42], evolutionary algorithms have been applied to search fora sequence of inputs that allow to reach a test target (statement).Such an approach does not deal explicitly with state machines, butrather with procedural programming. It handles the problem ofreachability. The authors used the chaining approach [21] toidentify a sequence of statements, involving internal objectvariables, that need to be executed prior to the test target. Thismay require the execution of functions. In a similar approach,Barasel et al. [4] suggested a particular encoding that allows to takefunction call into account.

Tonella suggests in [47] an evolutionary approach that aims atgenerating input sequences for the structural testing of classes. Theencoding of chromosomes, which stand for test cases in the form ofmethod sequences, incorporates information on which objects tocreate, which methods to invoke and which values to use as inputs(i.e., methods’ parameters). To preserve consistency when apply-ing genetic operators, Tonella proposed a grammar that defines thesyntax of chromosomes. The mutation operation relies on differentoperators: mutation of input value, constructor change, insertionof method invocation and removal of method invocation. A onepoint-crossover has been used for the combination of chromo-some. The proposed algorithm is applied with the aim ofmaximizing the branch coverage measure.

In a similar work, but also restricted to structural testing,Wappler and Lammermann [48] propose an evolutionary approachwhere each chromosome represents a sequence of statements.Each statement consists of a target object, a method and itsparameters. An algorithm to decode and encode the test program isproposed. Since chromosomes are of fixed length, the maximalnumber of statements must be predefined. An interesting ideaproposed in this approach is a multi-level optimization i.e., eachpart of the statement (a target object, a method and parameters)can be optimized on a different level.

Departing from the fact that the computation of Unique Input/Ouput sequences (UIOs) is NP-hard [36], the authors in [25],proposed genetic algorithms (GAs) to deal with the problem ofgenerating UIOs. An initial population is produced by randomlygenerating input sequences. Based on the state splitting treeproposed in [36], a fitness function encourages candidates to split

the set of all states into more discrete units and punishes the lengthof the sequences is proposed. Roulette wheel selection and uniformcrossover are used as genetic operators. Experimental resultssuggest that, in a small system, the performance of the GA basedapproaches is no worse than that of random search while, in a morecomplex system, the GA based approaches outperform randomsearch.

Ant-colony optimization (ACO) has also been used in previousresearch. For instance, in [37], Li et al. suggested an ACO approachto generate test suites from the UML Statechart diagrams for state-based testing. During test case generation, only feasible andoptimal (shortest) ones are retained. This approach does not dealwith object oriented programming features and does not assumethat transitions are equipped with methods having parameters. In[41], the authors reported on the application of ACO to generatesequences of method calls in the context of structural testing. Thiswork does not deal explicitly with state machines, but rather withprocedural programming that handles the problem of reachability.

An ACO-based approach relying on the Markov Usage Model(MUM) was proposed in [32] to derive test cases (test paths) for asoftware system. In this approach, the possible uses of the softwaresystem are modeled by a MUM that reflects the operationaldistribution of the software system and is enriched by estimates offailure probabilities, losses in case of failure and testing costs. In theproposed MUM approach the arcs represent the processing steps ofthe program (functions) and the nodes represent the state of theprogram after executing the function(s) on the incident arc. In orderto generate test paths with minimum loss (due to failure) and thecost (not exceeding a given budget), the authors resort to an NP-hardcombinatorial problem. They applied ACOs where the transitionprobabilities are computed using a combination of the estimates offailure probability, loss probability, and cost probability.

4. A framework for object-oriented conformance testing

State-bearing software, notably object-oriented software isquite often modeled by means of finite state machines (FSM) suchas UML state diagrams. A finite state machine is defined as A ¼ðQ ;S; d; q0; FÞ with:

� Q
is a finite set of states, � S is a finite alphabet of input symbols, � q 0 2Q is an initial start state, � F �Q is a set of final states and � d : Q �S is a a transition function.
A language L2S� is called a regular language if there exists afinite state machine, A, such that L ¼ LðAÞ (see Fig. 1). Moreover, aregular language can be intensionally expressed in terms of aregular grammar or extensionally in terms of the set of terms/phrases that belong to it. The three formulations: FSM, grammarand set of sentences are equivalent representations of the sameregular language. This correspondence is relevant for the test data


generation problem since we can consider the specification of theclass under test (CUT) as an FSM that accepts exactly the‘‘language’’ made up of all messages this class can deal with. Werefer to this language as the CUT’s input language LIðCUTÞ. In caseone is interested also on CUT’s output language, LOðCUTÞ, onewould have to extend the FSM to a Mealy machine. As messagesconsist of method-id’s followed by a list of parameters of themethods, LIðCUTÞ has to represent what will be referred to later asmessage call blocks (MCBs), with each MCB consisting of themethod-id and possibly list of parameter values.

Departing from the assumption that each class under test isspecified by means of a finite state machine, conformance testing,as shown in Fig. 2, consists of comparing the actual behavior of thecode against the expected behavior of the class FSM.This is done ina systematic way by:

(1) d
efining a testing goal (for instance, a certain state), (2) g enerating test cases in the form of method call sequences
based on the FSM and then,
(3) e xercising the test cases on the code of CUT and (4) c omparing the output of the execution against the testing goal.
Because at the level of implementation, states are defined bythe cross-product of specification states with implementationstates defined by class variables, we refer to the corresponding FSMas fine granular finite state machine (GFSM). Transitions in the GFSMare the transitions specified in the specification augmented withthe actual parameter value permitted at the implementation level.

Formally, this GFSM is expressed as a finite state machine FM ¼ðQ ;S; q0; d; FÞ with

� Q
¼ K � V ¼ fhk; v; ijk2 SpecificationStatesV
v2rangeðassociatedIm

plementationVariableÞgThe set of states Q is the cross-product of the names of the

states an object might assume according to its specification, V,with the fine grained states this object might assume in thespecification, T.

In case the implementation uses a user-defined scalar typethat directly corresponds to the states of the specification, onlythe one-to-one mappings of specification state label to imple-

Fig. 2. Concept of conformance testing.

mentation state label will lead to reachable GFSM-states whileall other mappings will lead to isolated states. In case theimplementation uses a more fine granular program variable (asin the bank account example used in Sect. 7), this cross-productcontains all correspondences between specification states andimplementation states. Again, only combinations legitimateaccording to the requirements of the system will be reachable bytransitions. The remaining items of the cross-product will beisolated states in the GFSM.

V
� S ¼ T � P ¼ fht; pijt 2MethodName p2 ðrangeðParameterTy
peÞ [ fegÞg.The alphabet S of the ‘‘language’’ accepted by this automaton

consists of the messages the class under test can accept. This setof messages is defined by the identifiers of methods the classconsists of, T. Since methods might have some parameters (ornone), the specification foresees the combination of message thedescriptor with a list of actual parameter values P. As this list ispossibly empty, the union with the singleton set containing theempty word feg, as elements of the alphabet is needed.

It is to be mentioned already here that in the experiments werestricted the parameter types to match the type of theimplementation variable and that the parameter type wasconstrained to be drawn from either integer, character, or boolean.
� q 0 with q0 2Q is the initial state of the GFSM. � d with d�Q �S!Q is the state transition function.
Due to the definition of Q and S, we have d�ðK � VÞ�ðT � PÞ! ðK � VÞ. Apparently, only those combinations of statesand method invocations figure in d, that represent legitimatetransitions from (specification � implementation)-states con-forming to the specification to a new, legitimate (specification �implementation)-state.
� F with F�ðQnfq0gÞ is the set of final states.
In coverage testing, final states of the automaton don’t play aspecial role. All states but q0 have to be considered as targets andhence F ¼ ðQnfq0gÞ.

The GFSM was defined to obtain a solid formal basis for thestate-based testing process. Apparently, it contains many deadstates (or states that should be dead according to the specification)and even states situated on a path from q0 to an element f 2 F

appear in a multiplicity that calls for the definition of equivalenceclasses. To cope with this multiplicity in the GFSM, a class finite

state machine, CFSM, has been defined. This CFSM CM ¼ðQ ;S; q0; d; FÞ has the same granularity as the GFSM, buttransitions are grouped into equivalence classes according topredicates establishing relationships between the values of thestate(s) they are emanating from. Likewise states are grouped intoequivalence classes by predicates which usually range over valuesof the implementation state variables. To facilitate the imple-mentation of the GFSM, it groups states with common label(specification state) but different implementation values into anequivalence class, treating the implementation value like aparameter of the respective specification state.

The crucial difference is with the transition function d: dCFSM

corresponds to dGFSM in so far as the concrete values of theimplementation state and of the parameter value are representedin the originating state and in the message by variables. In thereceiving state, the implementation state is replaced by anexpression using the variable representing the implementationstate and the message parameter values. The resulting equivalenceclass of elementary transitions is guarded by a predicate over theparameter value(s) and the pre-transitional value of the imple-mentation state (the value of the implementation state in theoriginating state). Thus, dCFSM has the appearance:

d�ðK; vÞ � ðT; pÞ! ðK 0; functðv; pÞÞj predðv; pÞ


with v representing the value of the implementation state and p

representing the parameter value attached to the message. Thespecification-state of the receiving state, K 0 might be different fromK or remain the same. The value of the implementation state, v0 willbe the result of applying function funct on the parameters v and p.The legitimacy of choosing this transition will be ensured by thepredicate predðv; pÞ.

Obviously, testing object-oriented software has to cope withpotentially very complex state space. This holds, due to interactionbetween the different granularity of specification variables andimplementation variables. Moreover, The CFSM has syntactically adifferent appearance from the initial GFSM, but it is semanticallyequivalent. It can be seen as an automaton recognizing (respec-tively generating) the language of all message sequences generatedfrom the fine-grained FSM. Closer to the implementation, CFSMcan is used to perform conformance testing as explained earlier.

5. Nature-inspired techniques for test case generation

The crucial part of conformance testing is the generation of testcases. However, because the computation of particular sequences(e.g., unique input/output sequences), it is NP-hard [36]; hence it isnatural to think about applying meta-heuristics to deal with such acombinatorial problem. It has been shown in many studies[11,35,44,49] how the test data generation problem can beconceived as an optimization problem in the context of structuraltesting.

Likewise, for the sake of conformance testing the goal is to find aminimal set of (shortest) message sequences needed to obtain thetesting target relying on some meta-heuristics. In this paper, threedifferent nature-inspired meta-heuristics for generating testsequences are proposed. These are genetic algorithms, evolu-tionary programming, and ant-colony systems. The followingsection describes how these heuristics are tuned to the specifics ofthe test data generation problem based on the concept of CFSM.

5.1. Genetic algorithms

Genetic algorithms (GA) [29] are seen as a powerful method forsolving ill-structured problems. For solving problems with GA, oneusually finds a genotypical representation for solutions of theproblem, randomly generates a population of n individuals,computes the fitness (adequacy with respect to the optimizationcriterion) for each of them, selects candidates for matingconsidering the candidates fitness, and generates offspringsforming a new generation of candidate solutions by crossing overthe chromosomes of two members of the respective ancestorgeneration. The specifics of these operations vary with thedifferent approaches. A particular specific of the genotypicalapproach is that chromosomes are highly abstract and of fixedlength. This allows to conceive the chromosome as a bit string thatadmits crossover at arbitrary positions. However, there exist alsovariants of GA, where the basic idea of the algorithm just outlinedis preserved, but the representation by genetic material is closer tothe problem domain. Consequently, one has to be more selective inchoosing spots for applying the crossover operation. Theseapproaches are referred to as phenotypical.

As the issue of state-based conformance testing maps to anoptimization problem of finding the smallest set of shortestmessage sequences that meet a given test goal, fixed lengthrepresentations of the problem seem not adequate. Testingproblems can be posed where any fixed length representation ofa solution would be just a little to short. In this case the wholealgorithm needs to be invoked again with a larger chromosomelength. To avoid such inefficiencies, we decided for a phenotypicalrepresentation. This decision led to an easy encoding such that the

chromosome consists of fixed length message blocks (or methodcall blocks, MCB), with each block consisting of the method-id anda parameter space, large enough to admit all parameters of themethod admitting the longest parameter list.

Therefore, Crossover can be implemented in such a way thatcrossover points are either before or right after a method-idposition of an MCB. That is either a partial proper messagesequence is combined with the remainder of another messagesequence or, in case the cut is after the method-id, the new stringconsists of a substring ending with a method-id that obtained newparameters from another sequence. Growth and shrinking of therepresentation is obtained by means of mutation. Four mutation

operators are proposed, MCB-insertion, MCB-deletion Method-id-substitution, and Parameter-value update.

As both, crossover and mutation, are essentially blind opera-tions, one risks that they lead to strings that do not conform to thelanguage admitted by CFSM. This legitimacy problem can betreated in various ways.

� A
n obvious alternative would be to delete illegitimate stringsright away. This has the advantage that one has only propermessage sequences in the population. However, it has thedisadvantage that it would constrain the diversity of thepopulation. � A n alternative is to assign a very low (e.g., minimal) fitness value
to illegitimate offsprings. With this strategy one has to take carethat the population will not be dominated by illegitimatecandidates. Appropriate low fitness values will ensure that this isnot the case. However, this strategy has the advantage that thereremains a low probability that an illegitimate candidate survivesand partakes in a future crossover and mutation operations.
� O f course, one could also constrain both, mutation and crossover
operations, in such a way that out of two syntactically correctparents only syntactically correct descendants would beproduced. This, however, would not only lead to complexoperations, but it also deviates too drastically from the principleideas of GA. Therefore, this option was not pursued.

We decided on a mixed strategy such that illegitimacy due to anillegitimate message selector led to the immediate death of suchstrings, illegitimacy due to illegitimate parameter values, however,allowed such strings to survive. To diminish the effect ofillegitimate sequences, a penalty parameter is added to theobjective function to punish semantic violations. Therefore Theobjective function to be minimized consists of 4 parameters:

(1) T
he coarse grain distance D between the final state in the SMSand the target state’s equivalence class in the CFSM currentlyaimed at.
(2) T
he fine grain distance d estimating the distance yet to becovered on the fine-grained level of implementation states. Itconsists of a component assessed from the d-values in the SMSso far established (dhistoric) and the distance between of the finalstate compared to the target state computed at the level of themessage causing divergence (dfuture).
(3) T
he current length of the message sequence L. This parameter isused to value short message sequences over longer ones.
(4) T
he semantic violation penalty P to punish candidates withcurrently illegitimate parameter values.
The fitness function is then given as:

Min f100Dþ 10dfuture þ dhistoric þ Lþ Pg (1)

Two selection strategies have been implemented: proportionalselection (roulette wheel) and ranking selection [8].


5.2. Evolutionary programming

Evolutionary programming (EP) [15] was originally proposed byFogel [22]. Conceptually, there are many similarities between EP andGA. The initial considerations that gave rise to compare these twoapproaches was that GA usually uses fixed length genotypicalrepresentations while EP uses variable length phenotypical repre-sentations. Further GA uses sexual reproduction while EP uses onlyasexual reproduction, i.e. new offsprings are derived only by frommutation. Finally, only the second argument remained, since, asdescribed above, we had to use variable length phenotypicalrepresentations also in the solution applying GA. Strategicallythough, the CFSM played a slightly different role. While in the GAapproach, both, crossover and mutation were performed blindly andthe CFSM was used only to check the resulting offsprings for theirlegitimacy and to provide information for computing the overallfitness, in the EP approach, information from the CFSM wasindirectly used to constrain mutation. In order to do so, a GrammarG ¼ ðN;S; P; SÞ was derived from the CFSM such that

� T

ta

ac

th

he set of non-terminal symbols N corresponds to the states Q ofthe CFSM.
� T he set of terminal symbols S was identical for both, the
grammar (SG) and the finite state machine (SCFSM).
� T he set of production rules were derived from the transition
function d. However, here a distinction had to be made betweennodes that were transient (i.e., the test sequence is not yetcompleted) and nodes that were to be considered as target states(the test sequence should lead to exactly those states). Thus, oneneeds type-1 productions of the form 8ni;nk 2N;s j 2Sjðdðni;s jÞ ¼ nk

Vnk =2 FÞ)ni!s jnk and a type-2 productions1

of the form 8ni;nk 2N;s j 2Sjðdðni;s jÞ ¼ nk

Vnk 2 FÞ)ni!s j.

Each of these production rules is guarded by the samepredicate that guarded the respective item of the transitionfunction.
� T he start symbol S corresponds to the initial state q0 of the CFSM.
The Grammar G generates the test sequences admissible for theclass under test, i.e., the language LIðCUTÞ. Hence, the phenotypicalrepresentation sought was exactly this test sequence.

Following a generative strategy, the EP algorithm started bycreating an initial population of n (trivial) elements consisting justof the start symbol. Due to application of different randomlychosen mutation operations, these initially identical elementsdiverge into a set of different members (message sequences withproper parameters). Thus, in EP, mutation is the crucial operation.In this case, two different versions of mutation operations areneeded: insertion mutation appends a new method invocationwith randomly chosen parameter settings to a given string, andparameter value mutation randomly changes a parameter valuebut leaves the method-ids unchanged. Therefore, parameter valuemutation also leaves the length of the message string constant.

For insertion mutation, the productions of the grammar play arole in so far, as insertions can only take place at the end of thestring (location of the non-terminal) and the algorithm randomlychooses only among productions eligible on the basis of thetransformed grammar. Parameter values are generated randomlyby a parameter value generator. A length increasing mutation takesplace only if the generated parameter is legitimate according to thepredicate guarding the production rule used. Thus, with insertionmutation, only legitimate strings can be generated.

1 One could dispense of the type-2 productions by allowing e-productions after a

rget state, i.e., a state in F, has been reached. This option is relevant in so far, as the

tual implementation kept information about the states that were traversed

roughout the process.

Parameter value mutation scans over the string of parameter-ized messages generated so far and changes the respectiveparameters according to a random regime considering theparameters’ type. Thus, in a single reproductive step, variousparameters might be changed and these changes happen inblindness w.r.t. the grammar and the CFSM. Hence, as with GA,illegitimate strings might result. However, different from GA, thereis no reason to keep them for simulating parallel mutations. Anyparameter has a chance to mutate anyway due to multipleparameter mutation in a single step. Thus, with EP a check andrepair mechanism is invoked. It scans the string up to the firstconflicting parameter making this string illegitimate. Here, anattempt to solve the problem is made by replacing the conflictingvalue by the parameter value from the parent string and the check-and-repair mechanism continues. If this strategy is insufficient dueto domino effects of earlier, locally admissible parameter changes,backtracking occurs that might eventually lead to retaining theoriginal individual. Thus, none of the mutation operations causespersisting syntactic violations in the population subjected to EP.

The parameters of the fitness function for EP are similar to thefitness function of GA (Eq. (1)). Parameter D and L are used as in GA,P is not necessary, since all elements of the population aresyntactically correct, and d is redefined to take advantage of theinformation in the predicates attached to production (resp.transition) rules. d is defined by the divergence of the actualparameter value from the value(s) in the predicate that wouldallow this string to progress with minimal D to the target state. Thisdistance can be computed for each node, as the CFSM containsoften a moderate number of equivalence classes.

After mutation the population of n individuals is duplicated.Before entering the next evolutionary cycle, the population has tobe reduced again to n members. This selection is done by a q-tournament selection strategy [3] such that the parameter q

determines how many randomly selected members take part in atournament. The rank obtained in the tournament on the basis ofthe fitness value of the individual with respect to other tournamentmembers determines the member’s survival.

Two stopping criteria were defined. The EP algorithm terminateseither after a predefined maximal number of iterations has beenreached or if all of the population members reached the targetstate. Stopping only after all members reached the target state andnot already if one member does serves to avoid prematuretermination with suboptimal (i.e. too long) message sequences.

5.3. Ant-colony system

This approach was designed in full contrast to the GA- and EP-solution. It had its justification in the successful use of ant systemsand ant-colony algorithms (AS and ACS) for graph traversalproblems such as the traveling salesman problem. As the GFSM andthe related CFSM are graphs and both, the state-reachability aswell as the state-coverage problem can be seen as graph traversalproblems, it seemed appropriate to compare the effectiveness ofACS with the strategies mentioned so far.

Ant systems fall into the category of swarm intelligencesystems [10,33]. The basic idea behind ant systems (AS) is thata number of relatively simple individuals (ants) autonomouslyexplore a search space. These individuals cooperate indirectly byleaving some information in their environment which helps othersfor heuristic orientation in the search space. In nature, thisinformation transfer is provided by ants marking the trail theypassed by some chemical substance, called pheromone. Foragingants have a higher probability to follow a trail with a highconcentration of pheromone than to follow an unmarked trail or atrail with low pheromone concentration. As ants returning from afood source reemphasize the pheromone concentration, others will


recognize a short path. As pheromone evaporates over time,pathes to depleted food sources will eventually be given up.The classical paper of using ant systems for optimizationproblems reported the application of AS to the traveling salesmanproblem by Dorigo et.al. [18]. Ant-colony system (ACS) is anextension of the AS-algorithm as proposed by Dorigo andGambardella [17,19].

Formally, ACS rest on the idea that a combinatorial optimizationproblem P ¼ ðS;V; f Þ is defined by a search space S over a finite setof discrete decision variables Xi, a set of constraints V, and anobjective function f !Rþ0 which is to be minimized. The genericvariables Xi take values from a domain set Di ¼ fv1

i ; . . . ; vjDi ji g. A

feasible solution s2 S is a complete assignment of values tovariables that satisfies all constraints in V. s� 2 S is a globaloptimum iff 8 s : Sj f ðs�Þ � f ðsÞ.

The solution is sought by having a population of artificial antstraverse the problem space (graph), evaluate the solution, and,depending on the quality of the solution obtained by an individualant, update the pheromone concentration on the path it hasfollowed. This algorithm runs till a termination condition is met. Itcan be further improved by some local search heuristics.

With the test data generation problem, the state-coverageproblem amounts to finding shortest paths from the origin to allstates. The reachability problem involves finding the shortest pathto a given target state. One might note here a difference to the twoapproaches discussed above. While in GA and EP essentially thetarget has been addressed statewise, in ACS the full graph of theCFSM is always seen as the space where ants can travel. Hence,even while some overall iterative strategy has to be foreseen toobtain full state coverage, it seems reasonable to check off thosestates that are covered while striving to a more remote target. Tobenefit from such accidental hits, a scoreboard for states takes noteof premature coverage of states. The starting state q0 is initiallyconsidered as ‘‘nest’’ of all n ants. The target state is considered as‘‘source of food’’ to be reached in an efficient manner. When an antreaches the target state it is deactivated and waits for others tocomplete their journey.

To capture traffic information, the edges of the CFSM have to beextended such that they can capture pheromone laid by ants.2 Tostay on the CFSM-level, ants were also slightly empowered so thatthey ‘‘carry’’ the information contained in the GFSM. Thus, antsknow the actual value of the implementation variable. To traverse a

transition, the ant has to generate a message satisfying thecondition related to the respective transition (message). Ontraversing the respective transition, the action related to thistransition is performed. Thus, the ant arrives at the next state withan updated value of the fine granular implementation variable.This allows to retrieve the message sequence from the ant’smemory after it has reached the target state.

Selection of an edge occurs in a dual procedure. If a randomlydrawn value is less than a predefined threshold, the next edge ischosen randomly. If it meets or is above this threshold, the edgewith the highest pheromone concentration is chosen determinis-tically. After the transition to use has been chosen, the antgenerates randomly the parameters syntactically needed for thistransition. They, in conjunction with the GFSM-state value, have tobe checked against this transition’s predicate. The specifics of theinterdependence between parameter generation and the transitionpredicate need to be considered. If the evaluation determines thatan ant cannot reach an intended state, this state is put on atemporal tabu list. If it exhausted all possibilities of leaving the

2 To keep the search space of moderate size, the pheromone information is

attached to the CFSM transition equivalence classes and not to the GFSM raw

transitions.

state it is currently in (i.e., due to interactions with previouschoices it got stuck) it is transferred back to the ‘‘nest’’.

Pheromone update is performed after every iteration. Accordingto ACS a local and a global pheromone update is performed. Locally,the amount of pheromone ti j on edge (transition) ti j correspondingto dði; pvÞ ¼ j, receives the new value t0i j ¼ ð1�cÞti j þct0, witht0 being the initial value of pheromone and c2 ð0;1� being thepheromone decay coefficient. Global pheromone update isperformed to emphasize links on the best path discovered sofar. The amount of pheromone accumulated on transitions of theCFSM laying on the shortest path from the origin (q0) to the targetstate is increased according to the equation t0i j ¼ ð1� rÞti j þrDtbest

i jwhere r2 ð0;1� is a global pheromone update coefficient

and Dtbesti j

is 1 over the length of the best path, if arc ti j is located onthis path. Otherwise, it is 0.

As stopping criterion reaching a maximum number of iterationsis used. When the optimal solution was known beforehand, havingan ant arriving at an optimal solution was used as alternativestopping criterion.

5.4. Summary of algorithms

Before entering the evaluation section, one might mentionalready that the three meta-heuristics used fall into two diversecategories, variations of genetic algorithms and swarm intelligence.In all three cases, the interaction between parameter values to bechosen and the fine granular state information that has to bepreserved to adequately account for the curricular history of the CUTcaused specific problems for the algorithms. In case of GA and EP itmay cause offsprings to be illegitimate according to the ‘‘grammar’’of the input language admitted by the CUT, in case of ACS it mightcause ants to get stuck at a certain node. It will be interesting how thesolutions foreseen cope with this problem during the evaluation.

A further aspect to be mentioned is that due to the phenotypicalrepresentation used with GA, the basic ideas underlying the GA andthe EP-solution differed less than foreseen when designing thisexperiment. With different strategies of handling the case ofillegitimate offsprings, there is still some merit in exploring bothsolutions.

6. NITOT environment

The development of the environment for nature-inspiredtechniques for object-oriented testing (NITOT environment)intends to provide a general framework for software testing, inparticular for conformance testing. As illustrated in Fig. 3, NITOTconsists of the following components:

� T
he Program loader is responsible for loading to memory the classunder test (CUT) of the system under test (SUT). � T he Specification loader is responsible for loading the specifica-
tion of the CUT or SUT. Depending upon the meta-heuristic(triggered by the case generator), a particular form of the FSM(the CFSM corresponding to the CUT’s specification) is used. Thisautomaton provides the basic information for the nature-inspired optimizer.
� T he Monitoring unit is responsible for monitoring the testing
process by specifying progressively the testing adequacy criteria(test coverage) that the optimization should meet and activatingthe simulator for launching the process searching for test casesand the execution driver that executes the selected class with theautomatically generated test cases. Thus, the monitoring unitincludes:- Class collector that specifies the class or the set of classes to be

checked for conformance and their specifications.

Fig. 3. Architecture of the NITOT environment.


- Data manager that tracks the optimization goals and managesthe test cases during the optimization process.

- Simulator that coordinates the process of test case execution ofthe CUT. It activates and controls the execution driver and thecase generator.

- Comparator compares the results obtained from executing thetest driver with the content of the actual state of thespecification that served as target state of the respectivemessage sequence.

� T
he Case generator gathers test cases from the nature-inspiredalgorithms. It also provides some utilities used by thesealgorithms. Nature-inspired algorithms are search strategiesthat are applied on the specifications of the CUT to generateadequate test cases. At this stage, only three algorithms(evolutionary programming, ant-colony optimization andgenetic algorithm) are available, but alternative strategies canbe plugged in in the future. � T he Execution driver serves to execute the CUT using the test
cases obtained by the test case generator.

7. Experimental evaluation

For the sake of evaluation, we use several benchmarks. Since theobjective of the present study is to deal with isolated classes, wehave chosen benchmarks that represent individual classes butmodeling an entire application. In Section 7.1, each of thesebenchmarks is briefly described.

In this evaluation study, we applied the simple all-statescoverage which is achieved when the test reaches every statein the model at least once. The coverage results and analysis

of each of the proposed algorithms are discussed in Sections7.2–7.4.

7.1. Benchmarks

The following five benchmark specifications have been used forassessing the relative merits of the evaluated algorithms:

(1) A
ccount: This benchmark has been used at several occasions intesting literature [30]. The class implements a bank accountwith possible operations on it (see Fig. 4). It has 5 states and 17transitions.
(2) L
ocks: This class has been created by the authors. It models theconsumer-producer problem and consists of 7 states and 12transitions (see Fig. 5). This benchmark is useful because themessages have Boolean parameters, which are known to causedifficulties in white-box testing.
(3) S
tack: This benchmark models the behavior of a stack. TheStack consists of 4 states and 7 transitions (see Fig. 6). It alsocontains messages with Boolean parameters and predicateswith Boolean variables. Stack is further of interest as it has, onthe basis of a very simple FSM-model for specifying the stacksbehavior, a tricky interaction between the usual FSM on thespecification level and the related state model on theimplementation level. State full will be reached only after asufficient number of push operations without intermittent pop
operations having been performed (strictly,pushð Þ � po pð Þ> ðstackCa pacity� 1Þ).

(4) C
offee Vendor Machine (CVM): This benchmark models thebehavior of a coffee machine after inserting a coin. CVM

Fig. 4. CFSM of Account.


consists of 5 and 10 transitions (see Fig. 7). It also containsmessages with Boolean parameters and predicates withBoolean variables.

(5) E

Table 1Setting of GA parameters.

Parameter Value/strategy

Population size 40

Penalty, K 50

Insert probability 0.05

Delete probability 0.05

Substitute probability 0.05

Parameter update probability 0.05

Generation Simple

Selection Ranking

Stopping criterion avgFitness<30

xam: This is also proposed by the authors. The particularity ofthis benchmark is that attributes and method parameters are ofcharacter type. It consists of 7 states and 6 transitions (seeFig. 8). Beyond this, Exam has the interesting feature that itmodels certain aspects of the curriculum of the objectrepresented, but these aspects are dependent on some internalvariable (Ans) that matters for the legitimacy of transitionsfrom state After Second but are not explicitly captured in thestate information After Init and After First. Thus, in an exam of amodeling class, a student producing such a state diagram torepresent the problem would probably flunk the exam.However, in practical situations, one does meet state diagramswith such hidden interactions. Therefore, the example has beenintroduced into the evaluation benchmark to see how thealgorithm would cope with such implicit interactions betweenspecification states and implementation states.

The diagrams in Figs 4–8 graphically represent CFSM’s. Thecircles indicate the equivalence classes of states, labeled with thelabel of the specification FSM. The arrows show the equivalenceclasses of transitions. They are labeled by the method-idcharacterizing this transition equivalence class with its associatedparameter. The angular brackets enclose the predicates on thetransition. The curly brackets include the actions to be executedonce the transition is traversed.

7.2. Evaluation of GA

Following the description of the methodology described inSection 5.1, the genetic algorithm produces chromosomes whosequality (fitness) is controlled by the closeness (or inversely thedistance) to the target state in the FSM. The GA starts with apopulation of n chromosomes of length 1 and iterates by applyingthe genetic operators. Depending on the evaluation goals, thealgorithm stops when either a solution is found, the average fitnessexceeds a threshold, the execution time exceeds a certainthreshold, or the number of iterations exceeds a certain maximum.

GA is run on the five benchmarks Account, CVM, Stack, Lock,and Exam using the parameter setting shown in Table 1. Note thatfor the stopping criterion is avgFitness<30. This is because thevalue of D (in Eq. (1)) is multiplied by 100 (for scaling purposes).

Fig. 5. CFSM of Locks.

Fig. 6. CFSM of Stack.

Table 2Time required for full coverage by GA.

Benchmark Cov. time (s)

Account 1.589

CVM 2.021

Stack 2.264

Locks 6.187

Exam 7.253


Any sequence with a fitness value less than 100 indicates that thefinal state is the target state; hence the criterion allows to find agroup of solutions not only one. We set stopping criterion to 30, butit could be set to any value below 100 (smaller values are preferredto obtain multiple solutions). The other parameters are set basedon either default values (mutation, crossover) or initial empiricalexperiments (typical parameters: penalization, average fitness,etc.). As these parameter values are crucial with all algorithms, werun the experiments 30 times each.

The values of the parameters are obtained by varying eachparameter (or strategy) in its allowed range (resp. set). Thisanalysis was guided by the legitimacy ratio of the population(meaning the proportion of legitimate runable sequences). GA isthen executed on the five benchmarks to fulfill the all-statescoverage. This has has been achieved after different durations asshown in Table 2. The most demanding benchmark is Exam, whilethe less demanding one is Account (Table 2).

GA was able to achieve full coverage for each of the bench-marks. For the Account case Table 3, each state is reached in less

than a second. The shortest sequences of length 2 (e.g., account(),withdraw(100)) are found as shown in Fig. 9. For some states theaverage legitimacy ratio is very low (15% of the population islegitimate). This is because during the algorithms initialization thepopulation is generated randomly, usually with a low legitimacyratio. However, the most important issue to find at least one testcase has been achieved.

In the CVM benchmark case, reaching the state busy took twiceas long than the longest sequence in the Account’s FSM (Table 4).

Fig. 7. CFSM of CVM.

Fig. 8. CFSM of Exam.


GA was able to deal with this benchmark containing non-parameterized methods in the specification in an efficient way.Likewise, the state full in the Stack benchmark requires a longmessage sequence. However, in this example it can be seen from

Table 3Results of the benchmark Account.

State Avg #iter Avg time Avg coverag

Overdrawn 16.7 0.812 100%

inCredit 1.0 0.015 100%

Blocked 7.6 0.381 100%

Final 7.7 0.381 100%

the average solution length that the algorithm is continuouslyadding messages to the sequence (Fig. 10).

The Locks benchmark is solved quickly by GA. 100% coverage isreached (Table 6). The Boolean parameters in the specification are

e Avg #iter for the 1st solution Avg legitimacy ratio

12.40 82%

1.00 15%

1.00 15%

4.10 75%

Fig. 9. Overview of the state overdrawn (Account). Fig. 10. Overview of the state buy (CVM).

Fig. 11. Overview of the state empty (Locks).


efficiently handled. Fig. 11 pertaining to the state emptyCLockshows the typical evolution of the message length together withthe objective minimization values as the number of iterationsincreases. Moreover, the Stack benchmark is also processedconveniently by GA achieving 100% coverage (Table 5). However,reaching the state Full requires a long message sequence as shownin Fig. 12.

The Exam benchmark is the most difficult one among all. On thepath leading to the state good, it contains an equality condition,which refers to a transition encountered in previous steps. Despitethat, 100% coverage in each run was reached (Table 7). With thestate good, the convergence requires more iterations because of thepredicate on the transition leading to it. Usually equalityconditions are hard to fulfill and might lead even to a plateau asshown in Fig. 13. On average the state good was reached after 71iterations.

These experiments show that GA, as proposed in this paper, isan interesting approach to deal with conformance testing.

Table 4Results of the benchmark CVM.

State Avg #iter Avg time Avg coverage Avg #iter for the 1st solution Avg legitimacy ratio

Busy 21.6 0.990 100% 18.30 75%

Idle 10.3 0.309 100% 6.30 69%

notEmpty 18.6 0.715 100% 6.30 69%

Off 1.0 0.007 100% 1.00 20%

Table 5Results of the benchmark Stack.


empty 1.0 0.022 100% 1.00 35%

notEmpty 5.7 0.167 100% 2.40 70%

full 19.5 2.075 100% 15.70 82%

Table 6Results of the benchmark Locks.


emptyUnlock 1.0 0.017 100% 1.00 23%

emptyCLocked 40.2 2.601 100% 36.50 75%

notEmptyPLocked 12.7 0.399 100% 8.90 75%

notEmptyUnlock 21.9 1.065 100% 17.70 75%

emptyPLocked 10.0 0.301 100% 4.30 71%

notEmptyCLocked 32.9 1.804 100% 29.10 76%

Table 7Results of the benchmark Exam.


AfterSecond 8.9 0.287 100% 8.9 83%

Bad 11.7 0.539 100% 7.9 78%

Medium 53.6 2.782 100% 50.2 78%

Good 71.1 3.645 100% 67.7 84%

Fig. 12. Overview of the state full (Stack).Fig. 13. Overview of the state good (Exam).

Fig. 14. Minimization of the objective function.

Table 8Setting of EP parameters.

Parameter Value (Exam) Account, CVM, Stack, Locks

n 30 30

pIM 0.4 0.8

pPVM 0.2 0.05

q 2n n

bmax 10,000 10,000

bmutation 100 100

NumOfIterations 200 200


However, equality predicates pose a challenge to the algorithm.More discussion on this will follow in Section 7.5.

7.3. Evaluation of EP

EP relies on many parameters that need to be set. These includethe population size (n), the insertion mutation probability (pIM),the parameter value mutation probability (pPVM), the tournamentselection parameter (q), the maximum integer number (bmax), themaximum mutating value (bmutation), and the number ofiterations (NumOfIterations). The assigned values are shown inTable 8.

Similar to GA, the values of these parameters are obtained byvarying each parameter in their respective ranges. This analysis hasbeen guided by the convergence (fitness curve) and executiontime. As one can notice, for the benchmark Exam, parameters areset differently. This is due to the difficulty of reaching the state

Table 9Coverage results of the benchmarks by EP.

Benchmark #Iter. Time (ms)

Account 18.00 689

CVM 24.30 460

Stack 26.40 976

Locks 43.00 684

Exam 206.78 434

good using the same setting as other benchmarks (see Table 9).Such difficulty has been already raised in Section 7.2 whenapplying GA.

Running the EP algorithm on the benchmarks, we have obtainedthe results shown in Table 9. Clearly, for each of the benchmarks,EP achieves the all-states coverage very quickly for the classesAccount, CVM, Locks and Stack. The benchmark Exam requiredmore time to be covered, due to the state good. Of importance is thelength of the message sequences. In all coverage tests with respectto all benchmarks’ states, the generated sequences are minimal.

Figs. 14 and 15 illustrate the evolution of the objective function(resp. fitness function) and the sequences’ length for reaching the

Coverage Optimal sol. Difficult state

100% 100% Blocked

100% 100% Busy

100% 100% Full

100% 100% EmptyCLocked

100% 100% Good

Fig. 15. Evolution of the sequences length. Fig. 17. Coverage of Account states.

Fig. 18. Coverage of CVM states.


most demanding states portrayed in column 6 of Table 9 for theAccount, CVM, Locks, and Stacks benchmarks. Details related to thestate good of the benchmark Exam are shown in Fig. 16.

It is important to note that EP is very effective in handling theset of benchmarks. Compared to GA, EP looks even more efficient assuggested by Tables 2 and 9.

7.4. Evaluation of ACS

In order to set the parameters of the the ACS algorithm, areasonably complex graph has been devised. It contains 27 statesand 81 transitions, but without conditions on the transitions andwithout appended actions. The set of parameters influencing thecooperation between ants are t0 2 ð0;1Þ (the initial pheromone onevery transition), p0 2 ½0;1� (the transition probability threshold,higher values of p0 promote exploitation of the pheromone andlow values of p0 encourage ants to randomly explore the searchspace), r2 ½0;1� (pheromone evaporation (actually 1- r) highervalue leads to faster evaporation), j2 ½0;1� is a parameter used inthe local update, a high value leads to high evaporation).

The best value of t0 corresponding to the shortest path betweena given starting node and a given target node was found to be 0.2,r ¼ 0:4, j ¼ 0:09, and p0 ¼ 0:6. Moreover, the number of ants isset to 4, the number of max iterations is set to 10, and the numberof max steps (standing for the number of times an ant is allowed torestart a tour) is set to 60. Intuitively, the number of steps has to beat least as large as the length of the shortest path to the target state.In fact, on average the optimal path is 15 transitions length.However, increasing this number excessively implies muchcomputational time.

Fig. 16. Objective and length evolution for the state good of Exam.

Relying on these parameters, the ACS algorithm has easilyachieved the all-states coverage on three benchmarks Account,CVM and Locks as shown in Figs. 17–19. The maximum number ofiterations required by ACS to reach any state in these benchmarksis 3, which is small. This illustrates the efficiency of the ACSalgorithm.

More interesting is the case of the states full and good in theStack and Exam problems (Figs. 20, 22). They have not beencovered by the algorithm. However, after a close look into thereason, it turned out that the state full requires a larger number ofiterations to get covered. It is easy to obtain at least one sequencethat reaches the state full as shown in Fig. 21. As to the state good inExam (see Fig. 22), the condition on the transition leading to it, thatis Ans1 ¼¼ ‘B’&& Ans2 ¼¼ ‘x’, has to be fulfilled. It seems that thecharacters ‘B’ and ‘x’ could not be generated so that this conditiongets satisfied, meaning that the objective function has reached aplateau such that successive iterations are not producing betterresults anymore.

Fig. 19. Coverage of Lock states.

Fig. 20. Coverage of Stack states.

Fig. 21. Coverage of the state full of Stack

Fig. 23. Difficult transition.


7.5. Discussion

The three algorithms look very efficient in dealing with theproblem of conformance testing. Although the benchmarks are nottoo complex (for the sake of illustration), some of them posechallenges as described in Sections 7.1–7.4. The most difficult issueis the equality predicates. In order to get them fulfilled, an exactvalue is required. Without additional execution time, equalityconditions are hardly satisfied. However, for an efficient treatment,additional investigations are required to devise mechanisms toguide the search towards particular values of interest.

An another problem that requires more attention is when thefulfillment of a predicate is indirectly dependent on previousactions. For instance in Fig. 23, the CFSM contains a single Booleanvariable V1, initially set to false. The condition on the transitionfrom S1 to S2 cannot be satisfied unless the transition from S1 to S1is traversed and M1ðÞ is called with a value 7777. In such a scenario

Fig. 22. Coverage of Exam states.

one needs some chaining mechanisms [21] that allow trackingback the definition of the variable so that an updated sequence isgenerated including M1ðÞ. This allows to set V1 ¼ true and thenM1ðÞ can be executed to reach the goal state S2.

The proposed techniques are efficient when the CFSM reflectsthe correct specifications of the problem. They offer theopportunity to search the test suite via an pseudo-parallelexploration methodology. Moreover, EA and GA are similar interms of coverage, but they differ in terms of efficiency. EP appearsto be the most efficient approach. The reason for this stems fromthe fact that EP constructs feasible solutions, while GA seems towork blindly due mainly to the crossover operation. ACS, on theother hand, was trapped by examples where one specification stateis mapped to a single point in a large number of implementationstates.

As a last observation, the coverage criterion may not be strongenough in generating conformance test suites. Other criteria suchall transitions, all n-transitions, etc. are planned to be studied infuture investigations.

8. Conclusion

The present paper deals with the problem of conformancetesting. It suggests the CFSM formalism that allows to check a classagainst its specification. Three nature-inspired meta-heuristics areproposed, each with a particular respective on the CFSM (set ofwords, regular grammar, graph). These meta-heuristics have beenevaluated on various benchmarks. Evolutionary programming hasbeen found to be slightly better than the other meta-heuristics dueto its knowledge-based approach. However, the first aim is to showthat such meta-heuristics can efficiently deal with the problem oftest case generation for the purpose of conformance testingknowing that each of these heuristics needs a particularrepresentation.

As a primary future work, we intend to follow the same line ofresearch but for the cluster level, and system level relying on classCFSMs combination. Moreover, mechanisms for solving equalitypredicates and string based predicates need to be devised.

Acknowledgment

The authors would like to thank the anonymous reviewers fortheir valuable comments which helped to improve the manuscript.

References

[1] J. Al-Dallal, P. Sorenson, Reusing class-based test cases for testing object-orientedframework interface classes, Journal of Softw. Maintenance 17 (3) (2005) 169–196.

[2] S. Ali, L. Briand, M. Rehman, H. Asghar, M. Iqbal, A. Aamer Nadeem, A state-basedapproach to integration testing based on UML models, Inform. Softw. Technol. 49(11–12) (2007) 1087–1106.

[3] T. Back, R. Gunter, H. Schwefel, Evolutionary programming and evolution stra-tegies: Similarities and differences, in: Proceedings of the 2nd Annual Conferenceon Evolutionary programming, Evolutionary Programming Society, 1993.


[4] A. Baresel, H. Pohlheim, S. Sadeghipour, Structural and Functional Sequence Testof Dynamic and State-based Software with Evolutionary Algorithms, GECCO,2003, pp. 2428–2441.

[5] A. Bertolino, Software testing research: achievements, challenges, dreams, in:Proceedings of the International Conference on Software Engineering (Future ofSoftware Engineering), IEEE Computer Society, 2007, pp. 85–103.

[6] S. Beydeda, V. Gruhn, Integrating white- and black-box techniques for class-levelregression testing, in: Proceedings of the 25th International Computer Softwareand Applications Conference on Investigating Software Development, 2001.

[7] R. Binder, Testing Object-oriented Systems: Models, Patterns and Tool, Addison-Wesley, 1999.

[8] T. Blickle, L. Thiele, A comparison of selection schemes used in evolutionaryalgorithms, Evolut. Comput. 4 (4) (1997) 361–394.

[9] K. Bogdanov, M. Holcombe, F. Ipate, L. Seed, S. Vanak, Testing methods for x-machines: a review, Formal Asp. Comput. 18 (1) (2006) 3–30.

[10] E. Bonabeau, M. Dorigo, G. Theraulaz, Swarm Intelligence: From Natural toArtificial Systems, Oxford University Press, 1999.

[11] A. Bouchachia, An immune genetic algorithm for software test data generation, in:Proceedings of the 7th International Conference on Hybrid Intelligent Systems(HIS 2007), IEEE Computer Society, 2007, pp. 84–89.

[12] L. Briand, M. Di Penta, Y. Labiche, Assessing and improving state-based classtesting: a series of experiments, IEEE Trans. Softw. Eng. 30 (11) (2004) 101–130.

[13] H. Chen, T. Tse, T. Chen, Taccle: a methodology for object-oriented softwaretesting at the class and cluster levels, ACM Trans. Softw. Eng. Methodol. 10 (1)(2001) 56–109.

[14] T. Chow, Testing software design modeled by finite state machines, IEEE Trans.Softw. Eng. 4 (3) (1978) 178–187.

[15] K.A. De Jong, Evolutionary Computation: A Unified Approach, MIT Press, 2006.[16] R. Doong, P. Frankl, The ASTOOT approach to testing object-oriented programs,

ACM Trans. Softw. Eng. Methodol. 3 (2) (1994) 101–130.[17] M. Dorigo, L. Gambardella, Ant colony system: a cooperative learning approach to

the traveling salesman problem, IEEE Trans. Evolut. Comput. 1 (1) (1997) 53–66.[18] M. Dorigo, V. Maniezzo, A. Colorni, The ant system: optimization by a colony of

cooperating agents, IEEE Trans. Syst. Man Cybern.: Part B 26 (1) (1996) 29–41.[19] M. Dorigo, T. Stutzle, Ant Colony Optimization, MIT Press, 2004.[20] K. El-Fakih, N. Yevtushenko, G. Bochmann, Fsm-based incremental conformance

testing methods, IEEE Trans. Softw. Eng. 30 (7) (2004) 425–436.[21] R. Ferguson, B. Korel, The chaining approach for software test data generation,

ACM Trans. Softw. Eng. Methodol. 5 (1) (1996) 63–86.[22] L. Fogel, A. Owens, M. Walsh, Artificial Intelligence through Simulated Evolution,

John Wiley, New York, 1966.[23] S. Fujiwara, G.v. Bochmann, F. Khendek, M. Amalou, A. Ghedamsi, Test selection

based on finite state models, IEEE Trans. Softw. Eng. 17 (6) (1991) 591–603.[24] L. Gallagher, J. Offutt, A. Cincotta, Integration testing of object-oriented compo-

nents using finite state machines: Research articles, Softw. Test. Verif. Reliab. 16(4) (2006) 215–266.

[25] Q. Guo, R. Hierons, M. Harman, K. Derderian, Computing unique input/outputsequences using genetic algorithms, in: Proceedings of Workshop on FormalApproaches to Testing of Software, 2003, pp. 146–177.

[26] Q. Guo, R. Hierons, M. Harman, K. Derderian, Improving test quality using robustunique input/output circuit sequences (uiocs), Inform. Softw. Technol. 48 (8)(2006) 696–707.

[27] Q. Guo, R. Hierons, M. Harman, K. Derderian, Heuristics for fault diagnosis whentesting from finite state machines: research articles, Softw. Test. Verif. Reliab. 17(1) (2007) 41–57.

[28] J. Hartmann, C. Imoberdorf, M. Meisinger, UML-based integration testing, in:Proceedings of the 2000 ACM SIGSOFT International Symposium on SoftwareTesting and Analysis, 2000, pp. 60–70.

[29] J. Holland, Adaptation in Natural and Artificial Systems, The University ofMichigan Press, Ann Arbor, 1975.

[30] H. Hong, Y. Kwon, S. Cha, Testing of object oriented programs based on finite statemachines, in: Proceedings of the 2nd Asia-Pacific Software Engineering Confer-ence, 1995.

[31] P. Jorgensen, C. Erickson, Object-oriented integration testing, Commun. ACM 37(9) (1994) 30–38.

[32] K. Karl Doerner, W. Gutjahr, Extracting test sequences from a markov softwareusage model by aco, in: Proceedings of the GECCO’03, 2003, pp. 2465–2476.

[33] J. Kennedy, R. Eberhart, Y. Shi, Swarm Intelligence, Morgan Kaufmann Publ., 2001.[34] Y. Kim, H. Hong, D. Bae, S. Cha, Test cases generation from UML state diagrams, IEE

Proc. of the 9th International Conference on Engineering Complex ComputerSystem, 2004, pp. 75–84.

[35] B. Korel, Dynamic method of software test data generation, Softw. Test. Verif.Reliab. 2 (4) (1992) 203–213.

[36] D. Lee, M. Yannakakis, Testing finite-state machines: State identification andverification, IEEE Trans. Comput. 43 (3) (1994) 306–320.

[37] H. Li, C. Lam, An ant colony optimization approach to test sequence generation forstatebased software testin, in: Proceedings of the International Conference onQuality Software, 2005, pp. 255–264.

[38] G. Luo, G. von Bochmann, A. Petrenko, Test selection based on communicatingnondeterministic finite-state machines using a generalized Wp-method, IEEETrans. Softw. Eng. 20 (2) (1994) 149–162.

[39] R. Hierous, Comparing test sets and criteria in the presence of test hypotheses andfault domains, Trans. Softw. Eng. Methodol. 11 (4) (2002) 427–448.

[40] M. Martina Marre, A. Bertolino, Using spanning sets for coverage testing, IEEETrans. Softw. Eng. 29 (11) (2003) 974–984.

[41] P. McMinn, M. Holcombe, The state problem for evolutionary testing, in: Pro-ceedings of the Genetic and Evolutionary Computation Conference (GECC0), 2002.

[42] P. McMinn, M. Holcombe, Evolutionary testing of state-based programs, in:Proceedings of the Genetic and Evolutionary Computation Conference (GECCO2005), 2005, pp. 1013–1020.

[43] A. Offutt, Y. Xiong, S. Liu, Criteria for generating specification-based tests, in:Proceedings of the 5th International Conference on Engineering of ComplexComputer Systems, 1999, pp. 119–126.

[44] R. Pargas, M. Harrold, R. Peck, Test-data generation using genetic algorithms,Softw. Test. Verif. Reliab. 9 (4) (1999) 263–282.

[45] T. Ramalingam, A. Das, K. Thulasiraman, On testing and diagnosis of commu-nication protocols based on the FSM model, Comput. Commun. 18 (5) (1995) 329–337.

[46] D. Sidhu, T. Leung, Formal methods for protocol testing: a detailed study, IEEETrans. Softw. Eng. 15 (4) (1989) 413–426.

[47] P. Tonella, Evolutionary testing of classes, in: Proceedings of the ACM/SIGSOFTInternational Symposium on Software Testing and Analysis, 2004, pp. 119–128.

[48] S. Wappler, F. Lammermann, Using evolutionary algorithms for the unit testingof object-oriented software, in: Proceedings of GECCO’05, 2005, pp. 1053–1060.

[49] J. Wegener, A. Baresel, H. Sthamer, Evolutionary test environment for automaticstructural testing, Information and Software Technology 43 (2001) 841–854.

[50] E. Weyuker, More experience with data flow testing, IEEE Trans. Softw. Eng. 19 (9)(1993) 912–919.

Documents

Applied Soft Computing - Bournemouth University