19
Int J Softw Tools Technol Transfer DOI 10.1007/s10009-014-0316-3 TASE 12 Model-based test generation using extended symbolic grammars Hai-Feng Guo · Mahadevan Subramaniam © Springer-Verlag Berlin Heidelberg 2014 Abstract A novel, model-based test case generation approach for validating reactive systems, especially those supporting richly structured data inputs and/or interactions, is presented. Given an executable system model and an extended symbolic grammar specifying plausible system inputs, the approach performs a model-based simulation to (i) ensure the consistency of the model with respect to the spec- ified inputs, and (ii) generate corresponding test cases for validating the system. The model-based simulation produces a state transition diagram (STD) automatically justifying the model runtime behaviors within the test case coverage. The STD can further be transformed to produce an evolved sym- bolic grammar, which can then be used to incrementally gen- erate a refined set of test cases. As a case study, we present a live sequence chart (LSC) model-based test generator, named LCT in short, for LSC simulation and consistency testing. The evolved symbolic grammar produced by the simulator can either be used to generate practical test cases for software testing, or be further refined by applying our model-based test generation approach again with additional test coverage cri- teria. We further show that LSCs can also be used to specify and test certain temporal system properties during the model simulation. Their satisfaction, reflected in the STD, can either be served as a directive for selective test generation, or a basis for further temporal property model checking. Keywords Model-based test generation · Symbolic grammar · Live sequence chart H.-F. Guo (B ) · M. Subramaniam Department of Computer Science, University of Nebraska at Omaha, Omaha, NE 68182, USA e-mail: [email protected] M. Subramaniam e-mail: [email protected] 1 Introduction Automatic test case generation has become indispensable to automated software testing, which can significantly reduce the cost of software development and maintenance. There have been extensive researches on automatic test case gen- eration [2, 24, 28, 35, 42] in the last decades. A test case is a description of external event sequences with certain coverage criteria, independent of the system requirement and design. A good test case is usually derived from a formal system model. Generating test cases from a system model often finds inconsistencies and ambiguities in the system requirement and design, and the generated test cases can be fed into the system implementation for conformance testing. Either way will significantly bring down the cost of building a software system. Typical approaches for test case generation use techniques such as model checking to determine whether a system requirement expressed as a certain temporal logic formula holds over a given system model and outputs a correct/faulty trace(s), which are then used to generate test cases. System models represented as extended finite state machines (EFSM) [24, 42], labeled transition systems (LTS) [28], state diagrams [2], and state-based specification [35] have been used. While these approaches have been highly successful they have not paid adequate attention to some basic but important practical concerns. – Test cases in several applications including language processing, protocols, hardware devices, multimedia, and web services employ inputs and/or involve interactions with the system that are highly structured. An impor- tant aspect of test generation for such applications is to generate test cases satisfying the required structures in a systematic and robust manner. Even though test gen- 123

Model-based test generation using extended symbolic grammars

Embed Size (px)

Citation preview

Page 1: Model-based test generation using extended symbolic grammars

Int J Softw Tools Technol TransferDOI 10.1007/s10009-014-0316-3

TASE 12

Model-based test generation using extended symbolic grammars

Hai-Feng Guo · Mahadevan Subramaniam

© Springer-Verlag Berlin Heidelberg 2014

Abstract A novel, model-based test case generationapproach for validating reactive systems, especially thosesupporting richly structured data inputs and/or interactions,is presented. Given an executable system model and anextended symbolic grammar specifying plausible systeminputs, the approach performs a model-based simulation to (i)ensure the consistency of the model with respect to the spec-ified inputs, and (ii) generate corresponding test cases forvalidating the system. The model-based simulation producesa state transition diagram (STD) automatically justifying themodel runtime behaviors within the test case coverage. TheSTD can further be transformed to produce an evolved sym-bolic grammar, which can then be used to incrementally gen-erate a refined set of test cases. As a case study, we present alive sequence chart (LSC) model-based test generator, namedLCT in short, for LSC simulation and consistency testing.The evolved symbolic grammar produced by the simulatorcan either be used to generate practical test cases for softwaretesting, or be further refined by applying our model-based testgeneration approach again with additional test coverage cri-teria. We further show that LSCs can also be used to specifyand test certain temporal system properties during the modelsimulation. Their satisfaction, reflected in the STD, can eitherbe served as a directive for selective test generation, or a basisfor further temporal property model checking.

Keywords Model-based test generation · Symbolicgrammar · Live sequence chart

H.-F. Guo (B) ·M. SubramaniamDepartment of Computer Science, University of Nebraska at Omaha,Omaha, NE 68182, USAe-mail: [email protected]

M. Subramaniame-mail: [email protected]

1 Introduction

Automatic test case generation has become indispensable toautomated software testing, which can significantly reducethe cost of software development and maintenance. Therehave been extensive researches on automatic test case gen-eration [2,24,28,35,42] in the last decades. A test case is adescription of external event sequences with certain coveragecriteria, independent of the system requirement and design.A good test case is usually derived from a formal systemmodel. Generating test cases from a system model often findsinconsistencies and ambiguities in the system requirementand design, and the generated test cases can be fed into thesystem implementation for conformance testing. Either waywill significantly bring down the cost of building a softwaresystem.

Typical approaches for test case generation use techniquessuch as model checking to determine whether a systemrequirement expressed as a certain temporal logic formulaholds over a given system model and outputs a correct/faultytrace(s), which are then used to generate test cases. Systemmodels represented as extended finite state machines (EFSM)[24,42], labeled transition systems (LTS) [28], state diagrams[2], and state-based specification [35] have been used. Whilethese approaches have been highly successful they have notpaid adequate attention to some basic but important practicalconcerns.

– Test cases in several applications including languageprocessing, protocols, hardware devices, multimedia, andweb services employ inputs and/or involve interactionswith the system that are highly structured. An impor-tant aspect of test generation for such applications is togenerate test cases satisfying the required structures ina systematic and robust manner. Even though test gen-

123

Page 2: Model-based test generation using extended symbolic grammars

H.-F. Guo, M. Subramaniam

eration has played an important role in software test-ing, the formal notation for representing test cases, espe-cially their syntactical structures, has often been ad-hoc,and not been emphasized much in the research com-munity. The test cases are often represented as a setof input sequences, individually created and maintainedin a test set, in areas of software testing and mainte-nance [35,38,39,42]. Grammar-based testing [9,30,40]adopts context-free grammars to represent a set of testcases in a structured form; and recently, test genera-tion using symbolic grammars has been shown effec-tive [12,33].

– The transition systems underlying the various systemmodels are indispensable for automatic test genera-tion by approaches such as model checking. One couldmanually design a transition system, or generate oneautomatically from system requirements specified usingUML [2,28,35] by static analysis and transformation.The former approach is tedious and is likely to be error-prone. And, the latter approach may not be preciseenough since the UML specification mainly focuses onthe sequences of events, and may not be semanticallyrich enough to describe system runtime behaviors. Addi-tionally, despite recent advances in static analysis, testgeneration based on static analysis is only as good as itsspecification of transition systems. These specificationscould themselves be complex and error-prone.

– System models, especially those describing real reactivesystems, tend to have multiple related components inter-acting in subtle ways. This can result in inconsistencies inthe model due to contradictory assumptions about envi-ronmental/external inputs. It is well-known that ensuringthe consistency of these models is a formidable problembut is yet necessary in order to use these models for testgeneration. Most of the current model-based test gen-eration approaches implicitly assume the consistency ofthese models. Tightly integrating the model consistencychecking and test generation is crucial to generate mean-ingful test cases. Relating the transition systems to thegenerated test cases is crucial to control and refine testgeneration. In other words, how the transition system,with domain constraints and possible dynamic systemfeatures, affects the generation of test cases? If there is aformal notation for a set of test cases, how will the nota-tion be evolved or refined by considering the constraintsand dynamic features of the state transition system.

This paper is intended to make up the gap by answeringthose above concerns. We present a new test case genera-tion approach [16,17], which takes an executable reactivesystem model and a preliminary test coverage criterion, per-forms an automated model-based simulation, and generates astate transition system describing the dynamic system model

behaviors. The state transition system represents a set of sym-bolic test cases and can be further used for refining the testcases to validate the system.

We introduce a new formal notation for representing testcases having highly structured inputs and/or system interac-tions, in the form of an extended symbolic context-free pro-duction grammar, an extended symbolic grammar in short.An extended symbolic grammar is used to systematicallygenerate test cases for a model-based system, where sym-bolic terminals and domains are introduced in the grammarto hide the complexity of different inputs, which have com-mon syntactic structures as well as similar expected run-ning behaviors. Test generation using symbolic grammarshas been shown effective recently [13,33]. We extend theconcept of a symbolic grammar by allowing limited contextsensitivity between symbolic terminals to be carried overvia parameters among production rules. Automatic test casegeneration from a symbolic grammar is performed by simu-lating its leftmost derivation, where symbolic terminals andtheir associated domains are dynamically maintained.

In order to generate meaningful and a complete test cases,our test generation procedure tightly integrates each deriva-tion step with automated model-based simulation. The auto-mated model-based simulation system takes a reactive sys-tem model and a preliminary symbolic grammar serving astest coverage criteria, performs consistency checking. As aresult, it produces either a state transition graph or a failuretrace, with branched testing and symbolic domain decompo-sition if necessary, to justify the consistency testing results.The state transition graph can be further transformed intoa set of refined test cases, represented as an evolved sym-bolic grammars. This cycle of symbolic grammar evolutionthrough the model-based simulation system can be repeatedwith additional test coverage criteria.

We adopt live sequence charts (LSCs) [10,21] to spec-ify an executable system model. LSCs extends the UMLsequence charts with enabling mandatory behaviors and for-bidden behaviors; LSCs are able to specify system runtimebehaviors in a more semantically precise manner instead ofonly posing weak partial order restriction on possible behav-iors as in sequence charts. We present an LSC model-basedsimulation system, named LCT, which takes LSCs and sym-bolic grammars as inputs and performs an automated sce-nario simulation for consistency testing. The LCT system isan improved version of our previous work [17], where weintroduced a logic-based framework to implement an auto-mated LSC simulator with practical user controls, and to testthe consistency of an LSC specification with a set of enu-merable test inputs. The LCT system utilizes a memoizeddepth-first testing strategy [45] to simulate how a reactivesystem model in LSCs behaves with continuous test casesas inputs. The computational tree has been adapted in away that symbolic terminals are handled dynamically dur-

123

Page 3: Model-based test generation using extended symbolic grammars

Model-based test generation using extended symbolic grammars

ing the scenario simulation. Constraints on symbolic termi-nals may be collected and processed along the simulationto properly decompose their symbolic domains for branchedtesting.

This paper is an extended journal version based on our con-ference papers [16,17], where the extension includes detailedalgorithms, transformation approaches, and a more compre-hensive description of literature review and discussion withrelated works.

The paper is structured as follows. Section 2 briefly intro-duces the syntax and informal semantics of LSCs througha web order example. Section 3 presents the concept ofextended symbolic grammars, shows how the symbolic ter-minals are defined and passed as parameters of productionrules for maintaining context sensitivity, and illustrates howan extended symbolic grammar is applied for automatic testcase generation. Section 4 shows a flow chart explaining howour new model-based test generation works via grammar evo-lution. Section 5 illustrates how a running LSC reacts to thetest events using a simulation tree with symbolic terminalsmaintained dynamically, and Sect. 6 shows how the consis-tency of LSCs is justified through a state transition diagram(STD). Section 7 shows that LSCs can be specified to test run-time system temporal properties, which serve as a directivefor selective test generation. Section 8 shows an automatedtransformation procedure, given a STD, to generate test casesin a form of refined symbolic grammar. Finally, Sect. 9 fur-ther addresses related work, followed by our conclusions inSect. 10.

2 Live sequence chart (LSC)

Live sequence charts [10,21] have been introduced as aninter-object scenario-based specification and visual program-ming language for reactive systems, with more expressiveand semantically rich features compared to other alterna-tives [25,36]. The language introduces a universal chart tospecify mandatory behaviors in a more semantically precisemanner. A universal chart is used to specify a scenario-basedif-then rule, which applies to all possible system runs. A uni-versal chart typically contains a prechart, denoted by a topdashed hexagon, and a main chart, denoted by a solid rec-tangle right below a prechart; if the prechart is satisfied, thenthe system is forced to satisfy or run the defined scenario inthe main chart. The capability of specifying such mandatorybehaviors upgrades the LSCs from a formal specification toolto an achievable dream of becoming a scenario-based pro-gramming language [18], because mandatory behaviors pre-cisely tell what to expect at the system runtime. LSCs havebeen successfully used in many real-life applications such ashardware and software verification [7,8], an air traffic controlsystem [5], and a radio-based train system [3].

2.1 A web order protocol

We use a web order protocol, between a really big corporation(RBC) and a small T-shirt company (STC), to illustrate howLSCs are used to model a reactive system; and the sameexample will be used for test generation throughout the paper.There are two types of orders created daily between RBC andSTC, and each order has a unique numeric index ID from 1to 300. One is a regular order with 1 ≤ ID ≤ 200, whichthe buyer of RBC could abort as long as the seller of STChas not confirmed the order; the other is a customized orderwith 200 < ID ≤ 300, in which once placed by the buyer,no abort action will be accepted. The order index will bereset to 1 daily, and increased by 1 automatically for a newnext order. For each order with an index ID, both RBC andSTC have a state variable, conf[ID], with the three possiblevalues {false, true, abort} denoting whether the order hasbeen initiated, confirmed or aborted, respectively; and theinitial values of both state variables are true. For the clarityand conciseness of the presentation, we assume that therewill be at most one active order at any time; that is, a neworder can be initiated only if all previous orders have eitherbeen confirmed or aborted.

As shown in Fig. 1, LSCs a–e illustrate scenarios for theRBC, and the rest LSCs describe the scenarios for the STC.Buyer and Seller are the only external objects, denoted bywaved clouds, which send the external inputs to the systemmodel. Chart a shows if a buyer creates an order with an indexID, RBC will initiate an order by setting RBC.conf[Id] false,forwarding an order message to STC, and waiting for theacknowledgment. Chart d shows if the buyer wants to abortthe order, and at that point if the order has been initiatedbut not confirmed yet, an abort message will be forwardedto STC, or otherwise, an appropriate message on the orderstatus will be issued to the Buyer. The three stacked rectan-gles within the main chart is a multi-way selection construct,where each hexagon contains a condition. Chart h says thatif STC receives an abort request, and at that point if the orderhas not been aborted or confirmed yet, and it is a regular orderwith Id ≤ 200, it will set STC.conf[Id] to abort and send aconfirmation message to RBC; otherwise, an abort denial willbe issued to the RBC. Explanation for other LSCs are quitestraightforward, therefore omitted here.

2.2 Consistency testing of LSCs

Consistency checking [19,27,29,41] is one of the major andformidable problems on LSCs. In a complicated system con-sisting of many LSCs, inconsistency may be raised by inher-ent contradiction among multiple charts or due to inappro-priate environmental/external event sequences. For example,suppose we use Fig. 2 to substitute the LSC in Fig. 1f. When

123

Page 4: Model-based test generation using extended symbolic grammars

H.-F. Guo, M. Subramaniam

Fig. 1 An LSC specificationfor a symbolic web order system

123

Page 5: Model-based test generation using extended symbolic grammars

Model-based test generation using extended symbolic grammars

Fig. 2 Inconsistent to the LSC in Fig. 1a

the event that a buyer sends order(Id) to RBC happens, itwill activate both LSCs in Figs. 1a and 2. Figure 1a requiresthat the event conf[Id](false) happens first, followed by anacknowledge event orderAck(Id) later. However, this partialorder is inconsistent to the specification in Fig. 2 that theevent conf[Id](false) should happen after the acknowledgeevent orderAck(Id).

Previous work on consistency checking are mainlyfocused on formal automata-based verification strategies.The consistency of LSCs has been shown in [19] to be anecessary and sufficient condition for the existence of its cor-responding object system. Ensuring consistency of LSCs isthus reduced to a problem whether a satisfying object system,in a form of automaton, can be synthesized from the LSCs.Other formal method includes [4,27,29], which turn the LSCspecification into variants of büchi automata for verification.In [41], a semantic transformation from LSCs to communi-cating sequential process (CSP) is presented, so that the con-sistency checking of LSCs can be done by reusing an existingtool for CSP. All those consistency testing approaches arebased on analyzing LSCs statically; and may suffer the com-plexities caused by transformation itself, automata synthesis,or the blowing size of transformed results [20,43]. Allowingusers to simulate and test system models in a dynamic set-ting tends to be more robust and can potentially help themdevelop consistent large-scale systems.

3 Extended symbolic grammars

3.1 The concept

We use an extended symbolic grammar to represent a set oftest cases. A symbolic grammar [12,33] extends a context-free grammar with symbolic terminals and their correspond-ing domains. Here we further extend the concept of symbolicgrammar with parametric production rules and give its formaldefinition as follows.

Definition 1 An extended symbolic grammar is defined in aform of Gs = (V,Σ,Σs, W, P), where V is a set of vari-ables and parametric variables (as explained below shortly),Σ is a set of terminals and parametric terminals (as explainedbelow), Σs is a set of symbolic terminals, W is the mainvariable in V , and P is a set of production rules in the formA→ x , where A ∈ V and x ∈ (V ∪Σ)∗.

The 5-tuple notation extends the standard one of a context-free grammar by the following additions:

– It introduces symbolic terminals Σs and domains in thegrammar to hide the complexity of large redundant inputswhich may have common syntactic pattern and similarexpected behaviors. A symbolic terminal in Σs is rep-resented in the form of {α : D}, denoting a symbolicterminal name α with its associated finite domain D. Thefinite domain for a symbolic terminal can be representedas a list of ordered items in a form of either Lower..Upperor a single value. For example, [5..50, 60, 100..150] is avalid domain for a symbolic terminal.

– A symbolic terminal {α : D} ∈ Σs can be used as atraditional terminal (e.g., {α : D} ∈ Σ), a parameterof a traditional terminal (e.g., t ({α : D}) ∈ Σ , we callt ({α : D}) a parametric terminal, or even a parameterof a traditional variable (e.g., A({α : D}) ∈ V , we callA({α : D}) a parametric variable.

– The latter occurrences of a same symbolic terminal {α :D} in the same production rule may be simply speci-fied in the form of {α} omitting the domain. It raises abig difference between the proposed symbolic grammarand traditional CFG that a same symbolic terminal andits finite domain may be carried over within a produc-tion rule, and multiple occurrences of the same symbolicterminal in a production rule share the common domain.

– A production rule is called a parametric production ruleif the rule starts with a parametric variable, for example,

A({α})→ x

where A({α}) ∈ V and x ∈ (V ∪Σ)∗. The formal para-meter, {α}, represents a symbolic terminal that will beinherited from a parent production rule. It needs to beemphasized that only symbolic terminals are allowed tobe passed as parameters of terminals or variables withlimited context sensitivity.

It is flexible to use a symbolic terminal and its associateddomain to mitigate the complexity of different inputs whichshare common syntactic structures. A symbolic terminal isallowed to pass among production rules to carry over thecontext-sensitivity, which is useful to collect and solve thoserelated constraints during the system model simulation.

123

Page 6: Model-based test generation using extended symbolic grammars

H.-F. Guo, M. Subramaniam

Example 1 Consider the following symbolic grammar G =(V,Σ,Σs, W, P) to represent a set of test cases for the LSCmodel in Fig. 1, where the variable set V = {W, X, Y ({id})},the terminal set Σ = {order({id}), orderConfirm({id}),orderAbort({id})}, the symbolic terminal set Σs = {{id :[1..

300]}}, and P is a set of symbolic production rules definedas follows:

W → λ | X W (1)

X → order({id :[1..300]}) Y ({id}) (2)

Y ({id})→ orderConfirm({id}) (3)

Y ({id})→ orderAbort({id})orderConfirm({id}) (4)

The symbolic grammar represents continuous inputs oforders, where each order is followed by either an order confir-mation directly, or a combination of an abort request and thenan order confirmation. A symbolic terminal {id :[1..300]} isused to replace the production of order IDs from 1 to 300.Two occurrences of {id} in a same production rule [e.g., rule(2)] share the same domain, and are expected to share thesame instantiation as well during the simulation; that is, anydomain decomposition or instantiation of {id} in the eventof order may be carried over to the same {id} in the fol-lowing event. The domain decomposition or instantiation of{id} can be further passed into the rule (3) or (4) through theparametric variable Y ({id}).

3.2 Leftmost derivation for generating test cases

This subsection introduces how we can generate every sym-bolic test case from L(G), a language represented by G,in a backtracking way, given a symbolic grammar G =(V,Σ,Σs, W, P). The automatic test generation is done bysimulating the leftmost derivation from its main variable W .Consider the grammar defined in Example 1. A valid leftmostderivation from the grammar would be:

W ⇒ X W

⇒ order({id :[1..300]}) Y ({id}) W

⇒ order({id :[1..300]}) orderConfirm({id}) W

⇒ order({id :[1..300]}) orderConfirm({id}).The result order({id :[1..300]}) orderConfirm({id}) is a sym-bolic test case.

The live scope of the symbolic terminal {id}, in a left-most derivation, is from the derivation of X by applying itsproduction rule which contains an occurrence of {id},

X → order({id :[1..300]}) Y ({id}),to the complete derivation of Y ({id}), after which no moreoccurrences of {id} is left in the underived substring. Eachsymbolic terminal involved during the leftmost derivationhas a live scope.

In general, a symbolic grammar can reduce the numberof test cases to be enumerated by several orders of magni-tude. For instance, the test case, order({id :[1..300]})orderConfirm({id}), represents 300 concrete tests. Nevertheless,the generated test cases with symbolic terminals have enoughnon-determinism to symbolically explore all running pathsof the LSC simulation, incorporated with constraint solvingand domain partition techniques. Users could argue that asymbolic grammar does not expand the full syntactic struc-tures of inputs due to the involvement of domain representa-tions. However, the uses of symbolic terminals are perfect forsystem testing because it provides users great flexibility toshadow certain trivial details and reduce unnecessary testingcomplexity; at the same time, the introduction of symbolicterminals raises limited complexity of constraint solving dueto the fact that symbolic terminals are only allowed at theleaf level of grammar. In a case that a real system testingdoes not support handling a symbolic terminal as an input, asymbolic terminal can easily be replaced with a random orenumerated value from its domain.

4 A flow chart of model simulation-based testgeneration

Figure 3 shows a flow chart of our model simulation-basedtest generation approach, where the LCT model-based simu-lation tool takes inputs a preliminary symbolic grammar andan LSC model, simulates running the specified LSCs by feed-ing the external events represented in the symbolic grammar,and performs a consistency testing on the model design. Thepreliminary symbolic grammar does not only simply serveas an external event generator for the LCT simulator, but italso represent the coverage of model-based simulation. TheLCT simulator traces each simulation path to collect dynamicsystem constraints for test case generation refinement. If theconsistency testing fails, the LCT simulation tool will gen-erate a failure trace [17] as a counter-example to show howthe inconsistency among the model design can be achievedthrough a sequence of external events. On the other hand,if the consistency testing succeeds, a STD will be producedby the LCT system as an abstraction of the simulation. Theproduced STD can be transformed into a refined symbolicgrammar via an automatic procedure. The resulting symbolicgrammar may be used to enumerate concrete test cases forsoftware testing, where a symbolic terminal in a test case canbe randomly instantiated from its associated domain.

As shown in Fig. 3, our model simulation-based testgeneration approach can further take other test generationdirectives, such as temporal properties, in a form of LSCfor runtime behavior testing and refined test case genera-tion. The satisfaction of temporal properties will be illus-trated in the generated STD, which become valuable infor-

123

Page 7: Model-based test generation using extended symbolic grammars

Model-based test generation using extended symbolic grammars

Fig. 3 Flow chart of modelsimulation-based test generation

mation for model checking or other test case generationapproaches. Given an additional test coverage criteria, theevolved symbolic grammar can be further refined by goingthrough another cycle of our model-based test generationapproach.

It is worthwhile to be emphasized here that the consis-tency testing of the LCT simulator plays important rolesin our model-based test generation approach. (1) The usesof extended symbolic grammars allows the executable LSCmodels to simulate over a set of pre-specified testing eventsequences, and brings capabilities to pass limited context-sensitive info along the continuous events for the LSC sim-ulation. (2) It helps users find possible inconsistencies ordesign bugs in LSC specification models, and at the sametime, provides failure traces as counter test cases to helpusers re-construct the failure scenario. (3) Model-based con-sistency testing also acts as a guard or necessary conditionfor test generation; that is, it only allows a consistent modelto be eligible for test generation. (4) The LSC simulationfor consistency testing enriches the test generation proce-dure with dynamic system behaviors and properties, whichare critical to generating a refined set of test cases. For exam-ple, a domain of a symbolic terminal may be partitioned tosatisfy some required constraints during LSC model simula-tion; and such a domain partition may lead to different resultsof test generation, especially when a test coverage criteria isgiven. Traditional test generation approaches [24,35] based

on static model analysis may lack such dynamic features forgenerating precise test cases. Our LCT model-based simu-lation serves as a complement for enhancing other model-based test generation approaches, such as model checkingand static analysis. (5) Through consistency testing, the LCTsimulator produces a STD to justify the correctness of modelconsistency, and the STD, as an abstraction of model simula-tion, will be transformed in a mechanical way into a refinedsymbolic grammar for test generation.

The LCT simulation and consistency testing tool hasbeen implemented in SWI Prolog [32,46], and is availableonline [15] for free downloading. The example of a web orderprotocol has been carefully tested through our implementa-tion. More examples on practical applications and an inte-grated implementation of our simulation-based test genera-tion will be continuously explored and presented in our futurework.

In the following sections, we will focus on the methodol-ogy on model simulation-based test generation for reactivesystems supporting highly structured inputs, and explain thedetails of each major step in our flow chart.

5 Model-based consistency testing

The LCT simulator, which takes inputs an LSC specificationLs, and a symbolic grammar G, to check whether the run-

123

Page 8: Model-based test generation using extended symbolic grammars

H.-F. Guo, M. Subramaniam

ning Ls will react consistently to any external event sequencew ∈ L(G). We adopt a similar super-step concept, intro-duced in the PLAY-engine [21] for the semantics of a runningLSC model, to configure the running states during the LSCconsistency testing. Given an external event, the LCT sim-ulator continuously executes the steps associated with anyenabled internal events1 until the running LSCs reach a sta-ble state where no further internal or hidden events2 can becarried out.

A super state, defined as 〈Q, DS, B〉, is either an initialstate or a stable state after symbolically executing an externalevent, where Q is a set of system object states, DS is a set ofsymbolic terminals, associated with their respective domains,which are currently used in Q, and B is a boolean violationindicator, either True or False, to indicate whether the stateis violating or not, respectively.

We include a symbolic terminal set DS to maintain thecontext sensitive information during symbolic simulation ofLSCs. It needs to be mentioned that different from the superstate in [21], we hide a parameter, the set of running copiesof LSCs, from each super state. That is because (i) in ourweb order example, no running LSCs are active initially;also, after receiving an external event and then processing allenabled internal events, all the activated running LSCs will becomplete before the model receives a next external event; (ii)for conciseness and clarity of our presentation, we select thisweb order example intentionally to leave behind the detailson how exactly the internal transition works in LSCs, andfocus our emphasis on the uses of symbolic grammars inmodel-based testing.

Let Ss denote a set of all super states and Σ a set of externalevents which may involve symbolic terminals. We introducea notation

∇ : Ss ×Σ → S+sto denote a super-step transition similarly defined in [21].Multiple next super states are possible because the enabledinternal events may contain interleaving ones, which can beexecuted in a non-deterministic order, or more importantly,because domains of involved symbolic terminals, if any, maysplit into sub-domains due to branched constraints specifiedin LSCs.

In this paper, we will not address the detailed implementa-tion of super-state transition, but use the notation ∇ to showhow a symbolic grammar can be applied in a model-basedsimulation for consistency testing and model-based test gen-eration. The notation ∇ serves as a representative of under-lying system models, where LSCs is our example.

1 Internal events are those not starting from objects with waved cloudssuch as Buyer or Seller.2 Hidden events are those related to the structural syntax of LSCs, e.g.,end of prechart, or entering a main chart.

5.1 Operational semantics of the LCT simulator

We introduce a succinct notation for describing the succes-sive configurations of the LCT simulator given the input ofa symbolic grammar G = (V,Σ,Σs, W, P) denoting a setof external event sequences. A four-tuple

(Q, DS, U, B),

is introduced to describe the running state for the LCT sim-ulator, where Q and B are inherited from the super statedefinition, U is the unprocessed part of a string in (V ∪Σ)∗representing all possible test cases to be generated, and DS isa set of symbolic terminals, associated with their respectivedomains, which are currently used in Q and U . The runningstate completely determines all the possible ways in whichthe LCT simulator can proceed, where the symbolic terminalset DS maintains the context sensitive information during thesimulation.

Due to the existence of parametric production rules, aderivation from a symbolic grammar, dynamically maintain-ing the symbolic terminal set DS, is much more challengethan a traditional derivation from a CFG. We introduce arenaming operator ρ{α/β}(ω) = ω[α/β], where α and β aretwo distinct symbolic terminals, ω ∈ (V ∪Σ)∗, and ω[α/β]is defined as a derived string with any occurrences of thesymbolic terminal β renamed to α.

Now we give a definition of a move among running statesas follows:

Definition 2 (Move) A move from one running state toanother, denoted by the operator �, could be one of the fol-lowing cases:

– if a ∈ Σ is a symbolic external event (an external eventpossibly involving symbolic terminals), and U ∈ {Σ ∪V}∗,

(Q1, DS1, aU, B1) � (Q2, DS2, U, B2)

is a valid terminal move if

(Q2, DS2, B2) ∈ ∇((Q1, DS1, B1), a).

– if A ∈ V is a variable, but not a parametric variable,

(Q, DS1, AU, B) � (Q, DS2, E1 · · · EnU, B)

is a valid nonterminal move if A → E1 · · · En is a pro-duction rule in P , and DS2 is formed from DS1 with thefollowing changes:

– for any {α : D} occur in E1 · · · En , add {α : D}into DS2 (note that the renaming operator ρ may be

123

Page 9: Model-based test generation using extended symbolic grammars

Model-based test generation using extended symbolic grammars

applied to rename α to avoid any name conflict withexisting symbolic terminals in DS1).

– if A({α1}, . . . , {αk}) ∈ V is a parametric variable, wherek ≥ 1,

(Q, DS1, A({α1}, . . . , {αk})U, B) �(Q, DS2, ρ{α1/α

′1,...,αk/α

′k }(E1 · · · En)U, B),

is a valid nonterminal move if

A({α′1}, . . . , {α′k}})→ E1 · · · En

is a production rule in P , and DS2 is formed from DS1

with the following changes:

– for any new domain information D′i defined on a

symbolic terminal α′i , in a form of {α′i : D

′i } where

1 ≤ i ≤ k, in the production rule

A({α′1}, . . . , {α′k}})→ E1 · · · En,

update the domain of αi in DS2 to their intersection,D′i ∩ Di , where Di is the original domain of αi in

DS1;– for any other {α : D} occur in E1 · · · En , add {α : D}

into DS2, renaming if necessary.

– (Q, DS1, U, B) � (Q, DS2, U, B) is called an ε-moveif DS2 =

DS1 − {α | α ∈ DS1 but α does not occur in U },

where the operator − is a set difference, and there existsa non-empty set of such α’s.

The validity of a symbolic terminal move is defined as thatfor each symbolic external event, it has a valid correspond-ing super-step move in the PLAY-engine. For each symbolicterminal β ∈ DS, we use DS(β) to denote the domain ofβ in the symbolic terminal set DS. Thus, in a symbolic ter-minal move, ∀β ∈ DS1 we have DS2(β) ⊆ DS1(β) dueto the possible branched testing or instantiation during thesimulation. Different symbolic terminal moves from a samerunning state are possible due to inherent nondeterminismwhich could be caused by interleaving messages in an LSC,or the nondeterminism caused by domain decomposition fora symbolic terminal.

A nonterminal move extends the leftmost variable A withone of its matched production rule. if A is not a paramet-ric variable without symbolic terminals, DS2 may add newsymbolic terminals involved in the production rule if any. Onthe other hand, if the leftmost variable is a parametric vari-able with symbolic terminals, DS2 may additionally updateexisting symbolic terminals with decomposed sub-domains

via domain intersection. Symbolic terminal renaming maybe applied properly to invoke the application of a parametricproduction rule or to avoid name conflict issues. Differentnonterminal moves from a same state are also possible dueto possible multiple production rules from the same variableto represent different event sequence composition.

Each symbolic terminal involved during the simulationhas a live scope. An ε-move is an internal dummy movewhich is introduced to remove a symbolic terminal from thedomain set DS, at the end of its live scope, when the symbolicterminal does not occur in the context of unprocessed stringU any more.

The four-tuple running state, possibly involving symbolicterminals, is introduced to configure the LCT simulator,while the super states and its transition∇ are used to describethe operational semantics of running LSCs in the PLAY-engine [21]. The operational semantics of the LCT simu-lator, in terms of running states and their moves, is definedon the top of the underlying super states. We say an LSCmodel is inconsistent if a violating running state in a form of(_, _, _, True) is reached during the LCT simulation, where_ represents an anonymous value.

5.2 Testing the web order system

Consider the LCT simulation for testing the web order sys-tem in Fig. 1 with an input of the symbolic grammar shown inExample 1. Figure 4 shows a simulation tree illustrating howthe running states move during consistency testing, wherethe bold solid box and the bold dashed one denote a leafnode and a variant node,3 respectively. The object state of theweb order system is simply represented by the values of bothRBC.conf[id] andSTC.conf[id], where {id} ∈ Σs isa symbolic terminal with an initial domain [1..300]. Theabbreviations from S1 to S3 represent three different objectstates; and the abbreviations from D1 to D3 represent threedifferent sets of symbolic terminals during the simulation.

For clarification, we emphasize here that the word sim-ulation refers to the behaviors of the LCT simulator, whilethe one derivation is used in the context of producing a stringfrom a given symbolic grammar. Now we highlight importantfeatures in Fig. 4 as follows:

Consistency testing via a tree traversal The consistencyof an LSC model is defined in terms of a simulation treeconsisting of valid moves. It is consistent if the simulationtree does not contain any violating running state in a form(_, _, _, True). Otherwise, a failure trace can be obtainedalong the path from the root to the violating running state.Even though the symbolic grammar G may contain infinite

3 A node is called a variant of another if both nodes have the samefour-tuple running state except the renaming of symbolic terminals.

123

Page 10: Model-based test generation using extended symbolic grammars

H.-F. Guo, M. Subramaniam

Fig. 4 A simulation tree for testing the web order system

sequences, for each finite sequence w ∈ L(G), there is acorresponding finite path in the simulation tree, along whichif we concatenate all the labels of symbolic terminal movesfrom the root, the result will be the finite event sequence w.Figure 4 shows that the LSC model of the web order systemis consistent, because for each finite sequence w ∈ L(G),there is a corresponding success path, without containing aviolating running state, in the simulation tree.

Memoized depth-first testing strategy Given an LSCmodel, the consistency testing is a traversal of its simulationtree to see whether there exists any failure path. Due to therecursive nature of the symbolic grammar G, there may existinfinite number of test cases, and each test case can be anylong. Thus, it raises a great challenge in our LCT simulatorto test each w ∈ L(G). Neither depth-first nor breadth-firststrategy is good enough for the completion of consistencycheck over a simulation tree. We apply a memoized depth-first search strategy [45] for traversing a tree from left to right,

such that any variant nodes along the path seen later will notbe explored again because its running behaviors would besame as the previous one. A memoized depth-first strategyis an extension of standard depth-first search where visitednodes along the path are recorded so that their latter variantoccurrences can be considered a cycle of moves, thus dra-matically simplify the search process. The variant nodes areshown in the dashed bold boxes in Fig. 4, where renamingon symbolic terminals are allowed.

Leftmost derivation with context sensitivity Given a sym-bolic grammar G, the LCT simulator produces every sym-bolic input sequence in L(G) by simulating the leftmostderivation from its main variable W . The second parame-ter DS of running states is used to maintain the live scopeof each symbolic terminal at runtime. When a nonterminalmove introduces a new symbolic terminal in the productionrule (e.g., X ← order({id :[1..300]})Y ({id})), the symbolicterminal and its domain will be added into the DS; thus,

123

Page 11: Model-based test generation using extended symbolic grammars

Model-based test generation using extended symbolic grammars

the latter occurrences of the same symbolic terminal canfind their associated domain from the dynamically main-tained DS. When a symbolic terminal has no more occur-rences in the unprocessed derivation string U , an ε-movewill be applied to remove the symbolic terminal from theDS. During the leftmost derivation, if the leftmost sym-bol is a parametric variable (e.g., Y ({id})W ), the systemtakes a non-terminal move by applying a matched produc-tion rule in a top-down order, if multiple matched produc-tion rules exist. The symbolic terminal {id} will be passedinto the derivation as a parameter via a matched produc-tion rule (e.g, Y ({I })→ orderConfirm({I })), while its asso-ciated domain is passed along with the symbolic terminalset DS.

Domain decomposition for branched testing In the LCTsimulation, constraints on a symbolic terminal may bedynamically processed to properly instantiate the symbolicterminal or decompose its domains for branched testing.As shown in Fig. 4, the symbolic terminal move from thenode

(S2, D1, orderAbort({id})orderConfirm({id})W, False)

splits into two branches, where the symbolic domain of id isdecomposed into two sub-domains [1..200] and [201..300].The domain decomposition is caused by the related con-straint “Id <= 200” defined in the LSC of Fig. 1h. Whetherthis constraint is satisfied or not leads to two different run-ning states during the LCT simulation. Our LCT simula-tor processes such a conditional constraint “id <= 200”,combined with the original domain id :[1..300], to reducethe domain to id :[1..200] for further simulation. Once thesimulation is done on this branch, the LCT automaticallybacktracks to this conditional point to explore the branchwith its negative constraint “NOT(id <= 200)”. This is avery important feature of using symbolic terminals, avoid-ing unnecessary redundancy for testing each value in itsdomain, yet still maintaining enough flexibility to explorenon-determinism and alternatives.

5.3 A memoized depth-first search algorithm

We introduce a memoized depth-first search algorithm fortraversing a simulation tree, such that any repeated IDs seenlater during the traversal will not be explored again. A mem-oized depth-first strategy is an extension of standard depth-first search where visited IDs are recorded so that their latteroccurrences can be considered a cycle of moves, thus dra-matically simplify the search process.

Figure 5 shows a pseudo code for consistency testing,where the recursive function mdft takes an initial runningstate and an initially empty trace λ as inputs, and tests in amemoized depth-first order whether the LSC specification is

Fig. 5 A memoized depth-first search algorithm

consistent over all the input event sequences. For example,mdft((S1, {}, W, False), λ) is called to test the model con-sistency of the web order example described in Fig. 1.

The function mdft, as defined in Fig. 5, returns a pair(B,Trace) consisting of a violation indicator B and a fail-ure trace Trace if violated. A global table, named GT, isintroduced to record all the seen running states (line 11). Ifa new node has a state which has already been in the globaltable, then no exploration will be taken below this node (lines7–8); otherwise, appropriate moves will take place as fol-lows. First, we check whether the current running state hasany symbolic terminal out of its live scope, if so, we remove itfrom the domain set D (lines 9–10), which is correspondingto an ε-move. Second, the following scenarios are consid-ered:

(i) if the leftmost symbol in the unprocessed event sequenceU is an external event (lines 12–13), we take a sym-

123

Page 12: Model-based test generation using extended symbolic grammars

H.-F. Guo, M. Subramaniam

bolic terminal move, by feeding the external event intothe LSC super-step simulation, followed by a recur-sive call for further traversal to test the whole exter-nal event sequence (lines 14–18). The local variableTrace is used to concatenate all the external events alongthe current branch from root to the current runningnode. Whenever a violation is found, it will immediatelyreturn the failure Trace (or a local variable Tr1) to thecaller (lines 3–4, 17–18, 24–25, 38–39 under differentscenarios).

(ii) If the leftmost symbol in U is a variable, but not a para-metric variable (line 19), we take a non-terminal moveby applying one defined production rule, where live sym-bolic terminals in D as well as their domains are carriedover and new symbolic terminals, if any, are added intothe domain set D (lines 20–25). We use a joint unionoperator � to denote that when a new symbolic terminalβ is added into the domain set D, renaming operationwill be applied automatically if the name beta is conflictwith any existing symbolic terminal in D.

(iii) Otherwise, if the leftmost symbol in U is a parametricvariable, besides all the similar actions to the scenario(ii) (lines 30–31), we have to consider those passingdomains from {α1, . . . , αk} to {α′1, . . . , α

′k}, when apply-

ing a parametric production rule (lines 33–37). Domainsfor a parametric symbolic terminal αi may be updated byintersection if any sub-domain is explicitly given in theapplied parametric production Rule for a correspondingsymbolic terminal parameter α

′i (line 33–35). Addition-

ally, renaming is necessary for sake of correctness whenmdft is called recursively (line 36–37).

With a memoized depth-first search strategy, the LCT simu-lator shows whether an LSC model is consistent with respectto a given language defined over a set of symbolic externalevents. Note that LSC is a scenario-based programming par-adigm, where termination could be a problem if the system is

modeled in an infinite domain or LSCs involve infinite loops.No assumption could be given at this point to guarantee thetermination since it is an undecidable problem to a generalprogramming language.

However, given a system Sys under test, where Q0 is aninitial state, and a grammar G = (V,Σ,Σs, W, P), if thefunction mdft((Q0, {}, W, False), λ) terminates and returns(False,_), we say that Sys is consistent w.r.t. any finiteinput sequence ω ∈ L(G), which is assured by our mem-oized depth-first traversal algorithm, exploring all the pos-sible derivations from the start variable W in a system-atic way. Otherwise, if the function terminates and (True,Trace) is returned, there must exist a super-step procedurecall ∇((Q, D, B), A) (see line 15 in Fig. 5) during the LCTsimulation, which turns the violation indicator B into True.This implies a sequence of leftmost derivation similar to thefollowing form,

W∗⇒ [e]AU1

∗⇒ [eA]U1

where e ∈ Σ∗, and the pair [e] denotes that each includedterminal symbol in e has been processed by the super-stepprocedure ∇. Thus, [eA], a prefix of a string in L(G), is thereturned failure trace Trace.

6 Justification

The LCT simulator returns a truth value of consistency aswell as its justification [37], which provides evidences sothat users can easily follow the evidences to re-establish thesimulation scenarios. If the consistency is false, it returnsa failure trace as a negative justification; while if the resultis true, it actually returns a STD as a positive justification,showing labeled transitions between states while processingeach external event.

Fig. 6 Justification with statetransition diagram

S1,{}

S2,D1

order({id:[1..300]})

S3,D2

orderAbort({id:[1..200]})

S1,D1

orderConfirm({id:[1..300]})

S2,D3

orderAbort({id:[201..300]})

orderConfirm({id:[1..200]})

S3,{}

ε

ε

order({id:[1..300]})

S1,D3

orderConfirm({id:[201..300]})

ε

123

Page 13: Model-based test generation using extended symbolic grammars

Model-based test generation using extended symbolic grammars

6.1 State transition diagram (STD)

The STD diagram, as shown in Fig. 6, is essentially a pro-jection of the simulation tree in Fig. 4 on the focus how thesymbolic external events affect the web order system. Eachnode in the diagram represents a simulation state includingthe status of system objects and an associated set of sym-bolic terminals. The transition from nowhere indicates theinitial state, which is projected from the root of the tree.The double-circled node denotes a successful system termi-nation, which denotes that the projected node in the tree isreachable to a leave node (e.g., (S1, {}, λ, False)) withoutinvoking any external inputs. Each transition labeled withan external event is a projection of a sequence of moves byremoving all the nonterminal moves as well as the interme-diate running states following those moves. An ε-transitioncorresponds to an ε-move, removing the inactive symbolicterminal from the current simulation context, without chang-ing any web system states.

The STD diagram is used as a positive justification forthe consistency testing of the web order system. The self-loop transition, orderConfirm({id :[1..200]}) on the state“S3, D2”, tells that for any order id ∈ [1..200], if the orderhas been aborted, the following orderConfirm({id}) mes-sages are void. The diagram also shows how the symbolicdomain may be passed along the model-based simulationand may be partitioned for branched situational testing. TheorderAbort transitions from the state “S2, D1” are brancheddue to the domain partition, where the left branch correspondsto the satisfaction of the system constraint id ≤ 200.

6.2 Failure trace caused by an anti-scenario

The LCT simulator returns a failure trace as a negative justifi-cation if simulation fails, that is, an inconsistency is detected.An example of inconsistency scenario, caused by Figs. 2and 1a, has been shown in Sect. 2.2.

In this subsection, we introduce an anti-scenario, a specialfeature in LSCs allowing users to specify potential violationin an explicit way. Anti-scenarios enable users to specifyforbidden behaviors and to locate failure traces by simulatingthe LSCs through reachability testing, enriching the LSCswith self-contained scenario testing capability.

Example 2 Consider a scenario in the web order example,where the buyer tries to abort the order after the order hasbeen confirmed. Let’s assume this scenario as a forbiddenone, and model it as an anti-scenario LSC in Fig. 7. Nowconsider the web order system with LSCs in Figs. 1 and 7,with a symbolic grammar input specified in Example 1 withthe following additional production rule:

Y ({id})→ orderConfirm({id})orderAbort({id}) (5)

Fig. 7 An anti-scenario LSC

Figure 7 shows an anti-scenario LSC which is specifiedusing a combination of a universal chart and hot temperatureFALSE condition. The universal chart tells that if the buyercreates an abort request when the order has actually beenconfirmed at that time, the LSC will be compelled to executethe hot-temperatured FALSE condition, thus causing a vio-lation. It is easy to find out that there exists a failure traceleading to the violation. The LCT simulator returns a failuretrace from the root to a violating leaf as follows:

(S1, {}, W, False)⏐⏐⏐⏐⏐�

W → X W

(S1, {}, X W, False)⏐⏐⏐⏐⏐�

X → order(id)Y (id)

(S1, D1, order(id)Y (id)W, False)⏐⏐⏐⏐⏐�

[order(id)]

(S2, D1, Y (id)W, False)⏐⏐⏐⏐⏐�

Y (id)→ orderConfirm(id)orderAbort(id)

(S2, D1, orderConfirm(id)orderAbort(id)W, False)⏐⏐⏐⏐⏐�

[orderConfirm(id)]

(S1, D1, orderAbort(id)W, False)⏐⏐⏐⏐⏐�

[orderAbort(id)]

(S1, D1, W, True)

where the representations of S1, S2, and D1 are shownbelow:

123

Page 14: Model-based test generation using extended symbolic grammars

H.-F. Guo, M. Subramaniam

S1 : RBC.conf[id] = STC.conf[id] = true

S2 : RBC.conf[id] = STC.conf[id] = false

D1 : id :[1..300]If we concatenate all the terminal events (those in squared

brackets) along the path together, its corresponding eventsequence, createOrder ·createConfirm ·createAbort, is a pre-fix of the failure trace causing the anti-scenario. With failuretraces, PLAY-engine users can easily re-construct the sim-ulation scenarios to locate design defects in an interactiveway.

7 Test generation directives in LSCs

LSCs can also be used to specify certain temporal propertiesof a system model for runtime behavior testing. Both a statepredicate, denoting whether a predicate is true in a runningstate, and a path predicate, denoting whether a predicate willbe true along a sequence of external events, can be prop-erly specified as a testing scenario in an LSC. Consider theLSC in Fig. 8. It specifies a testing scenario between a pairof pre-defined virtual external events testStart and testEndthat when the Buyer tries to abort an order, whether thereexists a running path that the order will be still confirmed;that is, whether the abort will be denied. Our LSC simula-tor can dynamically test those specified temporal propertiesand eventually highlight their satisfaction in the generatedSTD diagram. The highlights can be served as directives forselective test generation.

We introduce a testing operator 〈〉 as well as a virtualtesting object, TestControl, in LSCs for users to specifyand test certain temporal properties. The testing operator 〈α〉,

Fig. 8 A testing scenario in LSC

where α can be a sequence of external events, can be embed-ded into a context-free grammar rule to trigger the testingLSC and detect its satisfaction during the LSC simulation.The notation 〈α〉 is automatically transformed to the externalevent sequence, testStart α testEnd. For example, to test thescenario in Fig. 8, the first row of grammar rules in Example 1may be changed into:

W → λ | 〈X〉W

which test the specified path property for every single order.The auxiliary event testStart is used to trigger the virtualtesting object, TestControl, at the beginning of each orderand thus activate the corresponding testing LSC scenario asshown in Fig. 8. The auxiliary event testEnd is used to findout whether all the external events specified in the prechartof a testing scenario are happened in the right order ornot.

During the LSC simulation, when the auxiliary eventtestEnd happens, if the internal events, orderAbort(Id) andconf[Id] = true, did happen as specified, the universal LSCwill enter into the main chart, denoting the testing propertyis true; otherwise, a violation occurs in the pre-chart. Such aviolation in pre-chart will not cause the whole system viola-tion; it simply tells that the running scenario does not matchthe defined testing LSC, therefore, the triggered testing LSCshould be removed from the simulation environment and themain chart will not be executed.

The property testing operator 〈〉 and the virtual object Test-Control, incorporated with our LSC simulator, provides aneasy vehicle to modeling and testing temporal properties inrunning LSCs. The satisfaction of the testing property willbe illustrated in the generated STD as shown in Fig. 9, wherethe colored nodes and edges denote the satisfactory paths.

State predicates can also be tested at individual runningstates by embedding an empty pair of testing operator 〈〉properly. For example, we can easily construct a testing LSC,as shown in Fig. 10, to check whether the safety property“RBC.conf[Id]=STC.conf[Id]” is always true. To trigger thetesting process, we may modify the grammar in Example 1by inserting an empty pair 〈〉 before each external event andthe λ as follows:

W → 〈〉λ | X W

X → 〈〉order({id :[1..300]}) Y ({id})Y ({id})→ 〈〉orderConfirm({id})Y ({id})→ 〈〉orderAbort({id}) 〈〉orderConfirm({id})

Thus, the satisfaction of safety property will be illustrated ina generated STD similar to Fig. 9, except that all the nodeswill be colored to denote the satisfaction of state property,but no edges will be colored.

123

Page 15: Model-based test generation using extended symbolic grammars

Model-based test generation using extended symbolic grammars

Fig. 9 STD with satisfiedproperty highlights

S1,{}

S2,D1

order({id:[1..300]})

S3,D2

orderAbort({id:[1..200]})

S1,D1

orderConfirm({id:[1..300]})

S2,D3

orderAbort({id:[201..300]})

orderConfirm({id:[1..200]})

S3,{}

ε

ε

order({id:[1..300]})

S1,D3

orderConfirm({id:[201..300]})

ε

Fig. 10 Safety: RBC.conf[id] is always same as STC.conf[id]

8 Generating test cases in symbolic grammar

Not only does the LCT system provide an executable envi-ronment for formal simulation and consistency testing, it alsopresents a new systematic way for model-based test gen-eration. The LCT system accepts two inputs, the LSCs asa formal model of the software under test and a prelimi-nary symbolic grammar as test generation directives, andprovides mechanisms to evolve the symbolic grammar forrefined test generation. The preliminary symbolic grammarmay describe the test cases independent from the systemrequirement and design, while an evolved symbolic grammartakes the system model into consideration for refined test casegeneration.

We present a new test generation procedure, which takes aSTD diagram as an input and generates a symbolic grammarvia a two-step automated transformation scheme as follows.

8.1 Simplification of an STD

The first step is to create a simplified STD diagram byremoving both self-loop transitions and ε-transitions. A

self-loop transition (e.g., in Fig. 6, the transition order-Confirm({id:[1..200]}) on the node “S3, D2”) denotes voidexternal events, which do not affect the system states; there-fore, such a transition should be removed from the test gener-ation for the efficiency purpose. An ε-transition is a dummyinternal transition to get rid of inactive symbolic terminalsfrom the context. The removal of ε-transition has no sideeffects on either the transformation procedure or the finaltransformed symbolic grammar, because the absence of thoseinactive symbolic terminals has been reflected in the desti-nation node following the ε-transition. Thus, we have a sim-plified STD as shown in Fig. 11.

8.2 From simplified STD to refined symbolic grammar

The second step transforms a simplified STD to a symbolicgrammar using the following strategies:

– For each node, “S, D”, in the STD, we introduce adistinct variable VS,D into the grammar if D is anempty set {}; otherwise, if D contains the symbolicterminals α1, α2, . . . , αk , then a parametric variableVS,D(α1, α2, . . . , αk) is introduced instead.

– For each transition, from “S1, D1” to “S2, D2” with alabel label, in the STD, we introduce a production rule,V1 → label V2, into the grammar, where V1 and V2

are two respective symbolic terminals transformed from“S1, D1” and “S2, D2”.

– The start variable V0 is transformed from the initial nodein the STD.

– For each successful terminal node (double-circled node),“S, D”, in the STD, we introduce a production rule, V →λ, into the grammar, where V is a variable transformedfrom the node “S, D”.

Therefore, we have a refined symbolic grammar as follows,given the simplified STD shown in Fig. 11.

123

Page 16: Model-based test generation using extended symbolic grammars

H.-F. Guo, M. Subramaniam

Fig. 11 A simplified statetransition diagram

S1,{}

S2,D1

order({id:[1..300]}) orderConfirm({id:[1..300]})

S3,{}

orderAbort({id:[1..200]})

S2,D3

orderAbort({id:[201..300]}) order({id:[1..300]})

orderConfirm({id:[201..300]})

V0 → λ | order({id :[1..300]}) V1({id})V1({id})→ orderConfirm({id}) V0

V1({id})→ orderAbort({id :[1..200]}) V2

V1({id})→ orderAbort({id :[201..300]}) V3({id})V2 → order({id :[1..300]}) V1({id})

V3({id})→ orderConfirm({id}) V0

Thus, a cycle of our model-based test generation via gram-mar evolution has been completely described. The evolvedsymbolic grammar can either be used to generate practicaltest cases for software testing, where each symbolic termi-nal in a test case can be replaced with a random value fromthe associated domain, or be further refined by applying ourmodel-based test generation approach again if additional testcoverage criteria is given.

9 Related work

Model-based test generation has become increasingly impor-tant to increase the confidence in the correct functioning ofthe current complex systems [6]. In model-based test gener-ation, test cases to validate a system are automatically pro-duced using a model of the system under test (SUT). In addi-tion, often, specification of the test cases such as coveragecriteria and other properties are used to search and gener-ate interesting test cases from the large space of possibletest cases so that faults, if any, can be found early. Usu-ally, the input part of each test case produced is fed intothe SUT whose output is then compared with the output partof the test cases to detect faults. Several formal approachesincluding, theorem proving including logic programming,symbolic execution, model checking, and their combinationshave been used to automatically generate test cases satisfyinggiven specification using a system model.

Theorem proving Theorem proving approaches typicallyuse formal specifications as their model to generate testcases. Usually, the specification is partitioned into equiva-

lence classes each representing one abstract test case. Eachclass represents the same behavior with respect to the corre-sponding concrete test cases i.e., they all produce the samefault (or are non-faulty). Several approaches have been usedto partition a specification into such equivalence classes.Helke et al. [22] partition Z specifications into the input lan-guage of the prover Isabelle [34] to perform a disjunctivenormal form partitioning, which outputs one or more pair-wise disjoint disjuncts corresponding to each predicate in aZ schema. Each disjunct represents an equivalence class andhence a test case. Bernot et al. [1] describe an approach forgenerating test cases from systems specifications representedas a logic program using Prolog. The approach converts analgebraic specification of the system into a Prolog programby using a small number of syntactic rules that transformeach equation into a Horn clause (Prolog rule). To partitionthe resulting program to generate test cases, a resolution treeis built using Prolog resolution principle.4 Each branch ofthis tree represents one test case. Instantiating predicates ina bottom-up way from the leaves, the abstract test case of abranch can be obtained using the unifiers along the branch.To obtain concrete test cases, variables in the abstract testcases are systematically instantiated governed by an upperlimit on the term complexity.

Symbolic execution Symbolic execution [26] has beenused to generate test cases both from system models andimplementation (code). The model-based test generationapproaches start from an abstract executable system spec-ification and perform search over the execution state spaceof the model by using a constraint logic programming frame-work or simply Prolog. The search is performed symbolicallyby using abstract system states each of which corresponds toseveral concrete states. Legeard and Peureaux [31] describean approach to derive test cases from B specifications usingtheir own constraint logic programming language CLPS-B.Their approach translated B operations to CLPS-B proto-

4 Iterative deepening instead of Prolog’s standard depth-first searchis used for completeness. Rewriting and heuristics based selection ofliterals are used to aid termination.

123

Page 17: Model-based test generation using extended symbolic grammars

Model-based test generation using extended symbolic grammars

types. With CLPS-B, execution state spaces of these proto-types can be automatically generated whose symbolic tracescan be then searched to identify interesting test cases. Thetraces are then instantiated as described above to generateconcrete test cases. Pretschner et al. [44] use a CLP tool tosymbolically execute the abstract model of the system basedon UML, whose traces are then searched to identify interest-ing test cases. They use functional, structural, and stochasticcoverage criteria over the executable state space to identifyinteresting test cases.

Model checking Test generation based on model checking isa push-button approach where the test generation details arehandled transparently within a model-checking tool. Usu-ally, test generation using a model-checking tool involvesfinding counter-examples or witnesses to system require-ments expressed as temporal logic formulas. Current model-checking tools take a temporal logic formula and a systemmodel as inputs and output a correct or an erroneous exe-cution trace, which can then be used to test the SUT forcorrect traces or analyze faults. Different types of temporallogic formulas representing test coverage criteria, purposes,and mutation analysis properties, have been used in modelchecker based test generation. Test coverage formulas rep-resent the structural aspects of the input model includingcontrol-flow and data-flow properties [23]. Test purpose for-mulas such as those in [11] represent behavioral aspects of amodel including properties that must hold over a reachableset of states. In [35] model checkers are used to generate teststo analyze the unexpected behaviors observed in alternativemodels, called mutants, obtained by applying mutation oper-ations to a given model.

Test generation using extended symbolic grammars Theapproach proposed in this paper automatically generates testcases for reactive systems supporting highly structured inputsand/or interactions. We have introduced a new grammar-based formal notation called extended symbolic grammarfor describing the underlying structure such that each left-most derivation of the grammar corresponds one symbolictest case, which can be suitably instantiated to obtained con-crete test cases to validate the system. A procedure that seam-lessly integrates each grammar derivation step with a sym-bolic model-based simulator has been used to perform on-the-fly consistency of the model to generate meaningful andcomplete test cases. Test generation using symbolic gram-mars has been shown effective recently [13,33]. We extendthe concept of a symbolic grammar by allowing limited con-text sensitivity between symbolic terminals to be carried overvia parameters among production rules.

Our approach is somewhat similar in spirit to the model-checking and symbolic execution based test generationapproaches where tests are generated by using a systemmodel and system specifications. However, it has several sig-

nificant differences from these approaches. First, because ofusing simulation, it tests the consistency of the model itself tomake it sure that the model is consistent and eligible for testgeneration or other verification approaches such as model-checking and symbolic execution. Second, by using a gram-mar that explicitly uses the inputs and interactions servingas inputs for the the generated tests also helps us control andrefine test generation unlike model checking or symbolic exe-cution where the mapping between tests and the specificationhave to be inferred. Finally, our approach complements exist-ing model-based test generation approaches, such as modelchecking and static analysis, by producing a STD automati-cally via a model-based simulation.

10 Conclusion

We presented extended symbolic grammars to systemati-cally enumerate continuous test cases for model-based test-ing, where symbolic terminals and domains are introducedin the grammar to hide the complexity of large redundantinputs which may have common syntactic pattern and simi-lar expected behaviors. A symbolic terminal, in an extendedsymbolic grammar, is not simply a set of terminal symbols,but also allows the context sensitivity to be carried over asparameters through the definition of production rules. A testgeneration using symbolic grammars avoids both unneces-sary tests and expensive constraint-solving techniques forautomatic test generation.

We introduced a novel, model-based test generationapproach for reactive systems supporting highly structuredinputs and/or interactions. Our approach takes LSCs andsymbolic grammars as inputs and performs an automatedscenario simulation for consistency testing. We have devel-oped a procedure that seamlessly integrates each grammarderivation step with a symbolic model-based simulator, andused the procedure to perform on-the-fly consistency testingto generate meaningful test cases of the model. The model-based simulation produces a STD automatically justifyingthe model runtime behaviors. The STD can either serve asa directive for selective test generation, or a basis for modelchecking.

We are currently working on applying our LCT tool ontesting more practical reactive systems. One ongoing projectis to model and simulate a game-based construction learningsystem [14]. We expect more experimental results in nearfuture conferences.

References

1. Bernot, G., Gaudel, M., Marre, B.: Software testing based on formalspecifications: a theory and a tool. Softw. Eng. J. 6(6), 387–405(1991)

123

Page 18: Model-based test generation using extended symbolic grammars

H.-F. Guo, M. Subramaniam

2. Bertolino, A., Marchetti, E., Muccini, H.: Introducing a reasonablycomplete and coherent approach for model-based testing. Electron.Notes Theor. Comput. Sci. 116, 85–97 (2005)

3. Bohn, J., Damm, W., Wittke, H., Klose, J., Moik, A.: Model-ing and validating train system applications using statemate andlive sequence charts. In: Proceedings of Conference on IntegratedDesign and Process Technology (IDPT2002), Society for Designand Process Science (2002)

4. Bontemps, Y., Heymans, P.: Turning high-level live sequence chartsinto automata. In: Proceedings of Scenarios and State Machines:Models Algorithms and tools, 24th International Conference onSoftware Engineering, May 2002, ACM (2003)

5. Bontemps, Y., Heymans, P., Kugler, H.: Applying LSCs to the spec-ification of an air traffic control system. In: Proceedings of Work-shop on Scenarios and State Machines: Models, Algorithms andTools (2003)

6. Broy, M., Jonsson, B., Katoen, J., Leucker, M., Pretschner, A.:Model-based Testing of Reactive Systems. LNCS(3472), Springer(2005)

7. Bunker, A., Gopalakrishnan, G., Slind, K.: Live sequence chartsapplied to hardware requirements specification and verification: avci bus interface model. Softw. Tools Technol. Transf. 7(4), 250–341 (2005)

8. Combes, P., Harel, D., Kugler, H.: Modeling and verification of atelecommunication application using live sequence charts and theplay-engine tool. In: Peled, D.A., Tsay, Y.-K. (eds.) AutomatedTechnology for Verification and Analysis. Lecture Notes in Com-puter Science, vol. 3707, pp. 414–428. Springer, Berlin, Heidelberg(2005)

9. Coppit, D., Lian, J.: Yagg: an easy-to-use generator for struc-tured test inputs. In: Proceedings of the 20th IEEE/ACM Interna-tional Conference on Automated Software Engineering. ASE ‘05,pp. 356–359. Long Beach, CA, USA (2005)

10. Damm, W., Harel, D.: LSCs: breathing life into message sequencecharts. In: Proceedings of 3rd IFIP International Conference onFormal Methods for Open Object-based Distributed Systems,pp. 293–312 (1999)

11. Engels, A., Fiejs, L., Mauw, S.: Test generation for intelligent net-works using model checking. In: Proceedings of Workshop onTools and Algorithms for Construction and Analysis of Systems(TACAS), pp. 384–398 (1997)

12. Godefroid, P., Kiezun, A., Levin, M.Y.: Grammar-based white-box fuzzing. In: Proceedings of the ACM SIGPLAN Conferenceon Programming Languages Design and Implementation (PLDI),pp. 206–215 (2008)

13. Godefroid, P., Levin, M.Y., Molnar, D.A.: Sage: whitebox fuzzingfor security testing. Commun. ACM 55(3), 40–44 (2012)

14. Goedert, J., Cho, Y., Subramaniam, M., Guo, H.F., Xiao, L.: Aframework for virtual interactive construction education (vice).Autom. Constr. 20, 76–87 (2011)

15. Guo, H.F.: The LCT tool. http://faculty.ist.unomaha.edu/hguo/lct.htm (2014). Accessed 1 Apr 2014

16. Guo, H.F., Subramaniam, M.: Model-based test generation usingevolutional symbolic grammar. In: International Symposium onTheoretical Aspects of Software Engineering (TASE), pp. 111–118 (2012)

17. Guo, H.F., Zheng, W., Subramaniam, M.: L2C2: logic-based LSCconsistency checking. In: 11th International ACM SIGPLAN Sym-posium on Principles and Practice of Declarative Programming(PPDP) (2009)

18. Harel, D.: From play-in scenarios to code: an achievable dream. In:Proceedings of Fundamental Approaches to Software Engineering(FASE), pp. 22–34 (2000)

19. Harel, D., Kugler, H.: Synthesizing state-based object systems fromlsc specifications. Int. J. Found. Comput. Sci. 13(1), 5–51 (2002)

20. Harel, D., Maoz, S., Segall, I.: Some results on the expressivepower and complexity of LSCs. In: Avron, A., Dershowitz, N.,Rabinovich, A. (eds.) Pillars of Computer Science. Lecture Notesin Computer Science, vol. 4800, pp. 351–366. Springer, Berlin,Heidelberg (2008)

21. Harel, D., Marelly, R.: Come, Let’s Play: Scenario-Based Program-ming Using LSCs and the Play-Engine. Springer, Berlin, Heidel-berg (2003)

22. Helke, S., Neustupny, T., Santen, T.: Automating test case genera-tion from z specifications using isabelle. In: Proceedings of Inter-national Conference of Z Users (ZUM), pp. 52–71 (1997)

23. Hong, H., Cha, S., Lee, I., Sokolsky, O., Ural, H.: Data flow testingas model checking. In: Proceedings of the International Conferenceon Software Engineering (ICSE), pp. 232–242 (2003)

24. Hong, H.S., Lee, I., Sokolsky, O., Ural, H.: A temporal logicbased theory of test coverage and generation. In: Proceedings ofthe 8th International Conference on Tools and Algorithms for theConstruction and Analysis of Systems, TACAS ’02, pp. 327–341(2002)

25. ITU-T: Message sequence chart (MSC). Z.120 ITU-T recom-mendation. Available at https://www.itu.int/rec/T-REC-Z.120/en(1996). Accessed 1 Jan 2012

26. King, J.C.: Symbolic execution and program testing. Commun.ACM 19, 385–394 (1976)

27. Klose, J., Toben, T., Westphal, B., Wittke, H.: Check it out: onthe efficient formal verification of live sequence charts. In: 18thInternational Conference on Computer Aided Verification (CAV),pp. 219–233 (2006)

28. Krenn, W., Schlick, R., Aichernig, B.K.: Mapping UML to labeledtransition systems for test-case generation. In: 8th InternationalSymposium on Formal Methods for Components and Objects,pp. 186–207. Springer, Berlin, Heidelberg (2009)

29. Kumar, R., Mercer, E.: Improving live sequence chart to automatatranslation for verification. Electron. Commun. EASST 10, 1–14(2008)

30. Lammel, R., Schulte, W.: Controllable combinatorial coverage ingrammar-based testing. In: Testing of Communicating Systems.Lecture Notes in Computer Science, vol. 3964, pp. 19–38. Springer,Berlin, Heidelberg (2006)

31. Legeard, B., Peureux, F., Utting, M.: Controlling test case explosionin test generation from b formal models. Softw. Test. Verif. Reliab.(STVR) 14(2), 81–103 (2004)

32. Liu, S., Li, L., Guo, H.F.: Generating test cases via model-basedsimulation. In: 13th IEEE International Conference on InformationReuse and Integration, pp. 17–24 (2012)

33. Majumdar, R., Xu, R.G.: Directed test generation using symbolicgrammars. In: The ACM SIGSOFT Symposium on the Founda-tions of Software Engineering: Companion Papers, pp. 553–556(2007)

34. Nipkow, T., Paulson, L.C., Wenzel, M.: Isabelle/HOL: a proof assis-tant for higher-order logic. Lecture Notes in Computer Science,vol. 2283. Springer, Berlin, Heidelberg (2002)

35. Offutt, J., Liu, S., Abdurazik, A., Ammann, P.: Generating testdata from state based specifications. J. Softw. Test. Verif. Reliab.13, 25–53 (2003)

36. OMG: Unified modeling languages superstructure specification,v2.0. The Object Management Group. http://www.uml.org/ (2005).Accessed 1 Jan 2012

37. Pemmasani, G., Guo, H.F., Dong, Y., Ramakrishnan, C., Ramakr-ishnan, I.: Online justification for tabled logic programs. In: The 7thInternational Symposium on Functional and Logic Programming,pp. 24–38 (2004)

38. Rothermel, G., Harrold, M.J.: A safe, efficient regression test selec-tion technique. ACM Trans. Softw. Eng. Methodol. 6, 173–210(1997)

123

Page 19: Model-based test generation using extended symbolic grammars

Model-based test generation using extended symbolic grammars

39. Rothermel, G., Harrold, M.J., von Ronne, J., Hong, C.: Empiricalstudies of test suite reduction. J. Softw. Test. Verif. Reliab. 4, 219–249 (2002)

40. Sirer, E.G., Bershad, B.N.: Using production grammars in soft-ware testing. In: 2nd Conference on Domain Specific Languages,pp. 1–13 (1999)

41. Sun, J., Dong, J.S.: Model checking live sequence charts. In: The10th IEEE International Conference on Engineering of ComplexComputer Systems, pp. 529–538 (2005)

42. Tan, L., Sokolsky, O., Lee, I.: Specification-based testing with lin-ear temporal logic. In: Proceedings of IEEE International Confer-ence on Information Reuse and Integration (IRI’04), pp. 493–498(2004)

43. Toben, T., Westphal, B.: On the expressive power of LSCs. In:The 32nd Conference on Current Trends in Theory and Practice ofComputer Science, pp. 33–43 (2006)

44. Utting, M., Legeard, B.: Practical Model-Based Testing: A ToolsApproach. Morgan-Kaufmann, San Francisco, CA, USA (2007)

45. Warren, D.S.: Memoing for logic programs. Commun. ACM 35,93–111 (1992)

46. Zheng, W.: Consistency checking on LSC specifications. MasterThesis, University of Nebraska at Omaha, Omaha (2009)

123