Automated Software Testing with Model Checkers

Dissertation

Automated Software Testingwith Model Checkers

Gordon Fraser

October 2007

IST - Institute for SoftwaretechnologyGraz University of Technology

Supervised and evaluated by:

Univ.Prof. Dipl.-Ing. Dr.Techn. Franz WotawaPaul Ammann, Ph.D., Associate Professor

Abstract

Abstract (English)

Testing is the most commonly applied technique to ensure a sufficiently high quality of software.Automation is desirable because testing is very complex, time consuming, and error prone whendone manually. The use of model checkers, tools intended for formal verification, to automaticallyderive test cases is a promising technique. Model checkers produce counterexamples, which arelinear sequences illustrating property violations. These counterexamples are suitable as test cases.While full automation is possible and different concrete techniques have been proposed, there arestill many issues that need resolving to encourage industrial adoption.

The applicablity of model checker based testing is limited by the performance of the model checkerin use. While the well known state space explosion is a major contributor to this issue, there arefurther problems. Often, too many test cases are created, or identical test cases are created severaltimes unnecessarily. This thesis describes techniques to improve the performance of the test casegeneration.

While test case generation is simple and automatic, the results are not always ideal. It is commonthat significant redundancy lowers the overall fault sensitivity. Some techniques tend to create largetest suites of very short test cases, which is not always useful. Different techniques to optimize thequality of test suites created with model checkers are described in this thesis.

Finally, there are several application areas where model checker based testing is useful but has notbeen considered so far. This thesis describes property relevant testing, regression testing, and testingof nondeterministic systems.

Abstract (German)

In der Softwareentwicklung ist Testen eine der am häufigsten verwendeten Techniken zur Qual-itätssicherung. Testen ist ein komplexer, zeitraubender Vorgang, der viel Erfahrung des durch-führenden Testers voraussetzt. Daher ist es schwierig, die Korrektheit des Testens zu garantieren.Automatisierung des Testvorganges soll diese Probleme verringern. Eine Technik zur Automa-tisierung, die in den letzten Jahren an Popularität innerhalb des akademischen Umfeldes gewonnenhat, ist die Testfallerzeugung mit sogenannten Model-Checkern. Ein Model-Checker ist ein Pro-gramm, mit dem man Modelle analysieren kann, und zur Diagnose Beispiele von Ausführungsse-quenzen bekommt. Diese Sequenzen können als Testfälle verwendet werden. Obwohl dieser Ansatz

iii

iv Chapter 0. Abstract

vielversprechend ist, gibt es noch viele Probleme zu lösen, bevor die Technik allgemein und ineinem industriellen Umfeld einsetzbar ist.

Der Anwendbarkeit von Model-Checkern zur Testfallerzeugung ist durch die Verarbeitungsgeschwindigkeitder Model-Checker Grenzen gesetzt. Das Hauptproblem hierbei ist die schnell wachsende Größedes Zustandsraumes von Modellen, aber es gibt auch weitere Gründe für inakzeptable Geschwindigkeit.Beispielsweise werden oft zu viele oder zu viele ähnliche Testfälle erzeugt, was einen signifikantenTeil der Erzeugungszeit verbraucht. In dieser Arbeit werden Methoden beschrieben, mit denen dieTestfallerzeugung optimiert werden kann.

Mit Model-Checkern lassen sich recht einfach Testfälle erzeugen, die Qualität dieser Testfälle ist je-doch oft verbesserungswürdig. So gibt es beispielsweise oft signifikante Redundanzen. Weiters hatdie Reihenfolge der Testfälle Auswirkungen auf die Geschwindigkeit, mit der Testfälle erkannt wer-den. Daher werden in dieser Arbeit Ansätze präsentiert, um mit Model-Checkern erzeugte Testfällezu optimieren.

Es gibt einige Spezialfälle, für die Model-Checker basiertes Testen bislang noch nicht in Betrachtgezogen wurde. In dieser Arbeit wird ein formaler Zusammenhang zwischen Software-Anforderungenund Testfällen definiert, die Auswirkungen von Änderungen am Modell auf das Testen analysiert,und Testen von nichtdeterministischen Systemen betrachtet.

Acknowledgement

First of all, I wish to thank my supervisor Franz Wotawa for his help and patience during the courseof this work. I would also like to thank Paul Ammann for his support and encouragements during thelast couple of months of this work. Further thanks go to Bernhard K. Aichernig, Martin Weiglhofer,and Arndt Mühlenfeld for fruitful discussions. Of course, thanks to my parents for their supportduring all the years. Finally, I am indebted to my wife Anneliese and my daughter Victoria forputting up with me during difficult times and encouraging me all the way.

This work has been partially supported by the Austrian Federal Ministry of Economics and Labour,the competence network Softnet Austria (http://www.soft-net.at), and the Austrian Federal Ministryof Transport, Innovation and Technology (BMVIT) and FFG under grant FIT-IT-809446. Further-more, part of the research herein was conducted within the competence network Softnet Austria(www.soft-net.at) and funded by the Austrian Federal Ministry of Economics (bm:wa), the provinceof Styria, the Steirische Wirtschaftsförderungsgesellschaft mbH. (SFG), and the city of Vienna interms of the center for innovation and technology (ZIT).

v

Contents

Abstract iii

Acknowledgement v

List of Figures xiii

List of Tables xv

1 Introduction 11.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.2 Problem Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21.3 Thesis Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21.4 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21.5 Organization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

2 Software Testing Background 52.1 What is Software Testing? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52.2 Why Automate Software Testing? . . . . . . . . . . . . . . . . . . . . . . . . . . 52.3 Software Testing Basics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62.4 Evaluating the Effectiveness of Testing . . . . . . . . . . . . . . . . . . . . . . . . 72.5 Formal Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

3 Automated Testing with Model Checkers 113.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113.2 Model Checking Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123.3 A Simple Demonstration Model . . . . . . . . . . . . . . . . . . . . . . . . . . . 193.4 Model Checkers in Software Testing . . . . . . . . . . . . . . . . . . . . . . . . . 203.5 Coverage Based Test Case Generation . . . . . . . . . . . . . . . . . . . . . . . . 25

3.5.1 Coverage of SCR Specifications . . . . . . . . . . . . . . . . . . . . . . . 273.5.2 Coverage of Transition Systems . . . . . . . . . . . . . . . . . . . . . . . 273.5.3 Control and Data Flow Coverage Criteria . . . . . . . . . . . . . . . . . . 333.5.4 Coverage of Abstract State Machines . . . . . . . . . . . . . . . . . . . . 37

3.6 Requirements Based Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

vii

viii Contents

3.6.1 Vacuity Based Coverage . . . . . . . . . . . . . . . . . . . . . . . . . . . 403.6.2 Unique First Cause Coverage . . . . . . . . . . . . . . . . . . . . . . . . 413.6.3 Dangerous Traces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 433.6.4 Property Relevance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

3.7 Mutation Based Test Case Generation . . . . . . . . . . . . . . . . . . . . . . . . 463.7.1 Mutation Operators for Specifications . . . . . . . . . . . . . . . . . . . . 463.7.2 Specification Mutation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 483.7.3 Reflection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 493.7.4 State Machine Duplication . . . . . . . . . . . . . . . . . . . . . . . . . . 51

3.8 Test Suite Analysis with Model Checkers . . . . . . . . . . . . . . . . . . . . . . 533.8.1 Symbolic Test Case Execution . . . . . . . . . . . . . . . . . . . . . . . . 533.8.2 Coverage Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 553.8.3 Mutation Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57

3.9 Issues in Testing with Model Checkers . . . . . . . . . . . . . . . . . . . . . . . . 583.9.1 Abstraction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 583.9.2 Improving the Test Suite Generation Process . . . . . . . . . . . . . . . . 593.9.3 Improving the Results of the Test Suite Generation Process . . . . . . . . . 613.9.4 Quality Concerns for Coverage Based Testing . . . . . . . . . . . . . . . . 633.9.5 Regression Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 633.9.6 Fault Visibility . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 643.9.7 Nondeterminism . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65

3.10 Further Uses of Model Checkers in Software Testing . . . . . . . . . . . . . . . . 663.10.1 Testing with Software Model Checkers . . . . . . . . . . . . . . . . . . . 663.10.2 Coverage of Timed Automata . . . . . . . . . . . . . . . . . . . . . . . . 673.10.3 Combinatorial Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . 673.10.4 Testing Composite Webservices with Model Checkers . . . . . . . . . . . 683.10.5 Adaptive Model Checking . . . . . . . . . . . . . . . . . . . . . . . . . . 693.10.6 On-the-fly Testing with Model Checkers . . . . . . . . . . . . . . . . . . . 69

3.11 Available Tools for Test Case Generation . . . . . . . . . . . . . . . . . . . . . . . 693.12 Formal Testing with Model Checkers . . . . . . . . . . . . . . . . . . . . . . . . . 713.13 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78

4 Using CTL for Model Checker Based Test Case Generation 814.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 814.2 Evidence Graphs - Counterexamples and Witnesses for CTL . . . . . . . . . . . . 82

4.2.1 Non-linearity of Evidence Graphs . . . . . . . . . . . . . . . . . . . . . . 874.2.2 Finiteness of Paths in Evidence Graphs . . . . . . . . . . . . . . . . . . . 88

4.3 Relating Test Coverage To Covering Evidence Graphs . . . . . . . . . . . . . . . . 894.3.1 MC/DC Coverage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 904.3.2 Using Evidence Graphs to Test Dangerous Traces . . . . . . . . . . . . . . 91

4.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93

5 Test Suite Analysis with Test Suite Models 955.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 955.2 Test Suite Analysis with Model Checkers . . . . . . . . . . . . . . . . . . . . . . 955.3 Merging Test Cases to a Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96

Contents ix

5.4 Test Suite Model Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 975.5 Test Suite Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 985.6 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 985.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99

6 Property Relevance 1016.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1016.2 Property Relevance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102

6.2.1 Relevance of Negative Test Cases . . . . . . . . . . . . . . . . . . . . . . 1026.2.2 Relevance of Positive Test Cases . . . . . . . . . . . . . . . . . . . . . . . 103

6.3 Property Relevant Coverage Analysis . . . . . . . . . . . . . . . . . . . . . . . . 1046.3.1 Measuring Property Relevance . . . . . . . . . . . . . . . . . . . . . . . . 1056.3.2 Relevance Based Coverage Criteria . . . . . . . . . . . . . . . . . . . . . 106

6.4 Property Relevant Test Case Generation . . . . . . . . . . . . . . . . . . . . . . . 1096.4.1 Property Relevant Test Case Generation . . . . . . . . . . . . . . . . . . . 109

6.5 Empirical Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1116.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114

7 Specification Analysis 1157.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1157.2 Specification Coverage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1157.3 Specification Vacuity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1177.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118

8 Nondeterministic Testing 1198.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1198.2 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1208.3 Deriving Test Cases from Nondeterministic Models . . . . . . . . . . . . . . . . . 1228.4 NuSMV and Nondeterministic Models . . . . . . . . . . . . . . . . . . . . . . . . 1248.5 Creating Test Cases with Nondeterministic NuSMV Models . . . . . . . . . . . . 124

8.5.1 Identifying Nondeterministic Choice . . . . . . . . . . . . . . . . . . . . . 1258.5.2 Extending Nondeterministic Test Cases . . . . . . . . . . . . . . . . . . . 1268.5.3 Improving the Test-Case Extension Process . . . . . . . . . . . . . . . . . 126

8.6 Coverage of Nondeterministic Systems . . . . . . . . . . . . . . . . . . . . . . . . 1278.7 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1328.8 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135

9 Test Suite Minimization 1379.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1379.2 Traditional Test Suite Reduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 1389.3 Test Suite Redundancy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138

9.3.1 Identifying Redundancy . . . . . . . . . . . . . . . . . . . . . . . . . . . 1389.3.2 Removing Redundancy . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140

9.4 Empirical Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1439.4.1 Experiment Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1439.4.2 Lossy Minimization with Model Checkers . . . . . . . . . . . . . . . . . . 144

x Contents

9.4.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1449.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147

10 Test Case Prioritization 14910.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14910.2 Test Case Prioritization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149

10.2.1 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15010.2.2 Test Case Prioritization Techniques . . . . . . . . . . . . . . . . . . . . . 150

10.3 Determining Prioritization with Model Checkers . . . . . . . . . . . . . . . . . . . 15210.3.1 Coverage Prioritization . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15210.3.2 FEP Prioritization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15210.3.3 Property Prioritization . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15310.3.4 Optimal Prioritization . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154

10.4 Prioritizing Test Cases at Creation Time . . . . . . . . . . . . . . . . . . . . . . . 15410.5 Empirical Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154

10.5.1 Average Percentage of Faults Detected . . . . . . . . . . . . . . . . . . . 15510.5.2 Experiment Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15510.5.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155

10.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157

11 Improving the Performance with LTL Rewriting 16111.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16111.2 Advanced Test Case Generation with LTL Rewriting . . . . . . . . . . . . . . . . 162

11.2.1 LTL Rewriting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16311.2.2 Test Case Generation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165

11.3 Empirical Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16811.3.1 Experiment Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16811.3.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 170

11.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 175

12 Mutant Minimization 17712.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17712.2 Characteristic Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17812.3 Using Characteristic Properties to Optimize Test Case Generation . . . . . . . . . 18512.4 Empirical Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18712.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 189

13 Regression Testing 19113.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19113.2 Handling Model Changes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 192

13.2.1 Identifying Obsolete Test Cases . . . . . . . . . . . . . . . . . . . . . . . 19213.2.2 Creating New Test Cases . . . . . . . . . . . . . . . . . . . . . . . . . . . 194

13.3 Empirical Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19613.3.1 Experiment Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19613.3.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 197

13.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 198

Contents xi

14 Conclusions 20314.1 Improving Model Checkers for Test Case Generation . . . . . . . . . . . . . . . . 203

14.1.1 Alternative Witnesses and Counterexamples . . . . . . . . . . . . . . . . . 20414.1.2 Minimization at Creation Time . . . . . . . . . . . . . . . . . . . . . . . . 20414.1.3 Nondeterministic Counterexamples . . . . . . . . . . . . . . . . . . . . . 20514.1.4 Tree-like Counterexamples . . . . . . . . . . . . . . . . . . . . . . . . . . 20514.1.5 Abstraction for Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20614.1.6 Model Combination . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20614.1.7 Explicit Setting of the Initial State . . . . . . . . . . . . . . . . . . . . . . 20714.1.8 Extensible Counterexamples . . . . . . . . . . . . . . . . . . . . . . . . . 20714.1.9 Constraints on Counterexamples . . . . . . . . . . . . . . . . . . . . . . . 20814.1.10 Organization of Counterexamples . . . . . . . . . . . . . . . . . . . . . . 208

14.2 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 209

Bibliography 211

List of Figures

3.1 Example model as FSM and the according Kripke structure. . . . . . . . . . . . . 123.2 Execution tree of example automaton. . . . . . . . . . . . . . . . . . . . . . . . . 133.3 SMV syntax. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183.4 FSM of example model Car Controller (CC). . . . . . . . . . . . . . . . . . . . . 193.5 CC represented as SMV model. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 203.6 Simple car controller as RSML−e specification. . . . . . . . . . . . . . . . . . . . 213.7 Reactive system model. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 223.8 Execution tree and counterexample . . . . . . . . . . . . . . . . . . . . . . . . . . 233.9 Coverage based test case generation. . . . . . . . . . . . . . . . . . . . . . . . . . 263.10 Example implementation of CC example. . . . . . . . . . . . . . . . . . . . . . . 343.11 Data flow graph for example implementation. . . . . . . . . . . . . . . . . . . . . 353.12 ASM specification for CC example. . . . . . . . . . . . . . . . . . . . . . . . . . 383.13 Dangerous traces for safety properties. . . . . . . . . . . . . . . . . . . . . . . . . 453.14 Test cases from mutants violating the specification. . . . . . . . . . . . . . . . . . 483.15 Mutation based test case generation with reflection. . . . . . . . . . . . . . . . . . 503.16 State machine duplication based test case generation. . . . . . . . . . . . . . . . . 523.17 Counterexample for mutant. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 533.18 Test case as verifiable SMV model. . . . . . . . . . . . . . . . . . . . . . . . . . . 543.19 Test case model combined with original model to simulate test case execution. . . . 553.20 Simple car controller as SAL specification with trap variables for simple transition

coverage. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 703.21 Test goal list for SAL-ATG. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 713.22 Simple example model as FSM and the according SMV model. . . . . . . . . . . . 763.23 Example 1 as Kripke structure and IOLTS. . . . . . . . . . . . . . . . . . . . . . . 773.24 Example model 2 as FSM and the corresponding SMV model. . . . . . . . . . . . 783.25 Example 2 as Kripke structure and IOLTS. . . . . . . . . . . . . . . . . . . . . . . 79

4.1 Evidence graphs for CTL connectives. . . . . . . . . . . . . . . . . . . . . . . . . 864.2 Evidence graph for (EX φ) ∧ (EX ¬φ). . . . . . . . . . . . . . . . . . . . . . . . . 884.3 A non-linear evidence graph for AF φ. . . . . . . . . . . . . . . . . . . . . . . . . 884.4 Evidence graphs for MC/DC. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91

xiii

xiv List of Figures

4.5 An AX dangerous evidence graph for property P. . . . . . . . . . . . . . . . . . . 924.6 An EF dangerous evidence graph for property P. . . . . . . . . . . . . . . . . . . 92

5.1 Test case model. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 965.2 Test suite model. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97

6.1 Combination of test case model and mutant model. . . . . . . . . . . . . . . . . . 1066.2 Example NuSMV model. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1076.3 Combined model consisting of original model and mutant. . . . . . . . . . . . . . 110

8.1 Traditional test case generation with model checkers. . . . . . . . . . . . . . . . . 1228.2 Creating test cases from a nondeterministic model iteratively. . . . . . . . . . . . . 1238.3 SMV ASSIGN section with nondeterminism. . . . . . . . . . . . . . . . . . . . . 1248.4 Special variable for nondeterminism. . . . . . . . . . . . . . . . . . . . . . . . . . 1258.5 Deterministic example transition relation. . . . . . . . . . . . . . . . . . . . . . . 1328.6 Nondeterministic example transition relation. . . . . . . . . . . . . . . . . . . . . 1338.7 Counterexample containing nondeterministic choice. . . . . . . . . . . . . . . . . 1348.8 Counterexample to extend nondeterminstic test case. . . . . . . . . . . . . . . . . 134

9.1 Simple test suite with redundancy represented as execution tree. . . . . . . . . . . 1399.2 Test suite transformation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1419.3 Test suite transformation with glue sequences. . . . . . . . . . . . . . . . . . . . . 1429.4 Comparison of reduction methods. . . . . . . . . . . . . . . . . . . . . . . . . . . 1469.5 Effects of the test case order. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1469.6 Minimization time vs. test suite size. . . . . . . . . . . . . . . . . . . . . . . . . . 147

10.1 APFD of Cruise Control model. . . . . . . . . . . . . . . . . . . . . . . . . . . . 15610.2 APFD of Cruise Control implementation. . . . . . . . . . . . . . . . . . . . . . . 15710.3 APFD of Safety Injection System. . . . . . . . . . . . . . . . . . . . . . . . . . . 15710.4 APFD of Car Model. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15810.5 Fault detection rates for Cruise Control example. . . . . . . . . . . . . . . . . . . 15810.6 Comparison of average APFD values. . . . . . . . . . . . . . . . . . . . . . . . . 159

11.1 Algorithm MON: Test case generation with monitoring by rewriting. . . . . . . . . 16611.2 Algorithm EXT1: Extending test cases with affected trap properties. . . . . . . . . 16711.3 Algorithm EXT2: Extending up to maximal depth. . . . . . . . . . . . . . . . . . 168

12.1 Example model. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17912.2 Mutant of example in Figure 12.1. . . . . . . . . . . . . . . . . . . . . . . . . . . 17912.3 Transition relation in SMV. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18112.4 Example model with non-boolean values. . . . . . . . . . . . . . . . . . . . . . . 18412.5 Mutant of example in Figure 12.4. . . . . . . . . . . . . . . . . . . . . . . . . . . 18412.6 Another mutant of example in Figure 12.4. . . . . . . . . . . . . . . . . . . . . . . 18512.7 Optimized test case generation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 188

13.1 Symbolic execution of a test case. . . . . . . . . . . . . . . . . . . . . . . . . . . 19313.2 Transition relation with special variable indicating changes. . . . . . . . . . . . . . 19613.3 Transition coverage test case generation. . . . . . . . . . . . . . . . . . . . . . . . 20013.4 Mutation of the reflected transition relation. . . . . . . . . . . . . . . . . . . . . . 20113.5 Test case generation via state machine duplication. . . . . . . . . . . . . . . . . . 202

List of Tables

3.1 Simple transitions for velocity. . . . . . . . . . . . . . . . . . . . . . . . . . . . 213.2 SCR Mode Transition Table for velocity in CC example. . . . . . . . . . . . . . 27

5.1 Evaluation of example test suites. . . . . . . . . . . . . . . . . . . . . . . . . . . 99

6.1 Models and mutation results. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1116.2 Numbers of test cases generated. . . . . . . . . . . . . . . . . . . . . . . . . . . . 1126.3 Mutant scores of test suites. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1136.4 Wiper example coverage. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113

8.1 Test case generation and extension for the SIS example. . . . . . . . . . . . . . . . 1338.2 Test case generation and extension for the CC example. . . . . . . . . . . . . . . . 1358.3 Simple guard coverage on a nondeterministic IUT. . . . . . . . . . . . . . . . . . . 135

9.1 Results in average for Cruise Control example. . . . . . . . . . . . . . . . . . . . 1459.2 Results in average for SIS example. . . . . . . . . . . . . . . . . . . . . . . . . . 1459.3 Results in average for Car Control example. . . . . . . . . . . . . . . . . . . . . . 1459.4 Mutant scores for cruise-control implementation mutants. . . . . . . . . . . . . . . 146

10.1 Test Suite statistics. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155

11.1 Coverage criteria and resulting trap properties. . . . . . . . . . . . . . . . . . . . . 16911.2 Average number of unique test cases. . . . . . . . . . . . . . . . . . . . . . . . . . 17111.3 Average test case length. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17111.4 Total test suite length after removing duplicate/redundant test cases. . . . . . . . . 17111.5 Standard deviation of test suite lengths. . . . . . . . . . . . . . . . . . . . . . . . 17211.6 Redundancy. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17211.7 Creation time. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17311.8 Average number of model checker calls. . . . . . . . . . . . . . . . . . . . . . . . 17311.9 Coverage: Transition coverage test suites. . . . . . . . . . . . . . . . . . . . . . . 17311.10 Coverage: Condition coverage test suites. . . . . . . . . . . . . . . . . . . . . . . 17411.11 Coverage: Transition-Pair coverage test suites. . . . . . . . . . . . . . . . . . . . 17411.12 Coverage: Reflection coverage test suites. . . . . . . . . . . . . . . . . . . . . . 174

xv

xvi List of Tables

11.13 Coverage: Property coverage test suites. . . . . . . . . . . . . . . . . . . . . . . 17511.14 Mutation scores using model mutants. . . . . . . . . . . . . . . . . . . . . . . . 17511.15 Mutation scores using implementation mutants. . . . . . . . . . . . . . . . . . . 175

12.1 Results of the test case generation. . . . . . . . . . . . . . . . . . . . . . . . . . . 18812.2 Coverage and mutation scores. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 188

13.1 Average number of unique new test cases. . . . . . . . . . . . . . . . . . . . . . . 197

Chapter 1Introduction

1.1 Motivation

Testing is an important part of the software development process. It is estimated that 50% of the totaleffort are dedicated to testing and debugging (Myers, 1979; Korel, 1990). Testing is necessary inorder to make sure that software actually performs what it is supposed to do, and does so correctly.Formal proof methods are sometimes suggested as an alternative. While formal proof methods canguarantee correctness, they are only applicable to systems of a limited size, and full automationis not possible. Software testing can be automated, and is computationally cheaper, but it canonly increase confidence in the correctness and not prove it. Software testing would guaranteecorrectness if it was done exhaustively, but exhaustive testing is not feasible in general, as thenumber of possible tests quickly becomes intractable.

The severity of errors in software can range from simple visual glitches to causing software crashes,and can even lead to dangerous situations for people. As an example, consider software that is usedin automotive systems. A problem with the software of an electronic steering system of a car drivingat a high speed can potentially lead to casualties or death.

Software testing is important, but unfortunately there are many problems related with it. Testing iscomplicated and takes a lot of time. In general, complicated tasks that are performed by humansare prone to incompleteness and errors. This can be improved by automation, but unfortunatelyautomation is difficult and requires changes in the software development process in order to beapplicable efficiently. In addition, exhaustive testing is usually not possible, so the decision ofwhich tests to use out of an infinite choice is difficult. The same difficulty applies to creating anautomated mechanism that makes this choice. There is significant effort in the research communityto solve these problems. As is often the case in software engineering, however, there is a widegap between what techniques are proposed and applied by researchers and what is actually used inindustry.

Obviously, the list of problems related to software testing is long. Just as with testing itself, ex-haustive enumeration of all problems is hardly possible. This thesis addresses some of these is-sues. It was mainly conducted within the Te-DES project (Testing Distributed Embedded Systems),

1

2 Chapter 1. Introduction

launched in 2005 by Graz University of Technology, Vienna University of Technology, Magna Steyrand TTTech. The project aim was to result in a prototype application that allows automated genera-tion of requirement related test cases in a safety related scenario. Of importance are full traceabilityof requirement properties to test cases, automated analysis, and easy integration into the V-modelsoftware development process. The results presented in this thesis were conducted as part of thisproject.

1.2 Problem Statement

The problem considered in this thesis is the automatic creation and analysis of a set of test casesgiven a formal model and set of requirement properties using model checkers. Although this is atask that has been considered before, there is no solution that is generally accepted and wide-spreadin industrial use. This indicates that there still are many issues that need to be solved. Summarizing,the main issues tackled in this thesis are:

• Requirements based testing: Model checker based testing usually aims to show that an imple-mentation conforms to a specification automaton. Testing with regard to requirement proper-ties entails several issues which were mostly neglected in the literature so far. For example,traceability requires that each line of code just as well as each test case has to be somehowrelated to a requirement property. How can this be integrated into test case generation andanalysis?

• Performance of test case generation: As the automated generation of test cases is a complextask, performance is a major problem. The performance of model checkers is difficult topredict, and often problematic. How can test case generation be optimized?

• Performance of test case execution: Automatically created test suites are often large or mis-shapen. Even if creation is efficient, the applicability of any approach is limited if the resultingtest cases cannot be efficiently executed. How can test suites be optimized? How can the testcase generation be optimized in order to create efficient test suites in the first place?

• Limits of model checker based testing: Current literature on testing with model checkersmakes assumptions on the underlying specifications; for example, specifications are assumedto be deterministic. Which of the common assumptions can be relaxed in order to increasethe applicability of model checker based testing?

1.3 Thesis Statement

The applicability of model checker based testing and test analysis can be increased by improvingthe performance and quality of the test case generation, and by considering practically relevantscenarios where the state of art is not sufficient.

1.4 Contributions

The results presented in this thesis extend and improve previous achievements in the field of testingwith model checkers. The following papers were revised during the course of this dissertation:

1.5. Organization 3

• A formal relationship between test cases and requirement properties is defined in the conceptof property relevance, published in (Fraser and Wotawa, 2006a) and (Fraser and Wotawa,2007f).

• As the quality of a specification-based testing approach depends on the quality of the usedspecification, methods to analyze specifications as part of the test case generation process aredescribed, and published in (Fraser and Wotawa, 2006b), (Fraser and Wotawa, 2006c), and(Fraser and Wotawa, 2007g).

• The performance of model checker based testing is critical for the application to industrialapplications. To improve the performance, the number of model checker calls is minimizedfor all known types of approaches. These findings were published in (Fraser and Wotawa,2007c) and (Fraser and Wotawa, 2007d).

• Performance is also critical during the test case execution. Prioritization of test cases andminimization with regard to redundancy are proposed as methods to improve test case execu-tion performance. These findings were published in (Fraser and Wotawa, 2007a), (Fraser andWotawa, 2007b), and (Fraser and Wotawa, 2007h).

• A related problem is regression of test suites due to changes in the model used for test casegeneration. The number of test cases that need to be created and executed is minimized, asdescribed in (Fraser et al., 2007).

• Several special cases limit the applicability of model checker based testing in general. Theimpact on nondeterministic models on model checker based testing is analyzed in (Fraser andWotawa, 2007e) and (Fraser and Wotawa, 2007j). Suggestions to improve model checkers forsoftware testing were collected in (Fraser and Wotawa, 2007i).

• Certain types of properties also have an impact on the direct application of model checkerbased techniques. Duminda Wijesekera, Lingya Sun, and Paul Ammann developed an ideaon formalizing the relationship between counterexamples and test cases for all possible prop-erties of the logic CTL. The findings were revised and published in (Wijesekera et al., 2007).

1.5 Organization

This thesis is structured as follows: First, an introduction to software testing is given in Chap-ter 2. The state of the art in model checker based testing and the necessary theoretical backgroundare reviewed in Chapter 3. Chapter 4 defines a formal relationship between test cases and modelchecker counterexamples, and Chapter 5 discusses the implications of this relationship on test caseanalysis.

Chapters 6 and 7 focus on specification related issues. First, the concept of property relevanceis described, which defines a relationship between test cases and requirement properties. Then,different types of analysis with regard to the specification quality are discussed.

Chapter 8 discusses the issue of testing with nondeterministic models and implementations.

Chapters 9 - 13 discuss performance related topics. The main show stopper in the industry em-bracement of model checker based testing is the limited performance of model checkers. Therefore,different issues related to the performance of the test case generation and execution are discussed inthese chapters.

4 Chapter 1. Introduction

Finally, Chapter 14 summarizes the results achieved in the context of this thesis, and discusses howthese results can be applied and extended.

Chapter 2Software Testing Background

2.1 What is Software Testing?

Despite the proliferation of helpful development tools and defined development processes, softwaredevelopment remains a largely manual process. Therefore, errors are made when producing soft-ware. User requirements might be misinterpreted, system design might be violated, or maybe theprogrammer just makes a sloppy mistake. The sources of program errors are plentiful.

Software testing describes the process of interacting with a piece of software with the aim of re-vealing errors. The actual type of software to be examined has a large influence on how this testingis performed. For example, a graphical user interface needs different treatment than an embeddedsystem. Software of realistic size usually offers an intractable set of possible ways to experimentwith. Therefore, one of the main difficulties in software testing is deciding what behavior exactlyto test. It is similarly difficult to decide during experimentation whether the observed behavior iscorrect or not.

2.2 Why Automate Software Testing?

The complexity of determining whether a given piece of software is correct is high. Traditionally,software testing is performed sporadically and manually. In industrial settings a systematic approachis necessary; especially in the case of safety related software a systematic and traceable approach isessential. The current practice in the industry is to use automated tools to organize and execute testcases. Automatic creation of test cases, however, is not common.

There are many reasons why automation is necessary: Creating test cases manually is a tedious anderror prone process. Automation promises a significant improvement of the development process,as it supports one of the process’s most time consuming parts. At the same time, automation oftenresults in more complete sets of test cases because it is performed systematically.

When talking of automation of software testing there are several different problems that need to besolved. For example, the execution of a set of test cases is easily automated. There are two main

5

6 Chapter 2. Software Testing Background

issues that need to be considered: The first is the selection of test data. This data is used as aninput to the system under test. The second problem is to decide whether the execution of a test casedetected a fault or not. This is referred to as the test oracle problem. Ideally, both problems need tobe solved by an automated technique in order to be really useful.

Software testing research has proposed several different solutions to test case generation automa-tion. Many of these have been evaluated on real applications, and some are even used in commercialproducts. There are, however, still many issues that need to be solved before wide-spread accepc-tancy can be achieved.

2.3 Software Testing Basics

Software testing is performed as a means of validation and verification. Validation describes theprocess of determining whether the right software is built. For this, requirement documents describewhat the software is supposed to do. In contrast, verification describes the process of determiningwhether the software is built right, i.e., if it is correct with regard to its design. As software testingcannot be exhaustive in general it can never show the absence of errors. Therefore, the aim of testingis to detect as many errors in the software as possible.

Common terms such as error, fault, failure, bug or defect are sometimes inconsistently used. Theavailable literature is full of different definitions of such terms. In order to avoid confusion, thisthesis tries to avoid the most disputed terms, and uses the following definitions: A failure is definedas a deviation of the software from its expected delivery or service, while the cause of such a failureis a fault (Grindal and Lindström, 2002). The term error is used synonymously to fault in thisthesis.

A fault can have many different faces. For example, Goodenough and Gerhart (1975) distinguishbetween performance and logical errors, where the former is a problem of wrong timing and thelatter is a problem of wrong functionality. Goodenough and Gerhard further suggest distinctionbetween different kinds of logical errors, whether an implementation does not conform to a speci-fication, whether a specification does not correctly represent the design, whether a design does notfulfill the requirements, and so on. A well known, detailed taxonomy of faults is given in Beizer’sclassical book on software testing (Beizer, 1990). Each of these error types can manifest in differentways.

Some faults are more complex than others, and the effects of some faults might be more severethan the effects of other faults. Testing usually does not have to focus on one specific kind of fault,because the coupling effect (DeMillo et al., 1978) states that complex faults are linked to simplerfaults. Consequently, it is sufficient to test for simple faults in order to detect complex faults.

Just as there are many different types of faults, there are many different types of testing. Severalproperties can be used in order to distinguish approaches. For example, the overall aim of testingcan greatly vary:

• Performance testing tries to determine whether the software is correct with respect to timingrequirements (Is it fast enough?).

• Stress testing evaluates the system behavior under heavy load.

2.4. Evaluating the Effectiveness of Testing 7

• Reliability testing determines whether a system behaves correctly over a longer period oftime.

• Functional testing examines whether a system is a correct refinement of its design or specifi-cation.

This list is not an exhaustive list of test aims, but just tries to illustrate the variety there is. A differentapproach to differentiate test techniques is based on the level of abstraction testing is applied to:

• Function testing is applied to the smallest unit of program code.

• Module testing tests the conformance of distinct components of a system (Component test-ing).

• System testing, as the name already suggests, is applied to a complete system. It is usuallysupported by related techniques such as Integration testing and Interface testing.

An important categorization of testing techniques is the distinction between black box and whitebox testing. Black box testing considers the system under test as a black box, where only inputsand outputs are known, the internal details are unknown. This type of testing is also known asfunctional testing. In contrast, white box testing or structural testing takes the actual source codeinto consideration. Structural testing allows to create such test cases that execute certain parts ofthe code. In contrast to functional testing it is possible to test all the program code. Functionaltesting, however, has the potential to detect missing functionality. Often, a combination of suchapproaches is used, for example, when creating functional tests under consideration of a system’smodule structure. This type of testing is referred to as grey box testing.

The list of possible categorizations could be continued further. There are many different testingtechniques taxonomies available in the literature, for example in Beizer’s classical book on softwaretesting (Beizer, 1990). It is not the aim of this section to provide a complete testing taxonomy, onlyto provide enough background information to see where the techniques presented in this thesis fitinto, which is mainly but not exclusively functional grey box testing of components. An exactcategorization is not possible, because the techniques presented in this thesis can be applied todifferent test methods, depending on the underlying model.

2.4 Evaluating the Effectiveness of Testing

The objective of testing is to detect faults; a main problem here is to know when to stop. Certaintythat there are no more undetected faults can only be achieved if all possible tests are executed. Asthe number of possible tests is usually very large or even infinite, this is not possible in practice.Therefore, some measurement for the thoroughness of testing is necessary. A tester can use sucha measurement to decide when he or she is confident enough that the testing is sufficient. The twomost common techniques used for the evaluation of a set of test cases are coverage analysis andmutation analysis. These two methods are deterministic methods, which means that they defineitems that should be exercised (e.g., coverage items, mutants, input partitions), and then check ifthere is at least one test case for each required item. In contrast, statistical methods require severaltest cases selected according to some probability distribution, and allow to estimate the reliabilityof the system under test. In this thesis only deterministic analysis methods are considered.

8 Chapter 2. Software Testing Background

Coverage Analysis Coverage analysis uses some aspects of the system under test and measureshow much of these aspects are executed by the testing process. The actual properties of the systemthat should be executed are described by a coverage criterion. For example, the statement coveragecriterion requires that each line of code in a program is executed at least once during testing. If aline of code is executed, it is covered. Coverage is measured as a percentage value of the items thatshould be covered out of the set of items that should be covered. If all lines of program code arecovered, 100% statement coverage is achieved. Even if black box testing is performed, structuralinformation can be used to measure the success of a test case.

A wealth of different coverage criteria has been proposed since the early beginnings of softwaretesting. The more common coverage criteria are those based on the source code. It is quite easy tomeasure such criteria, and there are efficient and easily usable tools available. The above mentionedstatement coverage is an example of such a criterion. The range of code based coverage criteriaranges from the simple statement coverage criterion up to path coverage, which requires all possiblesequences from program entry to program exit to be covered. Other code based coverage criteriaconsider the lifetime of variables (data flow coverage), loops, race conditions, object code, etc. It isalso possible to use a formal model or specification to measure the coverage. The characteristics ofthese coverage criteria depend on the actual formalism used.

Mutation Analysis A mutant is a syntactic variation of a program. The syntactic change possiblyleads to a different behavior in contrast to the original program. Such mutants are used to evaluatetest cases. If a test case can distinguish between the mutant and the original program, then themutant is killed. The idea of mutation analysis is to create a set of mutants that are representative ofrealistic faults. Mutants are created by applying mutation operators to the original program code.Each mutation operator represents a different fault class, and each application of a mutation operatorresults in a different mutant. The set of mutants is used to determine how many of these mutantsa given set of test cases can distinguish from the original program. The success is measured in themutation score. A high mutation score is representative of a high fault sensitivity. The term mutationtesting describes techniques to derive test cases such that a certain mutation score is achieved; aresulting test suite is mutation adequate. In contrast to coverage measurement, mutation analysisis not yet widely accepted in commercial use. This is because there are many problems related tomutation testing, mainly caused by its high computational costs. For example, it is an undecidableproblem to determine whether a mutant is equivalent to the original program (Budd and Angluin(1982) show that determining equivalence between two programs is undecidable). More details onmutation testing follow in Chapter 3, Section 3.7.

2.5 Formal Testing

Traditionally, testing is done ad hoc and without any theoretical underpinning. The availability of aformal specification makes it possible to turn testing into a well-defined process with a sound theory.Testing based on a formal specification usually aims to determine whether a given implementationconforms to the specification, where the notion of "conformance" has to be formally defined. Thiskind of testing is therefore known as conformance testing. The most common application for con-formance testing is protocol testing, but testing of reactive systems is quite similar.

2.5. Formal Testing 9

A popular framework for conformance testing was introduced by Tretmans (1999). A related frame-work based on algebraic specifications was presented by Gaudel (1995). In order to formally treatthe relationship between a specification and an implementation, the test hypothesis describes theassumption that there exists a formal model of the implementation; this model does not actuallyhave to exist. The test hypothesis allows to formulate a formal relationship between specificationand implementation, the implementation relation.

The implementation is examined by execution of test cases. This execution and resulting observa-tions are formally modeled with a test execution procedure. A verdict function decides after theexecution whether the observations are correct, and therefore whether the implementation passes orfails the test suite.

Conformance testing aims to deduce from the test execution whether the implementation conformsto a specification. It would be nice to have a test suite for a given specification such that an im-plementation conforms to the specification if and only if it passes the test suite. Such a test suiteis called a complete test suite, and is rarely practical for real applications because of the possiblyinfinite number of test cases.

In practice, weaker relations are formulated. A test suite is sound, if any implementation thatfails the test suite does indeed not conform to the specification. However, there can still be non-conforming implementations that pass the test suite. The requirement that all non-conforming im-plementations are detected is achieved with an exhaustive test suite; note that an exhaustive testsuite may contain unsound test cases, depending on the considered conformance relation. Testingis complete if it is sound and exhaustive.

An important part of the test framework is the test derivation. Any algorithm to create test casesshould result in only sound test suites.

This theoretical framework has to be instantiated with regard to a specific modeling formalism.Tretmans uses labelled transitions systems (Tretmans, 1996). The selection of test cases is formallyguided by certain implementation relations, such as trace preorders or testing preorders. Section 2.5relates model checker based testing to the discussed formal framework.

Chapter 3Automated Testing with Model Checkers

3.1 Introduction

Software test research has resulted in several different approaches to automated test case generation.In general, a main prerequisite for any automated approach to test case generation is the availabilityof some kind of model or specification. Therefore, such techniques are known as model basedtesting. In general, it is possible to derive test data without a model. The difficulty here is to definewhat exactly a model is. For example, a specification of the input variables and their domainscan be considered as a model, and test cases can be created with regard to equivalence partitionsor boundary values. In order to achieve complete automation it is also necessary to create a testoracle that decides whether a fault was detected or not. If test cases are only created with regardto equivalence partitions, the expected values are not automatically known; this is often referredto as test data. In this thesis, we define a model as a functional, formal description of the systemunder test. Consequently, a formal model is very well suited as test oracle. As the test oracle isso important, the term test case exclusively refers to a pair of test data and expected output in thisthesis. There are other scenarios where automated testing without an oracle is conceivable. Forexample, test cases are sometimes created from the application source code and executed on thetarget platform in order to verify the execution platform and the tool chain that creates the platformspecific executable from the source code.

A main characteristic of automated test case generation is the used model formalism. A well knownand thoroughly explored area of research considers finite state machines. For example, protocoltesting is often performed using such approaches. Model oriented languages such as VDM-SL, Zor Alloy are also used for test case generation as well as certain UML subsets. Labelled transitionsystems have recently become very popular. Another approach that has received attention lately isthe use of model checkers (Clarke et al., 1993) for automated testing. The latter approach is veryintuitive and easy to use. A model checker is a tool for formal verification. There are many differentefficient model checkers freely available, therefore it is easy to experiment with such an approach.The several different ways model checking has been used for test case generation illustrates itsflexibility. Consequently, such an approach is also chosen in this thesis.

11

12 Chapter 3. Automated Testing with Model Checkers

3.2 Model Checking Background

Basically, a model checker is a tool intended for formal verification. It takes as input an operationalspecification of the system that is considered. Then, it takes a temporal logic property and analyzesthe entire state space of the model in order to determine whether the model violates the propertyor not. If the state space exploration shows no property violations, then correctness with regard tothe property is proved. A basic feature of model checkers is the ability to generate witnesses andcounterexamples for property satisfaction or violation, respectively. When a typical model checkerdetects that a property is violated, it returns a counterexample that illustrates the property violation.A human analyzer can use this counterexample to identify and fix the design fault. For testingpurposes, the counterexample can be interpreted as a test case.

The formalism commonly used to describe model checking and to define the semantics of temporallogics is the Kripke structure.

Definition 1 (Kripke Structure). A Kripke structure K is a tuple K = (S , S 0,T, L):

• S is a set of states.

• S 0 ⊆ S is an initial state set.

• T ⊆ S × S is a total transition relation, that is, for every s ∈ S there is a s′ ∈ S such that(s, s′) ∈ T.

• L : S → 2AP is a labeling function that maps each state to a set of atomic propositions thathold in this state.

AP is a countable set of atomic propositions.

x=0 x=0, x=1

x=1y=1y=0

(a) Example FSM.

x=0

y=0

x=0

y=1

x=1

y=1

(b) Example Kripke structure.

Figure 3.1: Example model as FSM and the according Kripke structure.

An infinite execution sequence of this model is a path. As paths are infinite, deadlocks cannotdirectly be modeled with Kripke structures. It is, however, possible to model a deadlock as a self-loop to a state. A Kripke structure defines all possible paths of a system.

Definition 2 (Path). A path p := 〈s0, s1, ...〉 of Kripke structure K is an infinite sequence such that∀ i ≥ 0 : (si, si+1) ∈ T for K, and s0 ∈ S 0.

Let Paths(K, s) denote the set of paths of Kripke structure K that start in state s. We use Paths(K)as an abbreviation to denote {Paths(K, s) | s ∈ S 0}.

3.2. Model Checking Background 13

x=0

y=0

x=0

y=0

x=0

y=0

x=0

y=0

x=1

y=1

x=1

y=1

x=1

y=1

x=0

y=1

x=0

y=1

x=0

y=1

x=1

y=1

x=1

y=1

x=0

y=1

x=1

y=1

x=1

y=1

Figure 3.2: Execution tree of example automaton.

As an example, consider an automaton that takes one input (x), and tells us as output y whether x hasever been true (1) in the past or not. Figure 3.1(a) depicts this a regular finite state machine (FSM),which moves to state y = 1 once input x is true; this state cannot be left regardless of the input.Figure 3.1(b) depicts the same automaton as a Kripke structure. The state of the Kripke structureresults from the values of input x and output y. The possible executions of this Kripke structure canbe illustrated as a tree (Figure 3.2). Each path in this tree is a path of the Kripke structure.

As infinite paths are not usable in practice, model checking uses finite sequences, commonly re-ferred to as traces. If necessary, we can interpret a finite sequence as an infinite sequence where thefinal state is repeated infinitely.

Definition 3 (Trace). A trace t := 〈s0, . . . sn〉 of Kripke structure K is a finite sequence such that∀ 0 ≤ i < n : (si, si+1) ∈ T for K. There can be a dedicated state si such that si = sn and i , n,which is a loopback state, and 〈s0, . . . si−1, (si . . . sn)ω〉 is a path of K.

A trace t is either a finite prefix of an infinite path or a path that contains a loop, if a loopback state isgiven. The latter is called a lasso-shaped sequence, and has the form t := t1(t2)ω, where t1 and t2 arefinite sequences. The sequence t2 is repeated infinitely often, denoted with ω, the infinite versionof the Kleene star operator used for ω-languages. Lasso shaped sequences are used in practice toshow violation of liveness properties, which requires infinite sequences. For example, the modelchecker NuSMV (Cimatti et al., 1999) interprets all identical states in a trace as possible points ofloopback. The number of transitions a trace consists of is referred to as its length. For example,trace t := 〈s0, s1, ..., sn〉 has a length of length(t) = n. The set T of possible execution sequences ofa model K is denoted as T = Traces(K).

Temporal logics are modal logics with special operators for time. Time can either be interpreted tobe linear or branching. The most common logics are the linear time logic LTL (Pnueli, 1977) (LinearTemporal Logic), and the branching time logic CTL (Clarke and Emerson, 1982) (Computation TreeLogic). CTL*, introduced by Emerson and Halpern (1982), is the superset of these logics. Mostcurrent model checkers support either LTL or CTL, or sometimes both. Other temporal logicsthat are used in model checking are Hennessy-Milner Logic (Hennessy and Milner, 1985) (HML),Modal µ-calculus (Kozen, 1983), and different flavors of CTL such as timed, fair, or action CTL.


An LTL formula consists of atomic propositions, Boolean operators and temporal operators. Theoperator "© " refers to the next state. E.g., "© a" expresses that a has to be true in the next state." U " is the until operator, where "a U b" means that a has to hold from the current state up to astate where b is true. "� " is the always operator, stating that a condition has to hold at all statesof a path, and "^ " is the eventually operator that requires a certain condition to eventually hold atsome time in the future. The syntax of LTL is given as follows, where AP denotes the set of atomicpropositions:

Definition 4 (LTL Syntax). The BNF definition of LTL formulas is given below, where:

φ ::= true | f alse | a ∈ AP | ¬ φ | φ1 ∧ φ2 | φ1 ∨ φ2 |

φ1 → φ2 | φ1 ≡ φ2 | φ1 U φ2 | ©φ | � φ | ^ φ

A property φ satisfied by path π of model K is denoted as K, π |= φ, which is also abbreviated asπ |= φ if K is obvious from the context. A path π of model K violating property φ is denoted asK, π 6|= φ or π 6|= φ. The semantics of LTL is expressed for infinite paths of a Kripke structure. πi

denotes the suffix of the path π starting from the i-th state, and πi denotes the i-th state of the pathπ, with i ∈ N0. The initial state of a path π is π0.

Definition 5 (LTL Semantics). Satisfaction of LTL formulas by a path p ∈ Paths(K) of a KripkeStructure K = (S , S 0,T, L) is inductively defined as follows, where a ∈ AP:

K, π |= true for all π (3.1)

K, π 2 f alse for all π (3.2)

K, π |= a iff a ∈ L(π0) (3.3)

K, π |= ¬φ iff K, π 6|= φ (3.4)

K, π |= φ1 ∧ φ2 iff K, π |= φ1 ∧ K, π |= φ2 (3.5)

K, π |= φ1 ∨ φ2 iff K, π |= φ1 ∨ K, π |= φ2 (3.6)

K, π |= φ1 → φ2 iff K, π 6|= φ1 ∨ K, π |= φ2 (3.7)

K, π |= φ1 ≡ φ2 iff K, π |= φ1 iff K, π |= φ2 (3.8)

K, π |= φ1 U φ2 iff ∃i ∈ N0 : K, πi |= φ2 ∧ ∀ 0 ≤ j < i : K, π j |= φ1 (3.9)

K, π |= ©φ iff K, π1 |= φ (3.10)

K, π |= � φ iff ∀ j ∈ N0 : K, π j |= φ (3.11)

K, π |= ^ φ iff ∃ j ∈ N0 : K, π j |= φ (3.12)

The temporal logic CTL was introduced by Clarke and Emerson (1982). It can be viewed as asubset of CTL*, introduced by Emerson and Halpern (1982). CTL* formulas consist of atomicpropositions, logical operators, temporal operators (F , G , U , R , X ) and path quantifiers (A,E). The operator F ("finally") corresponds to the eventually operator ^ in LTL, G ("globally")corresponds to � . U ("until") corresponds to U , and R ("release") is the logical dual of U .Finally, X ("next") corresponds to the next operator © . The path quantifiers A ("all") and E("some") require formulas to hold on all or some paths, respectively. CTL is the subset of CTL*obtained by requiring that each temporal operator is immediately preceded by a path quantifier.Consequently, the syntax of CTL can be defined as follows:


Definition 6 (CTL Syntax). The BNF definition of CTL formulas is given below:

φ := a ∈ AP | φ1 ∨ φ2 | φ1 ∧ φ2 | ¬φ |

AX φ | AF φ | AG φ | A φ1U φ2 | A φ1R φ2 |

EX φ | EF φ | EG φ | E φ1U φ2 | E φ1R φ2

The syntax of CTL* further includes all possible combinations of temporal operators with formulas,where the temporal operators are not preceded by path quantifiers. As CTL* model checking iscomplex, most model checkers use either CTL or LTL in practice. Consequently, we do not considerCTL* in detail. As all temporal operators are preceded by a path quantifier in CTL, the semantics ofCTL can be expressed by satisfaction relations for state formulas. K, s |= φ denotes a state formulaφ that is satisfied in state s of Kripke structure K. The set of paths that start in state s is denoted asPaths(K, s).

Definition 7 (CTL Semantics). Satisfaction of CTL formulas by a state s ∈ S of a Kripke StructureK = (S , S 0,T, L) is inductively defined as follows, where a ∈ AP:

K, s |= a iff a ∈ L(s) ∧ s ∈ S

K, s |= ¬φ iff ¬(K, s |= φ)

K, s |= φ1 ∨ φ2 iff (K, s |= φ1) ∨ (K, s |= φ2)

K, s |= φ1 ∧ φ2 iff (K, s |= φ1) ∧ (K, s |= φ2)

K, s |= AX φ iff ∀π ∈ Paths(K, s) : K, π1 |= φ

K, s |= AF φ iff ∀π ∈ Paths(K, s) : ∃i : K, πi |= φ

K, s |= AG φ iff ∀π ∈ Paths(K, s) : ∀i : K, πi |= φ

K, s |= Aφ1U φ2 iff ∀π ∈ Paths(K, s) : ∃i :

∀ j < i : K, π j |= φ1∧

∀k ≥ i : K, πk |= φ2

K, s |= Aφ1R φ2 iff ∀π ∈ Paths(K, s) : ∀i : ∀ j < i :

K, π j 6|= φ1 → K, πi |= φ2

K, s |= EX φ iff ∃π ∈ Paths(K, s) : K, π1 |= φ

K, s |= EF φ iff ∃π ∈ Paths(K, s) : ∃i : K, πi |= φ

K, s |= EG φ iff ∃π ∈ Paths(K, s) : ∀i : K, πi |= φ

K, s |= Eφ1U φ2 iff ∃π ∈ Paths(K, s) : ∃i :

∀ j < i : K, π j |= φ1∧

∀k ≥ i : K, πk |= φ2

K, s |= Eφ1R φ2 iff ∃π ∈ Paths(K, s) : ∀i : ∀ j < i :

K, π j 6|= φ1 → K, πi |= φ2

Commonly, three different types of verifiable properties are distinguished:

Safety Property: A safety property describes a behavior that may not occur on any path ("Some-thing bad may not happen"). To verify a safety property, all execution paths have to be


checked exhaustively. Safety properties are of the type �¬φ or AG ¬φ, where φ is a propo-sitional formula.

Invariance Property: An invariance property describes a behavior that is required to hold on allexecution paths. It is logically complementary to a safety property. Invariance properties areof the type � φ or AG φ, where φ is a propositional formula.

Liveness Property: A liveness property describes that "something good eventually happens".With linear time logics, this means that a certain state will always be reached. For example,� φ1 → ^ φ2 and AG φ1 → AF φ2 are liveness properties.

The aim of model checking is to determine whether a given model fulfills a given property. Severaldifferent algorithms have been successfully used for this task, using different temporal logics anddata structures. Once property violation or satisfaction is determined, a model checker can returnan example of how this violation or satisfaction occurs. This is illustrated with a counterexampleor witness, respectively. Satisfaction of LTL properties is defined using linear sequences. Conse-quently, witnesses and counterexamples for LTL formulas are also linear sequences. In contrast,CTL properties are state formulas. Therefore, the CTL model checking problem (Clarke et al.,2001, p. 35) is to find the set of states that satisfy a given formula in a given Kripke structure.Special algorithms are used to derive trace examples for witness or counterexample states (Clarkeet al., 1995).

The first successful model checking approach is explicit model checking. There are different ap-proaches based on LTL (Lichtenstein and Pnueli, 1985; Vardi and Wolper, 1986) and CTL (Clarkeet al., 1983; Queille and Sifakis, 1982) properties. In all approaches, the state space is representedexplicitly, and searched by forward exploration until a violation of a property is found. For example,in LTL model checking the negation of a property is represented as an automaton that accepts infi-nite words (Büchi automaton). If the synchronous product of model and Büchi automaton containsany accepting path, then this path proves property violation (the path shows that the negation ofthe property is accepted by the model automaton, and therefore the property itself is violated). Thecounterexample is simply the path back to the initial state. The search algorithm can either be depth-or breadth-first search; recently heuristic search has also been considered. Breadth-first search al-ways finds the shortest possible counterexamples, but the memory demands are significantly higherthan for depth-first search. In CTL model checking, all states satisfying a given property are deter-mined by recursively calculating the satisfied sub-formulas for each state. If all states are visitedand no violation is detected, then the property is consistent with the model. Directed model check-ing (Edelkamp et al., 2001) extends explicit model checking with heuristic search to increase thespeed with which errors are found and counterexamples are generated. Such a technique is applica-ble if the aim of model checking is not a proof of correctness, but the generation of counterexample.As such, this idea is well suited for test-case generation.

Symbolic model checking (McMillan, 1993), the second generation of model checking, uses orderedbinary decision diagrams (BDDs (Bryant, 1986)) to represent states and function relations on thesestates efficiently. This allows the representation of significantly larger state spaces, but a largenumber of BDD variables has a negative impact on the performance, and the ordering of the BDDvariables has a significant impact on the overall size. There are different heuristic approaches ofhow to order variables, as determining the optimal order is NP-complete (Bryant, 1992).

Bounded Model Checking (Biere et al., 1999), the third generation of model checking, reformu-lates the model checking problem as a constraint satisfaction problem (CSP). This allows the use of


propositional satisfiability (SAT) solvers to calculate counterexamples up to a certain upper bound.As long as the boundary is not too big, this approach is very efficient. There are also approachesto extend bounded model checking to infinite state systems. Bounded model checking has beensuccessfully applied to systems where traditional model checking fails. At the same time, there aremany settings where a bounded model checker fails while a symbolic model checker is efficient.Therefore, bounded model checkers do not replace but supplement traditional model checking tech-niques.

The most commonly used model checkers in the context of testing are the explicit state modelchecker SPIN (Holzmann, 1997) (Simple Promela Interpreter), the Symbolic Analysis LaboratorySAL (de Moura et al., 2004), which supports both symbolic and bounded model checking, thesymbolic model checker SMV (K.L. McMillan, 1992) as well as its derivative NuSMV (Cimattiet al., 1999), which supports symbolic and bounded model checking. Other popular model checkersinclude Murφ (Dill, 1996) the process algebra based FDR2 (Formal Systems Europe, 1997), orCOSPAN (Hardin et al., 1996); some of these have also been used for testing.

Many current model checkers such as NuSMV (Cimatti et al., 1999) or SAL (de Moura et al., 2004)support CTL model checking in addition to or instead of LTL model checking. In CTL modelchecking, special algorithms are applied to construct linear traces from an initial state to explaina violating state (Clarke et al., 1995). However, only certain restricted subsets of branching timetemporal logics such as ACT Ldet or LIN always result in linear counterexamples (Clarke and Veith,2004). When using full CTL, linear counterexamples are not always sufficient as evidence forproperty violation or satisfaction. Most works on testing with model checkers only considers thelinear subset when using CTL for properties. Therefore, we use the term counterexample to describea linear trace that either shows an LTL property violation or violation of a CTL property that can beviolated by a linear trace.

Recently, an algorithm to create tree-like counterexamples has been proposed by Clarke et al.(2002). In a related work, Wijesekera et al. (2007) define a formal relation between test casesand counterexamples for full CTL. The implications of using CTL instead of LTL are discussed inChapter 4. If at any point a technique is applicable only to LTL or differs if used with CTL, thenthis is explicitly stated. A related approach has been presented by Meolic et al. (2004): Witness andcounterexample automata represent the superset of all finite and linear witnesses/counterexamplesfor a limited subset of CTL.

There are further optimizations and approaches to model checking for different application scenar-ios. For example, model checking techniques have been adapted to timed systems, where time isexplicitly represented. For the scope of this thesis, however, this distinction of major approaches issufficient.

A recent trend in verification research is software model checking. Here, the verification is notapplied to a model of the software, but to the software directly. Although the complexity of softwaremodel checking is very high and the maturity of related tools is not at the same level as establishedmodel checkers, testing with such tools has already been suggested (Visser et al., 2004; Beyer et al.,2004). The focus of this thesis is on model-based testing, therefore software model checkers are notconsidered except in a brief overview in Section 3.10.1.

In this thesis, the model checker NuSMV is used for experimentation, because this is an activelydeveloped model checker that uses the language SMV. A large part of the available work on testing


with model checkers defines examples with SMV. All following examples are given using the syntaxof SMV, and experimentation is performed with NuSMV. Most examples should work with bothSMV and NuSMV without modifications, therefore SMV and NuSMV are synonymously usedwhen concerning the syntax.

MODULE mainVAR

var declarationsASSIGNnext(var) := case

condition1: next_value1;condition2: {next_valuea, next_valueb};...

esac;

Figure 3.3: SMV syntax. The transition relation of a variable var is given as a set of conditions andnext values. The first transition is deterministic, while the second is nondeterministic.

In general, models are defined for a set of variables. Consequently, the atomic propositions APare defined on these variables, and the state of the model is defined by a valuation of the variables.The variables are defined in the VAR section of the NuSMV file. For each variable, the transitionrelation is specified separately. In NuSMV, the transition relation of a variable is either defined ina TRANS or ASSIGN section. In this thesis, only the ASSIGN method is considered, but as TRANSis only a syntactic variation there are no limitations when applying the presented techniques to it.Figure 3.3 illustrates the structure of a NuSMV file. The first transition described in the transitionrelation Figure 3.3 is deterministic, that is, whenever condition1 is encountered, var is assignednext_value1 in the next state. Here, condition1 can be any logical formula on the variables definedin the model.

NuSMV allows nondeterministic assignments for variables, where the assigned value is chosen outof a set expression or a numerical range. The second transition in Figure 3.3 is nondeterministic.Upon condition2, var is nondeterministically assigned either next_valuea or next_valueb. Withregard to the Kripke structure, each condition corresponds to a set of transitions T ′, where for each(si, s j) ∈ T ′ the condition has to be fulfilled by L(si), and the proposition var = next_value iscontained in L(s j).

Rayadurgam and Heimdahl (2001b) have defined a formal framework that can also be used to inter-pret NuSMV models. In this framework, the system state is uniquely determined by the values ofn variables {x1, x2, ..., xn}. Each variable xi has a domain Di, and consequently the reachable statespace of a system is a subset of D = D1 × D2 × ... × Dn. The set of initial values for the variablesis defined by a logical expression ρ. The valid transitions between states are described by the tran-sition relation, which is a subset of D × D. The transition relation is defined separately for eachvariable using logical conditions. For variable xi, the condition αi, j defines the possible pre-statesof the j-th transition, and βi, j is the j-th post-state condition. A simple transition for a variable xi isa conjunction of αi, j, βi, j and a guard condition γi, j: δi, j = αi, j ∧ βi, j ∧ γi, j.

The disjunction of all simple transitions for a variable xi is a complete transition δi. The transition re-lation ∆ is the conjunction of the complete transitions of all the variables {x1, ..., xn}. Consequently,

3.3. A Simple Demonstration Model 19

a basic transition system is defined as follows:

Definition 8 (Basic Transition System). A transition system M over variables {x1...xn} is a tupleM = (D,∆, ρ), with D = D1 × D2 × ... × Dn, ∆ =

∧i δi, and the initial state expression ρ. For

each variable xi there is a transition relation δi, that is the disjunction of several simple transitionsδi, j = αi, j ∧ βi, j ∧ γi, j, where αi, j, βi, j, and γi, j are pre-state, post-state, and guard conditions of thej-th simple transition of variable xi.

Each entry of a case statement in a NuSMV model (see Figure 3.3) corresponds to a simple transi-tion δi, j for variable var = xi. The conditions condition j correspond to the conjunctions of αi, j andγi, j, and βi, j corresponds to the atomic proposition x′i = ξ j, where x′i denotes xi in the post-state.Conditions are ordered so that strictly speaking they may be interpreted as φ1, ¬φ1 ∧φ2, etc., and inthe general case as: (

∧1≤ j<k ¬φ j) ∧ φk.

3.3 A Simple Demonstration Model

!accelerate/stop

accelerate/fast

brake/stop

!accelerate/slowS0

S1

S2brake/stop

brake/stop

!accelerate/stop

accelerate/slow

accelerate/fast

Figure 3.4: FSM of example model Car Controller (CC).

To illustrate the presented techniques, the following model serves as an example in this thesis: Themodel represents a simplified controller of a car (CC). It has two Boolean inputs that represent theuser’s decision to accelerate or brake. Upon acceleration, the car starts moving, with either slow orfast velocity. Upon braking the car immediately stops. Figure 3.4 depicts this model as an FSM,where transitions are labeled with the input and the resulting velocity as output. Figure 3.5 showsthe SMV source code of the model.

In addition to the model, several simple requirement properties are expressed in LTL:

1. Whenever the brake is activated, movement has to stop.

� (brake→ ©velocity = stop) (3.13)

2. When accelerating and not braking, the velocity has to increase gradually, until it is fast.

� (¬brake ∧ accelerate ∧ velocity = stop→ ©velocity = slow) (3.14)

� (¬brake ∧ accelerate ∧ velocity = slow→ ©velocity = fast) (3.15)


MODULE mainVARaccelerate: boolean;brake: boolean;velocity: { stop, slow, fast };

ASSIGNinit(velocity) := stop;next(velocity) := caseaccelerate & !brake & velocity = stop : slow;accelerate & !brake & velocity = slow : fast;

!accelerate & !brake & velocity = fast : slow;!accelerate & !brake & velocity = slow : stop;

brake: stop;

TRUE : velocity;esac;

Figure 3.5: CC represented as SMV model.

3. When not accelerating and not braking, the velocity has to decrease gradually, until the carstops.

� (¬brake ∧ ¬accelerate ∧ velocity = fast→ ©velocity = slow) (3.16)

� (¬brake ∧ ¬accelerate ∧ velocity = slow→ ©velocity = stop) (3.17)

In Section 3.2, basic transition systems are described as part of a framework for specificationbased test case generation introduced by Heimdahl et al. (2000). This framework is instantiatedin (Rayadurgam and Heimdahl, 2001b,c,a), where a general transition system definition is given.Other formalisms, such as SCR or RSML−e (Thompson et al., 1999), can be interpreted as suchtransition systems, and mapped to model checking specifications, e.g., SMV. The language RSML−e

is based on the Statecharts like language Requirements State Machine Language (RSML) (Levesonet al., 1994), and adds support for interfaces between the environment and the control software.Figure 3.6 illustrates the RSML−e table for our example CC model. The simple transitions (seeDefinition 8) for the CC model are listed in Table 3.1.

3.4 Model Checkers in Software Testing

The idea of testing with model checkers is to interpret counterexamples as test cases. A suitabletest case execution framework can extract the test data and as well as the expected results (i.e., testoracle) from counterexamples. Early work on testing with model checkers required manually spec-ified test purposes to either formulate negated as never claims (Engels et al., 1997), or to partition

3.4. Model Checkers in Software Testing 21

STATE_VARIABLE velocity:VALUES: { stop, slow, fast }INITIAL_VALUE: stopCLASSIFICATION: State

EQUALS stop IFTABLEaccelerate : * F F;brake : T F F;PREV_STEP(velocity) = stop : * * T;PREV_STEP(velocity) = slow : * T *;

END TABLE

EQUALS slow IFTABLEaccelerate : T F;brake : F F;PREV_STEP(velocity) = stop : T *;PREV_STEP(velocity) = fast : * T;

END TABLE

EQUALS fast IFTABLEaccelerate : T T;brake : F F;PREV_STEP(velocity) = slow : T *;PREV_STEP(velocity) = fast : * T;

END TABLE

END STATE_VARIABLE

Figure 3.6: Simple car controller as RSML−e specification.

Table 3.1: Simple transitions for velocity.Transition α β γ

1 velocity = stop velocity = slow accelerate ∧ ¬brake2 velocity = slow velocity = fast accelerate ∧ ¬brake3 velocity = fast velocity = slow ¬accelerate ∧ ¬brake4 velocity = slow velocity = stop ¬accelerate ∧ ¬brake5 True velocity = stop brake6 velocity = fast velocity = fast accelerate ∧ ¬brake7 velocity = stop velocity = stop ¬accelerate ∧ ¬brake


the execution tree (Callahan et al., 1996). Later, many different techniques were proposed to sys-tematically and automatically derive complete sets of test cases. Most approaches follow the ideaof never-claims and use counterexamples, but some also use witness traces instead of counterexam-ples; these two ideas are complementary, as a simple negation of the used properties is sufficientto switch from one to the other. This section only considers the basic idea of test case generation,systematic techniques are considered in later sections.

A test purpose describes the desired characteristics of a test case that should be created. For exam-ple, it could describe the final state of the test case, or a sequence of states that should be traversed.The test purpose is specified in temporal logic and then converted to a never-claim by negation; thisasserts that the test purpose never becomes true. Model checking the never-claim on a model resultsin a counterexample, if the never-claim becomes false at some point. The counterexample illustrateshow the never-claim is violated, and thus shows how the original test purpose is fulfilled. As will beshown in Sections 3.5 and 3.6, a popular approach is to automatically create never-claims based oncoverage criteria. These never-claims are called trap properties (Gargantini and Heitmeyer, 1999),and for each item that should be covered one trap property is generated. A test purpose is not nec-essarily feasible, but fortunately infeasible test purposes are not a problem, because the never-claimfor an infeasible test purpose simply results in no counterexample.

The exact interpretation of counterexamples as test cases depends on the system under test. Inmany cases, testing with model checkers is applied to reactive systems, which read input valuesfrom sensors and set output values accordingly. The model therefore consists of a set of variablesrepresenting input, output, and possibly internal variables, as depicted in Figure 3.7. The systemreacts to inputs by setting output values, such that a logical step in a counterexample can be mappedto an execution cycle of the system under test (see Figure 3.8(b)). This is the scenario assumed inthis thesis, but testing with model checkers is not limited to this specific type of automaton. For otherscenarios mapping from counterexamples to test cases will vary; for example when considering flowgraphs (Hong and Lee, 2003).

Figure 3.7: Reactive system model.

In the reactive system scenario, counterexamples can directly be interpreted as test cases. Becausetest cases are always finite, it is necessary to distinguish between traces with or without loopbackwhen mapping a trace to a test case.

Definition 9 (Test Case). A test case t := 〈s0, ...sn〉 related to Kripke structure K is a finite sequencesuch that ∀ 0 ≤ i < n : (si, si+1) ∈ T for K.

The number of transitions a test case consists of is referred to as its length. E.g., test case t :=〈s0, s1, ..., sn〉 has a length of length(t) = n. Test cases can easily be created from traces (Defini-tion 3). If a trace does not contain a loopback state, then trace and test case are identical. If the tracedoes contain a loopback, i.e., it is a lasso-shaped sequence, then the lasso needs to be unfolded. Tan

3.4. Model Checkers in Software Testing 23

(a) A counterexample is a trace in the execution tree.

(b) Counterexample, usable as test case.

Figure 3.8: Counterexamples are execution paths, where each state assigns values to all variables.

et al. (2004) describe truncation strategies to create finite test cases from lasso-shaped sequences.When using a white-box testing technique, the complete internal state is known, and can be trackedin detail during test case execution. Therefore a test case can be terminated whenever the same statehas been visited twice at the same position in the loop. When using a black-box testing approach,the loop part of the trace is repeated a finite number of times.

This interpretation of counterexamples as test cases can only be directly applied to deterministicsystems. If there is nondeterminism, then a concrete counterexample contains only one possiblechoice for each nondeterministic choice. Applying such a test case to an implementation that makesa different but valid choice would falsely report a fault. There are considerations of how to extendmodel checker based testing to nondeterministic systems; see Section 3.9.7 for more details.

The result of the test case generation is a test suite (or test set). A test suite TS is a finite set of testcases.

Definition 10 (Test Suite). A test suite TS is a finite set of n test cases. The size of TS is n.The overall length of a test suite TS is the sum of the lengths of its test cases t: length(TS ) =∑

t∈TS length(t).

In order to describe how a test case is executed, further definitions are necessary. In general, thetest case execution depends on the relation between the model and the system under test (SUT).The most common scenario in the literature is that of reactive systems, which are executed in aninfinite loop. This scenario is also assumed here; if the mapping from model to SUT is different,then test case execution has to be adapted to this change. A time step in the model can easily be


mapped to an execution cycle in such a reactive system. In each cycle, the SUT receives stimulifrom the environment via its inputs. Using the inputs, the SUT performs some computations andmakes some changes, which can be observed via its outputs. A test case is executed via interactionwith the system under test (SUT). For this execution, the inputs are provided by the tester. Theresulting outputs are observed, and compared to the expected values. This observation leads to averdict, which can be either fail or pass, expressing that a fault was found or not, respectively.

So far, a counterexample was only described as a sequence of states. According to the definitionof a Kripke structure, each state can be mapped to a set of atomic propositions that hold in thisstate. In order to use counterexamples as test cases, we need to identify inputs and outputs at eachstate. For this, we use the concept of modules, as described by Boroday et al. (2007) in the contextof testing with model checkers, and originating from module checking theory (Kupferman andVardi, 1997). In practice, the set of atomic propositions AP of a Kripke structure does not containdeliberate propositions; a system is defined by a set of variables (Clarke et al., 1995; Rayadurgamand Heimdahl, 2001b; Boroday et al., 2007) (see Figure 3.7). The labeling function for a statetherefore results in a valuation of all variables in that state. The variables can be partitioned intoinput, output and internal variables.

Definition 11 (Module). (Boroday et al., 2007) A module M is a triple M = (K, I,O), where K =

(S , S 0,T, L) is a Kripke structure, and I,O ⊆ AP are disjoint sets of input and output variables. Theset of hidden variables is defined as H = AP\(I ∪ O).

Intuitively, a module works as follows: In every state s, M reads L(s)∩ I, stores internally L(s)∩H,and outputs L(s) ∩ O. Inp(s) = L(s) ∩ I denotes the input of the module at state s, and Out(s) =

L(s) ∩ O denotes the output of the module at state s. A trace t := 〈s0, s1, ...sn〉 of module M canbe interpreted such that the output sequence 〈Out(s0),Out(s1), ...Out(sn)〉 is produced in responseto the input sequence 〈In(s0), In(s1), ...In(sn)〉.

Informally, a test case t := 〈s0, s1, ...sn〉 is therefore executed on an SUT I by providing valuesfor all input-variables described by Inp(si) to I and comparing the output-variables described byOut(si) with the values returned by I, for every state si, 0 ≤ i ≤ n. This actually allows two differentinterpretations: Synchronous languages (Benveniste et al., 2003) such as Lustre (Halbwachs et al.,1991), Esterel (Berry and Gonthier, 1992), or Signal (Benveniste et al., 1991) assume that the im-plementation responds immediately, which is known as the synchrony hypothesis. This hypothesiscan be verified by showing that the program execution time is always smaller than the time betweentwo successive external inputs. Under this assumption the expected output is contained in the samestate as the input within a test case. Alternatively, the expected output can be assigned to the suc-cessor state of the state containing the inputs. There is little difference between these two choices,as long as both model and test case execution framework agree on the interpretation. Formal test-ing theory assumes the existence of a formal model representing the implementation. The formalmodel for the implementation does not actually have to exist; this is known as the test hypothesis(see e.g., (Tretmans, 1999)). In our setting, execution of a test case is defined by interpreting theimplementation as a module MI = (KI , I,O).

Definition 12 (Test Case Execution). Execution of test case t := 〈t0, ...tn〉 passes on implementationI = (KI , I,O), if I has a path p := 〈s0, ...sn, ...〉, i.e., p ∈ Paths(I), such that for all states ti : 0 < i ≤n: si = ti. If I does not have such a path, the test case t fails on I.

3.5. Coverage Based Test Case Generation 25

As initially proposed by Engels et al. (1997) and implemented in several different approaches (seeSection 3.7), an alternative to converting test purposes to never-claims is to introduce deliberateerrors in the model. The details of such an approach are discussed in Section 3.7; but with hindsightof this we refine the definition of a test case. A distinction is made between positive and negativetest cases, depending on whether they contain a deliberate error or not.

We use the symbol "" to denote a test case execution that detects a fault, and "⊕" to denote a testcase execution that does not detect a fault. As a test case t is a path, t |= P also denotes if propertyP is satisfied by the path represented by t, and t 6|= P if P is violated by t. Two different types oftest cases can be distinguished. Positive test cases, suffixed with "+", describe correct (positive)behavior. Negative test cases, marked with the suffix "−", describe faulty (negative) behavior:

Definition 13 (Positive Test Case). A positive test case t+ detects a fault if its execution fails. I ⊕ t+

denotes a fault free execution of positive test case t+ on the implementation I, i.e., the test casepasses. I t+ denotes an execution fault, i.e., the test case fails.

Definition 14 (Negative Test Case). A negative test case t− detects a fault if its execution passes.Therefore, I t− denotes an execution without deviation from the test case, i.e., the test case passes.I ⊕ t− denotes an execution of negative test case t− on the implementation I, where the implemen-tation behaves different than t−, i.e., the test case fails. A negative test case contains a transition(ti, t′j) which is not defined by the reference model Kripke structure M, i.e., (ti, t′j) < T.

As a correct implementation is required to pass a positive test case, positive test cases are alsoreferred to as passing tests. Execution of negative test cases on a correct implementation has to fail,hence such test cases are also known as failing tests. The majority of available approaches producepositive test cases. Therefore, if a test case is not specially denoted as t− or t+ in this thesis, it is apositive test case.

In order to create meaningful test cases, a prerequisite is a model that is assumed to be correct. Thereare several different approaches of how to systematically derive test cases from a given model.These approaches can be categorized into two general types. One category is based on specialproperties that are expected to be violated by the model and therefore lead to counterexamples.Such properties are referred to as trap properties. The other category of approaches uses a givenrequirements specification or defines special properties, and then modifies the model such that aproperty violation results. We can further distinguish approaches that include formal requirementproperties in addition to a formal model, and methods that only use a formal model.

3.5 Coverage Based Test Case Generation

While manual specification of test purposes as proposed by Engels et al. (1997) can lead to efficienttest cases, it is usually advantageous to systematically create complete test sets according to sometest objective. It is difficult to ensure complete coverage of all possible system behaviors withmanually specified test purposes.

Coverage criteria are a means to measure how thorough a system is exercised by a given test suite. Acoverage criterion is defined on some aspect of a program or specification, for example statementsor code branches. Full coverage is achieved, if all items described by the coverage criterion are


G (a −> X !b)

G (b −> X !c)

G (c −> X !d)

....

Model

Trap Properties

Model Checker

Test cases

Figure 3.9: Coverage based test case generation.

executed (= covered) by at least one test case. Coverage is usually quantified as the percentage ofitems that are covered.

Model checkers can be used to automatically derive test suites for maximum coverage of a givencriterion. This process is illustrated in Figure 3.9. For each item that should be covered, a distinctnever-claim (trap property (Gargantini and Heitmeyer, 1999)) is specified. The test suite is createdby model checking all trap properties against a given model. Again, infeasible trap properties aredetected if the model checker creates no counterexamples.

For example, in order to create a test suite that covers all states of the system variables, a trapproperty for each possible state a of every variable x is needed, claiming that the value is not taken:�¬(x = a). A counterexample to such an example trap property is any trace that contains a statewhere x = a. In the car controller example introduced in Section 3.3, state coverage with respect toeach variable could be achieved with the following set of trap properties:

� (accelerate , 0)

� (accelerate , 1)

� (brake , 0)

� (brake , 1)

� (velocity , stop)

� (velocity , slow)

� (velocity , fast)

While trap properties for state coverage are simple safety properties, trap properties can be delib-erate temporal logic properties for which counterexamples exist; for example, they can be definedover transitions or sequences of transitions.


3.5.1 Coverage of SCR Specifications

The concept of trap properties was initially proposed by Gargantini and Heitmeyer (1999) withregard to SCR (Software Cost Reduction method (Heitmeyer, 2002)) specifications. An SCR modelis defined as a quadruple (S , S 0, Em,T ), where S is the set of states, S 0 ⊆ S is the initial state set,Em is the set of input events, and T is the transform describing the allowed state transitions. T isdescribed with tables for events (predicates defined on a pair of system states implying that the valueof at least one state variable has changed) and conditions (predicates defined on a system state) withregard to all variables controlled by the considered system.

SCR specifications consist of different types of tables. A condition table defines a variable as afunction of a mode and a condition, and an event table defines a variable as a function of a modeand an event. Table 3.2 shows a mode table for the CC example model.

Table 3.2: SCR Mode Transition Table for velocity in CC example.Current Mode Event New Modestop accelerate = 1 AND brake = 0 slowslow accelerate = 1 AND brake = 0 fastfast accelerate = 0 AND brake = 0 slowslow accelerate = 0 AND brake = 0 stopfast brake stop

The operational SCR specification is automatically converted to an SMV or SPIN model. SCRrequirement properties can then be used as never-claims, as in (Engels et al., 1997). The novel ideapresented in (Gargantini and Heitmeyer, 1999) is to automatically create trap properties from theSCR tables. Each table is converted to an if-else construct for the model checker SPIN, or a case-statement for SMV. For each variable, a designated variable is added, which indicates which branchof the if-else/case construct is currently active. For each possible value of this special variable,a trap property is formulated claiming that this value is never taken. For example, the variableCaseVar represents the chosen case for variable Var. Resulting trap properties in LTL would be,e.g., �¬(CaseVar = 1), �¬(CaseVar = 2), etc. The trap properties automatically result in a testsuite for branch coverage of the SCR model. Considering Table 3.2, the resulting model will containfive cases. Consequently, there will be six trap properties, because the "no-change" case has to beconsidered as well. The trap properties are:

{�¬(Case_velocity = i) | 0 ≤ i ≤ 6}

3.5.2 Coverage of Transition Systems

This section uses the framework for specification based test case generation related to basic transi-tion systems, proposed by Heimdahl et al. (2000), and described in Section 3.2. This framework canbe used for many different specification formalisms with similar semantics, e.g., SCR or RSML−e.

Trap properties can be derived from the basic transition system model (Definition 8). Structural cov-erage criteria are defined with regard to the transition relation ∆. These coverage criteria are similar


to common code based coverage criteria such as decision or condition coverage, but refer to transi-tions. In general, the conjunction of pre-state condition α and guard condition γ can be interpretedas a logical predicate, which allows the use of common logic based coverage criteria (Ammannet al., 2003). The post-state condition β is used as part of the trap properties to force the creation ofrelevant counterexamples.

Simple Transition Coverage requires that each simple transition of every variable is executed. Asimple transition consists of pre-state, post-state, and guard conditions. Consequently, a trap prop-erty to create an according test case simply has to require that always when the pre-state conditionand guard are true, the post-state condition may not be satisfied in the next state:

�α ∧ γ → ©¬β

In the CC example (Figure 3.5), simple transition coverage is achieved with the following trapproperties:

� (velocity = stop ∧ accelerate ∧ ¬brake→ ©¬(velocity = slow))

� (velocity = slow ∧ accelerate ∧ ¬brake→ ©¬(velocity = fast))

� (velocity = fast ∧ ¬accelerate ∧ ¬brake→ ©¬(velocity = slow))

� (velocity = slow ∧ ¬accelerate ∧ ¬brake→ ©¬(velocity = stop))

� (brake→ ©¬(velocity = stop))

� (velocity = fast ∧ accelerate ∧ ¬brake→ ©¬(velocity = fast))

� (velocity = stop ∧ ¬accelerate ∧ ¬brake→ ©¬(velocity = stop))

Note that counterexamples to such trap properties end with the simple transition from α to β. If thisis not an observable transition, then an additional postamble sequence is necessary. The transitioncoverage criterion described by Offutt et al. (1999) is basically identical to the simple transitioncoverage criterion.

Simple Guard Coverage is similar to decision coverage in code based testing. Simple guard cover-age requires that for each simple transition there exists a test case s where the guard evaluates to truein a state where the pre-state condition is true, and a test case t where the guard evaluates to falsein a state where the pre-state condition is true. This criterion corresponds to the predicate coveragecriterion (Ammann et al., 2003) for logical expressions. For example, this could be expressed as apair of trap properties:

�α ∧ γ → ©¬β

�α ∧ ¬γ → ©β

Again, the post-state expression β is negated in order to force creation of suitable counterexamplesfor the case when the guard evaluates to true. When the guard evaluates to false, creation of acounterexample is forced by claiming β will be true in the next state. The CC example (Figure 3.5)requires fourteen trap properties to achieve simple guard coverage. To save space we only considerthe first simple transition:


� (velocity = stop ∧ accelerate ∧ ¬brake→ ©¬(velocity = slow))

� (velocity = stop ∧ ¬(accelerate ∧ ¬brake)→ © (velocity = slow))

Condition Coverage (Offutt et al., 1999) requires that for each condition (clause) in a predicate thereis a test case where the condition evaluates to true, and a test case where the condition evaluates tofalse. This criterion corresponds to the clause coverage criterion (Ammann et al., 2003) for logicalexpressions, and can also be applied to guard conditions. It can be expressed as a pair of trapproperties for clause c:

�¬(α ∧ c = true)

�¬(α ∧ c = f alse)

The trap property claims that there is no state where the pre-state condition α is true and the clausec takes evaluates to true or false. Resulting counterexamples do not necessarily execute the transi-tion.

Again we consider the first simple transition of the example model:

� (velocity = stop ∧ accelerate = true)

� (velocity = stop ∧ accelerate = f alse)

� (velocity = stop ∧ (¬brake) = true)

� (velocity = stop ∧ (¬brake) = f alse)

Complete Guard Coverage is similar to the multiple condition criterion in code based coverageanalysis, also known as combinatorial coverage (Ammann et al., 2003). A guard condition consistsof several clauses (usually called conditions in code based coverage). Complete guard coveragerequires that all possible combinations of truth values for the clauses of a guard are covered.

Let the clauses in a guard condition γ be {c1, . . . , cl}, then complete guard coverage of γ requires atest case s for any given Boolean vector u of length l, such that for some i:

∧lk=1(ck(si, si+1) = uk).

This means that for every u there has to be a trap property of the type:

�¬(α ∧l∧

k=1

(ck(si, si+1) = uk))

The trap property claims that there is no state where the pre-state condition α is true and the clausestake on the values described by uk; this results in a trace that leads to the chosen valuation for theguard condition. Note that this trace does not necessarily execute the transition.



� (velocity = stop ∧ accelerate = f alse ∧ (¬brake) = f alse)

� (velocity = stop ∧ accelerate = f alse ∧ (¬brake) = true)

� (velocity = stop ∧ accelerate = true ∧ (¬brake) = f alse)

� (velocity = stop ∧ accelerate = true ∧ (¬brake) = true)

The number of trap properties quickly increases with the number of clauses. Consequently, if thenumber of clauses is too big, then complete guard coverage can result in too many test cases tobe useful. As a more practical solution, the modified decision/condition coverage (MC/DC) cri-terion (Chilenski and Miller, 1994) has been proposed in the context of code coverage. MC/DCrequires that each condition (clause) is shown to independently affect the value of the decision(predicate) it is part of. This informal definition is ambiguous, and allows three different inter-pretations. Following the nomenclature used by Ammann et al. (2003), the considered clause in apredicate will be called the major clause, and the remaining clauses minor clauses. In all interpreta-tions of MC/DC, it takes a pair of test cases to cover a clause. As test case pairs for different clausesneed not be disjoint, the size of an MC/DC test suite can be as small as l+1 test cases for a predicatewith l clauses. In the strictest variant (1), the values of all minor clauses are fixed while the valueof the major clause is changed, which also has to result in a change of the value of the predicate. Aslightly relaxed variant (2) still requires that the predicate takes on both values, but the values of theminor clauses do not need to be fixed. Finally, the decision coverage can also be relaxed, resultingin variant (3), which does not require the predicate to take on both values.

Clause-wise Guard Coverage is an adaptation of the strictest interpretation (1) of MC/DC to basictransition systems, introduced by Rayadurgam and Heimdahl (2001b). The authors assume somemechanism that calculates a pair of Boolean vectors u and v of equal length l for l clauses in guard γ,where only the m-th value differs. When the clauses ci are assigned the values in u, then γ evaluatesto true, and when assigned the values in v, then γ evaluates to false. The vectors u and v could, forexample, be derived using constraint satisfaction techniques. These vectors can be used to formulatethe following trap properties for the m-th clause of guard γ:

�α ∧l∧

k=1

(ck = uk)→ ©¬β

�α ∧l∧

k=1

(ck = vk)→ ©β

The first trap property results in a test case where the guard evaluates to true, and the second trapproperty results in a test case where the guard evaluates to false as a consequence of a different valuefor cm.

Considering the first simple transition of our example again, clause-wise guard coverage is achieved


with the following set of trap properties (slightly rewritten to fit into the page width):

� ((velocity = stop ∧ accelerate ∧ (¬brake) = true)→ © (¬velocity = slow))

� ((velocity = stop ∧ ¬accelerate ∧ (¬brake) = true)→ © (velocity = slow))

� ((velocity = stop ∧ accelerate ∧ (¬brake) = f alse)→ © (velocity = slow))

Heimdahl et al. (2003) also define Clause-wise Transition Coverage, which is identical to clause-wise guard coverage but defined in the context of the specification language RSML−e. In a casestudy (Heimdahl et al., 2003), a flight guidance system specified in RSML−e at varying levels of ab-straction is analyzed with regard to test case generation. The case study shows that the performanceof the model checker is a critical factor, but if the model size is within bounds, then coverage basedtest case generation is feasible. Complex criteria such as clause-wise transition coverage result inbetter test cases than very simple criteria like state coverage or simple transition coverage. A highcomplexity of trap properties has a negative effect on the performance.

The drawback of the solutions described in (Rayadurgam and Heimdahl, 2001b) and (Heimdahlet al., 2003) is that a mechanism that calculates appropriate valuations of all minor clauses is re-quired. Furthermore, all clauses have to be independent, otherwise there might not exist a test casefor every chosen valuation. As a solution, Rayadurgam and Heimdahl (2003) describe a methodto create pairs of test-cases for MC/DC. The solution consists of altering the model such that thereare auxiliary Boolean variables that store the values of clauses. The initial values of these auxiliaryvariables are nondeterministically assigned by the model checker to true or false. After that, thesevalues are not changed anymore. This results in a vector u of l value assignments to the l clausesof a guard condition. Note that the model checker selected suitable valuations automatically, andno additional mechanism to calculate vectors is necessary. The vector chosen by the model checkeris used to create a counterexample for clause cm that represents two concatenated test cases: A testcase where γ evaluates to true, and a test case where γ evaluates to false, where the values of allclauses except the major clause cm are defined by u. The first case is covered by any counterexampleto the following:

� (¬(α ∧ γ ∧ cm , um ∧∧k,m

(ck = uk)))

The second case is covered by any counterexample to the following:

� (¬(α ∧ ¬γ ∧ cm = um ∧∧k,m

(ck = uk)))

Combining these two trap properties to one trap property achieves that the same valuations for allclauses except the major clause are used. To reach the same decision point twice in a single run ofa reactive system, a dedicated hard reset transition might be necessary. Resulting counterexamplescan then be split at the hard reset transition into two test cases. Consequently, a trap property for apair of MC/DC test cases for major clause cm results in the following:

� (¬(α ∧ γ ∧ cm , um ∧∧k,m

(ck = uk))) ∨ � (¬(α ∧ ¬γ ∧ cm = um ∧∧k,m

(ck = uk)))


As an example, consider the first transition of the CC model, and let accelerate be the majorclause. This results in the following trap property, where the model checker performs the task ofchoosing suitable values for ua and ub:

� (¬(velocity = stop ∧ (accelerate ∧ ¬brake) ∧ accelerate , ua ∧ (¬brake) = ub)))∨

� (¬(velocity = stop ∧ ¬(accelerate ∧ ¬brake) ∧ accelerate = ua ∧ (¬brake) = ub))

It is conceivable to modify this kind of trap property such that the considered transitions are actuallyexecuted. For example, the following trap property achieves that a counterexample will first takethe considered transition (to make left part of implication false), and then reach a point where theguard evaluates to false (here, execution of a transition is forced by claiming ©β, which is false asγ is false):

� ((α ∧ γ ∧ cm , um ∧∧k,m

(ck = uk))→ � (α ∧ ¬γ ∧ ∧cm , um ∧∧k,m

(ck = uk))→ ©β)

Full Predicate Coverage (Offutt et al., 1999) requires that each clause in each predicate is testedindependently. In contrast to the previously discussed clause wise coverage criteria, the values ofminor clauses may change as long as the value of the predicate is still determined by the consid-ered clause. Consequently, full predicate coverage corresponds to interpretation (2) of MC/DC asdescribed above. Ammann et al. (2002) further relax this criterion to Uncorrelated Full PredicateCoverage. While this still requires that for a clause it is shown for both possible truth values thatit influences the predicate, it is not required that the actual value of the predicate differs; that is,it drops the requirement for decision coverage. This corresponds to interpretation (3) of MC/DCas described above. In (Ammann et al., 2002), the Boolean derivative (Sheldon B. Akers, 1959) isused to create two trap properties for each clause. The Boolean derivative dP/dc of predicate P forcondition c is a predicate on the remaining conditions that is true if the value of c determines thevalue of P. Trap properties for condition a in a predicate P can be formulated by claiming that thederivative always implies c is false and in a second property that c is true. We apply the derivativeto the guard condition γ as follows:

� ((α ∧ d(γ)/dc)→ c)

� ((α ∧ d(γ)/dc)→ ¬c)


� ((velocity = stop ∧ accelerate)→ (¬brake))

� ((velocity = stop ∧ accelerate)→ ¬(¬brake))

� ((velocity = stop ∧ ¬brake)→ (accelerate))

� ((velocity = stop ∧ ¬brake)→ ¬(accelerate))

A natural extension of MC/DC is the Reinforced Condition/Decision Coverage (RC/DC) (Vilkomir


and Bowen, 2002) criterion. The idea of RC/DC is that it is not sufficient to show for each conditionthat it independently affects the decision’s outcome; it is also necessary to show that each conditionindependently keeps the outcome. This means that the values of the remaining conditions are fixedwhile the value of the considered condition is altered, and the value of the decision does not changebecause of this. As it might not be possible to keep the values of all minor clauses fixed, thisrequirement can be relaxed.

Following the MC/DC definition given by Rayadurgam and Heimdahl (2001b), RC/DC needs thefollowing additional trap properties, where u′, ′v is a pair of Boolean vectors of equal length l for lclauses in guard γ, where only the m-th value differs. When the clauses take on the values describedin these vectors, the guard condition γ shall evaluate to the same value in both cases. If γ evaluatesto true in both cases, then the transition is taken and β evaluates to true in the next state. Therefore,the trap properties contain the negation of β:

�α ∧l∧

k=1

(ck = uk)→ ©¬β

�α ∧l∧

k=1

(ck = vk)→ ©¬β

If γ evaluates to false, then the negation of β has to be removed.

Considering the first simple transition of our example again, RC/DC is achieved with the followingset of trap properties (slightly rewritten to fit into the page width):

� ((velocity = stop ∧ ¬accelerate ∧ (¬brake) = f alse)→ © (velocity = slow))

� ((velocity = stop ∧ ¬accelerate ∧ (¬brake) = true)→ © (velocity = slow))

� ((velocity = stop ∧ accelerate ∧ (¬brake) = f alse)→ © (velocity = slow))

Finally, Transition Pair Coverage is another related coverage criterion given in (Offutt et al., 1999)that is also useful in the transition system context. In contrast to (simple) transition it requires thatall feasible pairs of transitions are covered. As shown in (Ammann et al., 2002), this results in trapproperties with two levels of next statements. The following trap property covers the transitions(α1, β1, γ1) and (α2, β2, γ2):

� (α1 ∧ γ1 → © (α2 ∧ γ2 → ¬β2))

As an example, a test case covering the pair of the first two transitions of our example is achievedwith the following trap property (slightly rewritten to save space):

� ((velocity = stop ∧ accelerate ∧ ¬brake)→

© ((velocity = slow ∧ accelerate ∧ (¬brake))→ ©¬(velocity = fast)

3.5.3 Control and Data Flow Coverage Criteria

The previous section presented coverage criteria in the context of basic transition systems. Suchcoverage criteria, however, are not limited to this specific kind of model, but can be applied to any


system or specification that uses Boolean predicates. For example, Hong and Lee (2003) use similarcriteria (state coverage, transition coverage) to create test cases for the control flow of a programor EFSM model, and define trap properties to generate test cases for data flow coverage criteria.So far, the discussed coverage criteria only considered the control flow of a model. Control flowcriteria are based on logical expressions in the specification, which determine the branching duringthe execution. In contrast, data flow oriented coverage criteria consider how variables are definedand used during execution.

Test case generation with regard to coverage of data flow graphs is considered in (Hong et al.,2002; Hong and Lee, 2003). A flow graph G is defined as a tuple G = (V, vs, v f , A), where V isa finite set of vertices, vs ∈ V is the start vertex, v f ∈ V is the final vertex, and A is a finite setof arcs. A vertex represents a statement and an arc represents possible flow of control betweenstatements. The set of variables that is defined at a vertex v is denoted with DEF(v), and the setof variables that is used at a vertex v is denotes with US E(v). A flow graph can be interpreted asa Kripke structure K(G) = (V, vs, L, A ∪ {(v f , v f )}), where L(vs) = {start}, L(v f ) = { f inal}, andL(v) = DEF(v) ∪ US E(v) for every v ∈ V − {vs, v f }.

v1 : input ( accelerate , brake , previous_velocity ) ;v2 : velocity = previous_velocity ;v3 : if ( brake ) {v4: velocity = stop ;

} else {v5: if ( accelerate ) {v6: if ( velocity == stop )v7: velocity = slow ;

elsev8: velocity = fast ;

} else {v9: if ( velocity == fast )v10: velocity = slow ;

elsev11: velocity = stop ;

}v12: output ( velocity ) ;

Figure 3.10: Example implementation of CC example.

As an example, Figure 3.10 shows an implementation of our car controller example. The corre-sponding data flow graph is depicted in Figure 3.11, where vertexes are annotated with their DEFand US E sets. Hong et al. (2003) show how model checkers can be used to derive test cases fordifferent data flow coverage criteria using witness formulas. In this survey, we use the criteria byRapps and Weyuker (1985) to illustrate this approach; trap properties for further criteria are givenin Hong et al. (2003). The basic idea of data flow criteria is to find definition-use pairs (du-pairs).A pair (d(x, v), u(x, v′)) is a du-pair, if there exists a path 〈v, v1, . . . , vn, v′〉 from vertex v to v′, suchthat x is not defined in any vi for 1 ≤ i ≤ n, or if n = 0 (this is called a definition-clear path).

Hong et al. (2003) express du-pairs as WCTL-formulas. WCTL formulas are CTL formulas where


v1

v2

v3

v4 v5

v12

v6 v9

v7 v8 v10 v11

d(brake, v1), d(accelerate, v1), d(previous_velocity,v1)

d(velocity,v2), u(previous_velocity, v2)

u(brake, v3)

d(velocity, v4) u(accelerate, v5)

u(velocity, v9)

d(velocity, v7),

d(velocity, v8),

d(velocity, v10),

d(velocity, v11)

u(velocity, v12)

u(velocity, v6)

vs

vf

start

exit

Figure 3.11: Data flow graph for example implementation.

only the temporal operators EF , EX , and EU occur, and for any sub-formula of the form f1∧ . . .∧fn all fi except one at the most are atomic propositions. A WCTL formula for a du-pair (d(x, v),u(x, v)) is expressed as follows:

wctl(d(x, v), u(x, v′)) := EF (d(x, v) ∧ EX E[¬de f (x)U (u(x, v′) ∧ EF f inal)])

In this formula, de f (x) is the disjunction of all definitions of x. This formula expresses that thereexists a path from the initial state to d(x, v), such that there exists a definition-clear path to u(x, v′). Inaddition, EF f inal requires that the path continues to the final vertex, such that the path is a completepath. For example, the following formula results for the du-pair (d(brake, v1), u(brake, v3)):

wctl(d(brake, v1), u(brake, v3)) := EF (d(brake, v1) ∧ EXE[¬d(brake, v1)U (u(brake, v3) ∧ EF f inal)])

An example witness to this formula is 〈vs, v1, v2, v3, v4, v12, v f 〉. Any given (d(x, v), u(x, v′)) is adu-pair iff the Kripke structure representing the data flow graph satisfies wctl(d(x, v), u(x, v′)).


The all-defs coverage criterion requires that for every definition d(x, v) there is a test case that usesa definition-clear path to some u(x, v′). Let DEF(G) denote the set of definitions in the data flowgraph G, and US E(G) the set of uses in G. Then, a test suite T satisfies the all-defs coveragecriterion iff it is a witness set for:

{∨

u(x,v)∈US E(G)

wctl(d(x, v), u(x, v′)) | d(x, v) ∈ DEF(G)}

Once more, a set of test cases can be created by using the model checker to derive witness sequencesfor this set of formulas, or equivalently, by calculating counterexamples to the negations of theseformulas.

In our example data flow graph in Figure 3.11, we can identify the example set of du-pairs given be-low (note that other sets of du-pairs are also possible). A test suite satisfying all-defs can be createdby calculating a witness for each wctl(d(x, v), u(x, v′)) or counterexample to ¬(wctl(d(x, v), u(x, v′)))for each (d(x, v), u(x, v′)) in this set:

{(d(brake, v1), u(brake, v3)),

(d(accelerate, v1), u(accelerate, v5)),

(d(previous_velocity, v1), u(previous_velocity, v2)),

(d(velocity, v2), u(velocity, v6)),





(d(velocity, v11), u(velocity, v12))}

The all-uses coverage criterion can be defined in a similar manner. A test suite satisfies the all-usescoverage criterion, if for every definition d(x, v) and every use u(x, v′) there exists some definition-clear path with respect to x as part of a test case. Consequently, a test suite satisfies the all-usescoverage criterion if it is a witness set for the following set of formulas:

{wctl(d(x, v), u(x, v′)) | d(x, v) ∈ DEF(G), u(x, v′) ∈ US E(G)}

Obviously, the number of du-pairs is larger for the all-uses criterion than for the all-defs criterion.


In our example graph, we get the following set of du-pairs:

{(d(brake, v1), u(brake, v3)),

(d(accelerate, v1), u(accelerate, v5)),

(d(previous_velocity, v1), u(previous_velocity, v2)),







(d(velocity, v11), u(velocity, v12))}

In this example, only the du-pair (d(velocity, v2), u(velocity, v9)) is added in comparison to the all-defs criterion. Theoretically, the worst case number of du-pairs can be O(n2) for a flow graph ofsize n.

Further data flow criteria are considered in (Hong et al., 2003). Data flow coverage criteria areextended with control dependence information in (Hong and Ural, 2005b). In (Hong et al., 2001),data and control flow criteria are applied to Statecharts specifications. In (Hong and Lee, 2003),control and data flow coverage criteria are defined for extended finite state machines (EFSMs).

3.5.4 Coverage of Abstract State Machines

Countless specification formalisms have been defined in the past. In general, coverage criteria canbe defined and used for testing for any specification language that is susceptible to model checking.For example, Abstract State Machines (Gurevich, 2000) (ASMs) are yet another formalism that hasbeen considered in the context of coverage oriented test case generation (Gargantini and Riccobene,2001; Gargantini et al., 2003). ASMs are semantically well defined pseudo-code over abstractstructures. An ASM consists of states and a finite set of rules for guarded function updates. Rulesare of the type if condition then updates, where condition is an arbitrary Boolean expressions, andupdates is a finite set of function updates that are executed simultaneously. There are many differenttypes of functions, and basically a nullary function can be interpreted as a variable.

Figure 3.12 shows the car controller example as an ASM specification, where each transition isrepresented as a distinct rule, following the style used in the tool ATGT (see Section 3.11), whichautomatically creates test cases from ASM specifications with a model checker.

Similarly to the previously described approaches, rules are suitable for trap property generation.Rule coverage is similar to simple transition coverage, and requires a test case where the guardcondition evaluates to true, and one where the guard condition evaluates to false: AG (¬condition).For example, rule 1 in Figure 3.12 results in two trap properties, the first one lets the guard evaluate


data Velocity = stop | slow | fastinstance AsmTerm Velocity

brake :: Dynamic Boolbrake = initVal "brake" False

accelerate :: Dynamic Boolaccelerate = initVal "accelerate" False

velocity :: Dynamic Velocityvelocity = initVal "velocity" stop

r1 :: Rule()r1 = if (velocity == stop) && (accelerate == True) && (brake == False)then velocity := slow

r2 :: Rule()r2 = if (velocity == slow) && (accelerate == True) && (brake == False)then velocity := fast

r3 :: Rule()r3 = if (velocity == fast) && (accelerate == False) && (brake == False)then velocity := slow

r4 :: Rule()r4 = if (velocity == slow) && (accelerate == False) && (brake == False)then velocity := stop

r5 :: Rule()r5 = if (brake == True)then velocity := stop

Figure 3.12: ASM specification for CC example, each transition represented as a distinct rule..

3.6. Requirements Based Testing 39

to true, the second one to false:

� (velocity = stop ∧ ¬accelerate ∧ ¬brake)

�¬(velocity = stop ∧ ¬accelerate ∧ ¬brake)

In a similar style, other coverage criteria based on logical predicates can be applied to rule guards.For example, MC/DC is used by (Gargantini and Riccobene, 2001; Gargantini et al., 2003).

In contrast to these control oriented coverage criteria, the rule update coverage requires each updatefunction to be nontrivially executed at least once. This is a data flow coverage criterion, as itconsiders the value of a variable prior to a new assignment (i.e., definition). Rule update coveragefor all five rules of our example specification results in the following trap properties, generated byATGT:

� (velocity , slow ∧ (velocity = stop ∧ accelerate ∧ ¬brake))

� (velocity , fast ∧ (velocity = slow ∧ accelerate ∧ ¬brake))

� (velocity , slow ∧ (velocity = fast ∧ ¬accelerate ∧ ¬brake))

� (velocity , stop ∧ (velocity = slow ∧ ¬accelerate ∧ ¬brake))

� (brake ∧ velocity , stop)

Further coverage criteria are defined by Gargantini and Riccobene (2001); Parallel rule coveragerequires combinations of updates to be executed in parallel, and strong parallel rule coverage re-quires all possible combinations of parallel update functions to be covered. This approach has beenevaluated with the model checker SMV in (Gargantini and Riccobene, 2001), and with the modelchecker SPIN in (Gargantini et al., 2003).

3.6 Requirements Based Testing

The majority of coverage based approaches use some structural coverage criterion based on a be-havioral model of the SUT. Sometimes it is desirable to create test cases with respect to a given setof requirement properties. The approach described by Engels et al. (1997) can be used for this, ifrequirement properties are used as test purposes. The drawback is that each requirement propertyonly results in one test case. This test case is not necessarily a good exercise regarding the property.For example, consider the property � (x→ © y), which is quite a common type. A counterexamplemight not contain a state where x is true, which obviously is not a good test case for the property.A straight forward approach is to require the antecedent to become true in a test case. For example,this is achieved with antecedent coverage (Whalen et al., 2006), where � (x→ © y) is reformulatedto � (x→ © y) ∧ ^ (x). Further approaches are shown below.

It is not always possible to create useful counterexamples by directly negating requirement proper-ties. For example, negation of a safety property might result in a counterexample which consists ofonly one state (the initial state) — which is not a useful test case.

An equivalence partitioning of the execution tree is suggested by Callahan et al. (1996, 1998).For a single requirement property, two kinds of paths can be distinguished within the expandedcomputation tree: those for which the property is fulfilled, and those where the property is violated.


The idea of this partitioning is that all paths within a partition are assumed to be very similar. Thatway, only a small number of test cases, or in fact only a single test case, per partition is necessary.A complete cover of disjoint partitions on infinite paths in the computation tree can be created bycombining properties and their negations conjunctively.

For example assume two requirement properties φ1 and φ2. There are four different possible parti-tions for these two properties: φ1 ∧ φ2, φ1 ∧¬φ2, ¬φ1 ∧ φ2, and ¬φ1 ∧¬φ2. Each such combinationis a coverage property. This partitioning, called conjunctive complementary closure (CCC), createspartitions that are only disjoint when considering complete paths in the computation tree. Finitetraces may fall into one or more partitions. Coverage properties can be used to validate existing testtraces, determine to which partition a given test case belongs to, or to create a new test case for apartition.

3.6.1 Vacuity Based Coverage

Tan et al. (2004) describe a method to derive trap properties from requirement properties. Thesetrap properties achieve that such test cases are created that show how a property is non-vacuouslyfulfilled. Vacuity describes the problem that a property is satisfied in a way not intended. A propertyis vacuously satisfied, if the model checker reports that the property is satisfied regardless of whetherthe model really fulfills what the specifier originally had in mind or not. For example, the property� (x→ © y) is vacuously satisfied by any model where x is never true. A vacuous pass of a propertyis an indication of a problem in either the model or the property.

Beer et al. (1997) use witness formulas to detect vacuity for a subset of ACTL (CTL with only Aquantified temporal operators). This method is extended to CTL* by Kupferman and Vardi (1999).More efficient algorithms are considered by Purandare and Somenzi (2002). In general, vacuity ofa property is detected by checking a formula and its witness formula against the model.

Witness formulas are derived from properties by changing sub-formulas. The idea is that if a modelsatisfies a property and also a corresponding witness formula, then the property is satisfied vacu-ously. If the witness formula is not satisfied by the model, then the property is properly satisfied.The replacement of sub-formula φwith ψ in formula f is denoted as f [φ← ψ]. If a sub-formula canbe replaced such that the model does not satisfy the resulting formula, then the sub-formula affectsthe formula:

Definition 15 (Affect). (Tan et al., 2004) A sub-formula φ of f affects f in model M if there is aformula ψ such that the truth value of f and f [φ← ψ] are different with respect to M.

If a property f is vacuously satisfied by a model, then there exists a sub-formula φ in f that doesnot affect the property. This means that there exists no replacement ψ for φ such that f [φ ← ψ] isviolated by a given model. Consequently, a property is satisfied vacuously iff the formula and itswitness formula are both satisfied by the same model:

Definition 16 (Vacuity). (Tan et al., 2004) A model M satisfies f vacuously with respect to a sub-formula φ if M |= f and φ doesn’t affect f in M. M satisfies f vacuously if there exists a sub-formulaφ such that M satisfies f vacuously with respect to φ.


The replacement formula ψ can be any formula. Fortunately it is not necessary to replace φ withevery possible ψ in order to detect vacuity. Kupferman and Vardi (1999) show that it is sufficientto replace φ with true or f alse, depending on the polarity of φ in the formula f . The polarity of asub-formula ψ is positive, if it is nested in even number of negations in f , otherwise it is negative.The polarity of a sub-formula φ is denoted as �(φ). To avoid confusion with the LTL � operator,it is noted that � in this section always refers to the polarity. Replacement of φ according to itspolarity makes it feasible to determine vacuity using witness formulas.

Theorem 3.6.1. (Kupferman and Vardi, 1999) A model M satisfies the formula f vacuously if andonly if M |= ¬ f [φ← �(φ)] for some (occurrence of) atomic proposition φ, where �(φ) = f alse if ahas positive polarity in f and �(φ) = true otherwise.

The idea of property coverage is that a test case that covers a property according to the propertycoverage criterion should not pass on any model that does not satisfy the property.

Definition 17 (Property-Coverage Metrics). (Tan et al., 2004) Given a property f , a test t covers asub-formula φ of f if there is a mutation f [φ ← �(φ)] such that every model M that passes t willnot satisfy the formula f [φ← �(φ)].

The property coverage can be measured by creating a set of witness formulas. The percentage ofthese witness formulas that are violated by at least one test case represents the property coveragevalue. Details of how coverage is measured on existing test cases is given in Section 11.3. Fur-thermore, test cases can be generated by using witness formulas as trap properties, following theapproach shown in Figure 3.9. For every requirement property f there is a trap property for everysub-formula φ of the following type:

f [φ← �(φ)]

For example, requirement 1 of the CC example model (Equation 3.13, � (brake→ ©velocity =

stop)) results in the following trap properties (note that the polarity of brake is f alse because ofthe implication):

� ( f alse→ ©velocity = stop)

� (brake→ © true)

3.6.2 Unique First Cause Coverage

Whalen et al. (2006) adapt the MC/DC criterion to apply to LTL requirement properties as a met-ric called Unique-First-Cause Coverage. While MC/DC only applies to states that fulfill certainrequirements regarding the valuation of conditions in control flow branches, LTL properties definepaths rather than states. Whalen et al. define MC/DC via sets of Boolean expressions for decisionassignments, and then refine these sets with temporal operators.

Given a decision A, A+ denotes the set of expressions necessary to show that all conditions in Apositively affect the outcome of A; that is, where A evaluates to true as a consequence of a consideredcondition. A− denotes the set of expressions necessary to show that all conditions in A negativelyaffect the outcome of A. In the following definition, x denotes a basic condition.


Definition 18 (Expressions for MC/DC).

x+ = {x}

x− = {¬x}

(A ∧ B)+ = {a ∧ B | a ∈ A+} ∪ {A ∧ b | b ∈ B+}

(A ∧ B)− = {a ∧ B | a ∈ A−} ∪ {A ∧ b | b ∈ B−}

(A ∨ B)+ = {a ∧ ¬B | a ∈ A+} ∪ {¬A ∧ b | b ∈ B+}

(A ∨ B)− = {a ∧ ¬B | a ∈ A−} ∪ {¬A ∧ b | b ∈ B−}

(¬A)+ = A−

(¬A)− = A+

The set of expressions necessary to cover a decision is determined by recursively applying the aboverules. For example, the expression x ∨ (y ∧ z) results in the following set to show positive affect:{(x ∧ ¬(y ∧ z)), (¬x ∧ (y ∧ z))}. The set to show negative affect is: {(¬x ∧ ¬(y ∧ z)), (¬x ∧ (¬y ∧z)), (¬x∧ (y∧¬z))}. A requirement for a test suite to satisfy MC/DC of a decision x∨ (y∧ z) is thateach constraint in these two sets is satisfied by a test case.

As LTL formulas are defined on paths and not states, the above rules need to be extended to taketemporal operators into consideration, resulting in the unique-first-cause coverage (UFC) criterion.A test suite satisfies UFC, if it achieves that every basic condition in a formula takes on all possibleoutcomes at least once, and each basic condition is shown to independently affect the formula’soutcome. Assuming a formula A and a path π, a condition c is the unique first cause of A, if in thefirst state along π where A is satisfied, it is satisfied because of c. The following rules are defined in(Whalen et al., 2006):

Definition 19 (Expressions for UFC).

� (A)+ = {A U (a ∧ � (A)) | a ∈ A+}

� (A)− = {A U a | a ∈ A−}

^ (A)+ = {¬A U a | a ∈ A+}

^ (A)− = {¬A U (a ∧ � (¬A)) | a ∈ A−}

© (A)+ = {© (a) | a ∈ A+}

© (A)− = {© (a) | a ∈ A−}

(A U B)+ = {(A ∧ ¬B) U ((a ∧ ¬B) ∧ (A U B)) | a ∈ A+}∪

{(A ∧ ¬B) U b | b ∈ B+}

(A U B)− = {(A ∧ ¬B) U (a ∧ ¬B) | a ∈ A−}∪

{(A ∧ ¬B) U (b ∧ ¬(A U B)) | b ∈ B−}

For example, the simple property φ = � (x ∧ y) results in the constraints φ+ = {(x ∧ y) U ((x ∧ y) ∧� (x∧y)} and φ− = {((x∧y) U (¬x∧y)), ((x∧y) U (x∧¬y))}. These constraints can be used to createtest cases with a model checker. As always, it is necessary to negate the constraints to be valid trapproperties. This results in the following type of trap properties, where for each f ∈ φ+, φ− one trap


property is created:

�¬ f

Test cases can be derived by using the usual trap property based approach shown in Figure 3.9.

As another example, let φ be the requirement 1 of the CC example model (Equation 3.13): φ =

� (brake → ©velocity = stop). For this property, the following two trap properties result forpositive affect:

� (¬(brake→ ©velocity = stop) U

(¬brake ∧ ¬(©velocity = stop) ∧ � (brake→ ©velocity = stop)))

� (¬(brake→ ©velocity = stop) U

(brake ∧ (©velocity = stop) ∧ � (brake→ ©velocity = stop)))

The following two trap properties result for negative affect:

� (¬(brake→ ©velocity = stop) U (brake ∧ ¬(©velocity = stop)))

� (¬(brake→ ©velocity = stop) U (brake ∧ (©¬(velocity = stop))))

LTL semantics are defined for infinite traces, while test cases are finite. Therefore, Whalen et al.(2006) refine the rules to derive expression sets. This is mainly of interest for test suite analysis.

3.6.3 Dangerous Traces

There are several approaches based on mutation, where test cases are created with regard to re-quirement properties. Although mutation based approaches are considered in Section 3.7, we nowconsider an approach by Ammann et al. (2001), who introduce the notion of dangerous traces basedon mutation. This encompasses the idea of scenarios where a dangerous action is either inevitableor possible as of the next state or at some point in the future.

Test requirements are defined in order to create dangerous traces with regard to safety properties. Atrace is dangerous to a safety property, if it can lead to a property violation on a mutant model. Theapproach taken by Ammann et al. is to combine a model M and a mutant M′, such that transitionsfrom both versions of the model can be taken. A special variable original indicates whether thetaken transition originates from M or M′; i.e., it is false as soon as the mutated transition is executed.This can easily be modeled in NuSMV, by using the mutated condition as the only condition that setsoriginal to true, and setting original to false in the default branch. Different types of dangeroustraces are distinguished:

• A trace is AX dangerous, if the additional transitions allowed by the mutant M′ violate aproperty P in all next states after executing the mutated transition.

• A trace is EX dangerous, if there exists an additional transition allowed by the mutant M′

which violates a property P in the next state.

• A trace is AF dangerous, if it can be extended with the next state from M′ and other transi-tions from the combined model so that in future there always is a violation of P.


• A trace is EF dangerous, if it can be extended with the next state from M′ and other transi-tions from the combined model so that in future there sometimes is a violation of P.

For each dangerous trace there are two versions: a failing and a passing test. In a failing test, thedangerous trace is extended with transitions from M′, so that P is violated. In a passing test, thedangerous trace is extended with a single transition from M. For example, test requirements for AXdangerous traces for property P are defined in (Ammann et al., 2001) as follows. A test requirementfor a failing test is:

EF (original ∧ EX (¬original) ∧ AX (¬original→ ¬P))

The following test requirement results in a passing test:

EF (original ∧ EX (original) ∧ EX (¬original) ∧ AX (¬original→ ¬P))

The test requirement for the passing test relies on the implementation of the model checker topick the correct trace such that transitions from M are chosen; this is achieved by adding anEX original or EX ¬original expression before the AX requirement. However, also transi-tions from M′ would result in valid counterexamples to the property, while not representing a validpassing test case.

The test requirements for EX dangerous failing and passing traces are:

EF (original ∧ EX (¬original ∧ ¬P))

EF (original ∧ EX (original) ∧ EX (¬original ∧ ¬P))

The test requirements for AF dangerous failing and passing traces are (partially abbreviated to fitin a line):

EF (original ∧ EX (¬original) ∧ AX (¬original→ (¬P ∨ AF (¬P))))

EF (orig ∧ EX (orig) ∧ EX (¬orig) ∧ AX (¬orig→ (¬P ∨ AF (¬P))))

The test requirements for EF dangerous failing and passing traces are:

EF (original ∧ EX (¬original) ∧ EF (¬P))

EF (original ∧ EX (original) ∧ EX (¬original) ∧ EF (¬P))

Given mutants of the CC example model (Figure 3.5), any of the requirement properties given inSection 3.3 can serve as P in the trap properties listed above.

The overall process for test case generation, depicted in Figure 3.13, consists of deriving a set oftrap properties according to the desired dangerous traces for each considered safety property as wellas creating a set of mutants. Each mutant is combined with the original model, and then checkedagainst the trap properties.


Model Checker

Test cases

Model

Trap properties

G (original & ...)

G (original & ...)

G (original & ...)

....

Requirements

G (a −> X !b)

G (b −> X !c)

G (c −> X !d)

....

MutantMutant

Model+Mutant

Figure 3.13: Dangerous traces for safety properties.

3.6.4 Property Relevance

A related approach to dangerous traces was presented by Fraser and Wotawa (2006a). This ap-proach is described in detail in Chapter 6. To keep the survey of available methods coherent, ashort overview is now given. Property relevance is introduced as a relationship between test casesand requirement properties. A failing test case is relevant to a property, if the erroneous behav-ior described by the test case violates the property. In contrast, positive test cases have to satisfyrequirement properties as they are created from a correct model. Therefore, a passing test case isrelevant to a property, if the test case execution could lead to a property violation on an erroneousimplementation. The possible deviation is simulated according to a fault model or simple mutation.Based on the notion of property relevance, it is shown in Chapter 6 how any structural coveragecriterion can be combined with property relevance.

For example, Transition Property Relevance Coverage requires that for each transition and eachrequirement property there is a test case that executes the transition and then proceeds relevant tothe property.

The approach taken in Chapter 6 to create property relevance test suites is to first create a completetest suite for the structural coverage criterion using a traditional trap property based approach. Thistest suite can then be optimized to simplify the second step, which is to extend each test case witha property relevant postamble. For this extension, the original model and a model that can takeerroneous transitions are combined, such that they share the identical prefixes. The initial state ofthese models corresponds to the considered structural coverage test case. The model checker is nowused to create a trace that shows how the erroneous model violates the requirement property; thisis achieved by simply checking the property using the outputs of the erroneous model. As correctand erroneous model share the same inputs, the trace created by the correct model represents thecorrect behavior which could lead to the property violation. This trace is used to extend the existingtest case. The erroneous model can be a mutant model, according to some fault model. As thenumber of mutants is potentially very high, Fraser and Wotawa (2006a) suggest a special kind of


mutant which can nondeterministically choose exactly one erroneous transition along any executiontrace.

3.7 Mutation Based Test Case Generation

In general, mutation describes the modification of a program according to some fault model. Mu-tation analysis describes the process of evaluating an existing test suite with regard to its ability toidentify mutants. Mutation testing is the process of deriving test cases that identify as many mutantsas possible. The idea of mutation is based on the coupling effect (DeMillo et al., 1978) and com-petent programmer hypothesis (Acree et al., 1979). The former states that tests that detect simplefaults are likely to also detect complex faults, while the latter states that programs are close to beingcorrect.

Originally, mutation testing was applied to source code (DeMillo et al., 1978; Acree et al., 1979).Specification mutation was initially introduced by Budd and Gopal (1985). In the context of modelchecker based testing, specification mutation was introduced by Ammann and Black (1999b) forcoverage analysis, and the use for test case generation was initially suggested by Ammann et al.(1998). There are related approaches where specifications are mutated; e.g., Srivatanakul et al.(2003) apply mutation together with model checking. Here, however, only approaches where theaim is test case generation are considered.

A competent specifier hypothesis is assumed, which resembles the competent programmer hypothe-sis and states that specifications are close to what is actually desired. If specifications are interpretedas abstract programs, the coupling effect can be assumed as well.

As an example of a mutation, considering the car controler example model of Figure 3.5. A possiblemutant results from changing a logical operator in line 10 (accelerate & !brake & velocity= stop: slow) to the following:

accelerate | !brake & velocity = stop: slow

3.7.1 Mutation Operators for Specifications

Although mutation can be applied to automaton models directly, the prevalent method is to mutatethe textual representations of models, for example in the input language of the model checker usedfor test case generation.

In general, a mutation operator describes a syntactic change according to a fault model. The muta-tion operator can be applied to different locations in the specification, each application resulting ina specification mutant. Usually only first order mutants are considered, that is, mutants that differfrom the original version by only one mutation.

Mutation operators for specifications are analyzed by Black et al. (2000). The examples given in(Black et al., 2000) use the syntax of the model checker SMV (K.L. McMillan, 1992), but can beapplied to language that uses similar logical expressions. For example, the same mutation operatorscan also be applied to LTL or CTL properties.

The following mutation operators are defined and evaluated with regard to coverage in (Black et al.,2000). The operators used in our implementation are illustrated with an example mutation of the

3.7. Mutation Based Test Case Generation 47

line accelerate & !brake & velocity = stop: slow of the CC example SMV code (Fig-ure 3.5):

Arithmetic Operator Replacement (ARO) : This mutation operator replaces an algebraic oper-ator with another algebraic operator. For the sake of this example, assume that instead ofsetting it to slow, velocity is an integer variable and is increased by 2:

accelerate & !brake & velocity=stop: velocity+2

A mutant could be:

accelerate & !brake & velocity=stop: velocity-2

Logical Operator Replacement (LRO) : This mutation operator replaces a logical operator withanother logical operator.

accelerate | !brake & velocity = stop: slow

Relational Operator Replacement (RRO) : This mutation operator replaces a relational opera-tor with another relational operator.

accelerate & !brake & velocity > stop: slow

Expression Negation Operator (ENO) : This operator negates sub-expressions.

accelerate & !(!brake & velocity = stop ): slow

Simple Expression Negation (SNO) : This operator negates an atomic condition in a decision.

!accelerate & !brake & velocity = stop: slow

Operand Replacement Operator (ORO) Changes variables or constants with other syntacticallyvalid operands. For example:

Variable Replacement Operator (VRO) : This operator replaces a variable reference witha reference to another variable of the same type.accelerate & !accelerate & velocity = stop: slow

Constant Replacement Operator (CRO) : This operator replaces a constant with a syn-tactically valid different constant.accelerate & !brake & velocity = stop: stop

Missing Condition Operator (MCO) : This operator removes a single condition from a decision.

accelerate & _ & velocity = stop: slow

Stuck At Operator (STO) : This operator replaces a condition with true or false (1 or 0).

1 & !brake & velocity = stop: slow

Associative Shift Operator (ASO) Changes the association between variables. For example, as-sume the original line would be: (accelerate | !brake) & velocity = stop: slow. A possible mutant would then be the following: accelerate | (!brake & velocity =stop): slow .

Number Replacement Operator (NRO) : This operator replaces a number with 0 or adds 1 toor subtracts 1 from its value. The following example assumes that velocity = 0 is usedinstead of velocity = stop:

accelerate & !brake & velocity = 1: slow


3.7.2 Specification Mutation

Ammann et al. (1998) initially proposed specification mutation for test case generation. An SCRspecification is converted to an SMV model and a set of temporal logic constraints, both of whichrepresent the mode transitions. Mutation is applied to both the textual description of the model andthe requirement properties. Initially, the model satisfies all temporal logic constraints. Mutatingeither the model or the constraints might lead to property violations.

The first option to derive test cases using mutation is to verify mutant models with regard to thetemporal logic constraints. For each mutant model there is one counterexample for every constraintthat is not satisfied. A resulting counterexample illustrates how an erroneous implementation wouldbehave, therefore a correct IUT is expected to behave differently when a resulting test case is exe-cuted on it. Consequently, such traces can be used as negative test cases, i.e., a fault is detected ifan IUT behaves identically.

As second option to derive test cases, mutation of the temporal logic constraints can result in prop-erties that are not satisfied by the original model, and can also be used to create counterexamples.As traces are created from the original model, resulting test cases are positive test cases.

A mutant is equivalent, if it behaves identically to the original model. Equivalence can be furtherconstrained by requiring the observable behavior to be identical to that of the original, that is, theoutput values have to be identical at all times. Equivalent mutants do not result in counterexam-ples.

Model Checker

Test cases

Model

MutantMutant

Mutant

Requirements

G (a −> X !b)

G (b −> X !c)

G (c −> X !d)

....

Figure 3.14: Test cases from mutants violating the specification.

Fraser and Wotawa (Fraser and Wotawa, 2006b,c) take a similar approach to generate test cases, butinstead of creating a model and properties that represent the same SCR specification, a behavioralmodel and a set of requirement properties derived from formalizing user requirements is used. Theoriginal model is assumed to satisfy all requirement properties. Figure 3.14 depicts the processof deriving test cases with requirement properties. The same process applies if the properties arederived from an SCR specification. Negative test cases illustrate requirement property violations,therefore test cases are traceable to requirement properties. Traceability with property mutants is


not always possible, as a mutation can completely change the meaning of a property. However,with only a certain restricted subset of mutation operators (e.g., RRO, LRO, SNO, ENO), propertymutants are related to the original properties. In general, the percentage of property mutants thatdo not result in counterexamples is higher than for model mutants. With this approach, a modelmutant that creates no counterexamples is not necessarily equivalent; the specification might just betoo weak to detect the change.

Some mutations on requirement properties might have no effect on the verification, and do notcontribute to the test case generation. Nevertheless, valuable insights on the specification can bedrawn from such cases, as described in Chapter 7.

The approach presented in (Ammann et al., 2001) can also be seen as related to this approach.Here, the model is also mutated and verified with regard to requirement properties. As described inSection 3.6.3, the objective is to derive dangerous traces with regard to safety properties. For this,the original and mutant model are merged, so that the combined model can take both, the originaland the mutated transition. The process of merging a model and its mutant is illustrated with thelanguage SMV in (Ammann et al., 2001). A special variable original, which is only false if themutated transition is taken, is added to the model. This is used to create special trap propertiesbased on the requirement properties, as described in Section 3.6.3.

3.7.3 Reflection

Ammann and Black (1999b) create logical formulas that “reflect” the transition relation of a model;this process is called reflection. These reflected properties resemble the logical properties derivedfrom SCR specifications, as described in (Ammann et al., 1998) and the previous section. Withregard to the transition system definition given in Section 3.5.2, there is one such reflected propertyfor each simple transition:

� (α ∧ γ → ©β)

The reflection process is straight forward in principle, but there are several subtle issues whenapplying it to a concrete modeling language. For example, in the language of the model checkerSMV there is an implicit semantics based on the syntactic ordering of case statements, which hasto be resolved as there is no ordering for properties. To overcome this problem, it is necessary tomake the implicit information contained in the ordering of the transitions explicit. In (Ammannand Black, 1999b), this process is called expoundment. Basically, instead of using each antecedentconditioni (which represents α∧γ) as such, they are converted to (

∧1≤ j<k ¬condition j)∧conditionk

for conditionk.

The CC example NuSMV model (Figure 3.5) results in the following reflected properties (sim-plified; automatic expoundment might result in more complex but logically identical properties,


especially for the default branch):

� ((accelerate ∧ ¬brake ∧ velocity = stop)→ ©velocity = slow)

� ((accelerate ∧ ¬brake ∧ velocity = slow)→ ©velocity = fast)

� ((¬accelerate ∧ ¬brake ∧ velocity = fast)→ ©velocity = slow)

� ((¬accelerate ∧ ¬brake ∧ velocity = slow)→ ©velocity = stop)

� (brake→ ©velocity = stop)

There is one more simple transition that needs to be covered – the default branch. Here, two thingshave to be considered: First, the antecedent is not explicitly available, but is the conjunction ofthe negations of all earlier antecedents. Second, the NuSMV model states that velocity doesnot change. To represent this as a temporal logic property, an auxiliary variable P_velocity isnecessary, which is defined as follows:

VAR..velocity: boolean;P_velocity: boolean;...ASSIGNnext(P_velocity) := velocity;

In our example, this results in the following property (simplified):

� (((accelerate ∧ ¬brake ∧ velocity = fast)∨

(¬accelerate ∧ ¬brake ∧ velocity = stop))→

©¬(P_velocity = velocity))

G (a −> X b)

G (b −> X c)

G (c −> X d)

....

Mutants

Model Checker

Test cases

Model

G (a −> X b)

G (b −> X c)

G (c −> X d)

....

Reflection

x

&

!c

Figure 3.15: Mutation based test case generation with reflection.

Once a set of reflected properties is derived, mutation can be applied to these properties. In (Am-


mann and Black, 1999b), the resulting mutants are used to determine the mutation adequacy of agiven test suite. In fact, the mutants of the reflected properties can be used just like coverage relatedtrap properties (Black et al., 2001) in order to generate test cases. Figure 3.15 depicts the processof test case generation with reflection. If the mutant property describes a transition that does notexist in the actual model, then the model checker returns a counterexample that takes the correcttransition. The mutant properties can be greatly varied by applying different mutation operators. Aninvaluable source of information for this approach is (Ammann et al., 2002).

Gargantini (2007b) proposed a related approach based on Abstract State Machine specifications(introduced in Section 3.5.4). Guard conditions of update rules are mutated according to a givenfault model. From the mutated conditions, detection conditions are derived. The idea is that afault in a Boolean expression can be discovered if the detection condition evaluates to true. For agiven Boolean expression φ and a mutant φ′, the detection condition is φ ⊕ φ′. The operator ⊕ isthe xor-operator, which means that the detection condition is only true if φ and φ′ have differentvalues. Test cases are generated by converting an ASM specification to a SPIN or SMV model. Theconsidered guard conditions are mutated, and a set of detection conditions is created by combiningeach mutant φ′ with its original condition φ as φ ⊕ φ′. Trap properties are created by negatingthe detection conditions; i.e., claiming that they are never true. As ASM specifications can behierarchic, additional outer guard conditions have to be included in the trap property:

� (A→ ¬(φ ⊕ φ′))

Here, A denotes the conjunction of the outer guard conditions. The property is equivalent to � (A→(φ↔ φ′)). Test cases can be derived as usual by checking the trap properties against the model.

Considering the example ASM specification of the CC model, given in Figure 3.12, assume a mutantof the guard of rule 1 from (velocity == stop) && (accelerate == True) && (brake ==False) to not(velocity == stop) && (accelerate == True) && (brake == False). Thisresults in the following trap property:

� (¬((velocity = stop∧accelerate∧¬brake)⊕((velocity , stop∧accelerate∧¬brake)))

3.7.4 State Machine Duplication

Okun et al. (2003b) identified the problem that when using mutation in the reflection approach thereis no guarantee that a test case propagates a fault to an observable output. As one possible solution,In-line expansion is proposed. In-line expansion considers only reflections of the transition relationsof output variables. In these reflections, internal variables are replaced with in-line copies of theirtransition relations. This replacement is repeated until the formula references no more internalvariables. In-line expansion results in very efficient, but also very large test suites, as the number ofmutants can increase quite significantly.

As an alternative, an approach called state machine duplication is proposed in (Okun et al., 2003b).This approach is based on model mutation, but uses an equivalence checking method to derivecounterexamples. As illustrated in Figure 3.16, for each mutant model, a combined model wheremutant and original model are executed in parallel is created. Both original and mutant model share


Orig = Mutant

Trap Property

Model Checker

Test cases

Mutant

Model

MutantMutant

Mutant

MutantCombined Model

Figure 3.16: State machine duplication based test case generation.

the identical input values, therefore inequivalence can be shown with a trace where the output valuesdiffer. The model checker can easily be used to create such a trace, by verifying a property of thefollowing type for each output variable out, or alternatively creating the conjunction of all outputvariables:

� (original.out = mutant.out)

In the CC example model, the following property would be used:

� (original.velocity = mutant.velocity)

Boroday et al. (2007) use this approach in the formal setting of modules as described in Section 3.4.In this setting, the composition of specification module S and mutant M with outputs O results in acounterexample if:

S ||M 6|= �∧p∈O

(p = p′)

If the mutant is equivalent to the original model with regard to the outputs, then the combinedmodel satisfies these properties. If the mutant is not equivalent, then each such property resultsin a counterexample usable as a test case where the fault is propagated to an output. In practice,creation of a test suite with state machine duplication takes longer than with reflection, becausethere is the overhead of creating the combined models, and calling the model checker separately oneach combined model; mutants of reflected properties can be verified in a single run of the modelchecker. Experiments (Okun et al., 2003b) have, however, shown that test suites created with statemachine duplication are better.

3.8. Test Suite Analysis with Model Checkers 53

3.8 Test Suite Analysis with Model Checkers

Model checkers are not only useful when it comes to creating test cases. Given an extant set of testcases, a model checker can be used to evaluate the quality, for example with regard to satisfaction ofa given coverage criterion. A nice aspect of this approach is that coverage can be measured withoutactually executing test cases. Different activities during the development process can result in testcases; for example, use cases created during the requirements phase, manually created test cases, ortest cases created with any automated method. It is an important task from a practical perspectiveto evaluate how good these test cases are.

3.8.1 Symbolic Test Case Execution

Analysis of test cases with a model checker is based on the idea of representing test cases as veri-fiable models, based on an approach by Ammann and Black (1999b). Test cases are represented asconstrained finite state machines (CFSM), which have an explicit state counter on which the valuesof all other variables depend.

Test cases are converted to CFSM models with an additional variable State. It is initialized with0 and increased until the final state of the test case is reached. The values of all variables are setaccording only its value.

As an example, consider test case derival by checking a mutant model of the CC example givenin Section 3.3 (accelerate | !brake & velocity = stop: slow) against the first require-ment property specified in Equation 3.13 in Section 3.3. NuSMV returns the counterexample shownin Figure 3.17, which can be used as a negative test case.

-- specification AG (brake -> AX velocity = stop)-- is false as demonstrated by the following-- execution sequence-> State: 1.1 <-accelerate = 0brake = 0velocity = stop

-> State: 1.2 <-accelerate = 1brake = 1velocity = slow

-> State: 1.3 <-accelerate = 0brake = 0

Figure 3.17: Counterexample created by NuSMV showing that brakes do not work in mutant model(edited for brevity).

In the trace in Figure 3.17, at every state only those variables that changed their values are listed.In state 1.2 both accelerate and brake are activated, while due to the mutation at the same


time velocity changes to slow. In state 1.3 velocity is still slow, which is a violation of therequirement that it should be stop. When converted to an SMV model, this trace results in themodel listed in Figure 3.18.

MODULE mainVARaccelerate: boolean;brake: boolean;velocity: {stop, slow, fast};

State: 0..2;ASSIGNinit(accelerate):=0;next(accelerate):= caseState = 0: 1;State = 1: 0;1: accelerate;esac;init(brake) := 0;

next(brake) := caseState = 0: 1;State = 1: 0;1: brake;esac;init(velocity):= stop;next(velocity):=caseState = 0: slow;State = 1: slow;1: velocity;esac;init(State) := 0;next(State) := caseState<2: State+1;1: State; esac;esac;

Figure 3.18: Test case as verifiable SMV model.

This SMV model is suitable for analysis with a model checker, for example to measure coverage or amutation score. The latter can only be directly measured in the case of weak mutation, which meansthat the change caused by the mutation does not have to propagate to an output to be considered askilled. In order to simulate the execution of a test case on a model, further processing is necessary.The test case is combined with the model by moving the model’s main module to a sub-module ofthe test case, and changing all input variables to parameters of that module. This new sub-module isinstantiated in the test case model, and the input variables are used as parameters, thus ensuring thatthe mutant model uses the inputs provided by the test case. The result is shown in Figure 3.19.

Finally, for each output variable a property is added that requires the output variables of the mutantmodel and of the test case to be equal for the duration of the test case (or alternatively, a conjunctionof all these properties). After the last state of the sequence the test case does not specify how thevalues change. In the test case model, this is modeled by not changing the variables. However, themutant model might still change as time progresses. Therefore, the assertion is extended to only bevalid while the last step of the test case has not been exceeded.

� (State < MAX_STATE→ velocity = model.velocity)

Calling the model checker on the combined model and these properties, any counterexample illus-trates that the test case fails on the model. If the model checker does not return a counterexample,


MODULE Model(accelerate, brake)VARvelocity: {stop, slow, fast};ASSIGN... As in (mutant) model

MODULE mainVARmodel: Model(accelerate, brake);... As in testcase model

Figure 3.19: Test case model combined with original model to simulate test case execution.

then the test case passes.

3.8.2 Coverage Analysis

Coverage analysis measures how thoroughly a given test suite exercises a system under test. Acoverage criterion describes the items that should be executed (covered) by at least one test case;for example, lines of code, or branches in the control flow. Coverage criteria can also be based onspecifications or models. Different coverage criteria were presented in Section 3.5 and Section 3.6.Each item described by the coverage criterion is represented as a single trap property, as describedin the previous section. For test case generation, each trap property results in a single test case. Thesame trap properties can be used to determine a coverage value. The test coverage is the percentageof items that are actually covered, i.e., reached during test case execution.

Definition 20 (Test Coverage). The coverage C of a test suite TS with regard to a coverage criterionrepresented by a set of trap properties P is defined as the ratio of covered properties to the numberof properties in total:

C =1|P|· |{x|x ∈ P ∧ covered(x,TS )}|

The predicate covered(a,TS ) is true if there exists a test case t ∈ TS such that t covers a, i.e., t 2 a.

When checking a test case model (e.g., Figure 3.18) against a trap property, the model checkerreturns a counterexample if the test case covers the item represented by the trap property. Care hasto be taken because a test case is usually only a finite prefix of an execution path, unless it is a lasso-shaped sequence. If it is only a prefix, then the path might be truncated such that a trap property isviolated because of the truncation. For example, consider a condition on the next state using the ©operator, evaluated on the final state of the test case. The next state is not defined for the final state,or in the case of test cases modeled with SMV it might simply be a copy of the final state, whichmight cause a property violation.

This problem is related to runtime verification, where finite execution traces are monitored withregard to temporal logic properties. In runtime verification, special finite trace semantics are defined


in order to obtain a verdict with a finite trace. As finite trace semantics consider only finite traces,special treatment of the last state of a finite trace is necessary. For example, one possibility is toassume that the last state is repeated after the end of the trace. Another possibility is to define thatno proposition holds after the last state. Obviously, this changes the meaning of the � operator.

In the context of model checker based testing we do not change the infinite trace semantics of LTL,but define a rewriting of LTL properties that avoids problems created by finite traces. A propertyP is transformed so that it is satisfied if the truncation of the path leads to a property violation.For this, we assume the existence of an atomic proposition that evaluates to true only if the currentstate of a sequence is any state prior to the final state. For example, this can be implemented as"s < l", where s represents the number of the current state, and l = length(t) is the length of testcase t. This proposition has to be included when analyzing a test case. In (Ammann et al., 2002),this proposition is denoted as Sustain. With this we can interpret a test case as a model:

Definition 21 (Test Case Model). The test case model Mt for a test case t := 〈t0, ..., tl〉 of length lfor the model M = (S , S 0,T, L) is defined as a Kripke structure Mt = (S t, {t0},Tt, Lt), where:

• S t = {ti | 0 ≤ i ≤ length(t) : < ..., ti, ... >= t}

• Tt = {(ti, ti+1) | 0 ≤ i < length(t) : < ..., ti, ti+1, ... >= t ∧ (ti, ti+1) ∈ T }

• ∀ 0 ≤ i < l : Lt(si) = L(si) ∪ {(s < l)}

• ∀ i ≥ l : Lt(si) = L(si) ∪ {¬(s < l)}

The labeling function and the set of atomic propositions AP are extended with the proposition "s <l", which is true as long as the current state s of execution has not exceeded the end of the test caset with length l = length(t).

Let ξ denote the proposition that decides whether a formula is valid and should be regularly eval-uated or whether it should evaluate to true because it does not apply. For example, if ξ = s < lthen we want every formula to evaluate to true once the final state of a test case has been reached.For this, constraint rewriting rules are defined, which basically rewrite temporal operators such thatthey evaluate to true when an unsound state is reached. In (Ammann and Black, 1999a) this rewrit-ing is defined for CTL, and a correctness proof is given. The same rules apply to LTL properties.Consequently, the constraint rewriting CR(φ) for an LTL/CTL property φ is recursively defined asfollows, where v denotes a Boolean value. OP denotes any of the LTL operators � , © , ^ , or whenconsidering CTL properties AG , AF , AX , EG , EF , EX . The operator OPU stands for either Aor E in the context of the CTL until operator, or is a blank placeholder in the case of LTL. Atomicpropositions are denoted by a.

Definition 22 (Constraint Rewriting).

CR(φ) =

{cr(φ,True) if φ begins with a temporal operator

ξ → cr(φ,True) else.


cr(a, v) = a

cr(¬φ, v) = ¬cr(φ,¬v)

cr(φ1 ∧ φ2, v) = cr(φ1, v) ∧ cr(φ2, v)

cr(φ1 ∨ φ2, v) = cr(φ1, v) ∨ cr(φ2, v)

cr(φ1 → φ2, v) = cr(φ1,¬v)→ cr(φ2, v)

cr(φ1 ≡ φ2, v) = cr(φ1, v) ≡ cr(φ2, v)

cr(OP φ,True) = OP (ξ → cr(φ,True))

cr(OP φ, False) = OP (ξ ∧ cr(φ, False))

cr(OPU φ1 U φ2,True) = OPU φ1 U φ2 → cr(φ2,True))

cr(OPU φ1 U φ2, False) = OPU φ1 U φ2 ∧ cr(φ2, False))

The test coverage of a given test suite is determined as follows:

1. Each test case is converted to a verifiable model, as described in Section 3.8.1.

2. Each test case model is checked against the prefix-transformed versions of all remaining trapproperties.

3. Each trap property that results in a counterexample is covered, and does not need to bechecked again.

The overall test coverage is calculated from the number of covered trap properties according toDefinition 20.

3.8.3 Mutation Analysis

As introduced in Chapter 2, another common analysis technique besides coverage analysis is muta-tion analysis. Here, a given test suite is examined with regard to a given set of mutants, in order todetermine how many of the mutants can be distinguished from the original by the test cases. Usually,mutation analysis is applied to the source code, but specification mutation is receiving increasingattention — for example, consider the mutation based test case generation presented earlier.

Definition 23 (Mutation Score). (Ammann and Black, 1999b) The mutation score S for a givenmethodM to create mutants, a test set t for any specification r equals the number of mutants killedby the test set, k, divided by the total number of mutants, N, produced byM on r:

S (M, r, t) =kN

A special case of mutation analysis is presented by Ammann and Black (1999b). Here, not themodel but properties that represent the transition relation are mutated. As described above, theseproperties can be used like trap properties for test case generation. Similarly, these properties canalso be used for analysis of test cases like trap properties. The mutation score is calculated fromthe number of mutant properties that result in a counterexample when checked against a test case


model. This kind of mutation analysis uses weak mutation, which means that a mutant is killed ifan erroneous state results immediately after the mutated transition.

In contrast, in strong mutation a mutant is killed if the final output is different from the originalversion. Strong mutation analysis can be performed by considering model mutants. Model mutantsneed different treatment in order to determine a mutation score. Again, each test case is converted toa verifiable model. Then, each mutant is successively combined with a test case model as describedabove, until the verification of such a mutant/test case model combination results in a counterex-ample. A counterexample indicates that the test case failed, which in turn means that the mutant iskilled.

3.9 Issues in Testing with Model Checkers

Testing with model checkers is an active area of research, and as such there are many issues that stillneed to be solved. The main showstopper for industry acceptance of model checker based testingis probably the limited performance. A main cause of this problem is the state explosion problem,but there are other issues contributing to a potentially bad performance. Even if the performance isacceptable, the results of the test case generation might not be as good as possible. Some applicationscenarios, like regression testing, need special treatment. Nondeterministic models or properties thatrequire nonlinear counterexamples are further examples of issues with model checker based testing.This section reviews identified problems and proposed solutions.

3.9.1 Abstraction

The main cause for performance problems with model checkers is the state explosion, which sig-nifies the large or intractable state spaces that can easily result from complex models. Especiallysoftware model checking is susceptible to the state explosion problem. Abstraction is a popularmethod to overcome the state explosion problem. Abstraction is an active area of research, andmany abstraction techniques have been presented in recent years. This has made it possible to verifyproperties on very large models. In general, abstraction methods are tailored towards verification,and therefore are not always useful in the context of testing.

A full survey of available techniques is out of the scope of this document; as an example technique,we mention counterexample guided abstraction refinement (CEGAR) (Clarke et al., 2000), whichrefines an abstract model until no more spurious counterexamples are generated when verifying aproperty. This method ensures soundness, which means that a property that holds on the abstractmodel also holds on the concrete model. In contrast, when generating test cases with a modelchecker, the objective is different: Properties that are violated by a concrete model should also beviolated by the abstract model.

Ammann and Black (1999a) define a notion of soundness in the context of test case generation,which expresses that any counterexample of an abstracted model has to be a valid trace of theoriginal model. A method called finite focus is proposed and shown to be sound with regard to thissoundness definition. Finite focus only considers a limited set of states, for example only a fixedsubset of variables of large or unbounded domains. An additional state machine is defined, which

3.9. Issues in Testing with Model Checkers 59

changes from state sound to unsound whenever a transition is taken that is out of the finite focus.Once the unsound state is reached, this state machine stays in this state.

A special variable s to denote soundness is introduced; s is true if the state is sound or otherwisefalse. This variable can be used as ξ = s in the constraint rewriting given in Definition 22.

When creating test cases from a model which is abstracted with the finite focus method, this con-straint rewriting has to be applied to all properties involved in the test process, that is, trap properties,reflected properties, etc. Any counterexample created from such a rewritten property is sound withregard to the abstraction. This means that the test case applies to the abstracted and the originalmodel. A property where the constraint rewriting has been applied might be satisfied by the abstractmodel, while the original model would result in a counterexample. Therefore, the number of testcases on an abstract model is usually smaller. This shows that abstraction can not only be usedto increase the performance of the test case generation, but also as a means to control the size ofresulting test suites.

3.9.2 Improving the Test Suite Generation Process

One main cause for bad performance during test case generation is the model checker itself. Im-provement of model checking techniques is an important area of research. For example, a casestudy by Heimdahl et al. (2003) showed that bounded model checking can be superior for test casegeneration, at least for certain models and coverage criteria. As another example, directed modelchecking (Edelkamp et al., 2001) is a recently proposed technique, which is of interest to testingwith model checkers, because its aim is the efficient generation of counterexamples and not ex-haustive verification. An overview of current research to improve model checking is out of thescope of this document. As another example of how the performance can be improved, abstractiontechniques have been considered in the previous subsection.

Both coverage and mutation based approaches to test case generation call the model checker farmore often than really necessary, as identified by Hong and Ural (2005a), Fraser and Wotawa(2007c), and Zeng et al. (2007). For example, consider a coverage criterion that is representedby a set of trap properties T . Traditionally, the model checker is called for each trap property t ∈ T .As a consequence, many duplicate test cases are created, and many test cases are subsumed by other,longer test cases. Black and Ranville (2001) describe winnowing of test cases as a means to removesuch redundant test cases once a complete test suite has been generated. As described by Fraser andWotawa (2007b), even test cases that are not duplicates or subsumed by other test cases can containa significant amount of redundancy if they contain common prefixes.

In (Fraser and Wotawa, 2007c), it is proposed to monitor trap properties during test case generation.Each time a counterexample is generated the remaining trap properties are analyzed with regard tothis new counterexample. A trap property that is already covered does not need to be considered fortest case generation; it is not necessary to call the model checker on it. As a concrete technique toperform this monitoring, LTL rewriting based upon an approach described in (Havelund and Rosu,2001a) is proposed in (Fraser and Wotawa, 2007c). Chapter 11 describes this approach in detail.

In Chapter 12 it is described how mutant models can be represented as temporal logic properties,which allows application of this approach to mutation based approaches. Each mutant model isrepresented by a unique characteristic property. Characteristic properties are similar to the reflected


properties described in Section 3.7, but are extended to cover all possible effects a mutation canhave in a transition system. Instead of monitoring trap properties, the characteristic properties canbe monitored. Whenever a characteristic property is covered by a counterexample, it is not necessaryto include the mutant represented by this characteristic property in the test case generation.

When converting each counterexample to a distinct test case, the resulting test suite contains re-dundancy. Monitoring avoids duplicate or subsumed test cases, but different test cases might stillshare identical prefixes. As described in the next section, these common prefixes do not contributeto the overall fault detection ability, but consume time during test case generation and execution.This can be avoided by creating test cases incrementally instead of mapping each counterexampleto a test case. This approach was initially proposed by Hamon et al. (2004). After creating a coun-terexample, the initial state of the model for the next verification process remains the final state ofthe counterexample. In (Hamon et al., 2004) this is achieved by directly calling application inter-face functions of the model checker SAL. Incremental generation of test cases in combination withproperty monitoring is used in (Fraser and Wotawa, 2007c). The choice of which trap property toverify next influences the length of the test cases that are generated. In (Hamon et al., 2004) the trapproperties are chosen randomly (in the order provided). In (Fraser and Wotawa, 2007c) this is doneas well, but in many cases the rewriting of trap properties leads to hints of which trap properties canlead to very short test cases, as will be shown in Chapter 11.

Hong and Ural (2005a) use subsumption relations between items described by a coverage criterionto reduce the costs of the test case generation. An entity subsumes another entity if exercising theformer guarantees exercising the latter. The time used by the test case generation is reduced by firstcalculating a minimal spanning set, and then only using coverage entities in this minimal spanningset to derive test cases.

Model checking is used to determine subsumption between two entities. It is assumed that theentities are represented as LTL formulas, such that a path exercises the entity, if it fulfills the LTLformula. For entities e1 and e2, represented by LTL formulas ltl(e1) and ltl(e2), e1 subsumes e2 if amodel K satisfies the following property:

ltl(e1)→ ltl(e2)

For each coverage criterion, a different formula ltl(e) has to be defined for the entities e. Hong andUral (2005a) define these formulas for control and data flow coverage criteria. In (Hong and Ural,2005a), the subsumption relation is used to derive minimal spanning sets for coverage criteria. Aspanning set for a coverage criterion is a subset of its entities, such that exercising all items in thespanning set covers all entities described by the coverage criterion. A spanning set is minimal, ifthere exists no spanning set with less elements.

The minimal spanning set is derived by first creating a subsumption graph, in which vertices rep-resent coverage entities and arcs represent subsumption. Subsumption information is derived bymodel checking the above property for pairs of coverage entities. Strongly connected componentsare collapsed into one vertex, which results in a reduced subsumption graph. Let vi, ...vn be the ver-tices of the reduced subsumption graph which have no incoming arc; that is, they are not subsumed.V1, ...Vn are the sets of strongly connected components of the subsumption graph corresponding tov1, ..., vn. A minimal spanning set is {v′1, ..., v

′n}, such that v′i ∈ Vi for all 1 ≤ i ≤ n. Hong and

Ural (2005a) present two different algorithm to derive subsumption graphs, one requires n2 calls


to the model checker for n coverage entities and identifies all possible minimal spanning sets. Thealternative algorithm reduces the complexity by only creating one possible minimal spanning set.

Monitoring avoids that the model checker is called for trap properties which are covered by thetraces selected so far, whilst the subsumption approach avoids model checking of trap propertiesthat are always subsumed by other trap properties. The results can be quite different, and eventhough monitoring can result in smaller test suites, the actual success depends on the order in whichtrap properties are chosen. Consequently, a combination of these approaches is conceivable.

Zeng et al. (2007) collect test cases created with a model checker in a structure called test tree. Eachcounterexample is merged into the existing test tree. Identical prefixes are simply overlaid in thetree, which automatically removes duplicate or subsumed test cases. Once a complete test tree hasbeen produced covering all test requirements, a test suite is derived as the set of paths from the treeroot to a leaf. This achieves the coverage criterion used for test case generation with a minimizedtest suite. It is also suggested that each time a sequence is generated, the remaining test requirementsare analyzed whether any of them are fulfilled. A concrete method for this is the rewriting techniquepresented above, and proposed in (Fraser and Wotawa, 2007c).

3.9.3 Improving the Results of the Test Suite Generation Process

Performance is a main concern for test case generation; the generation process itself needs to besufficiently fast to be applicable to models of realistic size. However, performance is also an impor-tant factor during test case execution. If there are too many or too long test cases, execution of a testsuite might not be feasible. This is even more the case when considering regression testing, wherea test suite is applied repeatedly after changes in an implementation of specification. Some of theapproaches described in the previous section improve the test case generation such that smaller testsuites result. This section considers optimizations to existing test suites.

With high execution costs in mind, the test case generation process should ideally result in minimaltest suites in the first place. A test suite can either be minimal with regard to the number of testcases, or the number of transition in the test suite. In the context of testing with model checkers,both tasks are NP-hard, as shown in (Hong et al., 2003).

Test suites created with model checkers are not minimal; in addition, they often consist of test casesthat do not contribute to the fault sensitivity. For example, different trap properties might result inidentical test cases. If a (passing) test case is a prefix of another test case, then it is not necessaryto execute the shorter test case if the longer one is also executed; the short test case is subsumedby the longer one. (For failing tests long test cases are subsumed by shorter prefixes). Black andRanville (2001) describe several methods to remove unnecessary test cases: Clearly, duplicates andsubsumed test cases can be safely removed. The cross section of a requirement is the ratio of testcases that satisfy a test requirement to test cases in total:

CS (r) =# satisfying tests

# tests

A test suite can be minimized by selecting those test cases, that satisfy test requirements with smallcross sections. Such test cases can be identified with their Resolution:

RES (t) =∑ 1

CS (r)2


The higher the resolution of a test case is, the more small cross section requirements it fulfills. Atest suite is minimized by iteratively selecting the test case with the greatest resolution that fulfills ayet unfulfilled test requirement, until all test requirements are fulfilled.

Another technique proposed in (Black and Ranville, 2001) is minimization, which selects a subsetof a test suite that achieves a given coverage criterion. This is also known as test suite reduction,which is defined by Harrold et al. (1993) as follows:

Given: A test suite TS , a set of requirements r1, r2, . . . , rn that must be satisfied to provide thedesired test coverage of the program, and subsets of TS , T1,T2, . . . ,Tn, one associated witheach of the ris such that any one of the test cases t j belonging to Ti can be used to test ri.

Problem: Find a representative set of test cases from TS that satisfies all ris.

The requirements ri can represent any test case requirements, e.g., test coverage. A representativeset of test cases must contain at least one test case from each subset Ti. The problem of finding theoptimal (minimal) subset is NP-hard. Therefore, several heuristics have been presented (Harroldet al., 1993; Gregg Rothermel, 2002; Zhong et al., 2006).

Test suite reduction results in a new test suite, where only the relevant subset remains and the othertest cases are discarded. Intuitively, removing any test case might reduce the overall ability of thetest suite to detect faults. In fact, several experiments (Jones and Harrold, 2003; Rothermel et al.,1998; Heimdahl and Devaraj, 2004) have shown that this is indeed the case, although there are otherclaims (Wong et al., 1995). Note that the reduction of fault sensitivity would also occur when usingan optimal instead of a heuristic solution.

Heimdahl and Devaraj (2004) conducted their experiments in the context of model checker basedtest case generation. These experiments also lead to the conclusion that test suite reduction can sig-nificantly reduce the size of a test suite, but the fault detection ability suffers from this reduction.

In Chapter 9 it is shown that test suites created with model checkers often contain a significantamount of redundancy, which means that test suites are bigger than would be necessary with regardto their fault detection ability. Common prefixes are identified as a main source of redundancy.Methods to measure and remove redundancy are presented in Chapter 9. In a nutshell, test suites canbe improved by splitting test cases with common prefixes and recombine them such that the commonprefixes are avoided. Hence, an optimized test suite still fulfills the original test requirements formost conceivable types of test requirements, but the overall test suite size is reduced.

Finally, a technique that is used to improve the speed with which faults are detected is test caseprioritization. Test case prioritization is the task of finding an ordering of the test cases of a giventest suite such that a given goal is reached faster. Test case prioritization in the context of testingwith model checkers is considered in Chapter 10. In general, the first step of prioritization is toanalyze each test case with regard to its coverage of a certain criterion or mutation score; test caseanalysis with model checkers is described in Section 11.3. Then, the test cases simply have to bearranged in descending order according to their coverage values of mutation scores. In general, theprioritization reduces the average number of test cases that need to be executed in order to detect afault.


3.9.4 Quality Concerns for Coverage Based Testing

Heimdahl et al. performed several experiments to evaluate model checker based testing. In (Heim-dahl et al., 2003), a case study that analyzed the scalability of test case generation with modelcheckers, the authors observed that several condition based coverage criteria resulted in too shorttest cases that are not good at detecting faults. In (Heimdahl et al., 2004), a pilot study was con-ducted to investigate the suitability of condition based coverage criteria. In this experiment, testsuites were generated using different condition based coverage criteria for a close to productionmodel of a flight guidance system from Rockwell Collins Inc. The fault detection ability of thedifferent test suites was measured on mutant versions of the model. The experiment showed that aset of randomly generated test cases generated using the same effort were superior to all coveragebased test suites.

This result was due in part to a peculiar behavior of the model in use that was not considered by thespecifiers, but often exploited by the bounded model checker used by Heimdahl et al., which alwaysreturns the shortest possible counterexamples. The solution applied in (Heimdahl et al., 2004) wasto define invariants to prohibit the unwanted behavior. The other conclusion drawn by Heimdahl etal. is that lazy evaluation techniques interfere with condition based coverage criteria. In general,this shows that a suitable model has a crucial influence on the result of model checker based testcase generation.

In a consequent experiment, Devaraj et al. (2005) showed that while coverage criteria are suitablefor analysis purposes, there are problems when using them for test case generation. The identi-fied problem is that coverage of a trap property does not guarantee that a considered part of thespecification is actually executed by the resulting test case. As a solution, auxiliary variables thatindicate whether some part of the specification was executed are introduced in the model and thetrap properties.

An evaluation of three specification coverage criteria was performed by Abdurazik et al. (2000).Test cases were automatically created for a given example model using full predicate coverage,transition pair coverage and specification mutation coverage, which is the mutation approach basedon reflected properties presented in Section 3.7. The resulting test cases of one criterion wereevaluated with regard to the other criteria. No subsumption relations could be detected between theconsidered criteria. The results showed that while full predicate and specification mutation relatedtest suites are more related with each other than transition pair test suites.

3.9.5 Regression Testing

Regression testing is applied when previously tested code is changed, in order to ensure that nonew errors are introduced. A straight forward approach to regression testing is retest all. Here, allavailable test cases are executed, which might be very time consuming and expensive. Therefore,selective retesting tries to select only a subset of the available test cases, which are sufficient to detectfaults introduced with the changes. Traditionally, only changes in the source code are considered.Changes in the specification, however, also require regression testing.

Xu et al. (2004) present an approach to regression testing with model checkers, where a specialcomparator creates properties from two versions of a model, the original version and a changedversion. Each such property covers one test path that has been changed; in (Xu et al., 2004) special


variables in the model are introduced to identify such paths, and the properties are implementedas assertions on these variables. It is suggested that comparators can act on different levels ofabstraction. The resulting properties are verified on the changed model. Only those properties thatresult in counterexamples need to be considered for regression testing, properties that hold on thechanged model represent test paths that do not need to be executed.

Chapter 13 evaluates different techniques to create regression test cases and update existing testsuites when a model is changed. In a first step, an existing test suite is analyzed to determine whichtest cases are still valid for the changed model and which are not. This can be done with a modelchecker, by either symbolically executing the test case on the new model and comparing outputs ofthe test case and the model, or by extracting change properties from the two versions of the modeland then checking the test case models against these properties. A change property represents thechange in the transition system, such that a test case that takes a different transition violates theproperty.

Once obsolete test cases have been identified, there are different approaches to create new testcases. The first approach is to determine the behavior of the changed model with regard to the inputof obsolete test cases; that is, the test cases are adapted to the new model. Alternatively, sets oftrap properties are generated from the old and changed model, and then the difference in these setsis calculated and used to generate test cases. Finally, property and model rewriting is proposed inChapter 13, which lets all test cases focus on the model changes. Every resulting test case containsat least one changed transition. Test cases created with any of these approaches can be used asregression tests, and when combined with those test cases from a previous test suite that are stillvalid, form a new test suite.

These methods are evaluated in Chapter 13, and it is shown that there is a trade-off between timeconsumed for generating and executing new test cases and overall quality of a test suite after sev-eral changes. Consequently, the preferred method depends on the available resources and qualityrequirements.

3.9.6 Fault Visibility

The state of a model is defined by the values of its variables. These variables can be input oroutput to the system, but they can also be internal variables. Internal variables might not be directlyobservable. Therefore, it is important that a test case ends with some observable event or change,such that a verdict is possible. For example, trap properties for structural coverage criteria or trapproperties created by the reflection approach explicitly consider the transition relations of internalvariables. Such trap properties usually end with a transition where an internal variable takes onan interesting value. If this value cannot be observed, the test case does not fulfill its intendedpurpose.

Okun et al. (2003b) propose two approaches that explicitly creates such counterexamples that re-sult in an observable change in an output variable. In-line expansion repeatedly replaces internalvariables in the properties used for test case generation with their transition relations until there areno more internal variables left. This process can be applied to any kind of trap property. As analternative, Okun et al. (2003b) propose state machine duplication, described in Section 3.7.4.

Hong et al. (2002) assume the existence of a special predicate exit, which is true in any exit state(e.g., in the final vertex of a data flow graph). It is also suggested that the initial state can be used


as an exit state, such that test cases can be seamlessly executed. To make use of this predicate,trap properties have to include a reference to this predicate, which can for example be done with animplication:

. . .→ �¬exit

When using a requirement property based approach it depends on the requirement properties, whethertest cases are fully observable or not: If the properties include internal variables, then there is achance that observability is not always achieved.

3.9.7 Nondeterminism

Although a model checker can verify nondeterministic models, trace counterexamples representonly one possible choice for each nondeterministic branch. Consequently, counterexamples canonly serve as test cases when using deterministic models. If a trace generated from a nondetermin-istic model is executed as a test case on an implementation, the test case might falsely detect a faultif the implementation makes different choices at the nondeterministic branching points. The correctverdict in this case would be inconclusive, as neither pass or fail can be concluded.

A simple solution that is applicable as long as there is not too much nondeterminism is presentedin Chapter 8. Here, the model is extended with an indicator variable that shows whether a non-deterministic transition was chosen or not. When interpreting counterexamples as test cases, theexecution framework has to check whether this flag is true when the implementation does not be-have as expected. If the flag is set, then an inconclusive verdict is given, or else a fault is detected.It is also straight forward to extend test cases to a tree like structure, where there are differentbranches for different nondeterministic choices. In Chapter 8, this is done in a lazy fashion; thatis, whenever an inconclusive verdict occurs during test case execution, the last known determinis-tic state is used as the initial state of the system, and a new counterexample is derived. This newcounterexample serves as a new branch in the old test case. The applicability of such an approachdepends on the amount of nondeterminism. Furthermore, if the implementation is nondeterministicitself, then applicability decreases. This means that nondeterminism as a means of underspecifica-tion or implementation choice can be handled to a certain degree, but not asynchronous, distributedsystems.

Boroday et al. (2007) distinguish between weak and strong test cases. A test case t for model S andmutant M is weak if M can produce an output sequence in response to t that S cannot produce. Atest case t is strong if every output sequence of M in response to t differs from the correspondingsequence of S . Under fairness assumptions, a weak test case can reveal any fault if repeatedlyexecuted; the repeated execution requires a reliable reset transitions. The method presented inChapter 8 could be used to distinguish weak test cases from strong test cases: a (linear) test case isweak if it contains an inconclusive verdict.

Boroday et al. (2007) describe methods to derive test cases for nondeterministic specifications,based on the state machine duplication approach (see Section 3.7.4). For the simpler case whenthe specification is deterministic and only the mutant is nondeterministic, weak test cases can bederived by the generic state machine duplication approach. For strong test cases, Boroday et al.(2007) describe a method to derive an observer from a mutant specification.


An observer Obs(M) for module M uses all outputs of M as inputs. A hidden variable f oundis added, and the hidden variables in M are removed. Determinization is possibly performed bypowerset construction. Except in trivial cases the observer is not input-enabled; additional sinkstates are added to make the module input complete. In these sink states, the variable f ound isset to true. A strong test case for a nondeterministic mutant is therefore derived if the followingproperty does not hold:

S ||Obs(M) |= �¬ f ound

If not only the mutant, but also the specification is nondeterministic, then weak test cases can begenerated by creating an observer from the specification. A weak test case for a nondeterministicspecification is therefore derived if the following property does not hold:

Obs(S )||M |= �¬ f ound

Because of its complexity, Boroday et al. (2007) do not consider generation of strong test cases, butdescribe a method to detect strong test cases.

3.10 Further Uses of Model Checkers in Software Testing

3.10.1 Testing with Software Model Checkers

All techniques presented so far assume the existence of a formal model of the system under testthat can be used to generate test cases. In practice, the creation of a sufficient model is one of themost difficult steps in model based testing. Sometimes the development process is supported bytools or specification languages which can serve as a basis for creating a verifiable model. Conver-sion between different formalisms is usually automatable; for example, Black (2000) consider thegeneration of models from high level specifications. Often, however, a model has to be generatedmanually, which is difficult and error prone. Therefore there is interest in applying model checkingto source code directly, removing the need for a model. This is commonly referred to as softwaremodel checking.

There are two different paths that have been taken to apply model checking to verification: Severaltools create models in the input languages of popular model checkers from the source code. Othertools implement their own model checking procedures. For example, Bandera (Corbett et al., 2000)creates SMV or Promela models from Java code. The first version of Java PathFinder (Havelund,1999) also converted Java programs to Promela models. Further tools that are built on top of ex-isting model checkers are JCAT (DeMartini et al., 1999), Park et al. (2000) convert Java code toSAL models; Bogor (Robby et al., 2003) tries to provide a language independent software modelchecking framework.

The second version of Java PathFinder (Visser et al., 2000) includes a specialized virtual machinethat interprets bytecode. Verisoft (Godefroid, 1997) executes C program code in order to avoidthe need to represent program states and statements. CMC (Musuvathi et al., 2002) addition-ally stores information about visited states. Bounded model checking is used to verify C codein CBMC (Clarke et al., 2004). SLAM (Ball et al., 2001) converts C code to Boolean abstractions

3.10. Further Uses of Model Checkers in Software Testing 67

that are model checked. Blast (Henzinger et al., 2003) uses counterexample guided abstractionrefinement to verify C code.

Testing with software model checkers has been considered by Beyer et al. (2004), who use themodel checker Blast to create test cases from C code. Test cases can be generated with regard topredicates (i.e., safety properties), and locations in the source code. Consequently, it is possible toderive test cases for code-based coverage criteria. Visser et al. (2004) use the Java PathFinder modelchecker to derive test cases in a similar manner. A source translation for symbolic execution withmodel checkers is presented by Sarfraz Khurshid and Visser (2003). This has been implemented asan extension to Java PathFinder, and can be used to generate test cases (Visser et al., 2006).

These findings show that test case generation with software model checkers is possible in theory, butscalability is not the only issue in practice. While test case generation with operational specificationscreates test sequences that include the expected output, test cases created directly from the sourcecode do not solve the oracle problem. Therefore, this is an area where further research will beneeded.

3.10.2 Coverage of Timed Automata

Timed automata are automata that include special variables called clocks, which include informationabout time, and can be used in guard conditions, etc. Uppaal (Larsen et al., 1997) is a popular modelchecker based on timed automata. Hessel et al. (2004) proposed test case generation using Uppaal.In this approach, a special timed variant of CTL is used to formalize test purposes or coveragecriteria. The generation of test cases with either test purposes or properties created for coveragecriteria as described in Section 3.5 is proposed. This method is of particular interest for timedsystems, because Uppaal supports generation of not only shortest but also quickest traces.

3.10.3 Combinatorial Testing

A new application of model checkers for test case generation was proposed by Kuhn and Okun(2006). Combinatorial testing tries to provide a high level of coverage of a system’s input domainwith a small number of test cases. The number of possible input combinations is usually extremelyhigh; for example, a system with 20 inputs with 10 values each allows a total of 1020 differentcombinations. If, however, only a limited number of combinations is selected, then this numberis reduced significantly. Considering all possible pairs of inputs for the above example resultsin 190 different input pairs with 100 different possible input combinations for each pair resultingin 19,000 different test cases which is substantially smaller than the overall number of differentcombinations.

The underlying idea of combinatorial testing can be best explained using a small example. Considera system with 3 boolean input variables v1, v2, and v3. All 2-way combinations would be v1 v2,v1 v3, and v2 v3. Only for these variable combinations all possible input value combinations have

to be tested leading to 12 test case instead of 16. In general there are(

nk

)different combinations

when we have n input variables and we want to compute all k-way combinations. For each ofthis combinations all possible input value tuple are generated. Variables which are not in a k-way


combination are assigned to a value which can be a random value or a value which allows to executethe program under test.

In practice, 3 to 6-way combinations are also used in addition to pairs and provide good results. Theempirical results in (Kuhn and Okun, 2006) showed a fault detection rate of 100 percent for a 5-waycombination. The underlying assumption of combinatorial testing is that only smaller subsets ofinput variables are responsible for certain outputs. Hence, only those inputs must be consideredwhen testing a specific functionality.

In (Kuhn and Okun, 2006), a model checker is used to derive test cases for t-way coverage. Givenassertions of the form AG (P→ AX R) and t-way variable combinations, v1∧v2∧...∧vt where eachvi is a condition comprising a variable and a assigned value, three different types of trap propertiesare proposed:

AG (v1 ∧ v2 ∧ ... ∧ vt ∧ P→ AX ¬R)

AG (v1 ∧ v2 ∧ ... ∧ vt → AX ¬1)

AG (v1 ∧ v2 ∧ ... ∧ vt → AX ¬R)

The first property might be trivially true if t is large because P together with v1, . . . , vt computesto false which makes the implication true. Because of this reason (Kuhn and Okun, 2006) pro-poses using the second property which simply forces a single step to be taken (¬1 is always false).Alternatively, the final property removes the condition P to avoid trivially true cases.

3.10.4 Testing Composite Webservices with Model Checkers

Web services are a recently popular mechanism to allow interaction of heterogeneous systems viathe internet. A particular strength of such techniques is that different services can be composed toform new, more complex services. There are several different languages that can be used to describeweb services and aid the automatic composition.

Composed web services result in complex behaviors, where the components can be distributedacross networks and implemented with different tools and systems. Therefore, verification of com-posed web service models as well as testing of composed web service implementations is veryimportant. The use of model checkers to verify web service composition has been proposed byseveral researchers. A combined approach of verification and testing based on model checkers hasbeen proposed by Huan et al. (2005). In this approach, OWL-S (Web Ontology Language for WebServices) specifications are translated to a C-like language, which is verified with the model checkerBlast. The model checker is also used to create witnesses that can be used as test cases, followingthe approach presented by Beyer et al. (2004).

Garcia-Fanjul et al. (2006) translate web service compositions specified with BEPL into Promela,the language of the model checker SPIN. Then, trap properties are used to create transition coveragetest suites.

3.11. Available Tools for Test Case Generation 69

3.10.5 Adaptive Model Checking

Adaptive model checking (Groce et al., 2002) is an advanced combination of model checking andtesting. Verification is performed on an incomplete model. If a counterexample is found, then thecounterexample is executed as a test case on an actual implementation. If the system passes thetest case, then a property violation has been found. If the test case does not pass, then the model isrefined according to the actual execution result. This is also related to black-box checking (Peled,2003), where no model at all to start with is assumed.

3.10.6 On-the-fly Testing with Model Checkers

All approaches to test case generation presented so far in this survey create test cases offline; thatis, the test cases are first generated from a model, and only once this generation process is done arethey executed. An alternative approach is to interleave test case generation and execution; this isknown as online or on-the-fly testing. On-the-fly testing has several advantages to offline testing; itcan be continued for a very long time, reduces the state explosion problem because only a limitedpart of the state space needs to be considered at a time, and nondeterminism is handled naturally.

Examples of on-the-fly testing tools based on model checkers are T-Uppaal (Larsen et al., 2004),based on Uppaal, and the work presented by de Vries and Tretmans (2000), who use the modelchecker SPIN. These tools are not based on model checking algorithms, but rather use the modelingand simulation features of the underlying model checkers.

3.11 Available Tools for Test Case Generation

Although testing with model checkers has been considered by several research groups, much of thework was done on research prototypes that were never released to the public. This section considersthe test case generation tools that are publicly available.

There is an online demonstration tool (Black, 1998) for mutation based test case generation withmodel checkers based on the work by Ammann et al. (1998). While it is only possible to generatetest cases for the cruise control example application used in (Ammann et al., 1998), the tool helpsto illustrate the steps involved in the process.

Since version 3.0, SAL (de Moura et al., 2004) includes the tool SAL-ATG (Hamon et al., 2005),which allows test case generation with SAL. As SAL provides a Scheme-based environment, thistool offers many possibilities for customization and extension. SAL-ATG does not use trap proper-ties, but requires that the model is extended with trap variables, which are true only when a test goalis reached. This basically allows similar coverage goals as with regular trap properties, althoughit is slightly more complicated to rewrite the model than to simply provide properties. In general,most coverage criteria that can be expressed as trap properties can also be encoded in the model.Figure 3.20 shows the car controller example model from Figure 3.5 as a SAL model. The variablest0–t5 are not actually part of the specification, but are trap variables for simple transition coverage.SAL-ATG can use these trap variables to create a simple transition coverage test suite. For this, thelist of goals has to be specified as listed in Figure 3.21. Assuming this list of goals is saved in a


car_control: CONTEXT =BEGINspeed: TYPE = {stop, slow, fast};

main: MODULE =BEGININPUTaccelerate, brake : BOOLEAN

OUTPUTvelocity: speed

LOCALt0, t1, t2, t3, t4, t5: BOOLEAN

INITIALIZATIONvelocity = stop;t0 = FALSE; t1 = FALSE; t2 = FALSE;t3 = FALSE; t4 = FALSE; t5 = FALSE;

TRANSITION[accelerate = TRUE AND brake = FALSE AND velocity = stop -->velocity’ = slow; t0’ = TRUE;

[]accelerate = TRUE AND brake = FALSE AND velocity = slow -->velocity’ = fast; t1’ = TRUE;

[]accelerate = FALSE AND brake = FALSE AND velocity = fast -->velocity’ = slow; t2’ = TRUE;

[]accelerate = FALSE AND brake = FALSE AND velocity = slow -->velocity’ = stop; t3’ = TRUE;

[]brake = TRUE -->velocity’ = stop; t4’ = TRUE;

[]ELSE -->t5’ = TRUE;

]END;

END

Figure 3.20: Simple car controller as SAL specification with trap variables for simple transitioncoverage.

3.12. Formal Testing with Model Checkers 71

(define goal-list ’("t0" "t1" "t2" "t3" "t4" "t5"))

Figure 3.21: Test goal list for SAL-ATG.

file called car_control_goals.scm and the model is saved in a file called car_control.sal,SAL-ATG is started with the following command:

sal-atg car_control main car_control_goals.scm

SAL-ATG will try to find test cases such that every trap variable is true at some point. There areseveral options to the test case generation; for details see (Hamon et al., 2005).

ATGT (ASM Tests Generation Tool) (Gargantini, 2007a) is a Java-based tool that implements theconcepts presented in (Gargantini and Riccobene, 2001; Gargantini et al., 2003; Gargantini, 2007b)to automatically create test cases for ASM specifications. It offers a graphical user interface anduses the model checker SPIN (Holzmann, 1997). The tool automatically creates trap properties, andillustrates them graphically. As an example to get started, the ASM model listed in Figure 3.12 canbe used with ATGT, and a version of a popular safety injection system model is available on thetool’s website (Gargantini, 2007a).

3.12 Formal Testing with Model Checkers

In Section 2.5, a formal framework for conformance testing was introduced. This framework servesto show that a test case generation algorithm is sound (i.e., creates only valid test cases), and theo-retically complete, if an exhaustive test suite were generated. Note that an exhaustive test suite maycontain unsound test cases, depending on the considered conformance relation.

Tretmans (1996) defines soundness, exhaustiveness, and completeness with regard to some confor-mance relation imp, which expresses whether a given implementation I conforms to its specificationM, denotes as I imp M. It can be assumed that a test case execution function and a verdict functioncan determine whether a test case passed or failed. Consequently, I passes T expresses that all testcases in test suite T pass on implementation I.

Definition 24. For specification M, implementation I, implementation relation imp and test suiteT:

T is complete =de f I imp M iff I passes T

T is sound =de f I imp M implies I passes T

T is exhaustive =de f I imp M if I passes T

To instantiate this framework, it is necessary to decide on model formalisms for the specificationmodel and the assumed implementation model, and a conformance relation between these models.A formal definition must be specified on how a pass or fail verdict is derived.


The model formalism mostly used for testing with model checkers is Kripke structures; modulesprovide the partitioning between input and output values. For a test hypothesis, it can be assumedthat the implementation can also be modeled as a module. A test case is a linear sequence of states,where each state consists of input and output valuations; this can also be interpreted as a specialkind of module. Test case, specification model and implementation model share the same set ofvariables (i.e., atomic propositions), but partitioning of the output and input variables is exchangedfor the test case model. Let MS = (KS , IS ,OS ) be the specification module, MI = (KI , II ,OI)the implementation module, and MT = (KT , IT ,OT ) the test case module. The following holds:OT = II = IS , and IT = OI = OS .

When testing with model checkers, the following test scenario is assumed: Implementation moduleand test case are executed in parallel. This is modeled as composition of the test case and implemen-tation modules. Composition of modules is defined in (Boroday et al., 2007) as follows: Composi-tion of modules MI = (KI , II ,OI) and MT = (KT , IT ,OT ), where no hidden variable of one module isa variable of another (API∩HT = APT∩HI = ∅), is MI ||MT = (KI ||KT , (II∪IT )\(OI∪OT ), (OI∪OT )).Here, it might be necessary to rename output and hidden variables for the composition. As OT = II

and IT = OI , the composition of the two modules simplifies to: MI ||MT = (KI ||KT , {}, (OI ∪ II)).This means that any execution of the module will use the input values provided by the test case.As the execution after the final state of a test case is not of interest, it is assumed that the test caseexecution framework stops execution at this point.

The composition of two Kripke structures KI = (S I , S I,0,TI , LI) and KT = (S T , S T,0,TT , LT ) isdefined as follows: KI ||KT = (S , S 0,T, L), where:

• AP = API ∪ APT

• S = {(s, s′) | LI(s) ∩ APT = LT (s′) ∩ API}

• T = {((w,w′), (s, s′)) | (w, s) ∈ TS , (w′, s′) ∈ TI} ∩ S

• S 0 = (S I0 × S T0) ∩ S

• L(s, s′) = LI(s) ∪ LT (s′) for (s, s′) ∈ S

Intuitively, the input values of the implementation are provided by the test case. At each executioncycle, the system under test receives input values for a fixed set of inputs. At the next cycle, thetester compares the output values for a fixed set of outputs with those described by the test case. Ifthere is a deviation, then the test case fails, otherwise the tester continues with the next input values.If there are no more input values left in the test case, then it has passed.

Definition 25 (Observation). An observation obs(t, I) resulting from executing test case t on animplementation I is any trace of length length(t) of MI ||t, where MI is the model representing theimplementation I.

A verdict function verdict(t, obs(t, I)) = {pass, f ail} decides whether a test case detected a fault ornot. Test-case execution fails, if the observed behavior does not match the expected behavior. As weare only considering deterministic models, the verdict can be either pass or fail. If nondeterministicmodels are considered, an inconclusive verdict is also possible.

Definition 26.

verdict(t, obs(t, I)) = pass iff ∀0 ≤ i ≤ length(t) : Out(t)i = Out(obs(t, I))i

verdict(t, obs(t, I)) = f ail iff ∃i : Out(t)i , Out(obs(t, I))i


Consequently we can define what it means for an implementation to pass a test suite:

Definition 27.

I passes T =de f ∀t ∈ T : verdict(t, obs(t, I)) = pass

Conformance testing describes the process of examining an implementation by means of testingwhether it complies with its functional specification. The specification describes what the system isintended to do. The decision of whether an implementation is correct with regard to its specificationis defined by an implementation relation. The test hypothesis allows to define a formal relationbetween implementation and specification. Many different such relations have been defined in thepast. For example, trace equivalence requires that the set of possible traces of implementation andspecification are equal. Trace preorder requires that all possible traces of the implementation areincluded in the set of specification traces. Testing preorder adds the requirement that observeddeadlocks are also part of the specification. The relation conf restricts the considered traces to onlythose of the specification; i.e., it requires that an implementation actually does what it is supposedto do, and not that it does not do what it is not allowed to do (Tretmans, 1996).

We first consider trace preorder, which requires that all possible traces of a model have to be tracesof the implementation:

Definition 28 (Trace preorder). For implementation module Mi and specification module Ms, tracepreorder is defined as:

Mi ≤tr Ms =de f traces(Mi) ⊆ traces(Ms)

Trace preorder requires that all traces of Mi have to be valid traces according to Ms. Research onformal methods and testing theories has resulted in many different conformance relations. Whencomparing the different relations, trace preorder is relatively weak; this means, that an implemen-tation that conforms according to trace preorder might not conform according to another, strongerrelation. In addition, showing exhaustiveness with regard to trace preorder would require the testingalgorithm to generate all traces of the implementation, whereas we focus on traces of the specifica-tion.

The strictest relations are bisimulation or observation preorder, which are too restrictive to be usedin practice. Many conformance relations are defined in the domain of labelled transition systems,and require that deadlocks can be observed. In contrast, Kripke structures do not support deadlocks(It is possible to simulate deadlocks with infinite loops.) As the implementation is expected to run inan infinite execution cycle, any deadlock is a failure. These restrictions have an effect on normallystricter relations.

Next we consider testing preorder. The usual definition of testing preorder requires two functionsobsd(t,M) and obsr(t,M), where obsd(t,M) denotes the set of traces that can be observed whentesting model M with test case t, such that the trace leads to a deadlock. obsr(t,M) denotes theset of traces that can be performed by M when executing t without leading to a deadlock. In ourtest scenario, t itself is an element of traces(Ms), as model checking selects one such trace ascounterexample. A popular relation that uses deadlocks is testing preorder:


Definition 29 (Testing preorder). For systems Mi and Ms, Mi ≤te Ms iff for every test case t:

Mi ≤te Ms =de f

obsd(t,Mi) ⊆ obsd(t,Ms)∧

obsr(t,Mi) ⊆ obsr(t,Ms)

The first condition requires that whenever a test case t may deadlock on an implementation, it mayalso deadlock on the specification. The second condition requires that all traces without deadlockthat can be observed when executing test case t also have to be valid traces in the specification.

As no trace may deadlock in the specification, obsd(t,Ms) = {}; this means that any obsd(t,Mi) , {}is erroneous. Furthermore, as we are only considering deterministic models, obsr(t,M) reducesto our observation function in Definition 25, which means that there is only one observation for amodel. Consequently, testing preorder basically reduces to trace preorder, as it can only requireobserved traces to be traces of the specification, in addition to forbidding deadlocks. Therefore weconsider another popular relation, conf :

Definition 30 (Relation conf ). For systems Mi and Ms, Mi con f Ms iff for every test case t:

Mi con f Ms =de f

obsd(t,Mi) ∩ traces(Ms) ⊆ obsd(t,Ms)∧

obsr(t,Mi) ∩ traces(Ms) ⊆ obsr(t,Ms)

The relation conf checks whether the implementation does not have unspecified deadlocks for tracesin the specification, and whether observed traces without deadlocks are traces of the specification.Again, the first condition results in fail verdicts for any implementation that has a deadlock becauseobsd(t,Ms) = {}. We can therefore redefine conf for our scenario:

Definition 31 (Relation conf for deterministic modules).

Mi con f Ms =de f ∀t ∈ traces(Ms) : obs(t,Mi) = obs(t,Ms)

Recall that testing is sound, if any implementation that is detected as incorrect is really incorrect,and that testing is exhaustive if any incorrect implementation can be detected by testing. It is easyto see that testing with model checkers is sound and complete with regard to conf :

Claim 1. Test case generation with model checkers is sound with regard to conf.

Proof. To prove soundness by contradiction, assume a correct implementation Mi, and a test caset such that implementation Mi fails t. As the implementation is correct, it does not deadlock. Ac-cording to Definition 30, this means that obsr(t,Mi) ∩ traces(Ms) ⊆ obsr(t,Ms) is not fulfilled.This further means that there exists a trace t′, such that t′ ∈ obsr(t,Mi) ∧ t′ ∈ traces(Ms), butt′ < obsr(t,Ms). As Ms is deterministic, obsr(r,Ms) ⊆ traces(Ms) for any r. Test case t is createdby checking Ms against a trap property φ, such that Ms 6|= φ and t is created as a counterexample.By Definition 9, the counterexample t ∈ traces(Ms). Consequently t′ cannot exist. Therefore, testcases generated with model checkers are sound. ut


Claim 2. Test case generation with model checkers is exhaustive with regard to conf.

Proof. Assume an incorrect implementation Mi, such that there exists no test case t which fails onMi. According to Definition 30, Mi is incorrect if there exists a test case t such that obsr(t,Mi) ∩traces(Ms) ⊆ obsr(t,Ms), or there exists some obsd(t,Mi)∩traces(Ms) , {}. There are no deadlocksin traces(Ms), therefore the latter would be detected. As obsr(t,Ms) ⊆ traces(Ms), traces(Ms) is thepossible set of test cases. As there exists no test case which fails on Mi, all test cases in traces(Mi)pass. This, however, is a contradiction, because that means Mi is correct.

Test case generation is exhaustive if all traces in traces(Ms) can be generated. Let T (Ms) denote theinfinite set of temporal logic properties that can be formulated over the input and output variablesof Ms. A trace t = 〈s0, s1, ...sn〉 can be represented as a property φ = �

∧L(s0) → ©

∧LO(s1) ∧∧

LI(s1)→ ©∧

LI(s2)...→ ©∧

LO(sn). By negating the final∧

LO(sn) we get a trap property φ′

that is violated only by t or any trace that subsumes t; i.e., M, t 6|= φ. Both φ and φ are contained inT . If a trap property φ is created for every t ∈ traces(Ms) and then used to generate test cases, or ifall properties t ∈ T are model checked, then the resulting test suite contains all traces in traces(M).Therefore, testing with model checkers can be exhaustive. ut

Claim 3. Test case generation with model checkers is complete with regard to conf.

Proof. According to Definition 24, a test case generation method is complete iff it is sound andexhaustive. According to claims 1 and 2, testing with model checkers is sound and exhaustive,therefore it is complete. ut

The work of Tretmans (Tretmans, 1996) is based on labelled transition systems (LTS), and furtherpartitions actions into input and outputs. Different relations defined on input-output systems aredefined, ioco being the most popular one. We now show that under the restrictions applied in ourtesting scenario, ioco gains no advantages compared to conf.

States contain no information in labelled transition systems; all information is represented by transi-tion labels. In contrast, states and not transitions contain information in Kripke structures. The twoformalisms are examples of the two possible dual viewpoints on system behavior: LTSs are eventbased models, while Kripke structures are state based. Tretmans further partitions actions into inputand output actions, and then defines different conformance relations on the resulting Input OutputLabelled Transition Systems (IOLTS).

Definition 32 (Labelled Transition System). A labelled transition system ML is a 4-tuple ML =

(S , s0, L,T ), where S is a non-empty set of states, s0 is the initial state, L is a countable set oflabels, and T ⊆ S × (L ∪ {τ}) × S is the transition relation. τ represents an unobservable, internalaction.

Definition 33 (Input-Output Labelled Transition System). An input-output labelled transition sys-tem MI is a labelled transition system in which the set of actions L is partitioned into input actionsLI and output actions LO such that L = LI ∪ LO.

Claim 4. Every Kripke structure K = (S , s0,T, L) based on a set of variables V has an equivalentLTS representation.


Proof. For Kripke structure K = (S , S 0,T, L) with variables V , the equivalent LTS is ML = (S ′, S ′0, L′,T ′).

Let v be a valuation for the variables in V , andV the set of all possible valuations.

• S ′ = S

• L′ = {v | v ∈ V}

• T ′ = {(s, v, s′) | s, s′ ∈ S ∧ (s, s′) ∈ T ∧ v ∈ V ∧ v ⊆ L(s′)}

• S ′0 = S 0

ut

Each valuation represents a state in the Kripke structure; these valuations are simply used as transi-tion labels. As there is an output value at every state of any execution of a Kripke structure, thereare no unobservable τ actions. The LTS is deterministic; that is, for every state there are no twooutgoing transitions with the same label. The input/output distinction of modules allows a mappingof modules to IOLTS.

(a) Example 1 FSM.

MODULE mainVARx,y: boolean;ASSIGNnext(y) := casex = 0: 0;x = 1: 1;esac;

(b) Example 1 SMV Code.

Figure 3.22: Simple example model as FSM and the according SMV model.

Claim 5. Every module M = (K, I,O), where K is a Kripke structure K = (S , s0,T, L), and I and Oare sets of input and output variables, respectively, has an equivalent IOLTS representation.

Proof. For module M = (K, I,O) with K = (S , s0,T, L), the equivalent IOLTS MI is MI =

(S ′, s′0, L′,T ′). Let vi be a valuation for the input variables I, and vo a valuation for output vari-

ables O, and VI and VO the sets of all possible valuations.

If a deterministic model is mapped, the initial states in the Kripke structure differ only in inputvaluations. Let Next(s) denote the set of states that is reachable from state s in the Kripke structure.vi = In(s) returns the input valuation vi at state s, and vo = Out(s) returns the output valuation vo atstate s.

• LI = {vi | vi ∈ Vi}

• LO = {vo | vo ∈ Vo}

• L′ = LI ∪ LO

• S L = {Next(s) | s ∈ S }


(a) Kripke structure/module. (b) Equivalent IOLTS.

Figure 3.23: Example 1 as Kripke structure and IOLTS.

• S I = {s′ | sL, s′L ∈ S L ∧ ∃(s, s′) ∈ T ∧ s ∈ sL ∧ s′ ∈ s′L}

• S ′ = S L ∪ S I

• TI = {(sL, vi, s′) | sL ∈ S L ∧ s′ ∈ S I ∧ vi = In(s)∀s ∈ sL}

• TO = {(s′, vo, sL) | sL ∈ S L ∧ s′ ∈ S I ∧ vo = Out(s)∀s ∈ sL}

• T ′ = TI ∪ TO

As the model is deterministic, |{sL|sL ∈ S L ∧ s ∈ sL}| = 1, hence each state of the Kripke structureonly occurs in one set of directly reachable states. ut

Each possible valuation of the input variables of a Kripke structure represents one possible action,and each possible valuation of output variables represents an output action. Again there are no un-observable actions. Even if some internal, unobservable values are changed, the unchanged outputvalues will still be used as output action. As the Kripke structure is deterministic, the IOLTS isalso deterministic. Tretmans assumes that the implementation can be modeled as an input-outputtransition system, which is an IOLTS where each state is input enabled; this assumption is calledthe test hypothesis. Furthermore, test cases are also IOLTS. The conversion of a linear test case toan IOLTS is straight forward: the IOLTS is linear and consists of an alternation of input and outputactions. Here, input and output are again viewed inverted to input and output of the implementation.If the final output action of the implementation is correct, a pass state is reached; any deviation ofthe test case results in a fail state.

Figure 3.22(a) shows a simple example FSM, which consists of input x, which can take on values0 and 1, and produces y, which can also take on values 0 and 1. Figure 3.22(b) illustrates howthis is modeled in SMV. The resulting Kripke structure is shown in Figure 3.23(a), and the IOLTSresulting from the conversion is shown in Figure 3.23. A second example with a different transitionrelation is illustrated in Figures 3.24 and 3.25.


(a) Example 2 FSM.

MODULE mainVARx,y: boolean;ASSIGNnext(y) := casex = 0 & y = 0: 0;x = 0 & y = 1: 1;x = 1 & y = 0: 1;x = 1 & y = 1: 0;esac;

(b) Example 2 SMV Code.

Figure 3.24: Example model 2 as FSM and the corresponding SMV model.

Definition 34 (Relation ioco).

i ioco s =de f ∀σ ∈ S traces(s) : out(i a f ter σ) ⊆ out(s a f ter σ)

• p a f ter σ is the set of states in which transition system p can be after executing trace σ.

• out(p a f ter σ) is the set of output actions which can occur after execution of trace σ onsystem p.

• S traces(s) denotes the suspension traces of s, which are traces that include the special actionδ, which denotes quiescence (i.e., no output action can occur).

The special action δ denotes absence of an output action. The IOLTS created from the Kripkestructure contains no transitions labelled with δ, therefore S traces(s) = traces(s). The use ofS traces instead of traces is the difference between ioco and ioconf, consequently this first restrictionreduces ioco to ioconf. As the model is deterministic, there is always only one possible outputaction out(p a f ter σ). i a f ter σ iff s a f ter σ, which in turn is only the case if σ ∈ traces(s) andσ ∈ traces(i). Consequently, ioco and ioconf reduce to conf.

3.13 Summary

This chapter has given an overview of testing with model checkers, together with all the necessarybackground information and references. Summarizing,

• A model checker verifies whether a given model fulfills a given property.

• A model checker returns a counterexample upon detecting a property violation.

• Counterexamples can be used as test cases.

• Test cases can be created using coverage criteria and mutation, with or without use of require-ment properties.

3.13. Summary 79

(a) Kripke structure/module.

(b) Equivalent IOLTS.

Figure 3.25: Example 2 as Kripke structure and IOLTS.

• Model checkers can not only create test cases, but also analyze test cases with regard tocoverage and mutation analysis.

• Testing with model checkers is sound and complete.

Chapter 4Using CTL for Model Checker Based TestCase Generation

This chapter is based on (Wijesekera et al., 2007).

4.1 Introduction

The previous chapters discussed the details of testing with model checkers. As a property speci-fication language, the linear temporal language (LTL) was used. While many currently availablemodel checkers use LTL, there are also many that use computation tree logics (CTL). This chapterexamines the differences that result from the use of CTL instead of LTL, and discusses applicationareas that require CTL.

Semantically, linear time logics such as LTL assume time to be linear. Counterexamples to trapproperties expressed in LTL are linear sequences, and a linear sequence can be directly interpretedas a linear test case.

Many current model checkers such as NuSMV (Cimatti et al., 1999) or SAL (de Moura et al.,2004) support CTL model checking in addition to or instead of LTL model checking. In CTLmodel checking, special algorithms are applied to construct linear traces from the initial state tothe violating state (Clarke et al., 1995). However, only certain restricted subsets of temporal logicssuch as ACT Ldet or LIN always result in linear counterexamples (Clarke and Veith, 2004). Whenusing full CTL, linear counterexamples are not always sufficient as evidence for property violationor satisfaction.

Not all test requirements can be expressed with LTL. Sometimes, branching time logics such as CTLare necessary. For example, even though the test requirements of the modified condition/decisioncoverage criterion (Chilenski and Miller, 1994) (MC/DC) can be expressed as CTL (Clarke andEmerson, 1982) properties, linear counterexamples are not sufficient to create according test cases,because MC/DC requires pairs of related test cases. In the example of MC/DC there are severalapproaches of how to work around this problem as described in Section 3.5. For example, external

81

82 Chapter 4. Using CTL for Model Checker Based Test Case Generation

tools are used in addition to the model checker (Rayadurgam and Heimdahl, 2001b), or the modelis altered so that linear counterexamples can be used (Rayadurgam and Heimdahl, 2003).

This chapter revisits the idea of how to relate test cases to counterexamples. In general, a coun-terexample should result in as many test cases as are demanded by the underlying test criterion.

4.2 Evidence Graphs - Counterexamples and Witnesses for CTL

Current model checkers create linear sequences as counterexamples. In explicit model checking,the search path that leads to a violating state is returned as a counterexample. In symbolic modelchecking, special algorithms are applied to construct linear traces from the initial state to the violat-ing state (Clarke et al., 1995). In general, only certain restricted subsets of temporal logics such asACT Ldet or LIN always result in linear counterexamples (Clarke and Veith, 2004). When using fullCTL, linear counterexamples are not always sufficient as evidence for property violation, and linearwitnesses cannot always show satisfaction. For example, consider the formula (EX x ∧ EX ¬x). Awitness to this formula needs a state, where there is a successor state where x is true, and a successorstate where ¬x is true. Clearly, these two states cannot be identical. Therefore, a different approachto the traditional linear counterexample/witness generation is necessary.

A possible solution is to make a clear distinction between the state that violates or satisfies a CTLstate formula, and the evidence that shows why this is so. The model checking problem (Clarkeet al., 2001, p. 35) is to find the set of states that satisfy a given formula in a given Kripke structure.That is, given a formula φ and a Kripke structure K = (S , S 0,R, L), the objective is to determinethe set S φ = {s ∈ S : K, s |= φ}. If the initial state set S 0 is included in this set, then the Kripkestructure satisfies the formula φ. If a state that violates φ is found, then a model checker returns acounterexample. Strictly speaking, a state formula is violated in a state. Consequently, we definecounterexamples and witnesses as follows:

Definition 35 (Counterexamples and Witnesses). Let K = (S , S 0,R, L) be a Kripke structure and φa CTL formula. A state s ∈ S is said to be:

• A counterexample for φ in K if K, s 6|= φ.

• A witness for φ in K if K, s |= φ.

Definition 35 shows that counterexamples and witnesses are complementary concepts: A witnessstate s for formula φ is a counterexample state for ¬φ. Thus, a solution to the model checkingproblem for the formula φ in K finds all witnesses for φ in K and a solution to the model checkingproblem for ¬φ in K finds all counterexamples for φ in K. According to this definition, any solutionto the model checking problem for φ/¬φ only finds the set of states in which φ is true or false, butwithout demonstrating any evidence for the claim.

Reconsidering the example formula (EX x ∧ EX ¬x), a witness to this formula is any state wherethere exists a successor state where x is true, and a different successor state where ¬x is true. Theevidence for the validity of this formula is given as a structure that contains a path from the witnessstate to a state where x is true and a state where ¬x is true. We refer to this structure as an evidencegraph.

To simplify the definition of evidence graphs, we assume that formulas are given in negation normalform.

4.2. Evidence Graphs - Counterexamples and Witnesses for CTL 83

Definition 36 (Negation Normal Form). A CTL formula in which the temporal operators do notappear within the scope of a negation is said to be in negation normal form (NNF).

Lemma 4.2.1. For every CTL formula there exists a provably equivalent formula in negation normalform.

Proof: The following identities can be used to push the negation (¬) inside the scope of temporaloperators of any given CTL formula:

¬EX φ ↔ AX ¬φ¬EF φ ↔ AG ¬φ¬EG F ↔ AF ¬φ¬EφU φ2 ↔ A(¬φR ¬φ2)

¬EφR φ2 ↔ A(¬φU ¬φ2)

¬AX φ ↔ EX ¬φ¬AF φ ↔ EG ¬φ¬AG F ↔ EF ¬φ¬AφU φ2 ↔ E(¬φ2U (¬φ ∧ ¬φ2)) ∨ EG ¬φ2

¬AφR φ2 ↔ E(¬φU ¬φ2)

The truth of the above equivalences is shown in (Clarke et al., 2001). These equivalences togetherwith distributivity of negation (¬) over conjunction (∧) and disjunction (∨) can be used to recur-sively show that every CTL formula is equivalent to one in negation normal form.

In order to define evidence graphs, a Kripke structure is interpreted as a rooted directed graphGK = (S ,R). The vertices of this graph are labeled by L. Informally, an evidence graph is asubgraph of GK that contains all information necessary as evidence for a given property.

For example, if the formula is of the form ψ = AG AX φ, then the evidence graph is the subgraphG of GK consisting of all paths of length two starting from every state in S ψ = {s ∈ S : K, s |=AG AX φ}. Note that K, π1 |= φ for any path π where π0 ∈ S ψ.

For two graphs G1 = (S 1,R1) and G2 = (S 2,R2) we use G1 ⊆ G2 as a shorthand for S 1 ⊆ S 2 ∧R1 ⊆

R2. The set of paths that start in state s denoted as Paths(s) always refers to the paths of the Kripkestructure K.

Definition 37 (Evidence Graph). The evidence graph of a CTL formula φ given in negation normalform and Kripke structure K = (S , S 0,R, L) is a rooted directed graph G = (S ′,R′), where S ′ ⊆ S ,R′ ⊆ R, and vertices are labeled by L. The root node of the evidence graph is referred to as thehead (notation Head(G)). The set of all evidence graphs for φ in K is denoted as EVD(φ,K). G isinductively defined on the structure of φ:

1. For any CTL formula φ without temporal operators, EVD(φ,K) is the collection of singlestate subgraphs obtained from the set of states {s ∈ S : K, s |= φ} of K. For any such singlestate subgraph G ∈ EVD(φ,K) we define the head of G, Head(G) to be the only state in G.


2. An evidence graph G of the formula φ1 ∧ φ2, where φ1 or φ2 contains logical operators,satisfies the following properties:

• G ⊆ GK .• K,Head(G) |= φ1 ∧ φ2.• Head(G) = Head(t) = Head(t′) for evidence graphs t ∈ EVD(φ1,K) and t′ ∈ EVD(φ2,K).• t, t′ ⊆ G.

3. An evidence graph G of the formula φ1 ∨ φ2, where φ1 or φ2 contains logical operators,satisfies the following properties:

• G ⊆ GK .• K,Head(G) |= φ1 ∨ φ2.• Head(G) = Head(t) for evidence graph t ∈ EVD(φ1,K) or t ∈ EVD(φ2,K).• t ⊆ G.

4. An evidence graph G of the formula AX φ satisfies the following properties:

• G ⊆ GK .• K,Head(G) |= AX φ.• For all paths π = 〈π0, π1〉 ∈ Paths(Head(G)) of length 2: π1 = Head(t) for some

evidence graph t ⊆ G, and t ∈ EVD(φ,K).

5. An evidence graph G of the formula AF φ satisfies the following properties:

• G ⊆ GK .• K,Head(G) |= AF φ.• All paths π = 〈π0, . . . , πn〉 ∈ Paths(Head(G)) satisfy the following properties:

– π ⊆ G.– There is a state πi on π such that πi = Head(ti) for some evidence graph ti ∈

EVD(φ,K), where t ⊆ G.

6. An evidence graph G of the formula AG φ satisfies the following properties:

• G ⊆ GK .• K,Head(G) |= AG φ.• All paths π ∈ Paths(Head(G)) satisfy the following properties:

– π ⊆ G.– Every state state πi of π is Head(ti) for some ti ∈ EVD(φ,K) and ti ⊆ G.

7. An evidence graph G of the formula A φ1U φ2 satisfies the following properties:

• G ⊆ GK .• K,Head(G) |= A φ1U φ2.• ∀π ∈ Paths(Head(G)) : π ⊆ G.• There is a state π j on π satisfying the following properties:

– For all i < j, πi is Head(ti) for some ti ∈ EVD(φ1,K) with ti ⊆ G.– For all i ≥ j, πi is Head(ti) for some ti ∈ EVD(φ2,K) with ti ⊆ G.

8. An evidence graph G of the formula A φ1R φ2 satisfies the following properties:

• G ⊆ GK .


• K,Head(G) |= A φ1R φ2.• ∀π ∈ Paths(Head(G)) : π ⊆ G.• In addition, one of the following conditions must hold:

– There is a state π j on π such that for all i < j πi is Head(ti) for some ti ∈ EVD(φ1,K)and ti ⊆ G, and for all i ≥ j, πi is Head(ti) for some ti ∈ EVD(φ2,K) and t ⊆ G, or

– For all states πi, πi = Head(ti) for some evidence graph ti ∈ EVD(φ2,K) withti ⊆ G.

9. An evidence graph G of the formula EX φ satisfies the following properties:

• G ⊆ GK .• K,Head(G) |= EX φ.• There is a path π = 〈π0, π1〉 ∈ Paths(Head(G)), such that π1 = Head(t) for some

evidence graph t ⊆ G, and t ∈ EVD(φ,K). In addition π ⊆ G.

10. An evidence graph G of the formula EF φ satisfies the following properties:

• G ⊆ GK .• K,Head(G) |= EF φ.• There is a path π ∈ Paths(Head(G)) and state πi on π, such that πi is Head(t) for some

evidence graph t ∈ EVD(φ,K), and t ⊆ G. In addition π ⊆ G.

11. An evidence graph G of the formula EG φ satisfies the following properties:

• G ⊆ GK .• K,Head(G) |= EG φ.• There is a path π ∈ Paths(Head(G)) such that π ⊆ G.• Every state state πi of π is Head(ti) for some ti ∈ EVD(φ,K) with ti ∈ G.

12. An evidence graph G of the formula E φ1U φ2 satisfies the following properties:

• G ⊆ GK .• K,Head(G) |= E φ1U φ2.• There is a path π ∈ Paths(Head(G)), π ⊆ G, and a state π j on π satisfying the following

properties:– For all i < j, πi is Head(ti) for some ti ∈ EVD(φ1,K) with ti ⊆ G.– For all i ≥ j, πi is Head(ti) for some ti ∈ EVD(φ2,K) with ti ⊆ G.

13. An evidence graph G of the formula E φ1R φ2 satisfies the following properties:

• G ⊆ GK .• K,Head(G) |= E φ1R φ2.• There is a path π ∈ Paths(Head(G)), π ⊆ G, where one of the following conditions

holds:– There is a state π j on π such that (1) For all i < j, πi is Head(ti) for some ti ∈

EVD(φ1,K) and ti ⊆ G, and (2) For all i ≥ j πi is Head(t) for some t ∈ EVD(φ2,K)with t ⊆ G, or

– For all states πi, πi = Head(ti) for some evidence graph ti ∈ EVD(φ2,K) withti ∈ G.


(a) AX φ (b) EX φ (c) AG φ (d) EG φ

(e) AF φ (f) EF φ

(g) Aφ1U φ2 (h) Eφ1U φ2

(i) Aφ1R φ2 (j) Eφ1R φ2

Figure 4.1: Evidence graphs for CTL connectives.


Figure 4.1 shows example evidence graphs for all basic CTL operators. In the graphs, filled verticesdenote states of interest, for example, where φ1 or φ2 hold, and white vertices are arbitrary states,where the propositions of interest might not hold. Only the colored states are parts of the evidencegraphs. Possible alternative execution paths that are not part of the evidence graphs are indicatedlight grey.

We can show that every witness state to a CTL formula is at the head of an evidence graph, andvice-versa. Therefore, to create an evidence graph as a counterexample to a CTL formula φ, it ispossible to use an evidence graph for the negation normal form of ¬φ.

Theorem 4.2.2. A CTL formula φ is satisfiable in a state s ∈ S of the Kripke structure K =

(S , S 0,R, L) if and only if there is an evidence graph t ∈ EVD(φ,K) such that s = Head(t).

Proof: This proof is an inductive argument based on the syntax of the negation normal form of theformula φ. Due to the repetitive nature of the proof, we only provide one case.

Suppose the given formula is EF φ, and there is a state s ∈ S of the Kripke Structure K =

(S , S 0,R, L) where K, s, |= EF φ. Then there is a path π ⊆ S satisfying the following properties.

1. s is π0.

2. There is some state πi on π such that K, πi |= φ.

Therefore by the inductive assumption, there is an evidence graph Gφ ∈ EVD(φ,K) where πi isHead(Gφ). Hence, the following graph G now is an evidence graph for EF φ.

1. The states of G are the states of Gφ and the states on π.

2. The accessibility relation of is the restriction of R to the states of G.

3. The head of G (i.e., Head(G)) is s.

Conversely, suppose that there is an evidence graph GEF φ ∈ EVD(EF φ,K) such that s is Head(GEF φ).Then, by definition, K, s, |= EF φ.

According to Theorem 4.2.1, if there is a counterexample to a CTL formula φ, then the evidencegraph of ¬φ (i.e., any graph G ∈ EVD(¬φ,K)) provides the evidence supporting the failure of φ.Test cases are usually finite and linear, while there is no guarantee that evidence graphs are eitherlinear or finite. In order to relate evidence graphs and counterexamples we now address these twoissues.

4.2.1 Non-linearity of Evidence Graphs

Although evidence graphs can be linear, it is not always possible to obtain linear evidence graphs.For example, consider again the formula (EX φ) ∧ (EX ¬φ). Any state s in a Kripke structure Kthat satisfies (EX φ) ∧ (EX ¬φ) must have two states s1 and s2 such that the following conditionshold:

• (s, s1) ∈ R and (s, s2) ∈ R.

• K, s1 |= φ and K, s2 |= ¬φ, so that K, s |= (EX φ) ∧ (EX ¬φ). s1 and s2 cannot be the samestate as K, s1 |= φ and K, s2 |= ¬φ.


¬φ

φ

Figure 4.2: Evidence graph for (EX φ) ∧ (EX ¬φ).

φ

φ

φ

φ

Figure 4.3: A non-linear evidence graph for AF φ.

Figure 4.2 shows the evidence graph of (EX φ) ∧ (EX ¬φ). As a further example, AF φ may resultin branching as shown in Figure 4.3.

The following theorem states sufficient, but not necessary conditions for an evidence graph to belinear.

Theorem 4.2.3 (Linear Evidence Graphs). If a negation normal form of a CTL formula has no ∧,Aor G then it has a linear evidence graph.

Proof: This proof is a consequence of Definition 37. Notice that in the recursive procedure forconstructing an evidence graph, a branching is possible only when there is room for Head(G) to beconnected to more than one other state. These are for the connectives ∧,AF ,AG and EG .

4.2.2 Finiteness of Paths in Evidence Graphs

It is not always possible to use finite sequences as counterexamples. For example, consider theformula AF x; property violation can be shown with an execution path where x never is true. Ex-ecution paths of Kripke structures are usually infinite, because there are no dedicated deadlock oraccept states. Consequently, an infinite path is necessary to illustrate that x is never true.

It has been shown (Clarke et al., 2001) that in such cases lasso-shaped sequences can be used. Asdescribed in Chapter 3, a sequence p is lasso-shaped if it has the form p := p1(p2)ω, where p1 andp2 are finite sequences. This also applies to evidence graphs.

Lemma 4.2.4. Suppose π is a path in a Kripke structure K and a CTL formula φ satisfying thecondition that K, πi |= φ for any state πi on π. Then, there are finite paths α and β such that π isα : βω.

Proof: The proof of this lemma is contained in the proof of Lemma 1, page 36 of Clarke et al.(2001). It is based on the fact that there are finitely many strongly connected components (in the

4.3. Relating Test Coverage To Covering Evidence Graphs 89

sense of graphs) of states, and the last such component the path goes into, does not come out of itand hence constitute β. The finite path α is the path from state πi to the beginning state on β.

Note that β can be an empty sequence. When presenting an evidence graph for a formula, we caneliminate potentially infinite paths by curtailing them after the first cycle. Therefore, using the aboveLemma, we can obtain the following theorem.

Theorem 4.2.5 (Every infinite path ends in a loop). Every evidence graph G ∈ EVD(φ,K) of aCTL formula φ contains an evidence graph G′ for the same formula φ such that: (1) Head(G) =

Head(G′), and (2) every path in Head(G′) is the concatenation of two parts, a finite branch, and afinite cycle - in that order.

Proof: The essence of this proof uses Lemma 4.2.4, where any search for an infinite path π onwhich a formula φ must be true can be curtailed to α; β where both α and β are finite, β = β0, . . . βn

satisfies β0 = βn (i.e., β is a cycle). This curtailing of repetitive cycles can be done in case of therecursive steps for EG , EU , ER , AG , AU and AR in the construction of evidence graph.

Therefore, every evidence graph can be curtailed to one that has only finitely long paths or infinitepaths that are the concatenation of a finite part and a finite cycle. This kind of infinite path is forexample considered by Tan et al. (2004), who also discuss the curtailing. Hence, given a CTLformula any counterexample to that formula in any given Kripke structure can and must be givenby an evidence graph that has paths that end up in cycles.

4.3 Relating Test Coverage To Covering Evidence Graphs

A common method to generate test cases with model checkers is to represent test coverage criteriaas trap properties, as described in Chapter 3. So far, LTL was used to specify trap properties. Theuse of CTL instead of LTL has no impact as long as only linear sequences as counterexamples areconsidered. In this section, we consider test case generation using evidence graphs for CTL trapproperties. Every counterexample to a CTL formula can be witnessed by the evidence graph of thenegation normal form of its negation. Any set of paths covering an evidence graph of φ satisfies thetest requirement.

Definition 38 (Coverage of Evidence Graphs:). A set of paths Π is said to cover the evidence graphGφ of the CTL formula φ, if the following conditions are satisfied:

1. π0 = Head(Gφ) for all paths π ∈ Π.

2. Gφ ⊆ ∪Π.

Here, ∪Π denotes the graph obtained by combining the paths in Π.

Consequently, the following steps are taken in order to generate a test suite (i.e., set of test cases)satisfying a given coverage criterion:

1. A set of test requirements is generated in CTL.

2. For each test requirement, the negation normal form of the negation is model checked, result-ing in an evidence graph.


3. For each test requirement, a set of test cases that covers an according evidence graph is cre-ated.

4. The test cases possibly need a prefix that leads from the initial state to the head of the ev-idence graph. Such a prefix can be calculated using traditional counterexample generationtechniques.

4.3.1 MC/DC Coverage

We now illustrate this approach using the well known test coverage criterion modified condition/de-cision coverage (Chilenski and Miller, 1994) (MC/DC), which is popular in safety relevant environ-ments. A test suite satisfies MC/DC if it guarantees that:

• Every entry and exit point of a program has been invoked at least once.

• Every condition in a decision in the program has taken on all possible outcomes at least once.

• Each condition has been shown to independently affect the decision.

The final requirement is problematic for model checker based testing. In order to show that eachcondition independently affects the decision, the values of all other conditions are fixed, and theconsidered condition is varied. Consequently, each condition requires a pair of test cases, but tra-ditionally there is no way to create pairs of counterexamples with a model checker. Section 3.5described workarounds to this problem, which involved model rewriting and the use of externaltools, such as constraint solvers.

It is possible to create pairs of test cases for MC/DC using evidence graphs. Suppose that a de-cision C depends on propositions {a1, . . . , an}. MC/DC seeks only those variations of parameters{a1, . . . , an} that alter the truth value of C. Let Ai = {b1, . . . , bn} be a valuation of the propositionsin C such that C is true if ai = bi for all ai in C. Assume a further valuation Ai, such that C is false,where for all j , i the j-th value in Ai equals the j-th value in Ai, and bi in Ai is replaced with ¬bi

in Ai.

To show that a proposition ai independently affects a decision C it is necessary to create a pair oftest cases, such that in one test case C evaluates to true using Ai and in the other to false using Ai;that is, all values in Ai except bi are identical in both test cases.

Assuming a valid Ai, such that there is an according Ai, the first case is achieved with φ1 =

EF (C(∧

j,i a j = b j) ∧ ai = bi), and the second case is achieved with φ2 = EF (¬C ∧ (∧

j,i a j =

b j)∧¬(ai = bi)). Here, the values bi are those contained in Ai. Consequently, a pair of test cases thatshows that a proposition independently affect the decision C is represented by the evidence graphof φ1 ∧ φ2:

EF (C ∧ (∧j,i

a j = b j) ∧ ai = bi)∧

EF (¬C ∧ (∧j,i

a j = b j) ∧ ai , bi)

4.3. Relating Test Coverage To Covering Evidence Graphs 91

Not all possible variations of Ai can lead to MC/DC pairs. A naive solution to finding a suitable Ai

is to simply create all 2n possible variations of the n elements in Ai, and then create a disjunction of(φ1∧φ2) for each version Ai. A more sophisticated approach is to use Boolean constraint satisfactionto determine those Ai where ai really determines the value of C, and then combining only these Ai

in a disjunction.

x ∧ ¬y ∧ z

¬x ∧ ¬y ∧ z

x ∧ y ∧ ¬z

¬x ∧ y ∧ ¬z

¬x ∧ y ∧ z

x ∧ y ∧ z

Figure 4.4: Three different possible evidence graphs for MC/DC with regard to x in C = x∧ (y∨ z).

In order to illustrate this, consider an example where the decision C is given as C := x ∧ (y ∨ z).When considering the proposition x in this example, there are three different valid variations ofAi such that MC/DC pairs of test cases for x in C are generated. MC/DC for x can be stated asfollows:

EF (C ∧ x ∧ ¬y ∧ z) ∧ EF (¬C ∧ ¬x ∧ ¬y ∧ z)

∨ EF (C ∧ x ∧ y ∧ ¬z) ∧ EF (¬C ∧ ¬x ∧ y ∧ ¬z)

∨ EF (C ∧ x ∧ y ∧ z) ∧ EF (¬C ∧ ¬x ∧ y ∧ z)

Figure 4.4 illustrates evidence graphs resulting from this example. Any of these evidence graphsprovides a pair of test cases that cover x in C according to MC/DC. In order to create a completetest suite, similar test requirements have to be formulated for all propositions in all decisions.

4.3.2 Using Evidence Graphs to Test Dangerous Traces

As a further example of test requirements that benefit from evidence graphs, we discuss dangeroustraces as described in Section 3.6.3. A trace is dangerous to a safety property, if it describes apossible property violation. Ammann et al. (2001) distinguish between dangerous traces where theproperty violation always immediately occurs after the mutation (AX dangerous), sometimes occursafter the mutation (EX dangerous), and always/sometimes eventually occurs (AF/EF dangerous).There are failing and passing dangerous traces, where the former contain the property violationwhile the latter take a correct transition instead of the mutated transition.

Test requirements for all types of dangerous traces are given in Section 3.6.3. For example, AXdangerous failing/passing traces are created with the following two test requirements (which haveto be negated to result in trap properties):


EF (original ∧ EX (¬original) ∧ AX (¬original→ ¬P))

EF (original ∧ EX (original) ∧ EX (¬original) ∧ AX (¬original→ ¬P))

The test requirement for the passing test relies on the implementation of the model checker to pickthe correct trace such that transitions from K are chosen. However, also transitions from K′ wouldresult in valid counterexamples to the property, while not representing a valid passing test case.Dangerous traces can benefit from the use of evidence graphs. The following property is sufficientto create an evidence graph that consists of both, a passing test and a failing test:

(original ∧ AX (¬original→ ¬P) ∧ AX (original→ P))

A resulting evidence graph is depicted in Figure 4.5.

original

original ∧ P

¬original ∧ ¬P

Figure 4.5: An AX dangerous evidence graph for property P.

Similar properties can be defined for the other types of dangerous traces:

• AF dangerous traces:

(original ∧ EX (original) ∧ AX (¬original→ AF (¬P)))

• EX dangerous traces:

(original ∧ EX (original) ∧ EX (¬original ∧ ¬P))

• EF dangerous traces (see Figure 4.6):

(original ∧ EX (original) ∧ EX (¬original→ EF (¬P)))

¬P¬original

original

original

Figure 4.6: An EF dangerous evidence graph for property P.

4.4. Summary 93

4.4 Summary

In this chapter, we have discussed why a one to one relationship between test cases and counterex-amples is not satisfactory with respect to some test criteria. The issue is that the temporal formulasneeded to directly express the test requirements associated with some criteria result in counterexam-ples that require multiple traces for adequate coverage. We provided the notion of an evidence graphfor a temporal formula, as well as mappings from traces, i.e., test cases, to evidence graphs. It turnsout that the operators ∧, A and G are responsible for cases where multiple traces are needed. Weshow that finitely many finite traces can cover any evidence graph. We also describe how to embedtest cases in a finite state model so that extant test sets can be evaluated for satisfaction of a giventest criterion. These results not only serve the testing community with a more solid foundation forgenerating test sets with model checkers, but they also suggest to the model checking communityhow counterexample generators could be made more useful.

In the context of counterexample creation, the issue that not all property violations can be illus-trated with linear traces has recently received increased attention. For example, Clarke et al. (2002)describe an algorithm to derive tree-like counterexamples for ACTL formulas (CTL formulas thatonly contain the A path quantifier). For this subset, a tree-like counterexample can be interpretedas an evidence graph together with a path leading from an initial state to the head of the graph.Meolic et al. (2004) take a different approach and derive counterexample or witness automata for afragment of action computation tree logic.

This chapter has considered only test case generation using CTL trap properties with nonlinearevidence graphs. Evidence graphs, however, also have an impact on coverage analysis. In general,test cases are converted to verifiable models for analysis with a model checker. A test criterion thatrequires more than one test case to be covered cannot be analyzed with such an approach. This issueis considered in the following chapter.

Chapter 5Test Suite Analysis with Test SuiteModels

5.1 Introduction

Model checkers, originally intended as formal verification tools, have recently been used to bothgenerate and analyze test cases. Full automation is possible with model checkers, and differentconcrete techniques allow variations of the results.

When using model checkers for analysis purposes, each test case is converted to a separate model,which can then be verified with regard to certain properties (Ammann and Black, 1999b). Thisapproach is straight forward, but incurs an overhead by calling the model checker on each test casemodel.

When looking for fulfilled test requirements, verifying each test case separately only works as longeach test requirement can be fulfilled by a single test case. Modified condition/decision coverage(MC/DC), for example, requires pairs of related test cases. In such a case it is not sufficient to vieweach test case separately.

In this chapter, a technique to analyze extant test suites with a model checker is proposed, whichmerges a set of test cases to a combined model. This combined model can then be used for efficienttest suite analysis.

5.2 Test Suite Analysis with Model Checkers

Model checkers can be used to analyze existing test cases. Ammann and Black (1999b) presentedan approach to convert test cases into constrained finite state machines (CFSM). A special statecounter variable is introduced and sequentially increased, while the values of all other variablesdepend only on the value of the state counter. When described in the input language of a modelchecker, this results in a verifiable model. The details of this approach are given in Section 3.8.

95

96 Chapter 5. Test Suite Analysis with Test Suite Models

Figure 5.1 shows a NuSMV model representing the test case t := 〈(x = 1, y = 0), (x = 0, y = 1), (x =

1, y = 1)〉. The state counter variable is State. Special care has to be taken because values do notchange once the end of a test case is reached. Properties have to be rewritten so that this does notcause unwanted counterexamples (Ammann and Black, 1999b). The test case model can be usedfor coverage analysis and for mutation analysis.

MODULE mainVARx: boolean;y: boolean;State: 0..2;

ASSIGNinit(x):= 1;next(x):= caseState = 0: 0;State = 1: 1;1: x;esac;

init(y):= 0;next(y):= caseState = 0: 1;State = 1: 1;1: y;esac;

init(State) := 0;next(State) := caseState < 2: State+1;1: State;esac;

Figure 5.1: Test case t := 〈(x = 1, y = 0), (x = 0, y = 1), (x = 1, y = 1)〉 as NuSMV model.

5.3 Merging Test Cases to a Model

The conversion and analysis of each test case separately incurs a certain overhead that is not neces-sary. In addition, some test requirements can not be covered by a single test case, although they canbe expressed as single branching time temporal logic properties (Wijesekera et al., 2007). There-fore, we propose to merge all test cases of a given test suite to a combined automaton, and thenapply analysis to this automaton.

The combined automaton for T consists of all the transitions that are contained in the test cases.If at any state an input is received for which there exists no transition in the test suite, then theautomaton enters a special unresolved state su, which it cannot leave once entered.

Definition 39 (Test Suite Model). A test suite model KT for test suite T for Kripke structure K =

(S , S 0,T, L) is itself a Kripke structure KT = (S K , S 0,K ,TK , LK), where:

• S K = {ti | 0 ≤ i ≤ length(t) : < ..., ti, ... >= t ∧ t ∈ T }{su}

• S 0,K = {t0 | : < t0, ... >= t ∧ t ∈ T }

• TK = {(ti, t j)|〈. . . ti, t j . . . 〉 ∈ T } ∪ S K × su

• LK = S K → 2AP

Here, su denotes the unresolved state.

We illustrate this concept using the language of the model checker NuSMV. The model creationprocess is similar to the test case model creation described in Section 5.2. The variable declaration

5.4. Test Suite Model Analysis 97

in the NuSMV model for KT is identical to that for K, except that it includes an additional Booleanvariable unresolved. Technically, this means that there is not only one single unresolved state su,but many states where unresolved is true. In practice, this is easier to handle.

Then, the transition relation is defined for each variable. This is done with a separate case-entryfor each possible transition, where the condition requires that unresolved is not true, and theinput and state variables take on the values of the considered transition. If an unknown transition istaken, variables can either be set nondeterministically, or remain unchanged as in Figure 5.2. As foranalysis purposes only resolved states are relevant, the choice does not matter.

The variable unresolved is initialized with false. For each known transition and if unresolved isnot true, unresolved is set to false. In all other cases, unresolved is set to true. An example ofthe result of this process is illustrated in Figure 5.2.

MODULE main

VARunresolved: boolean;y: boolean;x: boolean;

ASSIGN

init(y) := 0;next(y) := casey = 0 & x = 0: 0;y = 0 & x = 1: 1;y = 1 & x = 0: 1;1: y;esac;

init(x) := {1,0};next(x) := casey = 0 & x = 0: 1;y = 0 & x = 1: 0;y = 1 & x = 0: 1;1: x;esac;

init(unresolved) := 0;next(unresolved) := casey = 1 & x = 1: 1;!unresolved & y=0 & x=1: 0;!unresolved & y=0 & x=0: 0;!unresolved & y=1 & x=0: 0;1: 1;esac;

Figure 5.2: Test suite t1 := 〈(x = 1, y = 0), (x = 0, y = 1), (x = 1, y = 1)〉, t2 := 〈(x = 0, y = 0), (x =

1, y = 0), (x = 0, y = 1)〉 as model.

A model checker can be used to verify whether properties hold on the test suite model.

5.4 Test Suite Model Analysis

The test suite model can be used to analyze a complete test suite at once. The common methods totest suite analysis are to determine a coverage value or mutation score.

Coverage analysis with model checkers uses trap properties. The coverage value is commonlydetermined by checking test cases represented as models against trap properties. An item is covered,if its trap property results in a counterexample. The mutation score can be calculated similarly, bycreating special mutant properties, which we will also refer to as trap properties in this section.Coverage analysis and mutation score calculation are also possible using a test suite model.

98 Chapter 5. Test Suite Analysis with Test Suite Models

When checking trap properties against test case models, it is important to make sure that a propertyviolation is not caused by the end of the test case. This is done with the constraint rewriting rules(Definition 22) given in Section 3.8, ensuring that a property can only evaluate to false if the valueof the state counter is smaller than the test case length.

In the test suite model, the state number is not explicit like in test case models, instead the modelsenters an unresolved state whenever an unknown transition is taken. Therefore, trap propertieshave to be rewritten using an implication on the unresolved variable, such that a formula is truewhenever unresolved is true. The variable unresolved can be used as ξ = unresolved in theconstraint rewriting described in Section 3.8 in Definition 22. The constraint rewriting ensures thata trap property only results in a counterexample where all transitions are resolved.

Test coverage for a given criterion C represented as a set of trap properties P is calculated bychecking all trap properties p ∈ P against the test suite model KT . The model-checking tool onlyhas to go through the model encoding phase once, in contrast to once for each test case with thetraditional method. A mutation score can be determined similarly.

In contrast to test case models, test suite models can also be checked against such test requirementsthat require more than one test case. Such requirements are expressed using branching time temporallogics, where counterexamples are not necessarily linear (Wijesekera et al., 2007) but tree-like,such that several test cases are necessary to cover all branches. As an example, consider modifiedcondition/decision coverage (MC/DC), which requires pairs of related test cases.

5.5 Test Suite Optimization

A task that is sometimes necessary in practice is the optimization of a given test suite with regardto certain criteria. For example, when the costs of running a complete test suite against the softwarerepeatedly during regression testing are high, test suite reduction (Harrold et al., 1993) is applied.The aim of test suite reduction is to find a subset of the test cases that still fulfills the test require-ments. Another example is the optimization of test suites with regard to redundancy (Fraser andWotawa, 2007b).

A test suite model can be used to restructure test suites or create new test suites. For this, normaltest case generation techniques based on model checkers can be used, but instead of a completefunctional model of the system under test, the test suite model is used. For example, to extract a testsuite that satisfies a given coverage criterion, the test suite model is checked against the set of trapproperties satisfying the new coverage criterion. When doing so, it is important to ensure that onlyresolved transitions are used. Therefore, the constraint rewriting described in Section 11.3 has to beapplied to all trap properties.

5.6 Evaluation

As a proof of concept, Table 5.1 shows the results of a simple set of experiments conducted onfive different test suites created for a windscreen wiper controller application provided by MagnaSteyr. The test suites are derived using a mutation-based approach, each test suite representinga different mutation operator. The creation time for test case and test suite models are omitted,

5.7. Summary 99

because there is not much difference. Coverage values for different criteria and mutation scoresreported by the two methods are identical (but not listed here), which shows that the proposedmethod is valid. To illustrate that the test suite model can be used for test case generation, a simpletransition coverage (Rayadurgam and Heimdahl, 2001b) test suite is calculated for each exampletest suite. In Table 5.1, the size of the resulting simple transition coverage test suite is denoted inthe subset-column.

Table 5.1: Evaluation of example test suites.Size Transition Analysis Time

Subset Test Cases Test Suite1 343 28 1m23s 36s2 331 28 1m09s 32s3 202 28 1m03s 26s4 305 27 1m06s 28s5 178 27 1m10s 29s

5.7 Summary

In this chapter, we have described a new approach to analyze existing test suites with a modelchecker. Instead of creating a distinct model for each test case and then subsequently applyinganalysis to the resulting models, all test cases of a test suite are merged into one model. Thismakes it possible to analyze test suites with regard to test requirements that are not fulfillable withsingle test cases, but more complex structures. Besides, it simplifies the process of analysis. A testsuite model is useful beyond coverage and mutation analysis. As an example application, test suiteoptimization was suggested. Further applications are conceivable, e.g., different analyses, searchingfor alternative paths to reduce redundancy, or extension of existing test suites.

Chapter 6Property Relevance

The contents of this chapter have been published in the papers (Fraser and Wotawa, 2006a) and(Fraser and Wotawa, 2007f).

6.1 Introduction

There is a subtle difference between test cases for model and specification conformance. In order todetect as many faults as possible, model conformance testing aims at maximizing confidence in theassumption that the implementation is a correct refinement of the model. On the other hand, safetyrelated software, for example, requires that safety related requirements are shown to be fulfilledwith the help of test cases. Testing for requirements conformance aims to maximize coverage ofthe requirement properties. However, no matter how high the achieved coverage is, testing canonly increase confidence but not ascertain satisfaction, unless it is done exhaustively. In contrast, asingle trace can be sufficient to show a property violation. If a property is violated by a model, thena model checker returns a counterexample that illustrates the inconsistency. This idea could also beapplied to testing. Ideally, failed test cases should not only indicate non-conformance to a model,but also suggest which properties are violated.

This chapter uses this idea and formally links test cases with requirement properties. The notion ofproperty relevance is introduced. Under certain conditions, property violation by an implementationcan be inferred from a failed relevant test case. Property relevance enables automated traceabilityof requirements to test cases.

Our experience has shown that even when traceability is required, the success of a test suite is oftenstill measured with structural coverage criteria, because this is supported by many tools. We there-fore present new coverage criteria that combine structural coverage and property relevance. Weshow how these criteria can be measured on existing test suites, and present a method to automati-cally create test suites that satisfy them.

101

102 Chapter 6. Property Relevance

6.2 Property Relevance

In general, software testing can only ascertain that a property is fulfilled by an implementation undertest (IUT) if the test suite is exhaustive. Exhaustive testing, however, is commonly not feasible dueto the potentially huge number of possible test cases. Therefore, testing has to aim at revealing asmany property violations as possible. Usually, a test case that locates a fault is used as an indicationthat the IUT is erroneous without specifying which requirement properties are violated by this fault.We now define property relevance which is a relation between a test case and a property. With thehelp of property relevance it is possible to determine whether an implementation that fails a test casealso violates a property for which the test case is relevant. This information is for example usefulwith regard to traceability, which requires that each test case is related to a requirements property.It is also helpful for the estimation of the costs and severities of potential faults. Furthermore, thecoverage and test case generation techniques presented in later sections are based on this definitionof property relevance.

6.2.1 Relevance of Negative Test Cases

A single trace can show a property violation, while only an exhaustive set of traces can proveproperty satisfaction. As an example, model checkers verify properties on models and return tracesas counterexamples if property violation is found. Consequently, a single negative test case can besufficient to show property violation. If a path p violates a property P (i.e., p 6|= P) then any modelM that contains this path also violates P. We call a test case that represents such a path relevant toP.

As test cases can violate properties because they represent finite prefixes of possibly infinite execu-tion paths, constraint rewriting introduced in Chapter 3, Definition 22, is necessary when evaluatingproperty relevance. We define a variant of constraint rewriting where ξ = (s < l) the prefix trans-formation α(P, t), where s is a variable and l is the length of test case t, i.e., l = length(t). Withthe aid of the prefix transformation, we can define a negative test case t− to be relevant to a prop-erty P if P′ = α(P, t−) is violated by t−, as this guarantees the violation is not because of the finitetruncation:

Definition 40 (Relevant Negative Test Case). A negative test case t− is relevant to property P ift− 6|= P′, where P′ = α(P, t−). The predicate relevant(t−, P) is true if test case t− is relevant toproperty P.

For example, in a system with input x and output y, the negative test case t− := 〈(x = 1, y = 0), (x =

1, y = 0)〉 is relevant to a property P1 := � (x = 1 → © y = 1)), as t− and P1 are not consistent.On the other hand, t− is not relevant to P2 := � (¬(y = 0) → © y = 0)) because t− and P2 areconsistent.

Theorem 6.2.1. If an implementation I passes a negative test case t− relevant to property P, then Idoes not satisfy P. I.e., I t− ∧ relevant(t−, P)→ I 6|= P.

Proof by contradiction: Assume that t− is a relevant negative test case for property P and that I isa correct implementation, i.e., I |= P. Assuming that I t− it follows that there exists a path p in I

6.2. Property Relevance 103

which is equivalent to t−. Because relevant(t−, P) is true, we know that t− 6|= α(P, t−). Hence, theassumption that I |= P cannot be the case because this is only true if all paths in I do not violateproperty P. This in turn cannot be the case since there exists a path p = t− which contradicts theproperty. ut

Theorem 6.2.1 states that if the behavior of an implementation I does not deviate from a negativetest case t−, it is known that I violates P, if t− is relevant to property P.

6.2.2 Relevance of Positive Test Cases

As shown, a negative test case detects a property violation if it is passed by an IUT. If, however, theIUT shows a different behavior than described by the negative test case the result is inconclusive.The implementation might behave differently because it is correct. Just as well a different faultmight result in different outputs than described by the test case. As an example consider a negativetest case t− := 〈(x = 1, y = 0), (x = 1, y = 0)〉. Assume that the correct behavior for this test casewould be (x = 1, y = 1) in the second state of t−. If an implementation erroneously changes to state(x = 0, y = 0), this fault would not be detected.

Positive test cases, on the other hand, can detect any fault that results in a deviation of the expectedoutputs. For the previous example, positive test case t+ := 〈(x = 1, y = 0), (x = 1, y = 1)〉 wouldidentify the fault. However, neither property satisfaction nor violation can be directly concludedfrom positive test cases. Only exhaustive testing allows such a verdict in general. This sectionconsiders property relevance for positive test cases.

In order to resemble the definition of negative property relevance as good as possible, a positive testcase is defined to be relevant if an implementation can fail with the inputs provided by the test casesuch that a property violation results. Again care has to be taken because the infinite trace semanticsare applied to a finite trace. One possibility is to apply the prefix transformation again. Alternatively,the test case itself has to be consistent with the property. This ensures that the property violation isnot caused by the finite truncation.

Definition 41 (Relevant Positive Test Case). A positive test case t+ := 〈t0, ..., tn〉 for module M =

(K, I,O) is relevant to property P if K, t+ |= P and there exists a path t′ := 〈t0, ..., t′n〉, such that forany 0 ≤ j < n : Inp(t j) = Inp(t′j), and K, t′ 6|= P′, where P′ = α(P, t+). The predicate relevant(t+, P)is used to express that test case t+ is relevant to property P.

For example, in a system with input x and output y, the positive test case t+ := 〈(x = 1, y = 0), (x =

0, y = 1)〉 is relevant to property � (x = 1 → © y = 1)) because the second state can be changedso that the property is violated. However, t+ is not relevant to � (x = 0 → © x = 0 → © y = 1)because no change (except of input variables) in t+ can lead to a violation.

As it is not possible to conclude property violation or satisfaction from a positive test case in gen-eral, it is necessary to analyze the execution trace created during the test case execution and thendecide about property violations. Of course this is possible in general, and for example performedfor runtime verification (e.g., (Havelund and Rosu, 2001b)). However, the property relevance nar-rows down the set of possibly violated properties. If the trace violates the property, then it is knownthat the IUT also violates the same property. The analysis of a finite execution trace with regard to a


property is computationally simple (especially compared to verification of the complete implemen-tation). Markey and Schnoebelen (2003) analyze the problem of model-checking paths and showthat there are more efficient solutions than checking Kripke structures. For example, the NASA run-time verification system Java PathExplorer (JPaX) (Havelund and Rosu, 2001b) uses LTL rewritingtechniques for monitoring. However, as we assume a scenario of model checker based testing, wealso describe the use of a model checker for the analysis of execution traces.

Definition 42 (Execution Trace). The trace t′ of executing test case t := 〈t0, ..., tl〉 on an IUT con-ceivable as module I = (KI , I,O) is a finite sequence t′ := 〈t′0, ..., t

′l 〉 of states. For all 0 ≤ j ≤ l :

Inp(t j) = Inp(t′j), i.e., the valuations of the input variables in the execution trace equal those in thetest case. The execution trace is a prefix of a path of I: ∀ 0 ≤ j < l : (t′j, t

′j+1) ∈ T.

If an implementation I passes a positive test case t+ (I ⊕ t+), then the resulting trace t′ = t+. If thetrace t′ violates the prefix transformed property P′ = α(P, t′), i.e., t′ 6|= P′, then I 6|= P.

Theorem 6.2.2. If an implementation I fails a positive test case t+ relevant to property P, wheret+ fulfills P, and the resulting trace t′ does not satisfy P′ = α(P, t′), then I does not satisfy P. I.e.,I t+ ∧ relevant(t+, P) ∧ t′ 6|= P′ → I 6|= P.

Proof by contradiction: Assume that t+ is a relevant positive test case for property P and that I isa correct implementation, i.e., I |= P. Assuming further that I t+ results in an execution trace t′

such that t′ 6|= α(P, t′). From the definition we know that I |= P is only true if all paths in I do notviolate property P. This in turn cannot be the case since there exists a path p = t′. Hence, I 6|= P istrue, which contradicts our initial assumption that I |= P. ut

While a negative test case only describes certain property violations, a positive test case is relevantto all possible violations that can occur during its execution. Therefore, a test suite of positive testcases is likely to be relevant to more properties than an equally sized test suite of negative testcases.

Intuitively, it is easier to achieve a test suite where there are relevant test cases for all properties withpositive test cases than with negative test cases. However, we see the greatest use not in measuringproperty relevance directly, but in derived coverage criteria, as presented in the next section.

6.3 Property Relevant Coverage Analysis

The quality of a test suite is commonly evaluated by measuring the coverage with respect to certaincoverage criteria. These criteria are mostly based on structural items of the source code or the model,e.g., lines of code or branches. Because structural coverage can easily be measured by observingwhich items are passed during execution, there are many such coverage criteria. However, there areonly few that consider requirement properties. For example, Callahan et al. (1996) define coverageof a partitioning of the execution tree based on properties. Whalen et al. (2006) adapt structuralcoverage criteria to LTL properties. Ammann et al. (2001) define a notion of dangerous traces thatis similar to property relevance. Tan et al. (2004) apply ideas from vacuity analysis to testing.

In this section we introduce a new type of coverage criteria based on property relevance. We alsodescribe how test suites can be evaluated with regard to property relevance and derived criteria usinga model checker. The evaluation builds on model checker based analysis techniques introduced inSection 3.8.

6.3. Property Relevant Coverage Analysis 105

6.3.1 Measuring Property Relevance

Property relevance of a test case can be measured using the model representation presented in Sec-tion 3.8. To determine whether a negative test case t− is relevant to a property P, t− is representedas a model and checked against P′ = α(P, t−). If the model checker returns a counterexample, thisshows that t− is relevant to P.

In contrast, measuring the relevance of positive test cases is more difficult. It is necessary to simulatethe possible deviations from the test case. There are several options of how to do that. A naiveapproach would be to allow any random output to occur along the test case execution. This, however,would have the effect that test cases would be relevant to too many properties. We also assume thatsuch a random output is not realistic because the implementation is close to being correct. This isalso stated in the competent programmer hypothesis (Acree et al., 1979), which has been shown tobe reasonable.

This still leaves several possibilities to define how the property violation can be reached. In a relatedapproach, Ammann et al. (2001) describe a concept of dangerous traces as such traces where amutant model can reach a property violating state. Following this idea, it would be required toevaluate each test case against a complete set of mutant models, for each requirement property. Itis conceivable that the performance of such an approach in the context of property relevance wouldsuffer from large test suites or large sets of mutants.

Yet another possibility would be to require a single erroneous transition anywhere along the trace,which directly leads to a state where a property violation can occur. This could be evaluated bya slight modification of the test case model itself and would therefore be efficiently measurable.However, restricting the property violation to only occur immediately after the changed transitionmakes it impossible for test cases to be relevant to properties that are violated by lasso-shapedsequences.

As a compromise between these different approaches, we choose to allow exactly one erroneoustransition which can occur anywhere along the trace, not only at the end. After this erroneoustransition, the trace continues with states provided by the correct model.

In order to detect property relevance, the test case has to be symbolically executed against a versionof the model that allows such an erroneous transition, which we refer to as mutant model. This isillustrated with the example and test case t := 〈(x = 1, y = 0), (x = 0, y = 1), (x = 1, y = 1)〉.In NuSMV, models can be nested hierarchically. A model can be defined and instantiated withinanother model by defining it as a sub-module. The mutant model is stated as a sub-module ofthe test case model (MODULE model(x)). All input variables (x) in the sub-module are changedto parameters. The sub-module is instantiated in the test case model with its input variables asparameter (Mutant: model(x)). The variable mutant has a non-deterministic choice between 0and 1 as long as it equals 0. As soon as mutant equals 1 it is assigned the value 2. Once it has thisvalue, it is not changed anymore. In the mutant model there is a non-deterministic choice for othervariables if mutant equals 1. As this is only the case at one state during any execution sequence,there can only be one transition in any trace where this non-deterministic choice is considered.

To determine whether a test case t+ is relevant to a property P, P′ has to be created such that eachoutput variable is replaced by the output variable of the mutant model. In the example NuSMVmodel this is achieves by replacing a variable y with Mutant.y, which refers to the y contained in


MODULE model(x)VARy: boolean;mutant: 0..2;

ASSIGNinit(y) := 0;next(y) := casemutant=1: {0,1};x=1: 1;1: 0;esac;

init(mutant) := 0;next(mutant):=casemutant=0: {0,1};mutant=1: 2;1: 2;esac;

MODULE mainVARx: boolean;y: boolean;State: 0..2;Mutant: model(x);

ASSIGNinit(x) := 1;next(x) := caseState = 0: 0;State = 1: 1;1: x;esac;

init(y):= 0;next(y):= caseState = 0: 1;State = 1: 1;1: y;esac;

init(State) := 0;next(State) := caseState < 2: State+1;1: State;esac;

Figure 6.1: Combination of test case model and mutant model.

the Mutant sub-module. For example, assume a property P := � (x = 1 → © y = 1) statingthat in every state following a state where x equals 1, y also equals 1. Accordingly, P′ is � (x =

1 → © Mutant.y = 1). Finally, the model checker is challenged with the combined model and thequery whether t+ satisfies P while the mutant model violates P′. If so, a counterexample is returned,showing that t+ is relevant to P.

The relevance of any set of test cases with regard to a set of properties can easily be determined withthe presented methods. While property relevance is interesting and helpful if a test suite detects anerror, it is not sufficient to assess the overall quality of a test suite in general. If the requirementproperties thoroughly describe the system, then a property relevant test suite is likely to cover mostbehaviors of the system. On the other hand, a test suite could theoretically be relevant to all prop-erties while only covering a small subset of the model. For example, a weak set of requirementproperties could cause this. We therefore introduce new coverage criteria based on property rele-vance.

6.3.2 Relevance Based Coverage Criteria

While it is definitely reasonable to require a test suite to have test cases relevant to all properties, thisby itself is not necessarily an adequate coverage criterion. Therefore, we extend structural coveragecriteria such that they include property relevance:

Definition 43 (X Property Relevance Coverage). The property relevance coverage CR of a test suite

6.3. Property Relevant Coverage Analysis 107

TS with regard to a set of properties P and a structural coverage criterion X represented as a setof trap properties T is defined as the ratio of trap properties that are covered such that the coveringtest case continues relevantly to a property, to the total number of possible property/trap propertycombinations:

CR =1

|P| ∗ |T|· |{p, tr | p ∈ P ∧ tr ∈ T ∧ relevant_covered(tr, p,TS)}|

The predicate relevant_covered(a, b,TS ) is true if there exists a test case t ∈ TS such that t con-sists of two sub-sequences t := t1, t2 where t1 covers a, i.e., t1 6|= a, and t2 is relevant to b, i.e.,relevant(t2, b).

MODULE mainVARx: boolean;y: boolean;

ASSIGNinit(y) := 0;next(y) := casex = 1 : 1;1: 0;esac;

Figure 6.2: Example NuSMV model.

For example, Transition Property Relevance Coverage requires that for each transition and eachproperty there is a test case that executes the transition and then proceeds relevant to the property.The example model in Figure 6.2 has one Boolean variable y. Assuming a property P this wouldmean that four test cases are required: y = 0→ y = 0→ relevant to P, y = 1→ y = 0→ relevant toP, y = 0→ y = 1→ relevant to P and y = 1→ y = 1→ relevant to P. This means that in the worstcase for i transitions and j properties there are i∗ j test cases necessary to satisfy Transition PropertyRelevance Coverage. However, one test case can cover several transitions and be relevant to severalproperties. Therefore the number of test cases is significantly smaller than i ∗ j in practice.

The usual method of trap properties cannot be directly applied to this coverage criterion. For sucha criterion, an according trap property trP would need to show that a structural state described byanother trap property tr is reached. Then, the trace violating tr would need to continue such thatproperty P is violated, which is not always expressible with a single property. Therefore, there aretwo options: One is to make the evaluation a two-step process, the other is to define a weaker variantof the criterion.

In the two-step approach, first each test case is checked against the structural coverage trap prop-erties, recording for each trap property the test case and state number necessary to violate the trap.For example, the transition 〈(x = 0, y = 1), (x = 1, y = 1)〉 is described by the trap propertytr := � (x = 0 ∧ y = 1 → ©¬(x = 1 ∧ y = 1)). A test case model can be directly checked againstthis trap property. If the test case covers tr, then the model checker returns a trace of length ltr thatexecutes the transition.


The second step involves checking each test case whether a property violation can occur in thepostfix sequence after a trap property violation. For test case t that covers trap property tr at statenumber ltr, we therefore have to consider the sub-sequence between ltr and the final state of the testcase with regard to property P. With P′ as the prefix transformation of P, we thus check:

� (S tate > ltr ∧ mutant < 2→ P′) (6.1)

The proposition mutant < 2 ensures that the erroneous transition is taken after ltr. If t is a positivetest case, the model is combined with a modified model that can violate P, and P′ is rewritten to usethe mutant model’s output-variables as P′′, as already presented in Section 6.3.1, and the resultingtrap property is:

� (S tate > ltr ∧ Mutant.mutant < 2→ P′′) (6.2)

As a simpler alternative to this approach, we can define a weakened version of the coverage criterionwhich is always measurable with trap properties:

Definition 44 (Weak X Property Relevance Coverage). The weak property relevance coverage CWR

of a test suite TS with regard to a set of properties P and a structural coverage criterion X repre-sented as a set of trap properties T is defined as the ratio of pairs of trap properties and propertieswhere a test case covers the trap property and is relevant to the property, to the total number ofpossible property/trap property combinations:

CWR =1

|P| ∗ |T|· |{p, tr, t|p ∈ P ∧ tr ∈ T ∧ t ∈ TS ∧ t 6|= tr ∧ relevant(t, p)}|

The difference between Property Relevance Coverage and Weak Property Relevance Coverage isthat the latter makes no assumptions on the order in which the possible property violation andcovering of the structural item have to occur. This simplifies the evaluation significantly as it can bedone in a single step with trap properties.

The structural coverage criterion X results in a set of trap properties T, and the specification consistsof a set of properties P. For each tr ∈ T and P ∈ P, a test case model should result in a coun-terexample if tr is not satisfied and P can be violated. For negative test cases, this can be expressedas:

� ((mutant < 2→ P) ∨ (mutant = 0→ tr)) (6.3)

This property is only violated if the trap property tr is reached without a mutated transition, andif P is violated by a mutated transition. Positive test cases require a combination of the test casemodel and a modified (mutant) model that can reach a property violating state, as described inSection 6.3.1. In that case the mutant-variable of the mutant sub-model has to be used in the trapproperty:

� ((Mutant.mutant < 2→ P′′) ∨ (Mutant.mutant = 0→ tr)) (6.4)

6.4. Property Relevant Test Case Generation 109

The use of implications on the state of the mutant variable favors such test cases where first the trapproperty is covered and then the property relevant part follows. For example, if both trap propertyand requirement property only make assumptions on a single transition with a next operator (© ),then the correct order is achieved. However, this is not guaranteed for all types of properties. Asan example of when this order is not achieved consider a trap property that requires two subsequenttransitions (e.g., transition pair coverage). In this case, the (possible) property violation can occurbefore the trap property is covered. A possible idea for further research would be to rewrite trapproperties and requirement properties using implications on the state of mutant in order to assurethe correct order. In general, the weak property relevance criterion is sufficient in most cases.

6.4 Property Relevant Test Case Generation

The previous section introduced new coverage analysis methods and showed how to evaluate testsuites with regard to these criteria. This section presents a method that automatically generates testsuites that satisfy a given property relevance criterion.

6.4.1 Property Relevant Test Case Generation

Most approaches to automated test case generation do not systematically create property relevanttest cases. Exceptions are model mutation based approaches that create negative test cases by check-ing the mutants against the specification (Ammann et al., 1998, 2001).

In this section, we present an approach that automatically creates test suites that satisfy a givenproperty relevant coverage criterion. A test case for such a criterion consists of a path that leadsto the coverable structural item, and then a postfix sequence that is relevant to a property. Thepresented approach consists of two steps: First, a test suite that achieves structural coverage iscreated and then extended in a second step such that property relevance is achieved.

The approach is straight forward: First, a test suite TS 1 satisfying the coverage criterion X is createdby model-checking the trap properties T defined by the structural coverage criterion X against themodel. The resulting test cases can be optimized by removing duplicates and such test cases that areprefixes of other, longer test cases. When creating test cases with trap properties it often happensthat two or more trap properties result in the same test case. Such duplicate test cases only needto be saved once. Similarly test cases can be prefixes of other, longer test cases. In this case,only the longer versions need to be saved, the subsumed test cases can be discarded (this onlyapplies to positive test cases). We refer to such duplicate or subsumed test cases as redundant. Byremoving redundant test cases, the number of property relevant extensions that have to be calculatedis minimized.

The second step is to create property relevant extensions for the test cases in TS 1. In order to createpositive test cases that are property relevant we again need to model a behavior that can violate aproperty. At the same time the corresponding behavior of the correct model has to be determined.Therefore, the model is duplicated such that there is one correct model and one model that can leadto an erroneous state (mutant model). Both models are provided with the same input values. WithNuSMV this is achieved by stating the mutant model as a sub-module of the original model, andchanging all input variables in the sub-module to parameters (This is similar to the combination


of test case and mutant model in Section 6.3.1). The sub-module is instantiated in the originalmodel with its input variables as parameter. As an example, the modifications to the simple modelintroduced in Figure 6.2 are illustrated in Figure 6.3.

MODULE model(x)VARy: boolean;mutant: 0..2;

ASSIGNinit(y) := 0;next(y) := casemutant=1: {0,1};x=1: 1;1: 0;esac;

init(mutant) := 0;next(mutant) := casemutant=0: {0,1};mutant=1: 2;1: 2;esac;

MODULE mainVARx: boolean;y: boolean;Mutant: model(x);

ASSIGNinit(y) := 0;next(y) := casex=1 : 1;1: 0;esac;

Figure 6.3: Combined model consisting of original model and mutant.

For each t ∈ TS 1 a new trap property is specified for every property P. The final state of t isrepresented as a conjunction of value assignments t f := var1 = value1 ∧ var2 = value2. The newtrap property has to claim that P is valid after that state. It is also necessary to require the possiblefaulty transition to occur after that state. This results in the following trap property, where P isrewritten to P′′ using the mutant’s output variables (regard Section 6.3.1):

� (t f ∧ Mutant.mutant < 2→ P′′) (6.5)

The combined model is checked against the new set of trap properties. The actual values of themutant sub-module in the trace can be discarded, only the fact that a violating state can be reachedand the values of the correct model are relevant to a positive test case. In addition, if the initial stateof the model cannot be set to t f , or the model cannot be rewritten such that the initial state equals t f

(for example to avoid the model encoding phase), then the prefix leading from the initial state to t f

is discarded. This results in a set of test cases TS 2. Finally, the test cases TS 1 need to be extendedby their counterparts in TS 2. This means that each test case in TS 1 is extended with the accordingproperty relevant postfix sequences contained in TS 2.

Negative test cases can be created in a similar way. However, the combination of correct and mutant

6.5. Empirical Results 111

model is not necessary. Instead, the mutant model can be directly checked against the trap property:

� (t f ∧ mutant < 2→ P) (6.6)

These trap properties result in a set of traces TS 2. Again the test cases in TS 1 that achieve structuralcoverage need to be extended by their counterparts in TS 2.

6.5 Empirical Results

The techniques presented in this chapter were empirically analyzed with a set of example models.This evaluation tries to show whether property relevant test case generation and analysis are feasible,and how they perform in comparison to other methods.

Table 6.1 lists statistics about the model encoding reported by the model checker, and the numbers ofproperties used in the evaluation. Example1–5 are models of a car control at varying levels of com-plexity. The Safety Injection System (SIS) example was introduced by Bharadwaj and Heitmeyer(1999) and has since been used frequently for studying automated test case generation. Cruise Con-trol is based on a version by Kirby (1987) and has also been used for automated test case generation(Ammann et al., 1998, 2001). Finally, Wiper is the software of a windscreen wiper controller pro-vided by Magna Steyr. The properties used in the experiments were manually formalized. Themodel checker NuSMV (Cimatti et al., 1999) was used for our prototype implementation.

Table 6.1: Models and mutation results.

Example BDD Size Properties MutantsVars Nodes

Example1 9 102 7 98Example2 19 714 7 22Example3 29 3337 28 91Example4 27 3345 22 319Example5 53 12969 29 545SIS 29 7308 18 357Cruise Control 31 3330 30 748Wiper 91 153968 25 2828

In order to evaluate the quality of the different test suites, a set of mutants was created for eachmodel. These models are later used to determine the mutant scores. In addition to some modelfeatures, Table 6.1 also lists the number of inconsistent mutant models that were created for eachexample model. Such a mutant results from a simple syntactic change of the NuSMV model.

The following mutation operators were used (see Section 3.7.1 for details): STA (replace atomicpropositions with true/false), SNO (negate atomic propositions), MCO (remove atomic proposi-tions), LRO, RRO, ARO (logical, relational and arithmetical operator replacement, respectively),


and ORO (replaces operands, i.e., variables and constants). All resulting mutants that violate thespecification are used as a reference. The test cases are executed symbolically against these mu-tants by adding the mutants as sub-modules to the test case models, and replacing input variablesin the mutants with the input values provided by the test case. A property that states that all outputvariables of the test case and the mutant should equal for the duration of the test case simulatesexecution with a model checker. If the output values of the mutant differ at some point during theexecution, a counterexample is created. If a counterexample is created, the mutant is identified(killed).

Table 6.2: Numbers of test cases generated.

Example T C TP TE CE TPE

Example1 3 4 7 12 16 29Example2 2 3 4 12 18 24Example3 7 8 13 154 176 287Example4 6 9 18 78 116 230Example5 19 48 33 226 425 387SIS 8 16 10 67 158 88Cruise Control 8 21 23 109 246 284Wiper 32 40 84 529 642 1351

We considered the structural coverage criteria Transition (T), Condition (C) and Transition Pair(TP) (see Section 3.5.2). Table 10.1 lists the results of the test case generation. First, a test suitewas created for each coverage criterion. Then, these test suites were extended (TE, CE, TPE). Thenumbers given in Table 10.1 represent optimized test suites, where redundant test cases are removed.As can be seen, the number of test cases increases significantly for the extended test suites, whilestaying within realistic bounds. In average, 52% of all test cases created from structural coveragecriteria for the example models are redundant, while only 27% of the test cases created for propertyrelated coverage are redundant. This indicates that the test cases explore quite diverse behavior,which of course is good.

To assess the effectiveness at detecting faults, the test suites were executed against a set of mutantsknown to violate the specification (i.e., only mutants that are inconsistent with the specificationproperties). The mutant score is calculated as s = k

m , where k is the number of killed mutants,and m is the total number of mutants. The results of this analysis are presented in Table 11.14.For all the example models, Transition Pair coverage results in the best test suite without propertyrelevance. As expected, the number of killed mutants is significantly higher for property relevanttest suites. As a threat to the validity of the experiments it has to be considered that these mutantsare automatically generated and might not be representative of real errors.

Finally, we applied several different automated test case generation methods to the Wiper exam-ple, and evaluated the resulting test suites with different coverage criteria. Test-suites were gen-erated with the transition and condition coverage criteria, and also property coverage (Tan et al.,2004), which requires tests to show the effect of each atomic proposition within a property. In ad-dition, test suites were created by property mutation (checking mutants of the properties againstthe original model), mutation of reflected transition relations (Black, 2000) and a state-machine


Table 6.3: Mutant scores of test suites.

Example T C TP TE CE TPE

Example1 52% 52% 87% 95% 100% 95%Example2 18% 19% 59% 77% 100% 77%Example3 40% 44% 59% 79% 80% 79%Example4 46% 55% 81% 87% 97% 88%Example5 72% 79% 80% 85% 93% 94%SIS 27% 27% 73% 97% 97% 97%Cruise Control 44% 57% 84% 91% 97% 92%Wiper 47% 52% 84% 88% 96% 92%

duplication (Okun et al., 2003a) approach. Refer to Section 6.4 for more information about theseapproaches. For these mutation based approaches, the same set of mutation operators used to createthe mutants for the mutant score analysis were used. Table 8.3 lists the results of this experiment.In this table, the test suites created with the method presented in this chapter are referred to asextended. The coverage of these test suites is very good in comparison to other test suites, whichis not surprising considering they are significantly larger. Interestingly, the coverage with regardto the property relevant criteria is comparable for all test suites that are created without propertyrelevant methods. The property relevant extensions of transition and condition test suites achievesignificantly better results in this regard.

Table 6.4: Wiper example coverage.

Test Suite Without Relevance Relevant CoverageT C T C

Transition 100% 73% 47% 46%Condition 100% 100% 52% 51%Property 93% 91% 49% 49%Mutation 100% 99% 55% 53%Reflection 67% 84% 46% 52%SM Duplication 92% 86% 53% 53%Transition Extended 100% 73% 100% 73%Condition Extended 100% 100% 100% 100%

On the downside, property relevant test case generation and analysis are more complex than genera-tion and analysis for structural coverage. The addition of a mutant model that can take an erroneoustransition to the correct model increases the model complexity. Both test case generation and cov-erage analysis depend on a combination of trap properties and requirement properties. Therefore,the number of computations necessary is potentially high.

For example, the test case generation for transition, condition and transition pair coverage of thebiggest example model (Wiper) on a PC with Intel Core Duo T2400 processor and 1GB RAM


take 9s, 23s and 362s, respectively. Extending these test suites takes 33m43s, 89m54s and 41m3s,respectively. A significant part of the complexity is caused by the model duplication necessaryfor creating positive traces. Extending the same three test suites with negative test cases does notrequire model duplication, and therefore only takes 15m41s, 44m7s and 20m34s respectively.

However, the additional computational effort is still within realistic bounds, especially when con-sidering a main application for safety related systems. As with all model checker based approaches,the applicability generally depends on the model complexity. If a model checker fails to verify amodel in realistic time, then no model checker based approach can be used to generate test cases.

6.6 Summary

In this chapter we have introduced the notion of property relevance. We have formally linkedtest cases and properties via property relevance, and presented a method to evaluate the propertyrelevance of a test case. Based on property relevance, we have introduced a novel combination ofstructural coverage and property relevance. In addition to an evaluation method we have presenteda method to automatically create test suites that satisfy these criteria and create test cases traceableto requirement properties.

A related approach was taken by Ammann et al. (2001), who describe dangerous traces similarto property relevant traces in this chapter. However, the test case generation is not based uponstructural coverage but uses different mutants. These mutants are merged with the original model inorder to improve the performance. Their method does not allow generation of relevant positive testcases for all kinds of properties, such as liveness properties.

The approach presented in this chapter applies to all properties that can be model-checked. Thecombination of structural coverage and property relevance guarantees thorough testing even if therequirement properties are weak. However, the improvement of property relevant test case genera-tion in contrast to structural coverage test case generation is related to the quality of the requirementsspecification. The approach is neither specific to black- or white-box testing. Functional propertiesthat refer only to input- and output-variables lead to black-box tests, while the internal state of themodel can also be used if properties refer to it. On the downside, both the test case generation andthe analysis of property relevant coverage are considerably more complex than related coverageapproaches that do not use property relevance. The number of created test cases is also significantlyhigher. While this is good with regard to fault sensitivity, a large number of test cases can incur highcosts for test-execution.

In general, the experiments show encouraging results. The number of property violating mutantsthat can be detected with property relevant coverage test suites is significantly higher than with testsuites created from purely structural coverage criteria.

Chapter 7Specification Analysis

Parts of the contents of this chapter have been published in (Fraser and Wotawa, 2006b).

7.1 Introduction

When using a requirements specification to create test cases, the quality of a resulting test suiteusually depends on the quality of the specification. For example, if the specification only coversvery little of the possible behavior, then a resulting test suite will also achieve little coverage. Errorsin the implementation that are outside the scope of the specification might go undetected. If thereare errors in the specification, then a derived test suite will reflect these errors. Test case creationwith model checkers allows conclusions about the specification quality.

7.2 Specification Coverage

When talking of structural coverage, a state of a model or implementation is covered by a test case,if it is passed during execution. Similarly, it is of interest to know how much of a model is coveredby its specification. This information would be valuable in order to judge about the quality of aspecification. However, when verifying a specification a model checker visits all reachable states.Hence, a different notion of "covered" is necessary.

In general, a part of a model is considered to be covered by a specification if it contributes to theresult of the verification. Two different approaches to specification coverage based on FSMs wereoriginally introduced: Katz et al. (1999) describe a method that compares the FSM and a reducedtableau for the specification. In contrast, Hoskote et al. (1999) apply mutations to the FSM andanalyze the effect these changes have on the satisfaction of the specification. This idea is extendedby Chockler et al. (2001) and Jayakumar et al. (2003). Mutation based coverage measurementworks by inverting signals in the FSM and observing the effects. If a signal inversion has no effecton the outcome of the verification, then this signal is not covered in the state where it is inverted.The number of signals and states of the FSM can be very large, therefore efficient algorithms todetermine the coverage are required.

115

116 Chapter 7. Specification Analysis

The idea of mutation based coverage measurement can be extended from finite state machines tomodels in general; for example, the SMV source code of a model can be used. Mutation is appliedto the SMV model, and then the mutant is model checked against a property. If the property isviolated, then the mutant is covered:

Definition 45 (Covered mutant). A mutant M′ of model M is covered by property φ, where M |= φ,if M′ 6|= φ; that is, if the mutant M′ violates the property φ.

The overall specification coverage results from the number of covered mutants. An ideal specifica-tion would cover all inequivalent mutants.

The type and distribution of mutants has an influence on the expressiveness of coverage measure-ment based on this idea of coverage. If different mutation operators are used, then different mutantscan result from the same part of the source code. Obviously, coverage measurement would be quitemeaningless if all mutants were created by changing the identical part of the model. Applicationof one mutation operator is assumed to result in a set of mutants, where there is one mutant foreach part of the model where the operator applies to. Consequently, for coverage measurement onlymutants resulting from one mutation operator are considered.

A problem adherent to mutation in general is equivalent mutants. If a mutant is equivalent to theoriginal model, then no property can possibly cover the mutant. While detection of equivalentmutants is an undecidable problem when considering source code mutants, model checkers canbe used to determine equivalency efficiently with models. It is therefore a feasible approach todetermine the subset of inequivalent mutants for a given set of mutants, and then use only theinequivalent mutants for coverage measurement.

An equivalent model mutant, however, can also represent a redundant part of the model. If somepart is redundant, then no mutant of that part can result in a specification violation. Redundancy inthe model is undesirable because it has a negative impact on clarity and efficiency. If a mutant doesnot violate the specification, it can either be because the relevant part is not covered, or because themutant is equivalent. In either way it is beneficial to identify the uncovered part. Therefore, it isadvantageous to include equivalent mutants in specification coverage evaluation.

There is a wide range of different mutation operators. Accordingly, specification coverage can bedefined with respect to many different operators. For example, specification coverage could bemeasured as the percentage of relational operators in a model where there exists a covered mutant.The conditions in the SMV model description are a useful choice. The coverage can be easilyillustrated on the SMV model, and offers the specifier an intuitive means to identify the cause ofthe problem. This is related to an approach by Chockler et al. (2006), who propose negation ofexpressions or branches. Below, specification is based on conditions.

A simple and effective mutation operator for conditions is simple expression negation (SNO). Thisoperator negates atomic expressions, i.e., conditions. The result of the SNO operator is a set ofmutants, where in each mutant there is one condition negated. As an example, the following excerptfrom a windscreen wiper application repeatedly used for experiments in this thesis is mutated:

next(OUT_speed2) := case...Ca1_Chart_ns = Ca6_wait_idle_id & !IN_water & Endswitch: 0;

7.3. Specification Vacuity 117

...esac;

This results in three mutants:

1. !(Ca1_Chart_ns = Ca6_wait_idle_id) & !IN_water & Endswitch: 0;

2. Ca1_Chart_ns = Ca6_wait_idle_id & !(!IN_water) & Endswitch: 0;

3. Ca1_Chart_ns = Ca6_wait_idle_id & !IN_water & !(Endswitch): 0;

Mutant 1 is covered by the following requirement property:

� ((OUT_pump = 1 ∧ OUT_speed2 = 1 ∧ IN_speed1 = 0 ∧ IN_speed2 = 1)→

© ((OUT_pump = 0 ∧ IN_speed1 = 0 ∧ IN_speed2 = 1)→ OUT_speed2 = 1))

Model checking this property against mutant 1 results in a counterexample. Mutant 2 and 3, how-ever, are consistent with the property. Assuming there are no other properties that are inconsistentwith mutants 2 and 3, the specifier sees that the transition relation of the variable OUT_speed2 inthe idle state is not covered with regard to IN_water or Endswitch.

The overall approach to measure coverage is to create a set of mutants, model check each mutantmodel against the specification, and then calculate a coverage value, or illustrate the coverage onthe model source code. In the context of finite state machines based coverage measurement, wheresignals are inverted, such a straight forward approach is commonly not feasible due to the largenumber of mutants. In contrast, the number of mutants that result from mutation of the modelsource is significantly lower. As long as the model complexity is within bounds, such that modelchecking is basically feasible, specification coverage can be measured with the described mutationapproach.

Checking mutants against the specification, as used to evaluate the specification coverage, is alsoused for test case generation, as described in Section 3.7. Negation of conditions is a typical mu-tation operation, and is therefore often included in the test case generation. Therefore, when usingsuch an approach to test case generation, the specification coverage is implicitly also measured,requiring no additional computation. The coverage can be determined by reviewing which mutantsout of a certain subset, e.g., ENO mutants, have resulted in a test case. A mutant that did not resultin a test case is consistent with the specification, and therefore not covered.

7.3 Specification Vacuity

If a specification does not cover a model sufficiently, then chances are high that there are simplyproperties missing. That, however, is not the only possible reason for insufficient coverage. Specifi-cation properties can contain a special type of error known as vacuity. Unlike most other specifica-tion errors, vacuity does not result in inconsistency; vacuity results in properties that are consistentwith a model independently of whether the model really fulfills what the specifier originally had inmind.

As an example of vacuity, consider the following CTL property: AG (c1 → AF c2). This propertystates that in all paths, at all times, if c1 occurs, c2 must eventually be true. If a model satisfies this

118 Chapter 7. Specification Analysis

property, this can be either because c2 really always is true in some future state after c1, or becausec1 never occurs. If c1 is never true, then the property is vacuously satisfied by the model. Evenif the model would theoretically react erroneously to c1, this is not detected. A vacuous pass of aproperty is an indication of a problem in either the model or the property.

Vacuity is a problem when model checking, and it is a problem when creating test cases from thespecification. The simple example property above could easily be rewritten to be only valid ifc1 really occurs: EF (c1) ∧ AG (c1 → AF c2). Unfortunately, such rewriting is not a solutionin general, as properties can be more complex, and vacuity is therefore not as obvious as in theimplication in the example. The detection of vacuous passes has recently been considered in severalworks.

Vacuity detection and a property coverage criterion based upon vacuity were introduced in Sec-tion 3.6.1. Property coverage can be measured by creating a set of trap properties from each prop-erty f , such that there is one trap property f [φ ← �(φ)] for every φ in f , where �(φ) denotes thepolarity of φ. The resulting trap properties can be used for coverage measurement and to generatetest cases.

Replacement of atomic propositions with true and f alse, as used for vacuity analysis, is also per-formed by the common mutation operator StuckAt. The inclusion of this mutation operator in atest case generation approach that uses specification mutants will therefore automatically achievemaximum property coverage.

Property coverage and specification vacuity are complementary. Test case generation with specifi-cation mutation using StuckAt creates test cases for all properties that are not vacuously satisfied.Vacuously satisfied propositions can therefore be identified if the relevant mutants result in no testcases.

7.4 Summary

Summarizing, when test cases are generated by checking model mutants against the specificationand specification mutants against a model, both specification coverage and vacuity can be deter-mined without the necessity of any additional computation.

As the quality of requirements based testing very much depends on the quality of the requirementproperties, it is important to ensure that there are no errors in the properties, and that everything thatis implemented is covered by requirements. The methods presented in this chapter show how vacu-ity, a difficult type of error, can be detected, and how the coverage of the behavior by requirementproperties can be measured.

Chapter 8Nondeterministic Testing

Parts of the contents of this chapter have been published in (Fraser and Wotawa, 2007e) and (Fraserand Wotawa, 2007j).

8.1 Introduction

Nondeterminism is often used as a means of abstraction in system specifications. In general, asystem is nondeterministic, if given the same inputs at different times, different outputs can be pro-duced. Example reasons for nondeterminism are underspecification early in the development pro-cess or implementation choice. Nondeterminism is also necessary if the system or its environmentis not fully predictable.

In model-based testing, test cases are derived from an abstract model or specification. Nondetermin-istic specifications need different treatment than deterministic specifications in order to correctly testnondeterministic behavior. Nondeterminism in the implementation makes testing even more com-plicated. If the implementation is deterministic, then it is sufficient to run a test suite (i.e., set of testcases) once in order to derive a verdict about correctness. If, however, the implementation is non-deterministic, this is not sufficient because there is no guarantee that the behavior observed duringtesting is the same that will occur at runtime.

While nondeterminism is common in protocol testing (Petrenko et al., 1996) with state machines,and also possible in other approaches such as testing of labeled transition systems (e.g., (Jard andJéron, 2005)), nondeterminism is a problem in the context of testing with model checkers. Theidea of testing with model checkers is to use counterexamples, linear traces illustrating propertyviolations, as test cases. If there is nondeterminism, then a linear trace contains commitment to oneparticular nondeterministic choice. Applying such a test case to an implementation that makes adifferent but valid choice would falsely report a fault.

This chapter describes a solution that overcomes this problem. Nondeterministic choice in coun-terexamples is made explicit to prevent false verdicts. Extension of test cases with alternativebranches is used to fully cover nondeterministic specifications. The described methods can beapplied to any known model checker based testing technique.

119

120 Chapter 8. Nondeterministic Testing

8.2 Preliminaries

Chapter 3 described different approaches to generate test cases with a model checker. Regardless ofwhich approach is used, a counterexample is always the result of model checking a pair of a modelK and a property φ. Therefore, we define the requirement of a test case as the pair (K, φ):

Definition 46 (Test Requirement). A test requirement is a tuple RT = (K, φ), where K is a Kripkestructure, and φ is a property.

For a given test requirement RT = (K, φ), the property φ is supposed to be violated by the Kripkestructure K, such that model checking produces a counterexample c (K, c 6|= φ) that can be used fortesting.

Counterexamples resulting from test requirements are used as test cases. Therefore, test cases con-sist of both the test data and the expected output. The set of test cases resulting from model checkingall test requirements is a test suite. Test cases created by model checkers are deterministic, thereforethey cannot handle nondeterministic behavior in their basic form.

Definition 47 (Linear Test Case). A linear test case t is a sequence according to Definition 9.

Each state of a linear test case consists of value assignments for all variables in the model. Whenexecuting a test case created with a model checker, the test execution framework has to distinguishbetween input and output variables. Any variable or atomic proposition provided by the environ-ment and not calculated by the system is considered as input. The system response is referred to asits output. The input values of a counterexample are used as test data, and the output values of thecounterexample represent the expected values. The system under test is exercised with the test data,and the resulting outputs are compared with the expected values. If the expected values are correct,then the implementation passes the test case, else it fails. If we think of the implementation as aKripke structure Ki (which is usually referred to as test hypothesis), then the test case has to be atrace of Ki.

There are some basic properties of any testing approach that are important. Ideally, testing is com-plete, which means that every faulty implementation will be detected and no correct implementationwill be falsely identified as erroneous. This is usually not possible in practice because it would re-quire exhaustive testing, which mostly requires a large or infinite number of test cases. On the otherhand, it is important that a test case generation approach only results in sound test cases. Testing issound, if a test case can only fail on an implementation that is really erroneous. Testing is complete,iff it is sound and exhaustive. Informally, testing with model checkers is sound because every testcase t is a trace of model K, and a test case only fails if t is not a trace of the implementation.

While test case creation with model checkers assumes deterministic models, Kripke structures canbe used to describe both deterministic and nondeterministic systems. Intuitively, a model is nonde-terministic if presented with the same inputs at different times, different outputs may be generated.There is no distinction between input and output in Kripke structures, therefore nondeterministicbehavior is not immediately obvious. To allow such a distinction, modules11 were defined with apartitioning of input, output, and internal variables. According to Boroday et al. (2007), a moduleis deterministic if for each state s and each i ⊆ I, s has at most one successor state s′ with the inputInp(s′) = i, moreover, for all s, s′ ∈ S 0 Inp(s) = Inp(s′) implies s = s′.

8.2. Preliminaries 121

The testing methods presented in the Chapter 3 are unsound when testing nondeterministic systems.If the counterexample contains the transition (s, s′) at one step, but the implementation respondswith (s, s′′), then the common approach to model checker based testing would declare the test tohave failed. If, however, (s, s′′), is a valid transition, then the verdict should in fact be inconclu-sive.

As a solution, we extend the definition of linear test cases to linear nondeterministic test cases.

Definition 48 (Linear Nondeterministic Test Case). A linear nondeterministic test case tNL formodule M = (K, I,O) is a tuple tNL = (t,R), where t is a finite sequence t := 〈s0, s1, ...sn〉, andR = O × N→ {pass, fail, inconclusive} maps observed outputs and positions within the test case tothe verdicts pass, fail and inconclusive.

A linear nondeterministic test case extends a regular linear test case with a function that maps ob-servations during execution with the verdicts pass, fail and inconclusive. During test case executionthe outputs of the system under test are observed. Let Obs denote the observed values at the i-thstate; then R(Obs, i) evaluates to pass, iff the observed outputs are those expected by the linear testcase. If Obs differs from the expected values because of a nondeterministic choice, then the verdictis inconclusive. For all other cases, the verdict is fail. Note that the execution of a linear nondeter-ministic test case can be inconclusive even though the implementation is erroneous, if the deviationis only observable in a nondeterministic transition.

The use of linear nondeterministic test cases instead of deterministic test cases achieves that testingis sound: No correct implementation is rejected.

If a test case can possibly result in an inconclusive verdict, it is a weak test case according to thedefinitions given by Boroday et al. (2007). A weak test case can result in an inconclusive verdicton a nondeterministic implementation. A common solution is to repeatedly apply weak test cases,until either a pass or fail verdict is achieved.

If the execution of a test suite on an implementation results in too many inconclusive verdicts, thenthis reduces the effectiveness of testing. Moreover, an implementation might not fully implement allpossible nondeterministic behaviors specified in the model but always deterministically make thesame choice. In this case, a weak test case might never achieve a pass or fail verdict. Consequently,instead of only returning an inconclusive verdict, a test case should also be able to interpret alter-native paths that occur as a consequence of nondeterministic decisions. A test case could thereforebe seen as a tree-like structure of alternative execution paths instead of a linear trace. We definenondeterministic test cases in the style of Hierons’s adaptive test cases (Hierons, 2006). The set Tdenotes the domain of nondeterministic test cases.

Definition 49 (Nondeterministic Test Case). A nondeterministic test case tND ∈ T is recursivelydefined as one of the following:

• pass

• fail

• inconclusive

• (s,R), where s ∈ S , and R = S × T is a function that maps states to test cases.


Intuitively, a nondeterministic test case can be interpreted as a tree, where leaf nodes are eitherpass, fail or inconclusive, and all other nodes contain states. A nondeterministic test case is fullyexpanded if it contains no inconclusive verdicts. A test case that is not fully expanded can easily beextended by adding a new branch to an inconclusive node. A linear nondeterministic test case is aspecial case of a nondeterministic test case, where every node except the final state has exactly onechild node. Nondeterministic test cases are executed similarly to linear nondeterministic test cases,except that there are alternatives for the next input at nondeterministic branches, depending on theobserved outputs. Execution stops, when a pass, fail, or inconclusive node is reached.

Nondeterministic test cases can be extended at nondeterministic branching points. Even if there areseveral branches, a deviation at the state of these branches is still inconclusive, unless the branch isfully expanded; that is, if the branch contains all possible alternatives.

8.3 Deriving Test Cases from Nondeterministic Models

This section describes how model checkers can be used to create nondeterministic test cases. Ratherthan inventing a completely new approach to test case generation, the presented method is an exten-sion to currently used techniques.

checkmodel

executetests

Testrequirements

counterexamples verdict

Figure 8.1: Traditional test case generation with model checkers.

The normal approach to test case generation with model checkers is to create a set of test require-ments, and then to call the model checker for each of these test requirements. The result is a set ofcounterexamples, which are used as test cases. This is illustrated in Figure 8.1.

If this approach is applied to a nondeterministic model, then the model checker makes decisions atthe nondeterministic transitions. Resulting counterexamples are linear sequences representing thesechoices, and can as such only be used as deterministic test cases. However, there is no guaranteethat an implementation will behave identically as described by the test case.

Therefore, the standard approach is extended. Figure 8.2 illustrates an approach that can be used tocreate nondeterministic test cases. In detail, the steps are as follows:

1. Create counterexamples: The first step is identical to normal test case generation. A set oftest requirements is created, each consisting of a model and a property. By calling the modelchecker on each of the test requirements, a set of linear counterexamples is created.

2. Create linear nondeterministic test cases: Using information about the nondeterministicchoices in the model, the linear counterexamples are enhanced to linear nondeterministic testcases.

3. Execute test cases: The resulting test cases are executed. Whenever an inconclusive verdictis encountered, the test requirement and the state causing the verdict are recorded.

8.3. Deriving Test Cases from Nondeterministic Models 123

checkmodel

createlin. tests

executetests

createrequire-ments

mergetests

checkmodel

Testrequirements

counterexamples

lineartests verdicts

testrequ.

counterexamples

adaptivetests

inconclusivestates

1 2 3/7 4

6 5

Figure 8.2: Creating test cases from a nondeterministic model iteratively.

4. Create new test requirements: For all inconclusive results, new test requirements arecreated from the original, unfulfilled test requirements. These test requirements consist of amodel, where the initial state is set to the state that caused the inconclusive result, and theoriginal property.

5. Create new branches: The model checker is called on the new test requirements. Theresulting counterexamples represent linear test cases for the chosen nondeterministic branch.

6. Merge test cases: The new counterexamples are used to extend the corresponding originalnondeterministic test case. This is achieved by attaching the new test case at the position ofthe nondeterministic choice in the nondeterministic test case.

7. Execute extended test cases: The extended test cases are executed. If inconclusive verdictsare encountered, then the method proceeds with step 4, until no more inconclusive verdictsremain.

The following scenario might cause problems: Consider a model that consists of a nondeterministicchoice with regard to the transitions (sl, sm) and (sl, sn), and a corresponding linear nondeterministictest case tNL = (t,R), where the described sequence is t := 〈s0, ...sl, sm, ...〉. If the test case isexecuted on an implementation which always chooses (sl, sn) rather than (sl, sm), then the verdict isinconclusive, and the test case has to be extended. While this extension te does begin with (sl, sn)and therefore resolves one inconclusive verdict, it could still look like this: te := 〈sl, sn, ....sl, sm, ...〉;that is, it could include the same nondeterministic transition again. Assuming an implementationthat never chooses (sl, sm), the test requirement cannot be fulfilled on this path.

To overcome this problem, it is necessary to avoid the occurrence of the transition (sl, sm) whenextending test cases. If the implementation is deterministic, then this can be achieved by refiningthe model with regard to the nondeterministic transition. However, a test case extension te madefor the nondeterministic choice of (sl, sn) over (sl, sm) might itself contain another nondeterministicchoice with regard to, e.g., (sx, sy) and (sx, sz), possibly requiring yet another branch. This branchmight contain another nondeterministic choice, and so on. Consequently, many refinement iterationswould be necessary. A possible alternative is to force the model checker to create only deterministictraces (e.g., using invariants) as extensions. In the worst case, the process has to be stopped after acertain number of iterations.


8.4 NuSMV and Nondeterministic Models

Model checkers are based on Kripke structures as model formalism. A Kripke structure K is definedas a tuple K = (S , S 0,T, L), where S is the set of states, S 0 ⊆ S is the set of initial states, T ⊆ S × Sis the transition relation, and L : S → 2AP is the labeling function that maps each state to a set ofatomic propositions that hold in this state. AP is the countable set of atomic propositions.

When a nondeterministic model is presented with the same inputs at different times, different outputsmay be generated. In automaton-based formalisms nondeterminism if often explicit, for example ifthere are two transitions with the same label or input action in a finite state machine (FSM). Whenusing Kripke structures this is not the case, because there is no distinction between input and output.Consequently, counterexamples contain no indication about nondeterministic choice.

Kripke structures, however, are usually not specified explicitly but with a more intuitive descriptionmethod. For example, in many model checkers models are specified by describing the transitionrelations of the variables that make up a state, using logical expressions over these variables. At thislevel of abstraction, distinction between input and output is possible.

In NuSMV (Cimatti et al., 1999), the transition relation of a variable is either defined in a TRANSor ASSIGN section. We use the ASSIGN method in this chapter, but as TRANS is only a syntacticvariation there are no limitations when applying the presented techniques to it. Figure 12.3 showshow such an ASSIGN section looks like. The first transition described in Figure 12.3 is deterministic,that is, whenever condition1 is encountered, var is assigned next_value1 in the next state. Here,condition1 can be any logical formula on the variables defined in the model.

NuSMV allows nondeterministic assignments for variables, where the assigned value is chosen outof a set expression or a numerical range. The second transition in Figure 12.3 is nondeterministic.Upon condition2, var is nondeterministically assigned either next_valuea or next_valueb. Withregard to the Kripke structure, each condition corresponds to a set of transitions T ′, where for each(si, s j) ∈ T ′ the condition has to be fulfilled by L(si), and the proposition var = next_value iscontained in L(s j).

ASSIGNnext(var) := case

condition1: next_value1;condition2: {next_valuea, next_valueb};...

esac;

Figure 8.3: ASSIGN section of an SMV file. The transition relation of a variable var is given as aset of conditions and next values. The first transition is deterministic, while the secondis nondeterministic.

8.5 Creating Test Cases with Nondeterministic NuSMV Models

If the model that is used to derive test cases contains nondeterminism, then the direct interpreta-tion of counterexamples as test cases is not always possible. A linear counterexample contains a

8.5. Creating Test Cases with Nondeterministic NuSMV Models 125

commitment to one possibility at each nondeterministic choice. If such a counterexample is usedas a test case and is executed on an implementation that makes a different but valid choice, then theexecution framework reports a fault, as the expected output is not observed.

To overcome this problem, we describe a solution that consists of several steps: First, an initial setof test cases is produced with any traditional model checker testing technique. Then, the test casesare extended so that nondeterministic choice can be identified. If the observed values do not matchthe expected values because of nondeterminism, the verdict is inconclusive instead of fail. If testcase execution leads to too many inconclusive verdicts, then the result of the execution of a set oftest cases is less expressive. Therefore, we show how test cases can be extended such that alternativeexecution paths can be considered, and inconclusive results be resolved.

8.5.1 Identifying Nondeterministic Choice

A counterexample created from a NuSMV model contains no indication about which choices weremade deterministically and which were made nondeterministically. While the Kripke structure doesnot distinguish between input and output values, this distinction is possible in the NuSMV modeldescription.

Nondeterministic choice can highlighted with a special Boolean variable, which we will call ND_var.For each variable var that has nondeterministic transitions there is a distinct ND_var that indicateswhether a nondeterministic choice was made for var. The transition relation of ND_var uses thesame conditions as var, and is set to true whenever var has a nondeterministic choice, else it is set tofalse. The initial value of ND_var depends on whether there are any nondeterministic initializations.If there are, then ND_var is initialized with true/1, else with false/0. For the example transitionrelation given in Figure 12.3, this is illustrated in Figure 8.4. This annotation is straight forward andcan easily be automated. It is conceivable to extend this approach such that each ND_var has onedistinct value for each possible nondeterministic choice.

VAR ND_var: boolean;ASSIGNinit(ND_var) := 0;next(ND_var) := case

condition1: 0;condition2: 1;...

esac;

Figure 8.4: Nondeterministic choice is marked with a dedicated variable. The conditions equal thoseof the variable var that is observed with regard to nondeterministic choice.

Any counterexample created from such an annotated model contains indications of which choiceswere made nondeterministically: A variable var was changed nondeterministically if ND_var is true.Consequently, if during test case execution an observed value deviates from the expected value, thetest case execution framework only reports fail if the deviation cannot be caused by nondeterminism;else inconclusive is reported.


This approach guarantees that no valid nondeterministic choice is falsely identified as a fault. How-ever, a test case execution that really fails at such an execution step can also be reported as in-conclusive, as the allowed alternative values are not contained in the counterexample. Inconclusiveresults can be verified using a model checker. For example, assume a test case t := 〈s0, ...sx, sy, ...sn〉

executed on an implementation that instead of (sx, sy) takes the transition (sx, sz). Assuming thatthis leads to an inconclusive verdict, the model checker can be queried whether (sx, sz) is a validtransition by claiming that such a transition does not exist. For example, this could be expressed asfollows:

� sx → ©¬sy

Here, sx and sz represent the observed states as a logical expression (e.g., conjunction of all atomicpropositions valid in that state). If this property is checked against the model, then a counterexampleindicates that the transition is valid (the test case is really inconclusive), else the transition is notvalid (a fault was detected).

Even though a set of linear nondeterministic test cases can directly be executed on an actual imple-mentation, the result might not be satisfactory if there are too many inconclusive verdicts. Therefore,inconclusive verdicts are resolved by extending test cases with alternative branches.

8.5.2 Extending Nondeterministic Test Cases

The chances of reaching a pass or fail verdict instead of an inconclusive verdict with a test case arehigher, if the test case can handle alternative execution branches caused by nondeterminism. Such atest case can intuitively be interpreted as a tree instead of a linear sequence. As model checkers onlycreate linear sequences, a possibility to create tree-like test cases is to perform iterative extension.

Whenever an inconclusive verdict is observed during test case execution, the according test case isextended with an alternative branch. Consequently, the number of inconclusive results is iterativelyreduced by extending and re-executing inconclusive test cases, until the execution is conclusive.

Test case extension with a model checker is quite simple. When a test case is first created, it is theresult of checking a model against a property. The same model and property are used to extend thetest case, but the initial state of the model is set to the state that caused the inconclusive result. Forexample, if test case t := 〈s0, ...sx, sy, ...sn〉 is inconclusive because the implementation takes (sx, sz)instead of (sx, sy), then the initial state of the model is set to sz. In NuSMV this can be done usinginit(var):=value expressions for all variables, where value is the value of var in sz.

A counterexample created from this new model and property fulfills the same purpose as the originaltest case (i.e., it shows the same property violation), but begins in sz. Consequently, it can be usedas an alternative branch of the original test case t.

8.5.3 Improving the Test-Case Extension Process

There is no guarantee that a test case extension calculated upon an inconclusive verdict is free ofnondeterministic transitions. As was noted above, this might make it necessary to run throughmany iterations of calculating test case extensions. If fairness with regard to the nondeterministic

8.6. Coverage of Nondeterministic Systems 127

choice is not guaranteed, then the approach can theoretically not terminate at all. Therefore, modelrefinement and avoidance of nondeterministic choice were proposed as improvements.

Refinement of the model can only be done if the implementation is deterministic. Then, if a validnondeterministic transition (sd, sn) is observed, the model can be refined to also choose this tran-sition. In NuSMV, the conditions within a case-statement are ordered such that, for example, thesecond condition is only evaluated if the first condition evaluates to false. Therefore, we can add anew transition description at the beginning of the case-statement, where the condition is the con-junction of all atomic propositions in L(sd), and the next state is the value n of var in L(sn), i.e.,var = n ∈ L(sn). This is illustrated in Listing 8.1 for the example transition relation first given inListing 12.3. This refinement can be performed after each inconclusive verdict. The next time themodel checker calculates a counterexample it will take the same transition as the implementationdoes, which increases the chances of a pass or fail verdict.

ASSIGNinit(var) := 0;next(var) := case∧

x∈L(sd) x: n;condition1: next_value1;condition2: {next_valuea, next_valueb};...

esac;

Listing 8.1: Refinement of a nondeterministic transition relation with regard to a deterministicimplementation, that chooses next value n after state sd.

If the implementation is not deterministic, then we can only terminate after a certain number ofiterations if a looping behavior occurs, or try to avoid nondeterministic transitions in the first place.In NuSMV, the latter can easily be achieved by adding invariants to the model:

INVAR ND_var != 1

This example invariant achieves that NuSMV only considers states where the variable var does nothave to make a nondeterministic choice. Of course there is no guarantee that the test requirementcan be fulfilled on such a path, so it might be necessary to call the model checker a second timewithout the invariant.

8.6 Coverage of Nondeterministic Systems

Test cases can easily be extended using the presented techniques, but even if the execution of a testsuite results in no inconclusive verdicts this does not guarantee thorough testing. A weak test suitemight only explore a small subset of the possible behavior and avoid nondeterministic transitions.Therefore, this section considers coverage measurement for nondeterministic systems.

The use of nondeterminism does not only have an influence on the test case generation, but also oncoverage measurement. In general, coverage criteria are used to evaluate how well certain aspects


of a system are exercised by a test suite. For example, transition coverage measures how many of asystem’s transitions have been executed.

If a model has nondeterministic transitions, this does not automatically mean that an implementationunder test (IUT) has to implement all possible transitions. Therefore, when testing a deterministicimplementation, coverage of all possible nondeterministic transitions might not be possible.

In contrast, if the IUT is nondeterministic there is no guarantee that the behavior observed duringtesting is the same that will occur at runtime. It is therefore common to execute test cases repeat-edly to increase certainty that the majority of possible behavior has been observed. In this case,it is advantageous to include all possible outcomes of a nondeterministic choice in the coveragecriterion.

Coverage analysis with model checkers represents test cases as models by adding a special statecounter variable, and setting all other variables depending only on the value of this state counter (Am-mann and Black, 1999b). Normally, coverage of test suites created with model checkers can bemeasured without an implementation. Coverage of nondeterministic systems is not measured onthe test suite itself but on execution traces created by the test case execution, because even if a testcase can cover a nondeterministic transition, there is no guarantee that the IUT also does so. Theexecution traces can be represented as verifiable models just like test cases, as described by Am-mann and Black (1999b); in the model, the values of all variables of the trace are set according to aspecial state counter variable.

We consider the coverage criteria defined by Rayadurgam and Heimdahl (2001b). The model isinterpreted as a transition system M = (D,∆, ρ), as introduced in Definition 8. D represents thestate space, ∆ represents the transition relation and ρ characterizes the initial system state. A tran-sition is defined as a tuple of logical predicates (α, β, γ), specifying pre-state, post-state, and guard,respectively. A NuSMV model is also a transition system, where conditions in the NuSMV sourcerepresent guard predicates. To allow for nondeterministic transitions, we assume that there are mul-tiple different post-states, and extend the definition of a transition to (α, B, γ), where B is a set ofpredicates describing possible post-states.

Simple Transition Coverage The simple transition coverage criterion requires for each variablethat all transitions are taken. In (Rayadurgam and Heimdahl, 2001b), this is defined such that,"for any simple transition (α, β, γ) of any variable x there exists a test case s such that for some i,α(si) ∧ β(si, si+1) ∧ γ(si, si+1) holds." Here, si denotes the i-th state of test case s.

When testing with model checkers, a coverage criterion can be represented as a set of trap properties.These trap properties can either be used to create a test suite that satisfies the coverage criterion, or tomeasure the coverage of a given test suite. For example, consider the NuSMV transition descriptiongiven in Figure 12.3. For variable var, there is one transition to next_value1 upon condition1. Thistransition can be represented as a single trap property, which claims that upon condition1, var neverequals next_value1 in the next state:

� condition1 → ©¬(var = next_value1)

Checking such a trap property against a model results in a counterexample that takes the transitiondescribed by this property, and model checking test cases against this trap property results in a


counterexample if the transition is taken by a test case. The idea of formulating test cases as NuSMVmodels and then model checking these against trap properties is described in detail in (Ammann andBlack, 1999b).

The second condition in Figure 12.3 contains a nondeterministic choice. Upon condition2, var caneither be assigned next_valuea or next_valueb. Accordingly, we can define different versions ofsimple transition coverage:

Deterministic Simple Transition Coverage assumes a deterministic implementation. Therefore,if there is a nondeterministic transition in the model, this transition is covered if any of the possibletransitions is taken. With regard to the transition model, this requires for each transition (α, B, γ)that there exists a test case s such that for some i:

∃β ∈ B : α(si) ∧ β(si, si+1) ∧ γ(si, si+1)

si denotes the i-th state of test case s. For Figure 12.3, this results in the following trap properties:


� condition2 → ©¬(var = next_valuea

∨var = next_valueb)

Nondeterministic Simple Transition Coverage measures coverage of only the nondeterministictransitions. It considers transitions (α, B, γ) with |B| > 1. For Figure 12.3, this results in the follow-ing trap properties:

� condition2 → ©¬(var = next_valuea)

� condition2 → ©¬(var = next_valueb)

If the IUT is nondeterministic, then testing has to be repeated until the tester is confident that allpossible behaviors have been observed. Nondeterministic coverage can be used to guide such adecision. For example, as a minimal criterion, testing might be continued at least until all nondeter-ministic transitions have been fully covered.

Full Simple Transition Coverage is based on all possible transitions; i.e., if a transition is non-deterministic, all possible outcomes are included. It is a combination of deterministic and nonde-terministic simple transition coverage. With regard to the transition model, this requires for eachtransition (α, B, γ) that there exists a test case s for every β ∈ B, such that for some i:

α(si) ∧ β(si, si+1) ∧ γ(si, si+1)

For Figure 12.3, this results in the following trap properties:


� condition2 → ©¬(var = next_valuea)

� condition2 → ©¬(var = next_valueb)


Simple Guard Coverage Simple guard coverage extends simple transition coverage such that atransition is covered, if there is a test case where it is taken, and one where it is not taken. Sim-ilarly to the versions of transition coverage, the original definition of simple guard coverage in(Rayadurgam and Heimdahl, 2001b) can be extended to the following:

Deterministic Simple Guard Coverage requires for each transition (α, B, γ) that there exist testcases s and t such that for some i and j:

∃β1 ∈ B : α(si) ∧ β1(si, si+1) ∧ γ(si, si+1)

∃β2 ∈ B : α(t j) ∧ ¬β2(t j, t j+1) ∧ ¬γ(t j, t j+1)

Nondeterministic Simple Guard Coverage requires for each nondeterministic transition (α, B, γ)where |B| > 1 that there exist test cases s and t for every β ∈ B, such that for some i and j:

α(si) ∧ β(si, si+1) ∧ γ(si, si+1)

α(t j) ∧ ¬β(t j, t j+1) ∧ ¬γ(t j, t j+1)

Full simple guard coverage can defined as a combination of deterministic and nondeterministicsimple guard coverage.

Complete Guard Coverage Complete guard coverage is similar to the multiple condition cover-age criterion applied to source code. A guard or condition consists of one or more clauses. Completeguard coverage requires all possible combinations of truth values of the guard clauses to be tested.

Deterministic Complete Guard Coverage requires for each transition (α, B, γ), where γ consistsof clauses {c1, ..., cl}, that there is a test case t for any given boolean vector u of length l, such thatfor some i:

∃β ∈ B :l∧

k=1

(ck(ti, ti+1) = uk)

Nondeterministic Complete Guard Coverage requires for each transition (α, B, γ), where |B| > 1and γ = {c1, ..., cl}, that there is a test case t for every β ∈ B and any given boolean vector u of lengthl, such that for some i:

l∧k=1

(ck(ti, ti+1) = uk)

Full complete guard coverage can defined as a combination of deterministic and nondeterministiccomplete guard coverage.

Clause-wise Guard Coverage Clause-wise guard coverage is a variant of MC/DC. The defini-tion given in (Rayadurgam and Heimdahl, 2001b) can be adapted similarly to the previous criteria.


Deterministic Complete Guard Coverage requires for each transition (α, B, γ), where γ consistsof clauses {c1, ..., cl}, that there is a test case t for any given boolean vector u of length l, such thatfor some i:

∃β ∈ B :l∧

k=1

(ck(ti, ti+1) = uk)

Nondeterministic Complete Guard Coverage requires for each transition (α, B, γ), where |B| > 1and γ = {c1, ..., cl}, that there is a test case t for every β ∈ B and any given boolean vector u of lengthl, such that for some i:

l∧k=1

(ck(ti, ti+1) = uk)

Full complete guard coverage can defined as a combination of deterministic and nondeterministiccomplete guard coverage.

Clause-wise Guard Coverage Clause-wise guard coverage is a variant of MC/DC (Chilenskiand Miller, 1994). The definition given in (Rayadurgam and Heimdahl, 2001b) can be adaptedsimilarly to the previous criteria.

In general, a test suite achieves clause-wise guard coverage criterion for a basic transition systemM if it covers each clause of each guard γi, j in every simple transition δi, j in ∆. The definition ofwhen a clause is covered differes for deterministic and nondeterministic coverage:

For Deterministic Clause-wise Guard Coverage, a clause c of the guard γ of a simple transition(α, B, γ) is covered, if a test suite contains two test cases s and t, such that for some i and j:

1. ∃β ∈ B : α(si) ∧ β(si, si+1) ∧ γ(si, si+1): i.e., the simple transition is taken in the transitionfrom si to si+1 in the test case s,

2. ∃β ∈ B : α(t j) ∧ ¬β(t j, t j+1) ∧ ¬γ(t j, t j+1): i.e., the simple transition is not taken in thetransition from t j to t j+1 in the test case t,

3. c(si, si+1) = ¬c(t j, t j+1); i.e., the value of the clause c of the guard γ differs for the twotransitions, and

4. ∀d , c in γ, d(su, si+1) = d(t j, t j+1); i.e., all other clauses in the guard γ have the same valuein both the transitions.

For Nondeterministic Clause-wise Guard Coverage, a clause c of the guard γ of a simple transi-tion (α, B, γ) is covered, if a test suite contains two test cases s and t for each β ∈ B, such that forsome i and j:

1. α(si) ∧ β(si, si+1) ∧ γ(si, si+1): i.e., the simple transition is taken in the transition from si tosi+1 in the test case s,

2. α(t j) ∧ ¬β(t j, t j+1) ∧ ¬γ(t j, t j+1): i.e., the simple transition is not taken in the transition fromt j to t j+1 in the test case t,

3. c(si, si+1) = ¬c(t j, t j+1); i.e., the value of the clause c of the guard γ differs for the twotransitions, and


next(Overridden):= casePressure = TooLow & !(Pressure = next(Pressure) ): 0;Pressure = TooLow & !(Reset = On) & next(Reset) = On: 0;Pressure = TooLow & !(Block = On) & next(Block) = On & Reset = Off: 1;Pressure = Permitted & !(Pressure = next(Pressure) ): 0;Pressure = Permitted & !(Reset = On) & next(Reset) = On: 0;Pressure = Permitted & !(Block = On) & next(Block) = On & Reset=Off: 1;Pressure = High & !(Pressure = next(Pressure) ): 0;1: Overridden;

esac;

Figure 8.5: Deterministic version of the NuSMV transition relation for variable Overridden.

4. ∀d , c in γ, d(su, si+1) = d(t j, t j+1); i.e., all other clauses in the guard γ have the same valuein both the transitions.

Again, full clause-wise guard coverage can defined as a combination of deterministic and nondeter-ministic clause-wise guard coverage.

8.7 Experimental Results

This section presents the results of an empirical evaluation using two example NuSMV models. TheSafety Injection System (SIS) example was introduced by Bharadwaj and Heitmeyer (1999) and haspreviously been used for testing research (e.g., (Gargantini and Heitmeyer, 1999)). The system isresponsible for injecting reserve water in a nuclear reactor safety system if the water pressure is toolow. Depending on various conditions, the system is overridden. Cruise Control (CC) is based on aversion by Kirby (1987), and has also been used several times for automated test case generation,e.g., (Ammann et al., 1998, 2001).

Details of the models can be found in (Bharadwaj and Heitmeyer, 1999) and (Kirby, 1987). Wemodified both models to be nondeterministic with regard to overriding; i.e., the output is chosennondeterministically in most cases when overriding is activated.

Listing 8.5 shows the deterministic version of the transition relation of variable Overridden forthe SIS example. As an example, we assume that the exact behavior with regard to overriding isnot fully specified (underspecification); e.g., it might not be of interest to the verification process; itcould also be too early in the development process so that the full behavior is not known, or maybeit simply is an allowed implementation choice. The specification should only require the safetyinjection to work correctly. For this, a nondeterministic version of the specification is created.

Listing 8.6 lists the relevant section containing the nondeterministic decisions of the NuSMV model.The NuSMV model is used to derive test-cases with different coverage criteria. For each criteriona set of trap properties is created. Trap properties are created from the deterministic model. Inour experiments, test-cases are executed on a simple Python implementation that conforms to thedeterministic model.

8.7. Experimental Results 133

next(Overridden):= casePressure=TooLow & Pressure!=next(Pressure): 0;Pressure=TooLow & Reset!=On & next(Reset)=On: 0;Pressure=TooLow & Block!=On & next(Block)=On & Reset=Off: 1;Pressure=Permitted & Pressure!=next(Pressure): {0,1};Pressure=Permitted & Reset!=On & next(Reset)=On: {0,1};Pressure=Permitted & Block!=On & next(Block)=On & Reset=Off: 1;Pressure=High & Pressure!=next(Pressure): {0,1};1: Overridden;

esac;

Figure 8.6: Nondeterministic transition relation for variable Overridden.

Table 8.1: Test case generation and extension for the SIS example.Method Initial Tests Deterministic IUT Nondet. IUT

Inconclusive Iterations Inconclusive IterationsState 211 0% 0 0% 0Transition 13 7.7% 1 29.2% 1.8Condition 27 11.1% 1 19.6% 1.9Transition-Pair 87 24.1% 3 32.4% 4.1Reflection 289 11.1% 4 14.6% 3.2

The deterministic models are used to create sets of trap properties for test case generation for differ-ent common criteria: State coverage requires each variable to take all its values, Transition coveragerequires all transitions described in the NuSMV model to be taken, Condition coverage tests the ef-fects of atomic propositions within transition conditions, and Transition Pair coverage requires allpossible pairs of transitions described in the NuSMV model to be taken. Finally, Reflection de-scribes a set of trap properties that is created by reflecting the transition relation as properties, andthen applying various mutation operators to these properties (Black, 2000).

Figure 8.7 shows an example trace generated for the trap property � (Pressure = High∧¬((WaterPres <Permitted)) → © (¬(WaterPres < Permit) → (Pressure = Permitted)), edited for brevity. To theright of the counterexample, the according linear non-deterministic test-case is depicted. States areabbreviated with their number according to the counterexample. The nondeterministic choice ishighlighted with a gray node. This nondeterministic choice differs from how the implementationperforms, therefore the test-case execution is inconclusive (Overridden is true in state 34). Fig-ure 8.8 shows the counterexample resulting from the new test requirement, where the initial stateequals the choice made the implementation (state 34). The original test-case and the new test-caseare merged (the new counterexample is appended as alternative branch to state 33), and now theimplementation passes the test-case.

Test cases are generated using the trap properties and the nondeterministic models. The models areimplemented in Python as both deterministic and nondeterministic versions, in order to experimentwith test case extension and execution.


-> State: 1.1 <-Reset = OnOverridden = 0Block = OffWaterPres = 2Pressure = TooLowSafetyInjection = OnND_Overridden = 0

-> State: 1.2 <-WaterPres = 5

-> State: 1.3 <-WaterPres = 8

...

-> State: 1.34 <-Overridden = 1WaterPres = 101Pressure = HighND_Overridden = 1

-> State: 1.35 <-Block = OnND_Overridden = 0

1

2

3

33

34

35

Figure 8.7: NuSMV Counterexample for the trap property � (Pressure = High ∧ ¬((WaterPres <Permitted)) → © (¬(WaterPres < Permit) → (Pressure = Permitted)), and corre-sponding linear test-case. States are abbreviated by their numbers in the counterexam-ple. Nondeterministic choice is highlighted with a gray node.

-> State: 1.1 <-Reset = OnOverridden = 0Block = OffWaterPres = 101Pressure = HighSafetyInjection = OffND_Overridden = 0

-> State: 1.2 <-Reset = OffWaterPres = 109

33

34a

35a

34b

35b

Figure 8.8: NuSMV Counterexample for the same trap property � (Pressure = High ∧¬((WaterPres < Permitted)) → © (¬(WaterPres < Permit) → (Pressure =

Permitted)) but new initial state. The counterexample is used to extend the nonde-terministic test-case.

The results of the test case generation are listed in Tables 8.1 and 8.2. Values for nondeterministic

8.8. Summary 135

Table 8.2: Test case generation and extension for the CC example.Method Initial Tests Deterministic IUT Nondet. IUT

Inconclusive Iterations Inconclusive IterationsState 15 0% 0 0% 0.0Transition 12 8.3% 4 28.3% 6.2Condition 41 9.3% 4 34.1% 4.4Transition-Pair 63 46.0% 2 45.4% 3.6Reflection 272 23.2% 5 30.5% 10.0

Table 8.3: Simple guard coverage after execution on a nondeterministic IUT without extension.Method SIS CC

Det. Nondet. Full Det. Nondet. FullState 76.9% 58.3% 75.0% 72.2% 62.5% 66.7%Transition 88.1% 69.2% 85.6% 100% 75.0% 91.7%Condition 84.6% 74.2% 81.25% 100% 73.8% 91.3%Transition-Pair 95.4% 82.5% 93.1% 100% 78.8% 92.9%Reflection 96.3% 90.8% 95.4% 100% 86.3% 95.4%

IUTs are averaged over 10 runs. The number of inconclusive verdicts is given as a percentage ofthe size of the initial test suite, and the number of iterations necessary to resolve all inconclusiveverdicts is also given. State coverage is a weak criterion and leads to very short test cases, whichresulted in no inconclusive verdicts. Transition-pair coverage creates the longest test cases of theconsidered trap properties, and therefore nondeterministic transitions occur more often than with theother criteria. With the exception of transition-pair coverage test suites, the number of inconclusiveverdicts is relatively small. The number of iterations necessary to remove all inconclusive verdictsis slightly larger for a nondeterministic implementation, but it is still small enough to make theapproach feasible.

Table 8.3 lists the coverage values determined after executing the initial test suites on nondeter-ministic implementations; the values are again averaged over 10 runs. Simple guard coverage waschosen as an example criterion. If the IUT is deterministic, then deterministic coverage is a morerealistic criterion, while full coverage might not be achievable. Deterministic coverage is achievedrather quickly with a nondeterministic IUT, therefore nondeterministic coverage is better suited asan indicator whether a nondeterministic IUT needs more testing iterations.

8.8 Summary

In this chapter, we have presented an extension to model checker based techniques for test case gen-eration and coverage measurement that allows the use of nondeterministic models and implemen-tations. Current techniques are limited to deterministic models and deterministic systems, becausethe counterexamples that are used as test cases are deterministic, even if the corresponding model isnondeterministic. Using NuSMV as an example language, a straight forward rewriting to add infor-mation about nondeterministic choice in counterexamples was presented. This allows to distinguish


between test cases that fail because of errors, and test cases that are inconclusive because an alter-native nondeterministic path was chosen. It was shown how in case of inconclusive results lineartest cases can be extended to tree-like test cases that can cope with alternative branches. Coveragecriteria based on nondeterministic models and coverage measurement of nondeterministic systemswere discussed, and several known coverage criteria were adapted to a nondeterministic setting.

The presented methods apply to both deterministic and nondeterministic implementations, and someof the coverage criteria are especially useful when testing nondeterministic implementations. Theapplicability depends on the amount of nondeterminism in the system that is to be tested. Asyn-chronous, distributed systems are likely to cause too many inconclusive results in order for themethods to be feasible. Therefore, the intended application domains include nondeterminism as ameans of underspecification or implementation choice, and limited nondeterminism in the IUT.

Current model checkers do not indicate nondeterministic choice in their counterexamples. It is,however, conceivable to extend counterexample generation algorithms such that indicators are in-cluded automatically, avoiding the need to rewrite the model.

Chapter 9Test Suite Minimization

The contents of this chapter have been published in (Fraser and Wotawa, 2007b).

9.1 Introduction

Software testing is a process that consumes a large part of the effort and resources involved insoftware development. Especially during regression testing, when software is re-tested after somemodifications, the size of the test suite has a large impact on the total costs. Therefore, the idea oftest suite reduction (also referred to as test suite minimization) is to find a minimal subset of the testsuite that is sufficient to achieve the given test requirements.

Various heuristics have been proposed to approximate a minimal subset of the test suite. Thesetechniques can reduce the number of test cases in a test suite significantly. However, experimentshave revealed that the quality of the test suite suffers from this minimization. Even though the testrequirements with regard to which the minimization was made are still fulfilled by the minimizedtest suite, it has been shown that the overall ability to detect faults is reduced. In many scenarios,especially in the case of safety related software, such a degradation is unacceptable.

This chapter introduces a novel approach to test suite reduction. This approach tries to identifythose parts of the test cases that are truly redundant. Redundancy in this context means that thereare no faults that can be detected with the redundant part of a test case, and not without. Instead ofdiscarding test cases out of a test suite, the test cases are transformed such that the redundancy isavoided. That way, the test suite is minimized with regard to the number of test cases and the totalnumber of states, while neither test coverage nor fault detection ability suffer from the degradationexperienced in previous approaches.

The approach uses the state information that is included in functional tests created with modelchecker based test case generation approaches. The model checker is also used within an optimizedversion of the approach. An empirical evaluation shows that the approach is feasible.

137

138 Chapter 9. Test Suite Minimization

9.2 Traditional Test Suite Reduction

During regression testing the software is re-tested after some modifications. The costs of running acomplete test suite against the software repeatedly can be quite high. In general, not all test casesof a test suite are necessary to fulfill some given test requirements. Therefore, the aim of test suitereduction is to find a subset of the test cases that still fulfills the test requirements. The original testsuite reduction problem is defined by Harrold et al. (1993) as follows:

Given: A test suite TS , a set of requirements r1, r2, , , rn that must be satisfied to provide the desiredtest coverage of the program, and subsets of TS , T1,T2, ...,Tn, one associated with each ofthe ris such that any one of the test cases t j belonging to Ti can be used to test ri.

Problem: Find a representative set of test cases from TS that satisfies all ris.

The requirements ri can represent any test case requirements, e.g., test coverage. A representativeset of test cases must contain at least one test case from each subset Ti. The problem of finding theoptimal (minimal) subset is NP-hard. Therefore, several heuristics have been presented (Harroldet al., 1993; Gregg Rothermel, 2002; Zhong et al., 2006).

Test suite reduction results in a new test suite, where only the relevant subset remains and the othertest cases are discarded. Intuitively, removing any test case might reduce the overall ability of thetest suite to detect faults. In fact, several experiments (Jones and Harrold, 2003; Rothermel et al.,1998; Heimdahl and Devaraj, 2004) have shown that this is indeed the case, although there are otherclaims (Wong et al., 1995). Note that the reduction of fault sensitivity would also occur when usingan optimal instead of a heuristic solution. A different use of minimization is presented by Zeller(2002). Here, test data is minimized in order to isolate failure causes.

In this chapter we introduce a new approach to test suite minimization which does not have a nega-tive influence on the fault detection ability.

9.3 Test Suite Redundancy

Previously, redundancy was used to describe test cases that are not needed in order to achieve acertain coverage criterion. As the removal of such test cases leads to a reduced fault detectionability, they are not really redundant in a generic way. In contrast, we say a test case containsredundancy if part of the test case does not contribute to the fault detection ability. This sectionaims to identify such redundancy, and describes possibilities to reduce it.

9.3.1 Identifying Redundancy

Intuitively, identical test cases are redundant. For any two test cases t1, t2 such that t1 = t2, any faultthat can be detected by t1 is also identified by t2 and vice versa, assuming the test case executionframework assures identical preconditions for both tests. Similarly, the achieved coverage for anycoverage criterion is identical for both t1 and t2. Clearly, a test suite does not need both t1 and t2.

The same consideration applies to two test cases t1 and t2, where t1 is a prefix of t2. t1 is subsumedby t2, therefore any fault that can be detected by t1 is also detected by t2 (but not vice versa). In this

9.3. Test Suite Redundancy 139

case, t1 is redundant and is not needed in any test suite that contains t2. In model-based testing it iscommon practice to discard subsumed and identical test cases at test case generation time (Ammannet al., 1998).

This leads to the kind of redundancy which we are interested in: Model checker based test casegeneration techniques often lead to such test suites where all test cases begin with the same initialstate. From this state on different paths are taken, but many of these paths are equal up to a certainstate. Any fault that occurs within such a sub-path can be detected by any of the test cases thatbegins with this sub-path. Within these test cases, the sub-path is redundant.

This kind of redundancy can be illustrated by representing a set of test cases as a tree. The initialstate that all test cases share is the root-node of this tree. A sub-path is redundant if it occurs in morethan one test case. In the tree representation, any node below the root node that has more than onechild node contains redundancy. If there are different initial states, then there is one tree for eachinitial state.

Definition 50 (Test Suite Execution Tree). Test cases ti = {s0, s1, ...sl} of a test suite TS canbe represented as a tree, where the root node equals the initial state common to all test cases:root(TS ) = s0. For each successive, distinct state s j a child node is added to the previous node si:

s j : (si, s j) ∈ ti → s j ∈ children(si)

The depth of the tree equals the length of the longest test case in TS . The set of child nodes of nodex is denoted as children(x). Consider a test suite consisting of three test cases (letters representdistinct states): "A-B-C", "A-C-B", "A-C-D-E". The execution tree representation of these testcases can be seen in Figure 9.1(a). The rightmost C-state has two children, therefore the sub-pathA-C is contained in two test cases; it is redundant. The execution tree can be used to measureredundancy:

A

B C

C B D

E

(a) 17% redundancy

A

B C

C

B

D

E

(b) No redundancy

Figure 9.1: Simple test suite with redundancy represented as execution tree.


Definition 51 (Test Suite Redundancy). The redundancy R of a test suite TS is defined with thehelp of the execution tree:

R(TS ) =1

n − 1·

∑x∈children(root(TS ))

R(x) (9.1)

The redundancy of the tree is the ratio of the sum of the redundancy values R for the children ofthe root-node and the number of arcs in the tree (n − 1, with n nodes). The redundancy value R isdefined recursively as follows:

R(x) =

{(|children(x) − 1|) +

∑c∈children(x) R(c) i f children(x) , {}

0 i f children(x) = {}(9.2)

The example test suite depicted as tree in Figure 9.1(a) has a total of 7 nodes, where one nodebesides the root node has more than one child. Therefore, the redundancy of this tree equals R =

17−1 ·

∑x∈children(root(TS )) R(x) = 1

6 · (0 + (1 + 0)) = 16 = 17%.

A test suite contains no redundancy if for each initial state there are no test cases with commonprefixes, e.g., if there is only one test case per initial-state.

9.3.2 Removing Redundancy

Having identified redundancy, the question now is how to reduce it. This section introduces anapproach to solve this problem. It has already been stated that the removal of test cases from a testsuite has a negative impact on the fault detection ability, therefore this is not an option. Instead, theproposed solution is to transform the test cases such that the redundant parts can be omitted.

For each test case ti of test suite TS a common prefix among the test cases is determined. If such aprefix is found, then the test case is redundant for the length of the prefix and only interesting afterthe prefix. If there is another test case t j that ends with the same state as the prefix does, then theremainder of the test case ti can be appended to t j, and ti can safely be discarded. This algorithmis shown in Figure 9.2. It is of interest to find the longest possible prefixes, therefore the searchfor prefixes starts with the length of the test case under examination and then iteratively reduces thelength down to 1. This also guarantees that duplicate or subsumed test cases are eliminated.

The function find_test searches for a test case that ends with the same state as the currently consid-ered prefix, its worst time complexity therefore is O(|TS |). The complexity of has_prefix is O(n)as it depends on the prefix length. Appending and deleting test cases take constant time. Theseoperations are nested in a loop over |TS |, which in turn is called for all possible prefix lengths.Finally, this is done for each test case in TS . Therefore, the worst-case complexity of this algorithmis O(|TS |2 ·n · (|TS |+n)); with realistic test suite sizes it is still applicable. The algorithm terminatesfor every finite test suite. In the listing, t[n] denotes the nth state of test case t, and t[−1] the laststate of t.

The algorithm has to make non-deterministic choices when selecting a test case as a source for theprefix, when selecting a test case to look for the common prefix and when searching for a test caseto append to. These choices have an influence on how fast a test suite is processed. In addition, the

9.3. Test Suite Redundancy 141

for all t ∈ TS dofor n← length(t) downto 1 do

for all t2 ∈ TS doif has_prefix(t2, t, n) ∧ t2 , t then

t3← f ind_test(TS , t[n])if t3 , None then

append_postfix(t3, t, n)delete(TS , t)break

end ifend if

end forend for

end for

Figure 9.2: Test suite transformation.

number of test cases remaining in the final reduced test suite also depends on these choices. Thesuccess of the reduction depends on whether there are suitable test cases where parts of other testcases can be appended. A test case that is necessary for removal of a long common prefix might beused to append another test case with a shorter common prefix earlier. In that case, the long prefixcould not be removed unless there was another suitable test case. Determination of the optimalorder would have to take all permutations of the test suite order into consideration and is thereforenot feasible. In practice, the algorithm is implemented such that test cases are selected sequentiallyin the order in which they are stored in the test suite.

Figure 9.1(b) illustrates the result of this optimization applied to the Figure 9.1(a). The test caseA-C-B has the common prefix A-C, and there is a test case ending in C, therefore the postfix B ofA-C-B is appended to A-B-C, resulting in A-B-C-B.

This algorithm optimizes the total costs of a test suite with respect to two factors: It reduces the totalnumber of test cases (test suite size), and it reduces the overall number of states contained in thetest suite (test suite length). In the resulting test suite individual test cases can be longer than in theoriginal test suite. We assume that the costs of executing two test cases of length n are higher thanthat of executing one test case of length 2 · n because of setup and pull-down overhead. Therefore,it is preferable to have fewer but longer test cases instead of many small ones. This assumption isfor example also made in (Hamon et al., 2004), where the test case generation aims to create fewerbut longer test cases.

While the computational complexity of the algorithm is high, the success depends on the actual testsuite. A test suite might contain significant redundancy but have few test cases that are suitable forappending, in which case not much optimization can be achieved. In addition, the order in whichtest cases are selected has an influence on the results.

As we assumed a model checker based test case generation approach, we can make use of the modelchecker for optimization purposes. If appending is not possible, then the model checker can be usedto create a ‘glue’-sequence to append the postfix to an arbitrary test case. Of course the modelchecker is not strictly necessary to perform this part; there are other possibilities to find a path in


the model. However, the model checker is a convenient tool for this task, especially if it is alreadyused for test case generation in the first place. Figure 9.3 lists the extended algorithm. The functionchoose_nondeterministic(TS) chooses one test case out of the test suite TS non-deterministically.This choice has an influence on the length of the resulting glue-sequence. An optimal algorithmwould have to consider the lengths of all such possible glue-sequences, and therefore calculate allof them. A distance heuristic is conceivable, which estimates the distance between the final state ofa test case and the state the glue sequence should lead to. For reasons of simplicity, the prototypeimplementation used for experiments in this chapter makes a random choice.

for all t ∈ TS dofor n← length(t) downto 1 do

for all t2 ∈ TS doif has_prefix(t2, t, n) ∧ t2 , t then

t3← find_test(TS , t[n])if t3 , None then

append_postfix(t3, t, p)delete(TS , t)break

end ifelse

t3← choose_nondeterministic(TS )if t3 , t then

s← create_sequence(t3[−1], t[n])append(t3, s)append_postfix(t3, t, n)delete(TS , t)break

end ifend if

end forend for

end for

Figure 9.3: Test suite transformation with glue sequences.

The function create_sequence calls the model checker in order to create a suitable glue sequence.A sequence from state a to state b can be created by verifying a property that claims that such apath does not exist. If such a sequence exists, the counterexample consists of a sequence from theinitial state to a, and then a path from a to b. For example, when using computation tree logic(CTL) (Clarke and Emerson, 1982), this query can be stated as: AG a -> !(EF b).

The presented algorithms reduce both the number of test cases and the total test suite length, whileprevious methods selected subsets of the test suite. Therefore, the effects on the quality of theresulting test suite are different.

Each step of a test case adhering to Definition 9 fully describes the system state. A model checkertrace consists of the values of all input and output variables as well as internal variables. A fault is

9.4. Empirical Evaluation 143

detected if the actual outputs of the implementation differ from those of the test case. Therefore,any fault that occurs deterministically at a certain state can be detected with a step of a test case, nomatter when this step is executed. As the test suite reduction guarantees that only redundant steps asparts of prefixes are removed, any fault that can be detected by a test suite TS , can also be detectedby the test suite resulting from reduction of TS . It is conceivable that there are faults that do notdeterministically occur at certain system states. For example, a fault might only occur after a certainsequence has been executed, or if a state is executed a certain number of times. However, we havenot found such a fault in our experiments. Furthermore, it is equally possible that the transformationleads to such test cases that can detect previously missed non-deterministic faults.

Definition 20 allows arbitrary properties for measuring test coverage. Whether the test suite reduc-tion has an impact on the test coverage depends on the actual properties. If the coverage dependson the order of not directly adjacent steps in the test case, then splitting a prefix from a test case andappending the remainder to another test case can reduce the coverage. For example, transition paircoverage (Offutt et al., 1999) requires all pairs of transitions to be covered. A transition pair can besplit during the transformation. However, the appending can also lead to transition pairs previouslyuncovered. In practice, many coverage properties do not consider the execution order, e.g., transi-tion or full-predicate coverage (Offutt et al., 1999), or coverage criteria based on the model checkersource file (Rayadurgam and Heimdahl, 2001b).

9.4 Empirical Evaluation

This section presents the results of an empirical evaluation of the concepts described in the previoussections. The evaluation aims to determine how much reduction can be achieved with the presentedalgorithms, and how they perform in comparison to other approaches. Furthermore, the effects oncoverage and mutant score are analyzed.

9.4.1 Experiment Setup

The experiment uses three examples, each consisting of a model and specification written in thelanguage of the model checker NuSMV (Cimatti et al., 1999). For each model, 23 different testsuites are created with different methods as described in Chapters 3 and 6 (various coverage criteriafor coverage based methods, different mutation operators for mutation based approaches, propertybased methods). In addition, a set of mutant models is created for each model. The use of a modelchecker allows the detection of equivalent mutants, therefore only non-equivalent mutants are usedfor the evaluation of a mutant score. Car Control (CA) is a simplified model of a car control. TheSafety Injection System (SIS) example was introduced in (Bharadwaj and Heitmeyer, 1999) andhas since been used frequently for studying automated test case generation. Cruise Control (CC) isbased on (Kirby, 1987). A set of faulty implementations for this example was written by Jeff Offutt.The presented algorithms are implemented with Python, and the symbolic model checker NuSMVis used.


9.4.2 Lossy Minimization with Model Checkers

For comparison purposes, a traditional minimization approach is applied to the model checker sce-nario, similarly to Heimdahl and Devaraj (2004). Model-based coverage criteria can be expressedas trap properties (Gargantini and Heitmeyer, 1999). The test cases are converted to models andthen the model checker is challenged with the resulting models and the trap properties. For eachtrap property that results in a counterexample it is known that the test case covers the accordingitem.

A minimized subset of the test suite achieving a criterion can be determined by calculating thecovered properties for each test case, and then iteratively selecting the test case that covers themost yet uncovered properties. We choose transition coverage as first example coverage criterion.(Black, 2000) proposed a test case generation approach based on mutation of the reflected transitionrelation. The mutated, reflected properties can be used similarly to trap properties for test casegeneration, to determine a kind of mutant score and also for minimization. In order to distinguishthis from the mutant score determined by execution of the test case against mutant models we dubthe former reflection coverage. Chapter 3 contains all details about coverage and mutation basedtest case generation and analysis.

9.4.3 Results

Tables 9.1, 9.2 and 9.3 list the average values of the minimization of the 23 test suites for thethree example models. "Redundancy" denotes the algorithm in Figure 9.2, and "Redundancy+"the extended version of Figure 9.3. In all cases the coverage-based reduction techniques result insmaller test suites than the direct redundancy based approach. The extended redundancy basedapproach comes close to the coverage based approaches with respect to test suite size. The testsuite length is reduced proportionally to the test suite size for coverage based techniques, while asexpected the redundancy based length savings are not as significant. Again, the extended algorithmachieves better results, showing that the potential saving in redundancy is bigger than what is addedby the glue sequences. In general, even though the reduction in the total length is smaller with theredundancy approaches than with the coverage approaches, it is still significant and shows that theapproach is feasible.

The test coverage of coverage minimized test suites is not changed for the criterion that is used forminimization, while a degradation with the other criterion is observable. In contrast, the redun-dancy based approach has no impact on the coverage of either criterion. The extended redundancyapproach even leads to a minor increase of the coverage, due to the glue sequences. As for the mu-tant score, the coverage based approaches lead to a degradation of up to 16%, while the redundancyapproach has no impact on the mutant score, and the extended redundancy approach again results ina slight increase. Figure 9.4 sums up the results of the experiments for all models and test suites. Asthese experiments use only models and mutants of the models, this raises the question whether theresults are different with regard to actual implementations. Therefore, the Cruise Control test suitesare run against the set of faulty implementations. Table 9.4 lists the results. They are in accordancewith those achieved with model mutants, which indicates the validity also for implementations.

Figure 9.5 illustrates the effects of the order in which test cases are selected at several points inthe algorithm as box-plots. The box-plots illustrate minimum, maximum, median and standard


Table 9.1: Results in average for Cruise Control example.Method Size Length Redundancy Transition

CoverageMutant Score(Reflection)

MutantScore

Original 36,55 213,77 44,55% 89,16% 95,93% 87,31%Transition 6,23 35,6 32,00% 89,16% 94,46% 79,70%Reflection 6,59 37,36 30,30% 78,67% 95,93% 73,42%Redundancy 27,91 186,09 36,99% 89,16% 95,93% 87,31%Redundancy+ 8,95 152,73 4,82% 89,86% 96,44% 89,13%

Table 9.2: Results in average for SIS example.Method Size Length Redundancy Transition


MutantScore


deviation for the achieved reduction with the 23 test suites per example, each randomly sorted 5times. Figure 9.5(a) shows the effects on the test suite sizes. As the use of glue sequences makesit possible to append to any test case, the order has no effect on the resulting test suite size in ourexperiments, therefore there is no deviation. There is insignificant variation when not using gluesequences, and also only minor variation in the test suite length (Figure 9.5(b)). In contrast, thechoice of a test case to append to using a glue sequence has a visible influence on the resulting testsuite length. This suggests the use of a distance heuristic instead of the random choice.

Both presented algorithms have high worst-case complexity. However, many factors contribute tothe performance: the test suite size, the lengths of the test cases, the contained redundancy, thesuitability of test cases for the transformation, the order in which test cases are selected, the effortof calculating glue sequences, etc. Figure 9.6 depicts the performance of the minimization for thethree example models for different test suite sizes executed on a PC with Intel Core Duo T2400

Table 9.3: Results in average for Car Control example.Method Size Length Redundancy Transition


MutantScore



100%

80%

60%

40%

20%

Mutant ScoreReflection CoverageTransition CoverageRedundancyLengthSize

OriginalTransitionReflectio

RedundancyRedundancy+

Figure 9.4: Comparison of reduction methods, average percentage over all three example modelsand 23 test suites each.

Table 9.4: Mutant scores for cruise-control implementation mutants.Original Transition Reflection Redundancy Redundancy+

75,8% 39,1% 37,2% 75,8% 76,5%

100%

80%

60%

40%

CA+CASIS+SISCC+CC

(a) Test Suite Size Reduction

100%

80%

60%

40%

CA+CASIS+SISCC+CC

(b) Test Suite Length Reduction

Figure 9.5: Effects of the test case order, as percentage value of original sizes and lengths. Mini-mization using glue sequences is denoted by a ’+’ after the example name.

processor and 1GB RAM. Notably, the computation time for the car controller example increasesmore than for the other examples. This example has a bigger state space, therefore appending isnot easily possible. Figure 9.6(b) shows that there is less difference in the increase in computationtime when using glue sequences. The additional computational effort introduced by the generation

9.5. Summary 147

of the glue sequences is very small, compared to its effect. However, performance measurement isdifficult, as the redundancy is not constant along the test suites used for measurement. In order toexamine the scalability of the approach, minimization was also tested on a complex example witha significantly bigger test suite. The example is a windscreen wiper controller provided by MagnaSteyr. For a set of 8000 test cases, basic minimization takes 35m22s. This example also showsthe effects of the model complexity, as the calculation of glue sequences is costly for this model:Minimization with glue sequences takes 2h1m56s. Obviously, the performance is specific to eachapplication and test suite, but it seems to be acceptable in general.

0

10

20

30

40

0 100 200 300 400 500

Tim

e[s

]

Test-Cases

CCSISCA

(a) Direct approach

0

10

20

30

40

0 100 200 300 400 500

Tim

e[s

]

Test-Cases

CCSISCA

(b) Using glue sequences

Figure 9.6: Minimization time vs. test suite size.

9.5 Summary

In this chapter we have introduced an approach to minimize the size of a test suite with regard to thenumber of test cases and the total length of all test cases. The approach detects redundancy withinthe test suite and transforms test cases in order to avoid the redundancy. In contrast to previousapproaches the quality of the resulting test suites does not suffer with regard to test coverage orfault detection ability from this minimization under certain conditions. In fact, experiments showedthat the resulting test suites can even be slightly improved. The experiments also showed thatthe reduction is significant, although not as large as with approaches that heuristically discard testcases.

One drawback of this approach is the run-time complexity of the algorithm. However, even with-out further optimizations the approach is applicable to realistic test suites without problems. Thetransformation relies on information that might not be available in all test suites. Complete stateinformation is necessary, as is provided by model checker based test case generation approaches.There are several possibilities to continue work on this approach:

• It would be desirable to optimize the basic algorithm with regard to its worst case executiontime.

• The non-deterministic choice might not always lead to the best results. Heuristics for choos-ing test cases could lead to better reduction.


• The algorithms presented in this chapter sequentially analyze the test cases in a test suite.Therefore, a single run might not immediately eliminate all the redundancy. It is conceivableto iteratively call the algorithm until the redundancy is removed completely. This is likely tolead to test suites of very small size, where each test case is very long.

• In this chapter, a scenario of model checker based testing was assumed. It would be interestingto evaluate the applicability to other settings.

• The presented definition of redundancy only considers common prefixes. However, commonpath segments might also exist within test cases. Consideration of this kind of redundancymight lead to further optimizations.

Chapter 10Test Case Prioritization

The contents of this chapter have been published in (Fraser and Wotawa, 2007a).

10.1 Introduction

It has been shown (Rothermel et al., 1999) that the order in which the test cases of a test suite areexecuted has an influence on the rate at which faults can be detected. For example, test cases canbe sorted such that a given coverage criterion like statement coverage is reached as fast as possible.While this prioritization of test cases can increase the rate at which test cases are detected in the firstrun, the idea can be extended to include information about cost factors, e.g., the costs of test caseexecution or the costs of certain faults that can be detected with certain test cases (Elbaum et al.,2001). Furthermore, when a test suite is reused many times for regression testing, information aboutthe version changes can be incorporated (Elbaum et al., 2000), and histories of detected faults (Kimand Porter, 2002) can be included.

In this chaper we demonstrate how test case prioritization can be performed with the use of modelcheckers. As common prioritization techniques are based on program source-code, these techniqueshave to be to adapted to the model-based setting. In addition, new property based prioritizationtechniques made possible by the use of model checkers are introduced.

Obviously, a model checker based method to test case prioritization is a useful addition to modelchecker based test case approaches. We therefore show how prioritization can be done at test casegeneration time when using model checkers to create test cases. That way, no post-processing ofthe test suites is necessary while still achieving an improved fault detection ratio of the resulting testsuite. The ideas described in this chapter are illustrated using several example applications.

10.2 Test Case Prioritization

In this section the ideas of test case prioritization are recalled, and techniques for prioritization arepresented.

149

150 Chapter 10. Test Case Prioritization

10.2.1 Preliminaries

Test case prioritization is the task of finding an ordering of the test cases of a given test suite suchthat a given goal is reached faster. The test case prioritization problem is defined by Rothermel et al.(1999) as follows:

Given: T , a test suite; PT , the set of permutations of T ; f a function from PT to the real numbers.

Problem: Find T ′ ∈ PT such that (∀T ′′)(T ′′ ∈ PT )(T ′′ , T ′)[ f (T ′) ≥ f (T ′′)].

PT is the set of all possible orderings of T , and f is a function that yields an award value for anygiven ordering it is applied to. f represents the goal of the prioritization. For example, the goalmight be to reach a certain coverage criterion as fast as possible, or to improve the rate at whichfaults are detected. There are different test case prioritization techniques that can be used to achievesuch goals.

10.2.2 Test Case Prioritization Techniques

Several different prioritization methods have been discussed in previous works (Rothermel et al.,1999; Elbaum et al., 2000). These methods are generally based on the source code of a program,e.g., the coverage of statements or functions. In contrast, when using a model checker to determineprioritization we base the techniques on a functional model of the program to test. This section doesnot provide a complete overview of all available prioritization techniques but selects a representativesubset that can be used to illustrate the usefulness of model checkers in the prioritization process.In addition, the use of a model checker allows new kinds of prioritization techniques which areintroduced in this section.

Total Coverage Prioritization: There are several code-based prioritization methods that sorttest cases by the number of statements or functions they cover. Model checker based testing allowsthe formulation of coverage criteria as properties, as described in the next section. We thereforegeneralize from different code based methods to a coverage based method which is applicable toany coverage criterion expressible as a set of properties.

For example, the model-based coverage criterion Transition Coverage requires that each transitionin an automaton model is executed at least once. Test case prioritization according to transitioncoverage sorts test cases by the number of different transitions executed.

Additional Coverage Prioritization: Total Coverage Prioritization achieves that those testcases with the biggest coverage are executed first. This does not necessarily guarantee that thecoverage criterion is achieved as fast as possible. Additional coverage prioritization first picks thetest case with the greatest coverage, and then successively adds those test cases that cover the mostyet uncovered parts.

10.2. Test Case Prioritization 151

Total FEP Prioritization: This technique orders test cases by the ability to expose faults (fault-exposing-potential). Mutation analysis (DeMillo et al., 1978) is used to determine these values. Fora given program a set of mutants is created by the application of a set of mutation operators. Eachapplication of a mutation operator creates a mutant of the source code that differs from the originalby a single valid syntactic change. The mutation score represents the ratio of mutants that a testsuite can distinguish from the original program. This mutation score can be calculated for each testcase separately, and then used as an award value for test case prioritization. Total FEP prioritizationuses the mutation score for a total sorting.

Additional FEP Prioritization: Similarly to additional coverage based prioritization test casescan be sorted by the number of additional, yet undetected mutants. First the test case with thehighest mutation score is chosen, and then successively those test cases are added that maximizethe total number of detected mutants. Traditionally, this FEP based prioritization is computationallymore complex than coverage based methods.

Total Property Prioritization: This is a new technique made possible by the use of modelcheckers. It is based on the idea of property relevance (Fraser and Wotawa, 2006a). A test caseconsists of values that are used as input data for the system under test, i.e., they represent theinputs the system receives from its environment. A test case is said to be relevant to a requirementproperty if a property violation is possible when the input values are provided to an erroneousimplementation. In practice, this can be determined by checking whether there is a mutant that canviolate the property. A test case can of course be relevant to more than one property. Total propertyprioritization sorts test cases by the number of properties they are relevant to.

Additional Property Prioritization: Similarly to the previous techniques, this method beginswith the test case that is relevant to the most properties and then successively adds test cases thatare relevant to yet uncovered properties.

Hybrid Property and Coverage Prioritization: If the number of properties is significantlysmaller than the number of test cases, then a property based prioritization can quickly achieveproperty coverage. In general, the prioritization of the remaining test cases starts again with thetest case with the highest award value. However, it is also conceivable to combine two differentaward functions. For example, it can be useful to sort test cases totally based on the number ofrelevant properties, and then use a coverage prioritization as a secondary sorting method withintest cases of equal property relevance. We use transition coverage as secondary award value in ourexperiments.

Random Prioritization: Random prioritization is interesting for evaluation of the different tech-niques. In average, any sorting method should achieve better results than random prioritization inorder to be useful. We therefore use random prioritization as a lower bound for our analysis.


Optimal Prioritization: The optimal prioritization sorts test cases such that a given set of faultsis detected with the minimum number of test cases. This technique is not applicable in practiceas it requires a-priori knowledge about the faults that are to be exposed. However, in experimentswith known mutants it serves as upper bound for improvements that can be achieved with prioriti-zation.

10.3 Determining Prioritization with Model Checkers

In this section we show how the prioritization methods presented in the previous section can beperformed in practice. As mentioned, we use model checkers for prioritization. In order to do soit is necessary to re-formulate test cases as models, which allows analysis with regard to certainproperties. Section 3.8.1 illustrated how this is performed.

10.3.1 Coverage Prioritization

Model-based coverage criteria can be expressed as trap properties (Gargantini and Heitmeyer, 1999)(see Chapter 3). For each coverable item one such property is formulated, expressing that the itemcannot be reached. For example, a trap property might claim that a certain state is never reachedor that a certain transition is never taken. Challenging a model checker with a model and a trapproperty results in a counterexample, which is a trace illustrating how the item described by the trapproperty is reached. This principle is used for test case creation, where it automatically results intest suites that achieve a given coverage criterion. It is also used to measure the coverage of testsuites. The test cases are converted to models as described above, and then the model checker ischallenged with the resulting models and the trap properties. For each trap property that results in acounterexample it is known that the test case covers the according item.

While for overall coverage measurement it is sufficient to check how many trap properties are vio-lated, this can easily be extended such that each test case is checked against all trap properties. Thatway the overall coverage of each test case can be determined. This information can be used in orderto sort test cases according to their coverage, either totally or additionally. The prioritization worksas follows:

Create models from test casesCreate trap properties T P from coverage criterionfor each test case model t do

Model-check t against T PEach trap resulting in a counterexample is covered

end forSort test cases by number of covered traps

10.3.2 FEP Prioritization

Fault exposing prioritization is based on mutation analysis. Model and specification mutation wasintroduced by Ammann and Black (1999b). The ability to expose faults can be measured as the

10.3. Determining Prioritization with Model Checkers 153

mutation score of a test case. Chapter 3 contains all details about mutation based test case generationand analysis.

With model checkers, the FEP can be determined in two ways. One option is to create mutants ofa given model, and then symbolically execute the test cases against these models by combining themutant model and the test case model, using the test case values as input-values for the mutant. Amutant is detected if the model checker returns a counterexample when queried whether the outputvalues of mutant and test case are equal along the test case. Unlike coverage based methods, thisrequires the model checker to use the actual model in addition to the test case model. If the modelis complex, then this process is less efficient than the coverage based method.

The alternative is to reflect the transition relation of the model as special properties (Black, 2000).Each reflected property refers to one variable. For each possible transition a variable can take,there is one such property. It consists of the transition condition and makes an assertion about thevalue of the variable in the next state. These reflected properties can then be mutated instead of theoriginal model. When checked against the original model the mutated properties result in efficienttest suites (Ammann et al., 2002). A mutation score can be efficiently calculated by checking theseproperties against the test case models. This prioritization is therefore identical to coverage basedprioritization apart from the use of mutated reflected properties instead of trap properties.

10.3.3 Property Prioritization

Property prioritization uses the concept of property relevance. A test case is relevant to a propertyif the execution of the test case can theoretically lead to a violation of the property. As presented inChapter 6, property relevance can be determined with the aid of a model checker by symbolicallyexecuting the test case against a modified model which is allowed to take one single erroneoustransition. The model checker then efficiently determines if a single erroneous transition is sufficientin order to reach a property violating state during the test case execution. This process has to berepeated for each test case.

Create modified model M′ from model MCreate models from test casesfor each test case model t do

Combine t and M′ such that M′ takes input values from t instead of the environmentModel-check M′ against all requirement propertiest is relevant to each property causing a counterexample

end forSort test cases by relevance

While the complexity of this evaluation process can be higher than for coverage or reflection basedmethods, it is only necessary to challenge the model checker once with each test case, so this is stillsignificantly more efficient than the determination of the mutation score using symbolic executionwould be. Once the property relevance of each test case has been determined, this information canbe used in order to calculate a total or adding prioritization for the test cases.


10.3.4 Optimal Prioritization

The optimal execution order of a test suite with regard to a set of mutants is calculated with a greedyalgorithm that successively adds the test case next that detects the most yet undetected mutants.

10.4 Prioritizing Test Cases at Creation Time

When creating test cases automatically it is often the case that redundant test cases are created. If anew test case is a prefix of another test case it is sufficient to retain the subsuming, longer test case.If a new test case subsumes other test cases it is sufficient to retain the new test case. Redundant testcases are usually discarded. However, this redundancy information can also be used to prioritizetest cases. If a test case or part of it is created more than once, this can be seen as an indicationthat this test case is more important than other test cases. With this information prioritization canbe performed without post-processing of the test suite.

Each test case is assigned an importance value, initially 1. If a test case is a prefix of another testcase or equal to it, the importance of this other test case is increased. If a test case subsumes othertest cases, then its importance is the sum of the subsumed test cases plus 1:

while t = create next test case doimportance of t = 1if ∃t′ ∈ T : t = t′ then

increase importance of t′ by 1else if ∃t′ ∈ T : t ⊂ t′ then

increase importance of t′ by 1else if ∃t′ ∈ T : t ⊃ t′ then

for all t′ ∈ T : t ⊃ t′ doreplace t′ with t in Tincrease importance of t with importance of t′

end forelse

insert t in Tend if

end whilesort test cases by importance

Applied to coverage-based test case generation using a model checker, the resulting sorting rep-resents a total sorting by the number of covered traps. Similarly, when using a reflection-basedapproach the sorting is based on the number of mutants. However, the sorting is not necessar-ily identical to that resulting from a dedicated analysis, as test cases can still cover more traps ormutants.


This section presents the results of an empirical evaluation aiming to show that model-based testcase prioritization results in a noticeable performance improvement. We also want to analyze the


newly defined property coverage techniques in comparison to well-known techniques. Finally wewant to determine whether prioritization at test case generation time results in a measurable im-provement.

10.5.1 Average Percentage of Faults Detected

In order to quantify the efficiency gains achieved with a certain test case prioritization, the metricAPFD (Average Percentage of Faults Detected) was introduced by Rothermel et al. (1999). Thismetric is the weighted average percentage of faults detected over the life of a test suite. The APFDof a test suite T consisting n test cases and m mutants is defined as:

APFD = 1 −T F1 + T F2 + ... + T Fm

n · m+

1n · m

Here, T Fi is the first test case in ordering T ′ of T which reveals fault i. We use this metric in orderto compare the different prioritization techniques.


The evaluation is based on a set of three examples. Each example consists of an SMV-model andspecification. Different model checker based methods (various coverage criteria, different mutationoperators, property based methods) are used in order to create 23 different test suites for each model.For each model a set of mutants is created. Unlike for program mutation, a model checker canefficiently determine whether a model mutant is equivalent to the original or not. The APFD valuesfor each of the test suites is calculated using the subset of the inequivalent model mutants that can bedetected by the test suite. Table 10.1 sums up the results of the test case generation and the numbersof detected mutants. Only detectable mutants are relevant for the determination of the APFD value,as the test case execution order has no influence on undetectable mutants. Car Control (CA) is asimplified model of a car control. The Safety Injection System (SIS) example was introduced byBharadwaj and Heitmeyer (1999) and has since been used frequently for studying automated testcase generation. Cruise Control (CC) is based on (Kirby, 1987). In order to validate the method wealso use a set of 25 erroneous mutant implementations for the Cruise Control example applicationswritten by Jeff Offutt.

Table 10.1: Test Suite statistics.Example CA SIS CC

Avg Max Avg Max Avg MaxTest Cases 51 243 22 85 35 246Mutants 264 311 265 339 535 732

10.5.3 Results

Following the tradition of previous research about test case prioritization we use box-plots to illus-trate the results of the APFD analysis. The box-plots illustrate minimum, maximum, median and


standard deviation for each of the used prioritization methods. As can be seen in Figures 10.1, 10.3and 10.4, there is still a gap between all prioritization techniques and the optimal prioritization.However, there is an improvement clearly visible compared to the random sorting of the test casesand the original sorting, as provided by the test case generation algorithm. Figure 10.6 lists theaverage APFD values for all examples and methods in a concise manner. The improvement is notalways as significant as reported in previous works. This is probably because we used test suites ofdifferent sizes, and the improvement is not quite so obvious for large test suites. In general, a largeamount of the mutants is detected with the first couple of test cases (Figure 10.5), yet the remainingtest cases and mutants can distort the APFD value, if there are many test cases. Nevertheless, animprovement is visible. Figure 10.2 illustrates the APFD values for the same test suites (except theoptimal one) as in Figure 10.1, executed with the 25 erroneous implementations of Cruise Control.The values are comparable and we conclude that model-based prioritization is also valid for realimplementations.

100%

80%

60%

40%

20%

Pre

sort

ed

A.FE

M

T.FE

M

A.Tra

nsit

ion

T.Tra

nsit

ion

A.P

rope

rty

T.P

rope

rty

Hyb

rid

Opt

imal

Ori

gina

l

Ran

dom

Figure 10.1: APFD of Cruise Control model.

The prioritization performed at test case generation time (labeled presorted in the figures) is clearlybetter than random ordering, however there is still a gap between presorted test suites and post-processing prioritization. This gap is also visible in Figure 10.5. Interestingly, the presorted testsuites performed better than most other prioritization techniques during the evaluation on the cruisecontrol implementations. In general we can conclude that prioritization during test case generationis definitely useful, especially as the additional computational costs are negligible.

There are only minor differences between the various prioritization techniques. In general, thosetechniques that use adding sorting perform slightly better than those with total sorting. Propertyprioritization performs good (regard also Figure 10.5), in fact it sometimes outperforms coveragebased prioritization techniques. However, this case study does not reflect on the quality of thespecification. It is conceivable that a specification consisting of more and better properties willresult in better property based prioritization.

While model checkers in general are prone to performance problems this is not a problem for pri-oritization, as the state space of test case models is usually significantly smaller than that of related

10.6. Summary 157

100%

80%

60%

40%

20%

Pre

sort

ed

A.FE

M

T.FE

M

A.Tra

nsit

ion

T.Tra

nsit

ion

A.P

rope

rty

T.P

rope

rty

Hyb

rid

Opt

imal

Ori

gina

l

Ran

dom

Figure 10.2: APFD of Cruise Control implementation.

100%

80%

60%

40%

20%P

reso

rted

A.FE

M

T.FE

M

A.Tra

nsit

ion

T.Tra

nsit

ion

A.P

rope

rty

T.P

rope

rty

Hyb

rid

Opt

imal

Ori

gina

l

Ran

dom

Figure 10.3: APFD of Safety Injection System.

functional models.

10.6 Summary

In this chapter we have demonstrated how model checkers can be used for test case prioritiza-tion. This makes it possible to efficiently apply prioritization when creating test cases with modelcheckers. We adapted several well-known prioritization methods originally based on source-code tomodels. In addition we introduced new property-based prioritization methods. Finally, we showedthat test case prioritization can be performed automatically during test case generation, without


100%

80%

60%

40%

20%

Pre

sort

ed

A.FE

M

T.FE

M

A.Tra

nsit

ion

T.Tra

nsit

ion

A.P

rope

rty

T.P

rope

rty

Hyb

rid

Opt

imal

Ori

gina

l

Ran

dom

Figure 10.4: APFD of Car Model.

100%

80%

60%

40%

20%

0 5 10 15 20 25 30

%D

etec

ted

Faul

ts

Test-Cases

RandomOptimal

Total PropertyPresorted

Figure 10.5: Fault detection rates for Cruise Control example.

post-processing.

The experiments described in this chapter showed the applicability of test case prioritization tomodel checker based testing. However, in these experiments many factors were not yet included.For example, the actual test case execution costs, the costs of potential faults, the fault historiesduring regression testing or the importance or criticality of requirement properties could be used tooptimize test case prioritization. Weighting factors can easily be included in the prioritization pro-cess, as illustrated by Elbaum et al. (2000). The property based prioritization techniques introducedin this chapter also open up new possibilities for combination with approaches that assign different

10.6. Summary 159

100%

95%

90%

85%

80%

75%

70%

65%

Pre

sort

A. F

EM

T. F

EM

A. T

rans

ition

T. T

rans

ition

A. P

rope

rty

T. P

rope

rty

Hyb

rid

Opt

imal

Orig

inal

Ran

dom

CarSISCC

Figure 10.6: Comparison of average APFD values.

cost measures to properties (Srikanth and Williams, 2005).

Chapter 11Improving the Performance with LTLRewriting

The contents of this chapter have been published in (Fraser and Wotawa, 2007c).

11.1 Introduction

Recently, the use of model checkers for the automated generation of test cases has seen increasedattention. Counterexamples produced by model checkers are interpreted as test cases. The approachis straight forward, fully automated and achieves good results.

The main drawback lies within the performance limitations of model checkers. The state spaceexplosion problem severely limits the applicability of model checker based testing approaches. It istherefore necessary to find ways to improve the performance and applicability. Another disadvan-tage of model checker based approaches is the structure of resulting test suites. Often, test suitesconsisting of large numbers of very short test cases are created. In contrast, if longer test cases arecreated, they often share long identical prefixes. This adversely affects the time and costs of theexecution of a resulting test suite.

While ultimately the size of the model used for test case generation determines the applicability ofany model checker based technique, there are other factors that contribute to possible bad perfor-mance. For example, sometimes large numbers of duplicate test cases are created. In this chapterwe identify sources of redundancy that contribute to possible bad performance during test case cre-ation and execution, and describe an approach based on temporal logic formula rewriting that canbe used to reduce the number of model checker queries significantly. Consequently, the overall timeit takes to create a complete test suite is reduced. In detail, the contributions of this chapter are asfollows:

• We show how temporal logic formula rewriting can be used to efficiently avoid the creationof redundant test cases.

161

162 Chapter 11. Improving the Performance with LTL Rewriting

• In addition to improving the performance of test case generation by avoiding unnecessarycalls to the model checker, we show that the rewriting can also be used to extend test casessuch that the resulting number of test cases and their overall length is reduced. This can beseen as an improvement of the performance of the test case execution.

• We suggest different algorithms to efficiently generate test suites using model checkers andrewriting.

• Finally, we use an example model for a detailed evaluation of the presented ideas in order toshow their feasibility.

11.2 Advanced Test Case Generation with LTL Rewriting

The straight forward approach to generating test cases with a model checker and trap properties isto simply model-check all trap properties sequentially. This possibly results in duplicate test cases,or test cases that are subsumed by other test cases. We refer to such test cases as redundant. Thecreation of redundant test cases unnecessarily consumes time. If model-checking a single propertyis costly due to the complexity of the model or the property, then the time wasted can be significant.This can be avoided by determining whether a property is already covered by a previous test case,and only if not so calling the model checker.

Theoretically, a model checker could be used to determine whether a test case is already covered.Ammann and Black (1999b) present a straight forward approach to represent test cases as SMVmodels and then simply model-check the test case model against a trap property in order to deter-mine whether it is covered; this approach is described in Section 3.8.1. Intuitively, the state spaceof a test case model is significantly smaller than that of the full model. The use of regular modelchecker techniques and tools, however, is not the optimal solution with regard to the performance.For example, consider a complex model with a large number of trap properties and also a large num-ber of test cases created up to a certain point in the test case generation process. During the test casegeneration, each test case would have to be converted to a model. Then, the model checker wouldhave to be called for this model, in order to check all remaining trap properties. Repeating this eachtime a test case is created would result in a large number of model checker calls, which would beinefficient. Clearly, a more efficient solution is necessary under such circumstances. Markey andSchnoebelen (2003) analyze the problem of model-checking paths and show that there are moreefficient solutions than checking Kripke structures.

Runtime verification is commonly based upon determination of whether a finite path satisfies atemporal logic property. In contrast to model-checking it does not use an explicit model, butonly execution traces. For example, the NASA runtime verification system Java PathExplorer(JPaX) (Havelund and Rosu, 2001b) uses monitoring algorithms for LTL. Properties are rewrit-ten using the states of a trace. That way, violation of a property can be efficiently detected duringruntime or during analysis after the execution. This idea is also useful for test case generation. Ifthe rewriting is applied to the trap properties after creating a test case, then all trap properties thatare already covered can be efficiently detected before calling the model checker on them.

This section therefore presents an approach that uses LTL rewriting in order to detect already cov-ered trap properties efficiently, and thus increase the overall performance of the test case generationprocess.

11.2. Advanced Test Case Generation with LTL Rewriting 163

11.2.1 LTL Rewriting

An efficient method to monitor LTL formulae is to rewrite them using the states of an executiontrace. The rewriting approach we present here is based on work by Havelund and Rosu (2001a).Their implementation uses a rewriting engine that is capable of 3 million rewritings per second,which shows that rewriting is an efficient and fast approach. There are approaches that try to furtheroptimize this approach, e.g., (Barringer et al., 2004; Havelund and Rosu, 2004; Rosu and Havelund,2005).

In the domain of runtime verification, one important aspect is the optimization with regard to spacedemands. Long execution runs can create very long execution traces and lead to space problems.This problem does not exist in the domain of model checker based test case generation, as thecreation of the traces is the overall objective. In order for counterexamples to serve as usable testcases, their size always has to be within bounds. Therefore, space constraints do not have to beconsidered when choosing an algorithm for LTL monitoring.

In runtime verification, LTL rewriting is used to decide when a fault has occurred. In the contextof test case generation, the rewriting can be used to determine whether it is necessary to createa trace from a trap property before actually calling the model checker. If there exists a test casethat already covers the trap property, then there is no need to create another test case for this trap.This is achieved by evaluating a formula using the value assignments of a state, and by rewritingtemporal operators. If a trace violates a property, then at a violating state the formula is rewritten toa contradiction, i.e., it is false.

Monitoring LTL properties for runtime verification is generally based on finite trace semantics thatare different from the infinite trace semantics presented in Section 3.2. Finite trace semantics con-sider only finite traces, therefore special treatment of the last state of a finite trace is necessary. Forexample, one possibility is to assume that the last state is repeated after the end of the trace. Anotherpossibility is to define that no proposition holds after the last state. For example, this changes themeaning of the � operator. In the context of model checker based test case generation we do notneed to consider this. It is only of interest, whether a trap property is violated somewhere alonga test case. If it is not violated at the end of a finite trace, satisfaction is not of interest. The onlyconclusion that needs to be drawn is that it is not yet covered.

The rewriting of property φ with state s is recursively defined below, where a ∈ AP denotes anatomic proposition, φ denotes a temporal logic formula, and s ∈ S for Kripke structure K =

(S , s0,T, L). φ{s} denotes that state s is applied to the formula φ. Application of a state to a formuladetermines, whether the propositions valid in that state have an effect on the formula. The parts ofthe formula that refer to the present state are instantiated according to L(s), while affected temporaloperators are rewritten according to the rules. The rewriting in Definition 58 differs from that givenby Havelund and Rosu (2001a) in order to reflect the different semantics applied; the final state of atrace is not treated specially.


Definition 52 (State Rewriting).

(� φ){s} = φ{s} ∧ � φ (11.1)

(©φ){s} = φ (11.2)

(^ φ){s} = φ{s} ∨ ^ (φ) (11.3)

(φ1 U φ2){s} = φ2{s} ∨ (φ1{s} ∧ (φ1 U φ2)) (11.4)

(φ1 ∧ φ2){s} = φ1{s} ∧ φ2{s} (11.5)

(φ1 ∨ φ2){s} = φ1{s} ∨ φ2{s} (11.6)

(φ1 → φ2){s} = φ1{s} → φ2{s} (11.7)

(φ1 ≡ φ2){s} = φ1{s} ≡ φ2{s} (11.8)

(¬φ){s} = ¬(φ{s}) (11.9)

a{s} = a if a < L(s) else true (11.10)

As a simple example, consider a trap property that forces the creation of a test case which containsa transition from a state where x is true and y is false to any state where x is true. To achieve this,the property claims that such a transition does not exist:

φ := � ((x ∧ ¬y)→ ©¬x)

A previously checked trap property might have resulted in the following test case:

t := 〈(x, y), (x,¬y), (x,¬y), (¬x,¬y)〉

Obviously, φ is covered by this test case as the transition from the second to the third state is the onedescribed by φ. In order to detect this, φ is rewritten using the states of the test case sequentially.Application of the first state (x, y) of t is performed as follows:

φ{x, y} = � ((x ∧ ¬y)→ ©¬x){x, y}

= ((x ∧ ¬y)→ ©¬x){x, y} ∧ φ)

= ((x ∧ ¬y){x, y} → (©¬x){x, y}) ∧ φ

= ((x{x, y} ∧ (¬y){x, y})→ (¬x)) ∧ φ

= ((true ∧ f alse)→ (¬x)) ∧ φ

Which can be simplified to:

= (( f alse→ (¬x)) ∧ φ

= true ∧ φ

φ1 = φ


Rewriting with the first state does not change φ. The second state, however, does affect φ:

φ1{x,¬y} = � ((x ∧ ¬y)→ ©¬x){x,¬y}

= ((x ∧ ¬y)→ ©¬x){x,¬y} ∧ φ)

= ((x ∧ ¬y){x,¬y} → (©¬x){x,¬y}) ∧ φ

= ((x{x,¬y} ∧ (¬y){x,¬y})→ (¬x)) ∧ φ

= ((true ∧ true)→ (¬x)) ∧ φ

φ2 = ¬x ∧ φ

The third state (x,¬y) is now applied to φ2:

φ2{x,¬y} = (¬x ∧ φ){x,¬y}

= ((¬x){x,¬y} ∧ φ{x,¬y})

= ( f alse ∧ φ{x,¬y})

φ3 = f alse

After rewriting with the third state a contradiction results, therefore it can be concluded that φ iscovered by t. Hence, there is no need to call the model checker with the trap property φ.

11.2.2 Test Case Generation

The basic approach to automated test case generation with model checkers is to sequentially call themodel checker with one trap property after the other. Integrating the formula rewriting is thereforeeasy. Either all remaining properties are checked after creating a test case, or each property ischecked against all previous test cases before calling the model checker. The simple algorithm MONin Figure 11.1 shows the latter possibility. The worst case scenario is that of n trap properties, whereeach property results in a unique test case that only covers the property used for its creation. For anaverage test case length of l, this means that the rewriting procedure would be called

∑nk=1(k − 1) ∗ l

times. The maximum number of calls to the model checker is n, with and without the use ofrewriting. Obviously, in order to improve the overall performance, rewriting a property has to besignificantly faster than model-checking a property.

Due to the use of the rewriting method the order in which trap properties are selected has an influ-ence on the results with regard to both the performance and the test suite size. Consider two trapproperties φ1 and φ2, resulting in test cases t1 and t2. Now assume that t1 only covers φ1, while t2covers both φ1 and φ2. If φ1 is chosen first, then the model checker is called for both properties,resulting in t1 and t2. Here, t1 can even be a prefix of t2, in which case it would be completelyredundant. In contrast, if φ2 were chosen first, then the resulting t2 would cover φ2 and φ1, thusavoiding that the model checker is called with φ1 in the first place. In Figure 11.1, the choice of thenext trap property is non-deterministic.

Even if the formula rewriting does not show that a trap property is covered, the result of the rewritingcan be useful. If the transformation of a property with a test case results in a formula that is different


function (trap, traces)for all trace in traces do

trap′ ← trapfor all s in trace do

trap′ ← trap′ {s}if trap′ == False then

return Trueend if

end forend forreturn False

end functionfunction CTC_MON(Model M, Traps T)

traces← []for all trap ∈ T do

if ¬covered(trap, traces) thenappend(traces, createTrace(M, trap))

end ifend forreturn traces

end function

Figure 11.1: Algorithm MON: Test case generation with monitoring by rewriting.

from the original, this is an indication that the trace affects the property, although it does not yetcover it.

For example, assume a trap property that requires a test case such that there is a state where x istrue, and upon which a state where y is true follows. To achieve this, the trap property expressesthat whenever x is true, ¬y follows: φ := � x → ©¬y. Assume further a test case of the shapet := 〈(¬x,¬y), (¬x, y), (x,¬y)〉, i.e., the test case ends with a state where x is true. This test casecould simply be extended with one state where y is true in order to also cover φ. Even though tdoes not cover φ, the transformation changes the property to ¬y ∧ (� x → ©¬y). The fact that theproperty changed can be seen as an indication that the test case can be extended. In the example,only one additional transition is needed to cover φ, while a new test case to cover φ starting in theinitial state is likely to be longer. In general, the extension sequence of the existing test case islikely to be shorter than a distinct test case for the property, as there is no prefix necessary to reacha relevant state, and part of the temporal logic formula already is achieved.

In order to use rewritten properties as trap properties it is necessary to place them within a next-operator, such that the model checker creates at least one new transition:

φ′ = © (φ{s})

The final state of the trace that is extended serves as the initial state of the new model, therefore thenext operator is necessary in order to avoid duplicate evaluation of that state.


The algorithm EXT1 in Figure 11.2 shows how this can be incorporated into the test case generation.Again, a trap property is checked against the previous test cases using rewriting. If the trap propertyis not covered, then the results of the rewriting process are compared to the original trap property.Any rewritten trap property that differs from the original property suggests that the according testcase is related and can be extended. If there are several test cases suitable for extension, then one ofthe test cases has to be chosen. In Figure 11.2 this is the second non-deterministic choice besidesthe choice of the next trap property. The function extendTrace calls the model checker to create anew counterexample beginning with the final state of the trace that is to be extended. The actualimplementation of this function depends on the model checker that is used. If the model checkerdoes not support to explicitly set the initial state, a possible alternative is to rewrite the initial statein the model source file. After changing the initial state, the model checker is called with the trapproperty, resulting in a counterexample. This new counterexample is appended to the previoustrace.

function CTC_EXT1((Model M, Traps T))traces← []for all trap in T do

if ¬covered(trap, traces) thenif exists trace t : trap{t} , trap then

extendTrace(t, M, trap)else

append(traces, createTrace(M, t))end if

end ifend forreturn traces

end function

Figure 11.2: Algorithm EXT1: Extending test cases with affected trap properties.

Finally, the monitoring idea can also be integrated into a test case generation approach similar to theidea presented by Hamon et al. (2004). In the algorithm EXT2 shown in Figure 11.3 trap propertiesare used to extend a test case until it reaches a certain maximum length MAX. If MAX is reached,then a new test case is started in the initial state of the model.

As with the other algorithms, the choice of the next trap property has an influence on the results.It is possible to use the rewriting to guide this choice. In contrast to the previous two algorithms,EXT2 applies the transformation to all remaining trap properties after a test case extension. All trapproperties that are already covered are removed. If a trap property is changed by the transformation,the changed version is stored. It is only necessary to use the extension for the rewriting instead ofthe whole test case, after the test case is extended.

The advantage of this approach is that all trap properties affected by the current test case are identi-fied. By preferring changed trap properties over unchanged ones, the overall test suite length can bereduced. If no trap property is affected, Figure 11.3 prefers those traps properties that were affectedearlier during creation of the current test case, or else chooses one of the remaining trap properties.If a new test case is started, then the rewritten traps properties have to be reset to their original


versions.

function CTC_EXT2((Model M, Traps T))traces← []current_trace← []while ¬ empty(T ) do

trap← trap affected by previous rewriting, or random trapif length(current_trace) < MAX then

extendTrace(current_trace, M, trap)else

append(traces, current_trace)reset rewritten trapscurrent_trace← createTrace(M, trap)

end iffor all trap ∈ T do

if covered(trap, current_trace) thenremove trap

else if trap changed by rewriting thensave rewritten trap

end ifend for

end whilereturn traces

end function

Figure 11.3: Algorithm EXT2: Extending up to maximal depth.

11.3 Empirical Evaluation

This section describes our prototype implementation of the presented techniques as well as thesetup, environment and results of a set of experiments conducted with the prototype.


Our prototype implementation was written with the programming language Python∗. The LTLrewriting was implemented on top of abstract syntax trees generated by the parser generator Antlr†.Clearly, this is not a high performance solution, and the achieved results should therefore be im-provable by using more efficient tools and methods. All non-deterministic choices are implementedsuch that trap properties are chosen sequentially in the order they are created by or provided to theprototype. Version 2.4.1 of the open source model checker NuSMV (Cimatti et al., 1999) is used inour experiments. NuSMV provides symbolic BDD-based model-checking and SAT-based bounded

∗http://www.python.org†http://www.antlr.org


model-checking. In our experiments, the symbolic model checker was used. All experiments wereconducted on a PC with Intel Core Duo T2400 processor and 1GB RAM. For the experiments, thetwo different maximum depth values 20 and 50 were chosen for EXT2. This is supposed to illustratethe effects the choice of the maximum depth has on the performance and quality of the results.

As an example model, a windscreen wiper controller provided by Magna Steyr is used again. Themodel was created manually from a Matlab Stateflow model. The system has four Boolean and one16 bit integer input variables, three Boolean and one 8 bit integer output variables, and one Boolean,two enumerated and one 8 bit integer internal variables. The system controls the windscreen heating,speed of the windscreen wiper and provides water for cleaning upon user request. NuSMV reportsa total of 244.8727 states, 93 BDD variables and 174762 BDD nodes after model encoding. The timeit takes to check a single property not only depends on the model, but also on the property itself.For the example model and trap properties, checking one property takes between 2 and 3 secondsin average. The size of the model is not yet problematic for a model checker based approach, but itis big enough to make performance changes visible while conveniently allowing an extensive set ofexperiments to be conducted within realistic time.

Table 11.1: Coverage criteria and resulting trap properties.Coverage Criterion Shorthand TrapsTransition T 89Condition C 320Transition Pair TP 6298Reflection R 5116Property P 345

Trap properties were created automatically for different coverage criteria and mutation of the re-flected transition relation. The following mutation operators were used (see (Black et al., 2000) andSection 3.7.1 for details): STA (replace atomic propositions with true/false), SNO (negate atomicpropositions), MCO (remove atomic propositions), LRO, RRO, ARO (logical, relational and arith-metical operator replacement, respectively). This mutation approach subsumes the Transition andCondition coverage criteria. In the tables of this thesis, we refer to this kind of trap properties as’Reflection’.

Finally, a set of trap properties was written for the property coverage criterion introduced by Tanet al. (2004). This coverage criterion creates traps from requirement properties, and results in in-teresting (i.e., showing non-vacuous satisfaction) test cases for the requirement properties. For this,30 requirement properties from an informal requirements specification were manually formalizedusing LTL.

Table 11.1 lists the numbers of trap properties created for the presented criteria. In our experimentsonly trap properties that result in counterexamples were used. The different algorithms were exe-cuted using these sets of trap properties. The time the creation takes is measured as well as aspectsof the resulting test suites. As the order in which trap properties are chosen during the test case cre-ation can influence the results, we repeated the test case creation with ten different random orderingsfor each set of trap properties, and the results stated in the tables below are averaged.

Besides the performance of the different algorithms, it is of major interest to examine the effects


on the quality of the resulting test suites. Therefore, the coverage of each test suite is measured forall the presented coverage criteria. In addition, the mutation score is measured with regard to themodel and to an implementation. A mutant results from a single syntactic modification of a modelor program. The mutation score of a test suite is the ratio of mutants that can be distinguished fromthe original to the number of mutants in total. A mutant is detected if the execution leads to differentresults than expected. For this, a test case can be symbolically executed against a model or a modelmutant by converting it to a verifiable model, as described in Section 3.8. This test case model canbe combined with a mutant model, where the values of the test case serve as inputs to the mutantmodel. Symbolic execution is performed by querying the model checker whether the output valuesof mutant model and test case model differ at some point. It is also conceivable to implement thissymbolic execution using rewriting techniques.

For the model-based mutation score, the original model was mutated using the same mutation op-erators as described above for the trap property creation. The resulting mutants were analyzed inorder to eliminate equivalent mutants. This is done with a variant of the test case generation ap-proach proposed by Okun et al. (2003a) (see Section 3.7.4). The original model and a mutant modelare combined so that they share the same input variables, and the model checker is then queriedwhether there exists a trace such that the output values of model and mutant differ. Therefore, anequivalent mutant is detected if no counterexample is returned. This method produced a total of3303 syntactically valid, non-equivalent mutants. In addition, a Java implementation of the systemwas written in order to calculate a mutation score using actual execution. Java was chosen for thisin order to make use of MuJava (Ma et al., 2005) for the creation of mutants. MuJava created 218syntactically valid mutants.

11.3.2 Results

In the tables below we refer to straight forward test case creation by sequentially calling the modelchecker with all trap properties as Normal. Table 11.2 lists the numbers of test cases created foreach set of trap properties and method. The number of unique test cases is determined by removingredundant test cases (i.e., duplicate tests and such tests that are prefixes of other, longer test cases andtherefore subsumed). While on average 75% of the test cases created without monitoring (Normal)are redundant, this ratio is significantly improved with all presented methods. MON creates almostno redundant test cases. In average there are 0.3% redundant test cases for all criteria except theproperty coverage criterion, which results in 14.76% redundant test cases. Redundancy can occurwith MON if an existing test case is a prefix of a counterexample for another trap property, but doesnot fully cover it. Therefore, the order in which trap properties are selected has an influence on theamount of redundant test cases. Theoretically, EXT1 can also create such redundant test cases, asthe rewriting cannot detect that an existing test case is a prefix of another test case in all situations.This only occurred a few times, and only for the property coverage set of trap properties, wherethe maximum number of redundant test cases was 4 out of 89. Except for that, EXT1 and EXT2created no duplicate or subsumed traces at all.

All the considered algorithms create smaller test suites than the straight forward (Normal) approach.For EXT1 and EXT2 this was to be expected, because these approaches are intended to create fewerbut longer test cases. The fact that also MON creates significantly less test cases indicates that thestraight forward approach creates test suites that contain a large amount of redundancy, which isdiscussed below.


Table 11.2: Average number of unique test cases.Criterion Normal MON EXT1 EXT220 EXT250

T 31 20.3 16.3 6.3 3.3C 78 46.5 39.7 15.7 8.0TP 261 171.5 67.6 53.1 25.2R 277 187.7 144.3 51.0 22.3P 197 142.0 81.0 50.0 21.6

Table 11.3: Average test case length.Criterion Normal MON EXT1 EXT220 EXT250

T 8 8.8 12.2 26.6 50.9C 9 10.9 12.7 27.3 53.9TP 11 11.0 25.2 28.8 57.8R 8 8.8 11.3 25.0 54.2P 9 11.0 15.9 25.4 55.5

Table 11.3 lists the average test case lengths, and Table 11.4 lists the overall test suite lengths. Thelength of a test suite is calculated as the sum of the lengths of its unique test cases. The lengthof a test case equals the number of its transitions. As expected, the tables show that the test casescreated using extension of other test cases are significantly longer than those where test cases arenot extended. At the same time, all methods produce test suites with a total length smaller than thatof test suites created with the Normal method. Interestingly, the overall lengths of EXT1 test suitesare sometimes bigger than those of MON. The reason for this is that symbolic model-checking doesnot necessarily return the shortest counterexamples. If there are only few trap properties, then thiscan result in greater overall lengths. This does not seem to be a problem in general, as the effectis only observable for the quite simplistic transition coverage test suites. A possible alternative toovercome this problem would be the use of a bounded model checker, which is guaranteed to findthe shortest counterexample. If all trap properties are of a similar structure, it is also conceivableto simplify the rewriting. For example, if all trap properties are of the type � (x → ©¬y), then itwould be sufficient to use ¬y as rewritten trap for trace generation instead of ¬y ∧ � (x → ©¬y),after a state where x is true. In order to keep our approach independent of the type of trap properties

Table 11.4: Total test suite length after removing duplicate/redundant test cases.Criterion Normal MON EXT1 EXT220 EXT250

T 259 186.9 207.1 170.0 168.5C 748 522.6 518.6 436.1 433.5TP 2972 2009.4 1731.2 1556.2 1523.2R 2469 1717.1 1699.8 1290.2 1226.8P 1934 1566.4 1332.5 1289.5 1209.9


Table 11.5: Standard deviation of test suite lengths.Criterion Normal MON EXT1 EXT220 EXT250

T 0 16.7 22.3 19.0 16.7C 0 17.6 26.9 38.9 30.4TP 0 48.4 34.5 54.6 60.4R 0 40.7 61.7 53.5 73.7P 0 29.2 79.7 65.4 65.0

Table 11.6: Redundancy.Criterion Normal MON EXT1 EXT220 EXT250

T 31.25% 26.91% 11.7% 2.99% 0.94%C 30.84% 26.89% 16.02% 3.96% 1.48%TP 33.38% 31.19% 5.68% 4.25% 1.62%R 49.25% 46.29% 20.94% 4.75% 1.64%P 21.52% 16.43% 9.42% 4.20% 1.47%

used, we do not consider such a modification to the rewriting technique used in this chapter. This,however, could potentially be interesting further research.

The influence of the order in which trap properties are selected has been pointed out several timesin this chapter. As an example of the effects of these choices, Table 11.5 lists the standard deviationof the total test suite lengths. The total test suite length was chosen because it is representative ofthe performance and the quality. The number of transitions a test suite consists of reflects the actualsavings compared to the original, and is also proportional to the performance improvements. Thetable shows that the deviation is not significant. In general, the deviation in the total test suite lengthis significantly smaller than the achieved reduction compared to the normal test suite. Althoughonly the small subset of 10 different orderings out of the set of possible permutations was used, wecan safely conclude that simply using trap properties in the order they are generated or passed to thetest case generation process is feasible. Still, some kind of heuristic to guide the selection of trapproperties could be useful to further improve the performance.

In Chapter 9 we introduced a redundancy measurement for test suites created with model checkers.The redundancy value represents the amount of common prefixes. Test suites with high redundancyvalues are less efficient at detecting faults as the test cases traverse the same passages repeatedly andunnecessarily. Table 11.6 shows the redundancy values for all test suites. The amount of redundancysaved by MON is proportional to the savings in the test suite size. EXT1 results in significantlyless redundancy. In general, the redundancy seems to be correlated to the ratio of the number of testcases to the average test case length. Therefore, the redundancy of test cases created with EXT2and a maximum length of 20 contain more redundancy than those with a maximum length of 50.

Table 11.7 shows the total time consumed for test case generation for each approach and test suite.For all algorithms, the savings are significant. This performance improvement is caused by thereduced number of actual calls to the model checker, as listed in Table 11.8. The overhead addedby the large amount of rewritings is negligible, as long as the model complexity makes the model-


Table 11.7: Creation time.Criterion Normal MON EXT1 EXT220 EXT250

T 3m35s 59s 55s 49s 49sC 12m32s 2m01s 2m02s 2m15s 2m12sTP 247m54s 25m17s 20m00s 41m24s 39m17sR 218m35s 13m01s 10m46s 12m51s 12m34sP 13m20s 7m08s 7m11s 10m06s 9m45s

Table 11.8: Average number of model checker calls.Criterion Normal MON EXT1 EXT220 EXT250

T 89 20.3 19.4 18.6 18.4C 314 46.5 44.4 44.3 44.0TP 6298 171.8 165.2 143.8 152.1R 5410 189.9 179.1 170.9 168.8P 342 154.3 126.7 117.6 113.4

checking process costly enough, and the number of trap properties and test cases is within bounds.Our prototype is comparatively slow with regard to rewriting and could be optimized. The averagetest case length seems to be related to the number of model checker calls; the longer a test case,the more trap properties it covers. Therefore, the EXT2 algorithm with a maximum depth of 50performs the least model checker calls in most cases.

Table 11.9: Coverage: Transition coverage test suites.Criterion Normal MON EXT1 EXT220 EXT250

T 100% 100% 100% 100% 100%C 71.66% 71.66% 72.93% 73.25% 73.25%TP 18.75% 18.75% 27.42% 34.77% 37.66%R 88.87% 88.87% 89.45% 89.72% 89.74%P 30.72% 30.72% 30.14% 31.30% 31.01%

In order for the presented algorithms to be feasible, it is important that the coverage with regardto the criterion used for test case generation is not negatively affected. Therefore, the coverage ofall test suites is measured with regard to the criterion used for creation as well as all other criteria.Tables 11.9, 11.10, 11.11, 11.12 and 11.13 list the results of the coverage analysis. For each set oftrap properties used for test case creation there is one table. Only such trap properties that resultin counterexamples are used, therefore a normal test suite achieves 100% coverage of the criterionused for creation. As expected, the tables show that this coverage value is not affected by anyof the alternative algorithms. In contrast, there is a slight variation with regard to the coverageof criteria not used for creation. MON has a minimal negative impact on the coverage in somecases. Monitoring avoids the creation of test cases where the according trap property is already


Table 11.10: Coverage: Condition coverage test suites.Criterion Normal MON EXT1 EXT220 EXT250

T 100% 100% 100% 100% 100%C 100% 100% 100% 100% 100%TP 25.93% 25.64% 34.90% 44.86% 42.66%R 93.59% 92.07% 92.27% 93.12% 92.66%P 38.26% 37.39% 37.39% 38.26% 37.68%

Table 11.11: Coverage: Transition-Pair coverage test suites.Criterion Normal MON EXT1 EXT220 EXT250

T 100% 100% 100% 100% 100%C 85.99% 85.99% 85.67% 85.03% 84.39%TP 100% 100% 100% 100% 100%R 93.18% 93.18% 93.18% 93.60% 93.31%P 36.81% 36.81% 36.81% 37.10% 37.10%

covered. However, when called on an already covered trap property, the model checker mightreturn a different trace than the one that already covers the trap property. Such traces are not createdwhen monitoring the trap properties. This has no effect on the coverage criterion used to create testcases but explains the small possible degradation of coverage of other criteria in some cases.

While there are still some cases where both EXT1 and EXT2 achieve lower coverage (e.g., coverageof the transition pair traps by the reflection test suites), in the majority of cases the coverage is aboutthe same or higher than that of normal test case generation. The effects on the coverage are generallydifficult to predict as they depend very much on the type of trap properties, redundancy of the testcases, how well the test cases can be extended and many other factors. From our experiments weconclude that the presented approaches can safely be applied without significant negative effects onthe coverage with regard to the model.

Finally, the mutation score is measured as an indicator for the fault sensitivity of the test suites.Table 11.14 shows the results using the model mutants. The results are similar to those with regardto coverage: MON performs marginally worse than Normal test suites, while in most cases EXT1and EXT2 achieve higher scores. Only the extended transition pair test suites perform slightly worse

Table 11.12: Coverage: Reflection coverage test suites.Criterion Normal MON EXT1 EXT220 EXT250

T 100% 100% 100% 100% 100%C 96.82% 94.59% 94.90% 96.18% 96.82%TP 60.78% 31.55% 30.41% 50.71% 57.05%R 100% 100% 100% 100% 100%P 41.74% 41.16% 42.90% 44.64% 44.06%

11.4. Summary 175

Table 11.13: Coverage: Property coverage test suites.Criterion Normal MON EXT1 EXT220 EXT250

T 93.26% 93.26% 93.26% 100% 93.26%C 92.36% 92.36% 91.72% 95.54% 92.68%TP 40.95% 40.66% 46.52% 51.78% 52.21%R 94.07% 93.96% 93.64% 96.47% 93.86%P 100% 100% 100% 100% 100%

Table 11.14: Mutation scores using model mutants.Criterion Normal MON EXT1 EXT220 EXT250

T 59.52% 59.52% 64.64% 69.48% 72.69%C 72.42% 70.30% 74.63% 84.80% 84.50%TP 90.40% 90.37% 90.31% 88.19% 89.92%R 78.84% 76.96% 86.92% 92.40% 93.73%P 79.35% 79.35% 81.83% 86.71% 85.17%

than the original test suite. These results are also reflected in Table 11.15, which lists the mutationscores calculated with the Java mutants.

11.4 Summary

In this chapter, we have presented an approach that integrates LTL rewriting known from runtimeverification into model checker based test case generation. If a sufficiently simple model is usedfor test case generation, then a straight forward approach of model checking all trap properties isapplicable without problems. If, however, the model size increases to a point where the verificationof a single property takes significant time, then the applicability of a straight forward approachdeclines. Although a model is usually more abstract than the actual program it represents, themodel size can still be significant. For instance, automatic conversion (e.g., Matlab Stateflow toSMV) can result in complex models.

The integration of LTL monitoring techniques results in a significant performance increase in such a

Table 11.15: Mutation scores using implementation mutants.Criterion Normal MON EXT1 EXT220 EXT250

T 76.15% 76.15% 78.90% 82.11% 81.19%C 80.28% 79.36% 83.49% 84.40% 84.86%TP 86,70% 86,70% 86,70% 86.24% 86.24%R 82.57% 80.73% 86.24% 87.61% 88,07%P 77,06% 77,06% 77,06% 85,78% 83,49%


scenario. This is achieved by avoiding unnecessary calls to the model checker. Already covered trapproperties are detected using the faster method of LTL rewriting instead of model-checking. Theresults of this rewriting also help to extend test cases instead of creating only distinctive test cases.As a consequence, the overall number and length of test cases in a test suite are reduced. At the sametime, the quality of the test suites is not adversely affected in general, and even enhanced in manycases. An increased fault sensitivity with a smaller size is achieved as the test suite redundancy isreduced.

The presented algorithms apply to all approaches where a single model is checked against multipletrap properties in order to create test cases. There are also approaches that use mutation of themodel instead of trap properties. The rewriting cannot directly be applied to those methods. This isconsidered for further research. The presented rewriting is restricted to LTL, which is sufficient fortrap properties in most cases. Even if CTL (Clarke and Emerson, 1982) is sometimes used in theliterature, this is only using the ’all paths’ quantifier (ACTL), and such a subset that allows linearcounterexamples. The resulting trap properties could therefore also be represented using LTL.

There are several non-deterministic choices in the presented algorithms. The experiments haveshown that a random choice achieves very good results, but further optimizations are conceiv-able. Possible future research therefore includes the search for suitable heuristics to guide thesechoices.

Model checkers are not originally intended for test case generation. Therefore, they are clearlynot optimized for this task. It is necessary to identify areas where drawbacks result from this fact.The introduction of rewriting techniques to model checker based test case generation improvesthe applicability. Even though the performance increase using LTL rewriting can be significant, thisdoes enable the use of model checker based test case generation for models of deliberate complexity.Often, a model can cause the model checker to take too long to check even a single property. Insuch a case, abstraction seems to be the only possibility to allow test case generation.

Chapter 12Mutant Minimization

The contents of this chapter have been published in (Fraser and Wotawa, 2007d).

12.1 Introduction

Specification mutation has been suggested as a method to automatically generate test cases withmodel checkers (Ammann et al., 1998). A model checker can be used to automatically derivetraces that illustrate the difference between original and mutated specification. Such traces can beused as test cases. In general, model checker based test case generation is very flexible and fullyautomated, but the applicability is limited by the performance of the model checker. Although amodel checker is very efficient for smaller models, the state explosion problem quickly leads toperformance problems.

Abstraction methods and improved model checking techniques are directions of research that willhopefully lead to improvements with regard to the performance of the test case generation. Part ofthe problem, however, is caused by inefficient and redundant calls to the model checker during thetest case generation process.

For example, even though it is possible to detect equivalent mutants using a model checker, this stillrequires an explicit call to the model checker for each mutant and consumes a similar amount oftime as the creation of a test case. Consequently, time is spent on model checking of mutants that donot contribute to the test suite. Furthermore, experience has shown that a large number of mutantsresult in identical or subsumed test cases.

In our experience, test suites resulting from mutation based approaches are usually far from minimalwith regard to the mutation score (i.e., percentage of mutant models that can be distinguished fromthe original model). A similar observation has been made in coverage oriented test case generationwith model checkers (Hong et al., 2003). With increasing model size the number of mutants andtherefore test cases increases. Consequently, a necessary countermeasure might be test suite reduc-tion, where a subset of an existing test suite is heuristically selected, such that a certain coveragecriterion is fulfilled with as few test cases as possible.

177

178 Chapter 12. Mutant Minimization

In this chapter, we present a simple approach to reduce the number of calls to the model checker.The proposed solution represents model mutants using characteristic properties. These are used toidentify equivalent mutants and to detect already covered mutants during test case generation, thusexcluding them from the test case generation. The model checker NuSMV (Cimatti et al., 1999) isused to illustrate the presented ideas. Resulting test suites are smaller than when using a traditionalapproach, while the mutation score with regard to those mutants used for test case generation is notchanged. As the size of a test suite is reduced, the effects on the fault sensitivity are empiricallyanalyzed.

In this chapter, we consider methods that are based on mutation of the model (Ammann et al., 2001,1998; Okun et al., 2003a). The model checker has to encode the state space of each mutant, e.g.,using ordered BDDs. The time this takes depends on the model, the chosen model checker andthe techniques it implements. For example, our experience is that the model encoding phase inNuSMV (Cimatti et al., 1999) consumes a large part of the overall model checking time. Conse-quently, equivalent mutants consume time for encoding and model checking, but result in no testcases. Furthermore, mutants often result in identical test cases, or test cases that are subsumed byother, longer test cases. Such mutants are redundantly checked, because resulting test cases do notcontribute to the overall fault sensitivity of a test suite.

Even after removal of subsumed test cases, the size of a resulting test suite is often very large.This can be especially problematic for regression testing, and therefore test suite reduction (Harroldet al., 1993) is sometimes applied to minimize the test suite size. Here, time is consumed to createtest cases in the first place, and then more time is consumed trying to heuristically find a propersubset of the resulting test cases, that is minimal with regard to a given coverage criterion.

As a countermeasure to these problems, we propose to use dedicated properties to minimize thenumber of model checker calls, and consequently the performance and test suite size are improvedsignificantly.

12.2 Characteristic Properties

We use the formal framework for transition systems described by Rayadurgam and Heimdahl (2001b)in order to discuss the ideas presented in this chapter. The framework was described in detail in Sec-tion 3.2 and Definition 8. An example model shown as automaton and basic transition system isshown in Figure 12.1.

Mutation is applied to the textual representation of a model, for example in the input language of themodel checker NuSMV. Conceptually, we can think of mutation as changes in a simple transitionδi, j of a variable xi. We only consider first order mutants, i.e., mutants that differ in a single changefrom the original model.

Definition 53 (Mutant Transition System). A mutant M′ = (D,∆′, ρ) of transition system M =

(D,∆, ρ) differs from M in exactly one simple transition. That is, ∃i, j : δi, j , δ′i, j ∧ ∀k , j : δi,k =

δ′i,k ∧ ∀l , i : δl = δ′l . Furthermore, for δ′i, j , δi, j only one of the following holds: αi, j , α′i, j,βi, j , β

′i, j, or γi, j , γ

′i, j.

As an example, Figure 12.2 shows a mutant of the transition system introduced in Figure 12.1. Here,β of the first transition was changed.

12.2. Characteristic Properties 179

x=1

x=0 x=0

x=1

y=1y=0

(a) Automaton

• (y = 0, y = 0, x = 0)

• (y = 0, y = 1, x = 1)

• (y = 1, y = 1, x = 0)

• (y = 1, y = 0, x = 1)(b) Simple transitions

Figure 12.1: Example model.

x=0x=0

x=1

x=1

y=1y=0

x=0

(a) Automaton

• (y = 0, y = 1, x = 0)

• (y = 0, y = 1, x = 1)

• (y = 1, y = 1, x = 0)

• (y = 1, y = 0, x = 1)(b) Simple transitions

Figure 12.2: Mutant of example in Figure 12.1.

Besides a formalism for system modeling, model checkers use different property specification lan-guages. In this chapter, we use the Linear Temporal Logic (LTL) (Pnueli, 1977).

In (Rayadurgam and Heimdahl, 2001b), the guard and the post-state condition can refer to the valuesof variables in the pre- and post-state. For simplicity, we assume that the guard γi, j refers only to thepre-state, and βi, j only refers to the values of variables in the post-state. Without this assumption,γi, j or βi, j can refer to the values of variables at two different states in time. A possible solutionto this problem is to use "shadow" variables that track the values from the previous solution, asdescribed by Ammann and Black (1999b). A shadow variable for the previous state always has theprevious value of the variable it shadows, therefore all variables that refer to the previous state inβi, j can be replaced with their corresponding shadow variables. The same is possible the other wayround for γi, j. The disadvantage of such an approach is, that the state space is increased by theshadow variables. A different solution would be to access variables in the correct context specificto a © operator. This, however, requires that the constraints γi, j and βi, j can be split into two partsγi

i, j, γji, j and βi

i, j, βji, j, respectively, such that: βi, j = βi

i, j ∧ βji, j, and γi, j = γi

i, j ∧ γji, j, respectively.

From the definition of a transition system it follows that any M which has the transition δi, j =

αi, j ∧ βi, j ∧ γi, j has to satisfy the following property, which expresses that always (� ) when pre-state αi, j and guard condition γi, j are true, the post-state condition βi, j has to be true in the next state


(© ):

ψ1 := � ((αi, j ∧ γi, j)→ ©βi, j) (12.1)

Lemma 12.2.1. For all basic transition systems M and all transitions (αi, j, βi, j, γi, j), M |= ψ1.

The correctness of Lemma 12.2.1 follows directly from Definition 8.

Assume a mutant that contains a simple transition δ′i, j = α′i, j ∧ β′i, j ∧ γ

′i, j where the original model

takes δi, j = αi, j ∧ βi, j ∧ γi, j; i.e., one of α′i, j, β′i, j, or γ′i, j syntactically differs from the original model.

This mutant satisfies the following property, which states that whenever the modified pre-state andguard condition are true, the modified post-state condition has to be true in the next state:

ψ2 := � ((α′i, j ∧ γ′i, j)→ ©β′i, j) (12.2)

A mutation can have different effects. If the mutation has no effect on the transition system, thenthe mutant is referred to as equivalent.

Definition 54 (Equivalent Mutant). A mutant M′ = (D,∆′, ρ) of transition system M = (D,∆, ρ) isequivalent, if ∆↔ ∆′.

From Definition 53 it follows that ∃i, j : δi, j , δ′i, j such that δi, j ↔ δ′i, j. This further means thatαi, j ↔ α′i, j, βi, j ↔ β′i, j, and γi, j ↔ γ′i, j.

If a mutant is not equivalent, then the mutant is inequivalent.

Definition 55 (Inequivalent Mutant). A mutant M′ = (D,∆′, ρ) of transition system M = (D,∆, ρ)is inequivalent, if ¬(∆↔ ∆′).

From Definition 53 it follows that ∃i, j : δi, j , δ′i, j such that ¬(δi, j ↔ δ′i, j). This further means that

αi, j , α′i, j, βi, j , β

′i, j, or γi, j , γ

′i, j.

If a mutant is inequivalent, this means that at least one of the following cases applies: (1) Themutant model contains behavior that the original model does not allow. (2) The original modelcontains behavior that the mutant model does not allow. In the first case the mutant does not satisfyψ1. In the latter case the original model does not satisfy ψ2.

Note, that this definition of mutant equivalence does not take into account that some behaviors ofthe mutant might not be observable. Theoretically, a model can contain not only input and output,but also hidden variables. A change might not be observable if only hidden variables are affectedby the change, but the change does not propagate to an observable output.

We use the language of the model checker NuSMV in this chapter to describe transition systems.Figure 12.3 is an example of how a transition relation of a variable can be defined in NuSMV. Eachentry of the case statement corresponds to a simple transition δi, j for variable xi. The conditionsφ j correspond to the conjunctions of αi, j and γi, j, and βi, j corresponds to the atomic propositionx′i = ξ j, where x′i denotes xi in the post-state. There is an ordering on the conditions, thereforestrictly speaking, the conditions are to be interpreted as φ1, ¬φ1 ∧ φ2, etc., and in the general caseas: (

∧1≤ j<k ¬φ j) ∧ φk.


ASSIGNnext(xi) := caseφ1: ξ1;φ2: ξ2;...

esac;

Figure 12.3: ASSIGN section of an SMV file. The transition relation of a variable xi is given as aset of conditions φ j and next values ξ j.

The property ψ1 can be instantiated for a model M corresponding to Figure 12.3 and the k-th tran-sition of variable xi as follows:

ψxi,k1 := � (((

∧1≤ j<k

¬φ j) ∧ φk)→ © xi = ξk)

The property ψ2 can be instantiated similarly. For example, assuming a mutant of Figure 12.3 thatuses φ′2 instead of φ2, ψ2 is instantiated as follows:

ψxi,22 := � ((¬φ1 ∧ φ

′2)→ © xi = ξ2)

Ideally, we want a property ψ for M′ which identifies an inequivalent mutant simply if M does notsatisfy ψ:

M 6|= ψ↔ M , M′ (12.3)

If M′ is equivalent to M, then M |= ψ2. If M 6|= ψ2, then we know that M and M′ are inequivalent.This corresponds to the left-to-right implication in equation 12.3. The right-to-left implication isnot fulfilled by ψ2, because a mutant can be inequivalent although M |= ψ2. This is the case if themutated simple transition simply restricts the set of possible transitions. Therefore, the right-to-leftimplication in equation 12.3 requires another property to cover all cases:

ψ3 := � ((αi, j ∧ γi, j ∧ ¬(α′i, j ∧ γ′i, j))→ ©¬β′i, j) (12.4)

The property ψ3 covers those cases where the mutant removes possible transitions from the model.For our example NuSMV mutant, this can be instantiated as follows:

ψxi,23 := � ((¬φ1 ∧ φ2 ∧ ¬φ

′2)→ © xi , ξ2)

Definition 56 (Characteristic Property). The characteristic property ψ of mutant M′ = (D,∆′, ρ)of transition system M = (D,∆, ρ) is ψ := ψ2 ∧ ψ3, with ψ2 and ψ3 given in equations 2 and 4,respectively.


The characteristic property ψ of a mutant is the conjunction of ψ2 and ψ3. A mutant is inequivalent,iff M 6|= ψ, which means that ψ fulfills equation 12.3.

Theorem 12.2.2. A mutant M′ represented by its characteristic property ψ is inequivalent to modelM, iff M 6|= ψ.

Proof: The proof of Theorem 12.2.2 consists of two parts. First, we show that if a model does notsatisfy a characteristic property, the mutant is indeed inequivalent. Second, we show that if a mutantis inequivalent, the characteristic property is violated by the model.

(1) M 6|= ψ → M , M′ (Proof by contradiction): Assume a model M and a mutant M′ suchthat M = M′, and ψ derived from M′ such that M 6|= ψ. From Definition 54 we know that M = M′

means that for mutant transition (α′, β′, γ′) the following holds: α′ = α, β = β, γ = γ (the suffix i, j

will be omitted here). Substituting α for α′, β for β′, and γ for γ′ in Equation 12.2, it follows that:

ψ2 = ψ1

Performing the same substitution in Equation 12.4 results in:

ψ3 = � ((α ∧ γ ∧ ¬(α ∧ γ))→ ©¬β)

This can be simplified to :

ψ3 = � ( f alse→ ©¬β)

Because of the implication, this can further be simplified to:

ψ3 = true

Substituting these ψ2 and ψ3 in Definition 56, we get:

ψ = ψ2 ∧ ψ3 = ψ1 ∧ true = ψ1

From Lemma 12.2.1 we know that M |= ψ1, which contradicts our assumption, thus proving M 6|=ψ→ M , M′. ut

(2) M , M′ → M 6|= ψ (Proof by contradiction): Assume a model M and a mutant M′ suchthat M , M′, and ψ derived from M′ such that M |= ψ. From Definition 55 we know that M , M′

means that for mutant transition (α′, β′, γ′) the following holds: α′ , α ∨ β , β ∨ γ , γ (the suffixi, j will be omitted here again). Definition 53 we know that only one of α′, β′, and γ′ differs from theoriginal version. As α and γ only occur in conjunction with ψ, we have to distinguish two cases:

1. α′ , α ∨ γ′ , γ: As β′ = β, we can substitute β for β′ in Equation 12.2:

ψ2 = � ((α′ ∧ γ′)→ ©β)


This is not yet a contradiction, because the mutation of α or γ can result in a stricter logicalequation, such that (α′∧γ′)→ (α∧γ). However, substituting β for β′ in Equation 12.4 resultsin:

ψ3 = � ((α ∧ γ ∧ ¬(α′ ∧ γ′))→ ©¬β)

According to Lemma 12.2.1, M |= ψ1, which contradicts M 6|= ψ3, as (α ∧ γ ∧ ¬(α′ ∧ γ′))→(α ∧ γ). ut

2. β′ , β: We substitute α for α′ and γ for γ′ in Equation 12.2:

ψ2 = � ((α ∧ γ)→ ©β′)

As we assumed that M |= ψ this means that M |= ψ2 and M |= ψ3. This is a contradictionto Lemma 12.2.1, because M cannot satisfy ψ1 = � ((α ∧ γ) → ©β) and violate ψ2 =

� ((α ∧ γ)→ ©β′) if not β′ , β. ut

The idea of characteristic properties is similar to the reflected properties described by Ammann andBlack (1999b). However, not all equivalent or killed mutants can be represented or detected withreflected properties. The conditions of a case statement are unfolded with a process the authors callexpoundment, similarly like in the NuSMV instantiation of ψ1, ψ2, and ψ3. Therefore, reflectiontogether with expoundment results in properties similar to ψ2. As described above, this is notsufficient to identify all possible inequivalent mutants.

Example 1 As a simple example of a characteristic properties, consider the first transition in theexample model in Figure 12.1. We immediately get:

ψ1 = � ((y = 0 ∧ x = 0)→ © (y = 0))

Assuming the corresponding mutant in Figure 12.2, we get the following for ψ2:

ψ2 = � ((y = 0 ∧ x = 0)→ © (y = 1))

The property ψ3 can be instantiated as follows:

ψ3 = � (((y = 0 ∧ x = 0) ∧ ¬((y = 0) ∧ (x = 0)))→ ©¬(y = 1))

As (y = 0 ∧ x = 0) ∧ ¬((y = 0) ∧ (x = 0)) is false and any implication with false on its left side istrue, we can simplify ψ3 to:

ψ3 = true

This means that the characteristic property of the mutant transition system in Figure 12.2 is thefollowing:

ψ = � ((y = 0 ∧ x = 0)→ © (y = 1))


x>=1

x=0 x=0

x>=1

y=1y=0

(a) Automaton

• (y = 0, y = 0, x = 0)

• (y = 0, y = 1, x ≥ 1)

• (y = 1, y = 1, x = 0)

• (y = 1, y = 0, x ≥ 1)(b) Simple transitions

Figure 12.4: Example model with non-boolean values.

x>=1

x=0 x=0

x=1

y=1y=0

(a) Automaton

• (y = 0, y = 0, x = 0)

• (y = 0, y = 1, x = 1)

• (y = 1, y = 1, x = 0)


Figure 12.5: Mutant of example in Figure 12.4.

Example 2 As an example of when ψ3 is necessary, consider the transition system given in Fig-ure 12.4. In contrast to the previous example, x can now take on positive numerical values, and theguard conditions make use of that by changing the = operator to a ≥. Figure 12.5 shows a simplemutant of the second transition, where the ≥ operator is mutated to =. We get the following for ψ1and ψ3:

ψ1 = � ((y = 0 ∧ x ≥ 1)→ © (y = 1))

ψ2 = � ((y = 0 ∧ x = 1)→ © (y = 1))

In this example, ψ2 is satisfied by the original transition system. In contrast, ψ3 results in thefollowing:

ψ3 = � (((y = 0 ∧ x ≥ 1) ∧ ¬((y = 0) ∧ (x = 1)))→ ©¬(y = 1))

This can be simplified to:

ψ3 = � ((y = 0 ∧ x > 1)→ ©¬(y = 1))

The property ψ3 is not satisfied by the original transition system, and therefore also ψ = ψ2 ∧ ψ3 isnot satisfied; this allows the conclusion that the mutant is not equivalent.

12.3. Using Characteristic Properties to Optimize Test Case Generation 185

x>=1

x<=0 x=0

x>=1

y=1y=0

(a) Automaton

• (y = 0, y = 0, x ≤ 0)

• (y = 0, y = 1, x ≥ 1)

• (y = 1, y = 1, x = 0)


Figure 12.6: Another mutant of example in Figure 12.4.

Example 3 As a final example, consider another mutant of the same transition given in Fig-ure 12.4. The new mutant shown in Figure 12.6 changes the guard of the first transition from x = 0to x ≤ 0. As we defined x to take on only positive values, this mutant is obviously equivalent. Thecharacteristic property confirms this. We get the following for ψ2:

ψ2 = � ((y = 0 ∧ x ≤ 0)→ © (y = 0))

The property ψ3 results in the following:

ψ3 = � (((y = 0 ∧ x = 0) ∧ ¬((y = 0) ∧ (x ≤ 0)))→ ©¬(y = 0))

ψ3 can be further simplified:

ψ3 = � ((y = 0 ∧ x < 0)→ ©¬(y = 0))

As x can only take on positive values, both ψ2 and ψ3 are satisfied by the original model; conse-quently, the characteristic property ψ = ψ2 ∧ ψ3 is also satisfied and proves the mutant is equiva-lent.

12.3 Using Characteristic Properties to Optimize Test CaseGeneration

Eliminating Equivalent Mutants: Model checkers can be used to perform equivalence checksin order to detect equivalent mutants. For example, the state-machine duplication method presentedby Okun et al. (2003a) combines a model and a mutant, and then queries the model checker whetherthere exists a path such that the outputs of the model and the mutant differ when using the sameinputs.

The overall time consumed by this equivalence check can be reduced by using characteristic proper-ties (Definition 56). For each mutant, a characteristic property is created. Then, the original modelis checked against these properties. This improves the performance because (1) the model encodingphase only has to be done once and (2) the state space is smaller because the model is not modi-fied. For example, using the method of Okun et al. (2003a), the state space is twice the size of that


of the normal model. The performance can be further improved because it is sufficient to detectthat a model does not satisfy the property, but counterexamples are not necessary. For example,when using the symbolic model checker NuSMV, counterexample calculation, which takes consid-erable time, can be deactivated. If counterexamples are created, this process can be considered asan improvement of (Ammann and Black, 1999b) that results in more test cases.

As noted earlier, an inequivalent mutant might not be observably different. When using charac-teristic properties to detect equivalent mutants this is acceptable: Only characteristic properties ofreally equivalent mutants are satisfied by a model, while there is a small chance that an identifiedinequivalent mutant is in fact equivalent. The worst case is that not as many mutants are excludedfrom the test case generation as might be possible.

Detecting Killed Mutants: Even if a mutant is not equivalent to the original model, it might stillbe unnecessary to use it for test case generation. Often, different mutants result in identical testcases. On the one hand, this can be because mutants cannot only be equivalent to the original modelbut also amongst themselves. On the other hand, different mutants can also lead to identical testcases, and as our experience shows often do so. Another related performance drawback is causedby similar test cases, where one is a prefix of another and can therefore be omitted.

Test suites also tend to get rather large, which might require test suite reduction as a post-processingstep. There are approaches of how to eliminate redundant test cases for a complete test suite (e.g.,winnowing (Ammann and Black, 1999b)), but ideally it should be avoided to call the model checkeron mutants unnecessarily, before creating redundant test cases.

The characteristic properties introduced in the previous section can be used to solve these problem.A test case kills a mutant, if it can distinguish between the original model and the mutant. Accordingto our definition of inequivalent mutants, a test case can distinguish between original model andmutant only if it covers a modified transition. Consequently, a test case that kills a mutant is alsoinconsistent with the characteristic property of that mutant.

The time consumed by the test case generation can be reduced by monitoring characteristic prop-erties. Each time a test case is created it is determined which characteristic properties are violated.A violated characteristic property shows that the mutant is already killed, and the mutant and itscharacteristic property can be removed. Note that this method relies on our definition of equiva-lence. If a mutant is only internally inequivalent but the difference is not observable, then it mightbe excluded from test case generation although there is no test case that propagates the effect ofthe mutation to an observable output. We believe that in scenarios where this is not acceptable, notest-suite minimization at all can be applied.

In the worst case, m mutants result in m unique test cases, and no test case kills more than onemutant. Then, m − 1 properties have to be analyzed after the first test case, m − 2 after the second,etc. Consequently, there is an upper bound of m · (m + 1)/2 such analyses. In order to achieve aperformance improvement, monitoring m · (m + 1)/2 properties has to be faster than model checkingm mutants.

There are different techniques that can be used for the monitoring. For example, test cases can beformulated as verifiable models and then model checked against the characteristic properties. Sucha process is used for coverage analysis with model checkers. Test cases are represented as modelsby adding a special state counter variable and by setting all other variables depending only on the


value of this state counter (Ammann and Black, 1999b). A different approach avoiding the use of amodel checker is taken by rewriting techniques used in property monitoring in the field of runtimeverification.

We propose to use LTL rewriting, which is an efficient method to monitor LTL formulas. For exam-ple, the method described by Havelund and Rosu (2001a) can be adapted to apply to characteristicproperties. It defines a set of simple rewriting rules that are based on the states of an executiontrace. An LTL formula is rewritten with every state of a trace, and if it results in a contradic-tion, then a property violation is detected. Such an approach is faster than model checking; e.g.,Havelund and Rosu claim their implementation is capable of 3 million rewritings per second. In(Fraser and Wotawa, 2007c) we used such a rewriting approach in the context of coverage basedtest case generation with model checkers. Details about minimization with LTL rewriting can befound in Chapter 11.

Optimized Test Case Generation: The normal approach to test case generation with modelcheckers and mutation is to create a set of mutants, and then call the model checker on the mutantswith either requirement or dedicated properties. Figure 12.7 extends this scheme by creating char-acteristic properties together with the mutant models. The characteristic properties can be used toeliminate equivalent mutants before starting the test case generation, or before starting a new modelchecker call on the mutant. Counterexamples are not only stored as test cases but also applied to thecharacteristic properties with rewriting techniques. Any characteristic property that is not satisfiedby a counterexample represents a killed mutant, therefore the corresponding mutant does not needto be included in the test case generation.

As model checking the mutants is generally slower than monitoring properties, this procedure willimprove the performance for all but very small models. Less mutants are model checked becauseredundant test cases are avoided. At the same time, the size of resulting test-suites is reduced. Themutation score with regard to the mutants used for test case generation is not changed.

The order in which mutants are selected for test case generation has an influence on the number ofnecessary test cases, as different test cases kill different sets of mutants. As shown in (Hong et al.,2003), the minimal test-suite problem is NP-hard. Therefore, creation of an minimal test-suite isnot feasible. In practice, random selection of mutants works very well. Heuristic selection might beconsidered in future research.

The performed minimization is related to the test-suite reduction problem (Harrold et al., 1993),where a suitable subset of given test-suite is computed. Consequently, the drawbacks of test-suitereduction also apply to this approach: While maximal mutation score with regard to the mutantsused for test case generation is still achieved, every avoided test case might reduce the overall faultsensitivity.


To evaluate the described ideas, a simple prototype implementation was applied to a manually cre-ated model of a windscreen wiper controller, provided by Magna Steyr. NuSMV reports a total of244.8727 states, 93 BDD variables and 174762 BDD nodes after model encoding. All experimentswere conducted on a PC with Intel Core Duo T2400 processor and 1GB RAM.


Model Mutants

Testspecification

Counter-examples

check modelmutation

Char.properties

remove redundant orequivalent mutants

monitor properties

Figure 12.7: Optimized test case generation.

As an example test case generation method we used the state-machine duplication approach de-scribed by Okun et al. (2003a) described earlier, as it does not depend on requirement properties.The following mutation operators were used (see Section 3.7.1 for details): STA, SNO, MCO, LRO,RRO.

Table 12.1: Results of the test case generation.Operator Mutants Normal Optimized

Size Time Size TimeSTA 768 343 29m48s 70 9m33sSNO 630 331 20m3s 41 5m51sMCO 274 202 10m26s 82 5m56sLRO 584 305 43m40s 80 12m15sRRO 510 178 14m16s 38 9m20s

Table 12.2: Coverage and mutation scores.Operator Normal Optimized

Transition Score Transition ScoreSTA 100% 80.26% 100% 65.13%SNO 100% 79.47% 100% 60.92%MCO 100% 87.64% 96.34% 70.23%LRO 96.34% 87.64% 91.46% 61.29%RRO 96.34% 72.76% 79.27% 54.58%

Table 12.1 lists the results with regard to test-suite size and creation time for regular test case gener-ation (Normal) and when applying the optimizations described in the previous section (Optimized).The test-suite size is given as the number of unique test cases. In average, 62% of all test cases cre-ated with the normal approach are redundant, but only 9% in the case of the optimized version.

Table 12.2 shows the effects on the test-suite quality. The mutation score is measured using a set of4038 model mutants (including equivalent mutants and also different mutants than those used fortest case generation). While the mutation score with regard to the mutation operator used for test

12.5. Summary 189

case generation is not adversely affected, the evaluation on the complete set of mutants shows thedecrease of fault sensitivity. The coverage values for simple transition coverage (Rayadurgam andHeimdahl, 2001b) are similar. This result is in accordance with results from test-suite minimizationresearch (e.g., (Rothermel et al., 1998)).

12.5 Summary

In this chapter, we have considered the sources of bad performance in test case generation usingmodel checkers and mutation, and proposed a solution that avoids some of the identified problems.The idea is to represent model mutants as characteristic properties, which can then be used to iden-tify equivalent mutants more quickly, and to avoid creation of redundant test cases. In addition to theimprovement of the test case generation performance, the resulting test-suites are minimized whilestill achieving the highest possible model based mutation scores. This reduction in the test-suitesize is often important when test case execution is costly. Traditionally, it is done using heuristicapproaches as a post-processing step.

A further optimization of the test-suite size is conceivable by fully exploiting the information pro-duced by the rewriting. Currently, rewriting is only used to detect when a property is not satisfied.If a formula is changed but not shown to be false, this suggests that the considered test case affectsthe property and thus the mutant, but does not yet kill the mutant. It is likely that the test casecan be extended with a short sequence, rather than requiring a new, possibly longer test case. Theconsequence would be, that the number of redundant test cases would be further decreased.

Chapter 13Regression Testing

13.1 Introduction

The need for efficient methods to ensure software correctness has resulted in many different ap-proaches to testing. Recently, model checkers have been considered for test case generation use inseveral works. In general, the counterexample mechanism of model checkers is exploited in orderto create traces that can be used as test cases.

If the model used for test case generation is changed, this has several effects. Test cases created witha model checker are finite execution paths of the model, therefore a test suite created previously tothe model change is likely to be invalid. As test case generation with model checkers is fullyautomated the obvious solution would be to create a new test suite with the changed model. This isa feasible approach as long as the model complexity is small. If the model complexity is significant,the use of a model checker can lead to high computational costs for the test case generation process.However, not all of the test cases might be invalidated by the model change. Many test cases canbe valid for both the original and the changed model. In that case, the effort spent to recreate thesetest cases would be wasted. There are potential savings when identifying invalid test cases and onlycreating as many new test cases as necessary. If a model is changed in a regression testing scenario,where a test suite derived from the model before the change fails to detect any faults, running acomplete test suite might not be necessary. Here it would be sufficient to run those tests createdwith regard to the model change.

In this chapter, we present different approaches to handle model changes. Which approach is prefer-able depends upon the overall objectives in a concrete scenario — should the costs be minimizedwith regard to test case generation or test case execution, or is it more important to ensure that thechanges are correctly implemented? The contributions of this chapter are as follows:

• We present methods to decide whether a test case is made obsolete by a model change or if itremains valid. The availability of such a method allows the reuse of test cases of an older testsuite and is a necessary prerequisite to reduce the costs of the test case generation process forthe new model.

191

192 Chapter 13. Regression Testing

• We present different methods to create new test cases after a model change. These test casescan be used as regression tests if the number of test cases executed after a model changeshould be minimized. They are also used in order to update test suites created with olderversions of the model.

• An empirical evaluation tries to answer two research questions: (1) What is the impact oftest suite update on the overall quality compared to newly created test suites? (2) Is there aperformance gain compared to completely re-creating a test suite after a model change?

13.2 Handling Model Changes

Definition 57 (Invalid Test Case). A test case t for model M = (S , s0,T, L) is invalid for the alteredmodel M′ = (S ′, s′0,T

′, L′), if any of the following conditions is true:

∃ i : < ..., si, si+1, ... >= t ∧ (si, si+1) < T ′ (13.1)

∃ i : < ..., si, ... >= t ∧ L(si) , L′(si) (13.2)

∃ i : < ..., si, ... >= t ∧ (si < S ′) (13.3)

In practice, the Kripke structure is described with the input language of the model checker in use.Such input languages usually describe the transition relation by defining conditions on AP, andsetting the values of variables according to these conditions. A transition condition C describesa set of states S i where C is fulfilled. In all successor states of these states the variable v has tohave the next value n: ∀s ∈ S i : L(s) |= C ∧ ∀s′ : (s, s′) ∈ T → ”v = n” ∈ L(s′). A changein the Kripke structure is represented by a syntactical change in the model source. Such changescan be automatically detected, e.g., by a comparison of the syntax trees. We are only interestedin changes that do not invalidate a complete test suite. Traces created by a model checker consistonly of states s such that for each variable v defined in the model source there exists the proposition”v = n” ∈ L(s), where n is the current value of variable v in state s. For example, addition orremoval of a variable in the model source would result in a change of L for every state in S , andwould therefore invalidate any test suite created before the change. Consequently, the interestingtypes of changes are those applied to the transition conditions or the values for the next states ofvariables in the model description.

13.2.1 Identifying Obsolete Test Cases

In order to use a model checker to decide if a test case is valid for a given model, the test caseis converted to a verifiable model. The transition relations of all variables are given such thatthey depend on a special state counting variable, as suggested by Ammann and Black (1999b).Section 3.8.1 illustrates this process for the model checker NuSMV. There are two methods todecide whether a test case is still valid after a model change. One is based on an execution of thetest case on the model, and the other verifies change related properties on the test case model.

13.2. Handling Model Changes 193

Symbolic test case Execution: A model checker is not strictly necessary for symbolic execution.In a scenario of model checker based test case generation, however, the possibility to use a modelchecker is convenient, as it avoids the need for an executable model. Symbolic execution of a testcase with a model checker is done by adding the actual model as a sub-model instantiated in thetest case model. As input variables of the sub-model the values of the test case are used. Finally, byverifying a property that claims that the output values of the test case and the sub-model equal for thelength of the test case, the model checker determines whether this is indeed the case (Figure 13.1).

MODULE changed_model(input variables)Transition relation o f changed model

MODULE maintest case modelVARSubModel: changed_model(input vars);

SPEC G(State < max_state -> output = SubModel.output)

Figure 13.1: Symbolic execution of a test case.

Now the problem of checking the validity of a test suite with regard to a changed model reduces tomodel-checking each of the test cases combined with the new model. Each test case that results ina counterexample is obsolete. The test cases that do not result in a counterexample are still valid,and thus are not affected by the model change. A drawback of this approach is that the actualmodel is involved in model-checking. If the model is complex, this can have a severe impact on theperformance.

Change Properties: In many cases, test case models can simply be checked against certain prop-erties in order to determine whether a change has an influence on a test case’s validity. This avoidsthe inclusion of the new model in the model-checking process. If a transition condition or target ischanged, then the changed transition can be represented as a temporal logic property, such that anytest case model that is valid on the new model has to fulfill the property:

� (changed_condition→ © variable = changed_value)

Such change properties can be created automatically from the model checker model source file. Theconcrete method depends on the syntax used by the model checker.

If a variable transition is removed, it can only be determined whether a test case takes the oldtransition using a negated property:

� (old_condition→ © ¬(variable = old_value))

Any test case that takes the old transition results in a counterexample.

Theoretically, the latter case can report false positives, if the removed transition is subsumed orreplaced by another transition that behaves identically. This is conceivable as a result of manual


model editing. Such false positives can be avoided by checking the new model against this changeproperty. Only if this results in a counterexample the removal has an effect and really needs to bechecked on test cases. Although verification using the full model is necessary, it only has to be doneonce in contrast to the symbolic execution method.

Test cases that are invalidated by a model change can be useful when testing an implementationwith regard to the model change. Obsolete positive test cases can be used as negative regression testcases. A negative test case is such a test case that may not be passed by a correct implementation.Therefore, an implementation that passes a negative regression test case adheres to the behaviordescribed by the old model.

13.2.2 Creating New Test Cases

Once the obsolete test cases after a model change have been identified and discarded, the test casesthat remain are those that exercise only unchanged behavior. This means that any new behavioradded through the change is not tested. Therefore, new test cases have to be created.

Adapting Obsolete Test Cases: Analysis of the old test suite identifies test cases that containbehavior that has been changed. New test cases can be created by executing these test cases onthe changed model, recording the new behavior. This is done with a model checker by combiningtest case model and changed model together as described in Section 13.2.1. The test case modelcontains a state counter State, and a maximum value MAX. The model checker is queried with theproperty � (State , MAX). This achieves a trace where the value of State is increased up to MAX.The adapted test case simply consists of the value assignments of the changed model in that trace.

Alternatively, when checking test cases using the symbolic execution method, the counterexamplesin this process can directly be used as test cases. In contrast to the method just described resultingtest cases can potentially be shorter, depending on the change. This can theoretically have a negativeinfluence on the overall coverage of the new test suite.

The drawback of this approach is that the changed model might contain new behavior which cannotbe covered if there are no related obsolete test cases. In the evaluation we refer to this method asAdaptation.

Selectively Creating Test Cases: Xu et al. (2004) presented an approach to regression testingwith model checkers, where a special comparator creates trap properties from two versions of amodel. In general, trap property based approaches to test case generation express the items thatmake up a coverage criterion as properties that claim the items cannot be reached (Gargantini andHeitmeyer, 1999). For example, a trap property might claim that a certain state or transition is neverreached. When checking a model against a trap property the model checker returns a counterexam-ple that can be used as a test case. We generalize the approach of Xu et al. in order to be applicableto a broader range of test case generation techniques. The majority of approaches works by eithercreating a set of trap properties or by creating mutants of the model.

For all approaches using trap properties we simply calculate the difference of the sets of trap prop-erties, as an alternative to requiring a special comparator for a specific specification language and

13.2. Handling Model Changes 195

coverage criterion. The original model results in a set of properties P, and the changed model re-sults in P′. New test cases are created by model-checking the changed model against all propertiesin P′ − P. The calculation of the set difference does not require any adaptation of given test casegeneration frameworks. In addition, it also applies to methods that are not based on coverage cri-teria, e.g., the approach proposed by Black (2000). Here, properties are generated by "reflecting"the transition relation of the SMV source file as properties similar to change properties presented inSection 13.2.1. The resulting properties are mutated, and the mutants serve as trap properties.

It is conceivable that this approach might not guarantee achievement of a certain coverage criterion,because for some coverable items the related test cases are invalidated, even though the item itself isnot affected by the change. If maximum coverage of some criterion is required, then an alternativesolution would be to model-check the test case models against the set of trap properties for the newmodel instead of selecting the set difference. For reasons of simplicity, we consider the straightforward approach of using set differences in this chapter. In the evaluation we refer to this methodas Update.

The second category of test case generation approaches uses mutants of the model to create testcases (e.g., (Ammann et al., 1998, 2001; Okun et al., 2003a; Fraser and Wotawa, 2006a)). Forexample, state machine duplication (Okun et al., 2003a) combines original and mutant model so thatthey share the same input variables. The model checker is then queried whether there exists a statewhere the output values of model and mutant differ. Here, the solution is to use only those mutantsthat are related to the model change. For this, the locations of the changes are determined (e.g.,in the syntax tree created by parsing the models) and then the full set of mutants for the changedmodel is filtered such that only mutants of changed statements in the NuSMV source remain. Testcase generation is then performed only using the remaining mutants.

Testing with Focus on Model Changes As a third method to create change related test caseswe propose a generic extension applicable to any test case generation method. It rewrites both themodel (or mutants thereof) and properties involved in the test case generation just before the modelchecker is called. This rewriting is fully automated. The model is extended by a new Booleanvariable changed. If there is more than one change, then there is one variable for each change:changei. These variables are initialized with the value false. A change variable is set to true whena state is reached where a changed transition is taken. Once a change variable is true, it keepsthat value. The transition relation of the change variable consists of the transition condition of thechanged variable is shown in Figure 13.2.

The properties involved in the test case generation approach are rewritten in order to create testcases with focus on the model change. As an example we use LTL, although the transformation canalso be applied to computation tree logic (CTL) (Clarke and Emerson, 1982).

Definition 58 (Change Transformation). The change transformation φ′ = α(φ, c) for an LTL prop-erty φ with respect to the change identified with the Boolean variable c, with a ∈ AP being a


MODULE mainVARchanged: boolean;

...

ASSIGNinit(changed) := FALSE;next(changed) := case

changed_condition: TRUE;1: changed; -- default branchesac;next(changed_var) := case

changed_condition: changed_value;...

Figure 13.2: Transition relation with special variable indicating changes.

propositional formula, is recursively defined as:

α(a, c) = a (13.4)

α(¬φ, c) = ¬α(φ, c) (13.5)

α(φ1 ∧ φ2, c) = α(φ1, c) ∧ α(φ2, c) (13.6)

α(©φ, c) = © (c→ α(φ, c)) (13.7)

α(φ1 U φ2, c) = α(φ1, c) U (c→ α(φ2, c)) (13.8)

Basically, all temporal operators are rewritten to include an implication on the change variable.This achieves that only such counterexamples are created that include the changed transition. Formultiple changes there has to be one modified version of each property for each change in order tomake sure that all changes are equally tested. In the evaluation we refer to this method as Focus.


The previous section presented different possibilities for different aims to cope with model changesin a scenario of model checker based test case generation. This section tries to evaluate the feasibil-ity of these ideas. First, the experiments conducted are described, and then the results are presentedand discussed.


The methods described in this chapter have been implemented using the programming languagePython and the model checker NuSMV (Cimatti et al., 1999). All experiments have been run on aPC with Intel Core Duo T2400 processor and 1GB RAM, running GNU/Linux. We automatically


identify changes between two versions of a model by an analysis of the abstract syntax trees createdfrom parsing the models. We use a simple example model of a cruise control application basedon a version by Kirby (1987). In order to evaluate the presented methods, the mutation score andcreation time of new and updated test suites were tracked over several changes. There is a threatto the validity of the experiments by choosing changes that are not representative of real changes.Therefore the experiments were run several times with different changes and the resulting valuesare averaged.

In the first step, mutants were created from the supposedly correct model. The following mutationoperators were used (see Section 3.7.1 for details): STA (replace atomic propositions with true/-false), SNO (negate atomic propositions), MCO (remove atomic propositions), LRO, RRO, ARO(logical, relational and arithmetical operator replacement, respectively). The resulting mutants wereanalyzed in order to eliminate equivalent mutants. This is done with a variant of the state machineduplication approach (Okun et al., 2003a), where the model checker is queried whether there existsa state where the output values of a model and its mutant differ. An equivalent mutant is detectedif no counterexample is returned. The set of mutants was further reduced by checking each mutantagainst a set of basic properties that require some elementary behavior, e.g., reachability of someimportant states.

Out of the resulting set of inequivalent mutants one mutant is chosen randomly, and used as newmodel. With this mutant, the procedure is repeated until a sequence of 20 visible model changes isachieved. The experiment was run on 20 such sequences and the results were averaged.

For each of the sequences of model versions the following is performed: Beginning with the modelcontaining all 20 changes, test suites are created using the methods transition coverage criterion(one test case for each transition condition of the NuSMV model), mutation of reflected transitionrelation (Black, 2000) and state machine duplication (Okun et al., 2003a). These three methodswere chosen as they should be representative for most types of conceivable approaches. Then, thenext version of the model is chosen, and the test suites of the previous model are analyzed forobsolete test cases, and new and updated test suites are created. Then the mutation scores of allof these test suites are calculated. The mutation score is the ratio of identified mutants to mutantsin total. It is calculated by symbolically executing the test case models against the mutant models.This procedure is repeated for each of the model versions up to the original model.

13.3.2 Results

Table 13.1: Average number of unique new test cases.Test Suite Type Full Adaptation Update FocusTransition 5.75 1 1.4 6.6Reflection 19.35 2.5 7.05 24.4SM Duplication 33.35 2.85 6.45 29.4

Figures 13.3-13.5(a) show the mutation scores of the different methods along the course of thedifferent model version. There is a degradation of the mutation score for the adaptation and updatemethods. The degradation increases with each model change, therefore it could be advisable to


create new test suites after a certain number of changes when using such a method. In contrast,the change focus method achieves a mutation score that is sometimes even higher than that of acompletely new test suite. This is because the test suites created with the focus method are biggerthan new test suites for the transition coverage criterion. Adaptation generally achieves the lowestmutation scores. However, the mutation score is only slightly smaller than for the update method,so a significant performance gain could justify this degradation.

Figures 13.3-13.5(b) show the computation times for creating and for updating test suites. All me-thods are faster than a complete test case generation process, and test suite adaptation performssignificantly faster than all other methods in most cases. The performance of the adaptation is de-termined by the model complexity and number of invalid test cases, and is therefore similar forall test suites in the experiment. In contrast, the performance of the update and focus methods de-pends on the test case generation approach they are based upon. For simple methods like transitioncoverage focus is very efficient, while there is less performance gain as test case generation com-plexity increases. For the most complex approach used in the experiment (based on state machineduplication) the performance gain in comparison to creating a new test suite is minimal.

Finally, Table 13.1 compares the numbers of new test cases created in average after each modelchange. This chart reveals why the change focus method achieves such high mutation scores: thenumber of test cases generated is significantly higher than for any other approach. Interestingly itis even higher than the number of test cases generated for new test suites with the transition andreflection approaches, although the test case generation is still faster in average.

13.4 Summary

In this chapter, we have shown how to decide whether a test case is still valid after the model itwas created from is changed. That way, it is possible to reuse some of the test cases after a modelchange and reduce the test suite generation effort. Different methods to create test cases specific tothe model change were presented. We used the model checker NuSMV for our experiments and asan example model syntax. However, there is no reason why the approach should not be applicableto other model checkers. Experiments have shown that the presented methods can be used to updatetest suites after a model change, although there is a trade-off between performance improvementand quality loss.

The main problem of model checker based approaches in general is the performance. If the model istoo complex, then test case generation will take very long or might even be impossible. Therefore,it is important to find ways of optimizing the approach. The potential savings when recreating testsuites after a model change are significant. Even for the small model used in our evaluation a largeperformance gain is observable when only selectively creating test cases for the model changes.Although a model is usually more abstract than the program it represents, the model size can stillbe significant. For instance, automatic conversion (e.g., Matlab Stateflow to SMV) can result incomplex models.

The methods presented to create new test cases with minimal computational effort achieve goodresults. We cannot conclude that one method is superior, because the preferable method dependson the concrete scenario. If the main goal is to minimize the costs of retesting, then adaptationof old test cases is effective as long as there are not too many and significant changes. If it is

13.4. Summary 199

more important to maximize likelihood of detecting faults with relation to the change, then thepresented method to create test cases focusing on a change is preferable. For example, in safetyrelated scenarios a decrease of the test suite quality is unacceptable. Finally, the update method thatcreates test cases only for changed parts seems like a good compromise; it reduces the costs whilethe quality decrease is not too drastic. Test cases created with any of the presented methods can beused as regression test suites, following the ideas of Xu et al. (2004).

There are some approaches that explicitly use specification properties for test case generation (Am-mann et al., 1998; Okun et al., 2003a; Ammann et al., 2001). This chapter did not explicitly coverthe aspects of test suite update with regard to specification properties. However, the idea of test suitefocus directly applies to such approaches, as well as the presented test suite update techniques. Thecruise control example is only a small model, and the changes involved in our experiments weregenerated automatically. This is sufficient to show the feasibility of the approach. However, actualperformance measurements on more complex models and realistic changes would be desirable.


0

20

40

60

80

100

0 5 10 15 20

%M

utat

ion

Scor

e

Change

FullAdaptation

UpdateFocus

(a) Mutation score

0

0.2

0.4

0.6

0.8

1

1.2

1.4

0 5 10 15 20

Tim

e(s

)

Change

FullAdaptation

UpdateFocus

(b) Creation time

Figure 13.3: Transition coverage test case generation.

13.4. Summary 201

0

20

40

60

80

100

0 5 10 15 20

%M

utat

ion

Scor

e

Change

FullAdaptation

UpdateFocus

(a) Mutation score

0

10

20

30

40

50

60

0 5 10 15 20

Tim

e(s

)

Change

FullAdaptation

UpdateFocus

(b) Creation time

Figure 13.4: Mutation of the reflected transition relation.


0

20

40

60

80

100

0 5 10 15 20

%M

utat

ion

Scor

e

Change

FullAdaptation

UpdateFocus

(a) Mutation score

0

5

10

15

20

25

30

35

40

0 5 10 15 20

Tim

e(s

)

Change

FullAdaptation

UpdateFocus

(b) Creation time

Figure 13.5: Test case generation via state machine duplication.

Chapter 14Conclusions

This chapter concludes the thesis by discussing the limits of the results, and possible ways to con-tinue work on this topic. Part of this chapter is based on (Fraser and Wotawa, 2007i).

14.1 Improving Model Checkers for Test Case Generation

While model checker based methods for testing are convenient, there are some drawbacks thatresult from the fact that model checkers were not originally intended for such an application. Forexample, many model checker based test case generation techniques extend or duplicate the model,which increases the complexity. It is conceivable that the model checker could fulfill these tasksmore efficiently, and without a significant increase in complexity. As another example, when usinga model checker for verification purposes, any counterexample showing a property violation isappropriate. When creating test cases, this has the effect that a significant amount of the test casesare identical or subsumed. This wastes time during creation, and reduces the overall test suitequality. Unfortunately, this is not the only drawback of currently used counterexample generationtechniques.

In this section, a closer look is taken at the demands of test case generation, identify deficienciesof current model checkers, and suggest ten concrete ways to improve model checkers to be bettersuited for the task of test case generation. Some of the discussed techniques have already beenexperimented with, and can be directly integrated into model checkers. Others can be seen aspossible directions for further research.

Research work on testing with model checkers commonly builds on ready available model checkingtools. Such tools were not originally intended for test case generation. They have several drawbacks,which might be negligible within a limited research focus, but do have a negative influence on theoverall applicability of model checker based testing in practice. This section identifies ten ways ofhow model checker tools can be improved for test case generation use.

203

204 Chapter 14. Conclusions

14.1.1 Alternative Witnesses and Counterexamples

To prove a model does not satisfy a property it is sufficient to use a single counterexample. In thecontext of verification, a counterexample should offer the analyzer insight into why the propertyis not satisfied. The shorter the test case, the easier it is to understand how or why the propertyviolation occurs. If several properties are violated such that one counterexample illustrates all theseviolations, then a single counterexample is sufficient in the context of verification.

When using model checkers for software testing, counterexamples are used differently. The objec-tive of testing is to cover as much of the system as possible. The more different behaviors the testcases exercise, the better. Obviously there is a discrepancy between the objectives of testing andverification, but the same techniques are applied in both scenarios.

Testing could greatly benefit from model checkers that take into account the objectives of testing.For example, test requirements are commonly formulated in a negated way, such that a counterex-ample is generated. But, why stop after a single counterexample? Testing would benefit from thepossibility to generate several different counterexamples for the same property. While some modelcheckers (for example, SPIN (Holzmann, 1997)) allow to create multiple counterexamples, mostmodel checkers create only one counterexample per property. Multiple counterexamples wouldmake it possible to choose such a counterexample that explores the most yet uncovered parts of thesystem.

A further step would be to calculate the superset of all possible witnesses or counterexamples. Suchan approach is taken by Meolic et al. (2004), who derive counterexamples or witness automata fora fragment of action computation tree logic. This is also related to the tool TGV (Jard and Jéron,2005), which returns a graph structure (complete test graph) for a manually specified test purpose.Such a structure might itself result in a large or even infinite number of linear test cases, so herefurther processing is necessary to select a reasonable set of test cases.

14.1.2 Minimization at Creation Time

While alternative witnesses can be useful to increase a test suite’s fault sensitivity, the size of a testsuite can be important when test case execution is expensive. If test case execution is costly, then itis of interest to execute only as many test cases as are really needed in order to fulfill a given testobjective. This is related to the problem of test suite reduction (Harrold et al., 1993), also known astest suite minimization. The objective of test suite reduction is to find a suitable subset for a giventest suite, such that the test objectives are achieved. Several heuristics have been presented (Harroldet al., 1993; Gregg Rothermel, 2002; Zhong et al., 2006) that can be used for test suite reduction.While there are other claims (Wong et al., 1995), several experiments (Jones and Harrold, 2003;Rothermel et al., 1998; Heimdahl and Devaraj, 2004) have shown that test suite reduction leadsto a degradation of fault sensitivity. If execution is costly, then minimization is often necessarynevertheless.

Obviously, it is not advantageous with regard to performance to first create a large test suite witha model checker, and then to apply reduction. We have presented techniques that allow to createminimized test suites in the first place (Fraser and Wotawa, 2007c,d). The general idea is to excludealready covered trap properties or mutants from the test case generation. Model checking can be

14.1. Improving Model Checkers for Test Case Generation 205

used for this, but is not necessary. For example, we suggested the use of temporal logic rewritingrules (Havelund and Rosu, 2001a) as an efficient method.

The integration of these techniques into model checkers would not require much effort, but wouldbring advantages: Post-processing would not be necessary, and the performance of the test casegeneration would be improved.

14.1.3 Nondeterministic Counterexamples

In the context of verification with model checkers, nondeterminism is not problematic. In general,a system is nondeterministic, if given the same inputs at different times, different outputs can result.Nondeterminism can easily by modeled with many formalisms, including automaton-based ones.Verification of nondeterministic systems is simply done by exploration of all possible behaviors.

Current model checkers create only linear sequences as counterexamples. If the model used toderive a counterexample is nondeterministic, then such a linear sequence represents choices at non-deterministic decision points. If the counterexample is applied to an implementation as a test case,this implementation might make different choices that are valid nevertheless. The use of a linearsequence could lead to a false verdict in this case (see Chapter 8).

Possible remedies to this problem are to make nondeterministic choice explicit along counterex-amples. This would make it possible to apply the counterexamples as test cases, and if an im-plementation behaves differently than the trace at a nondeterministic decision point, the test caseexecution framework could report an inconclusive verdict. Such an approach is proposed in (Fraserand Wotawa, 2007e).

A further improvement would be to create non-linear test cases. For example, a counterexamplecould be seen as a tree-like structure, which includes all possible alternative branches caused bynondeterminism.

14.1.4 Tree-like Counterexamples

Nondeterministic testing is not the only conceivable application of tree-like counterexamples. Ithas been shown that linear counterexamples are only capable of showing property violations of acertain subset of temporal logics (Clarke and Veith, 2004). Some properties can only be witnessedby tree-like structures, such as created by an algorithm presented by Clarke et al. (2002).

One application of tree-like counterexamples are methods where mutants are model checked againstthe specification (Ammann et al., 1998; Fraser and Wotawa, 2006a). Requirement properties thatrequire tree-like counterexamples also require more than one linear test case. For example, eachcounterexample tree could result in a set of test cases such that the tree is covered.

The ability to create tree-like counterexamples also opens up new possibilities for the definitionof trap properties. For example, trap properties could be formulated such that pairs of test casesrepresenting alternative branches are generated, as described in Chapter 4.


14.1.5 Abstraction for Testing

Performance of model checker based test case generation is a problem mentioned in several papers.In fact, the performance limitations can be seen as the main show-stopper for the industry acceptanceof model checker based testing.

This problem is caused by the state explosion resulting from the large state spaces especially incomplex models. While the current state of the art in model checking already allows verification ofrealistic hardware designs, model checking of software is extremely susceptible to the state explo-sion problem.

Abstraction is generally the solution to overcome the state explosion problem. Several abstractiontechniques have been presented in recent years, and many of them make it possible to verify proper-ties on very large models. These abstraction methods are tailored towards verification, and thereforeare not always useful in the context of testing.

For example, a popular technique is CEGAR (Clarke et al., 2000), where an abstract model is refineduntil no more spurious counterexamples are generated when verifying a property. Such abstractiontechniques guarantee that a property that holds on the abstract model also holds on the concretemodel. This is contrary to the task of test case generation: Here we want properties violating aconcrete model are to be also violated in an abstract model.

Another common abstraction technique is the cone of influence reduction (COI) Berezin et al.(1998). Here, slicing techniques are applied to consider only those parts of a model necessaryto determine satisfaction of a property. When creating test cases, the objective is to cover as muchof the model as possible, so here again there is a certain discrepancy.

Abstraction is not only necessary with regard to performance, but also with regard to the size ofresulting test suites. The more complex a model is, the more mutants or trap properties can bedefined, and the more test cases can be potentially created. A possibility to scale the complexityor abstraction level of a model would be an ideal method to scale the size of a resulting test suitesize.

Unfortunately, research results on abstraction technique specifically tailored towards the needs oftesting is sparse. Ammann and Black (1999a) suggested a method called finite focus that puts alimited focus on the definition range of variables for the purpose of test case generation. Ammannand Black further define a notion of soundness in the context of test case generation, which expressesthat any counterexample of an abstracted model has to be a valid trace of the original model. Ingeneral, this is an area where further research is necessary.

14.1.6 Model Combination

There are some problems related to techniques to derive test cases that would probably never occurin the context of verification. For example, several different techniques to derive test cases injectfaults into a given model and then observe the resulting behavior (Ammann et al., 1998, 2001;Fraser and Wotawa, 2006a; Okun et al., 2003a). In some cases it might be of interest how the faultymodel behaved (for example for the purpose of failing tests), but in many cases the test case shouldrepresent the behavior of the correct model.

14.1. Improving Model Checkers for Test Case Generation 207

Several different solutions to this problem have been presented. For example, Okun et al. (2003a)create a combination of the original model and the mutant, such that the two models share the sameinput variables. Ammann et al. (2001) create a "merged" model, which includes both the original,and the mutated transitions. Another possibility is to symbolically execute failing test cases on acorrect model with a model checker, thus creating a new, passing test case.

All of these approaches can be seen as workarounds to a problem that could more easily be solvedby an appropriate model checker. For example, the verification process might only consider themutant model, while during counterexample creation the original models is used.

A related approach was presented in (Fraser and Wotawa, 2006a). Here, instead of using concretefault models and mutation operators, the model is modified such that each execution run of themodel can contain a fixed, limited amount of faulty behaviors. The advantage gained from thisis that it is possible to not only create traces that show property violation or satisfaction, but suchtraces where a property violation might occur in an error case, without the need to examine largesets of mutants. Again it should be possible to integrate this idea directly into the model checker. Itshould even be possible to implement such a technique without requiring two different versions ofmodels in memory simultaneously.

14.1.7 Explicit Setting of the Initial State

When creating test cases with a model checker, it might be necessary to explicitly set the initialstate of a model, without having to rewrite the textual model representation and re-initializing themodel. Example applications requiring such a method are described in (Fraser and Wotawa, 2006a,2007b,c). For example, it is necessary to set the initial state explicitly when extending existingcounterexamples, or deriving replacement sub-sequences. In other scenarios it might be necessarythat a set of test cases all share the identical initial state.

In general, the task of setting an initial state explicitly is easily fulfilled, even if the model checkerdoes not support that directly. For example, the source code of the model can be rewritten suchthat the desired initial state is the only allowed initial state. The drawback of this solution is, thatthe model checker has to re-encode the entire state space of the model. Alternatively, propertiescan be rewritten such that all statements are represented as implications of the desired initial state.Resulting counterexamples consist of a prefix to this state as well as the regular counterexample part.Again there are performance concerns, as the complexity of counterexample generation increaseswith the length of the counterexamples.

A model checker that explicitly supports setting of a specific single initial state for counterexamplegeneration would be a very practical extension. It is conceivable such an option would also increasethe speed of model checking in some cases, because the state space is reduced, if the set of initialstates is restricted.

14.1.8 Extensible Counterexamples

When using a model checker for verification, each property is interpreted as a distinct entity, andtherefore each property that is not satisfied by a model results in a distinct counterexample. In thecontext of testing, this might be disadvantageous.


For example, many test cases will share identical prefixes. These common prefixes do not contributeto the overall fault sensitivity, because all identical prefixes are only capable of identifying identicalfaults. This problem was discussed in Chapter 9, where we presented an approach to post-processtest suites to remove this redundancy. Such post-processing could be avoided if the counterexamplegeneration would take the testing objectives into account in the first place.

A different effect of the same problem has been observed by Heimdahl et al. (2003), who noticedin a case study that some coverage criteria result in large test suites, where each test case is veryshort.

This situation could be improved if not every property violation leads to a distinct counterexample.For example, each counterexample could be seen as an extension to an existing counterexample, upto a given upper bound on the test case length. Such an approach was implemented by Hamon et al.(2004) in the context of the verification suite SAL (de Moura et al., 2004).

Further improvement could be achieved with regard to the size of a resulting test suite. A new coun-terexample could only be created if the new counterexample is shorter than any possible extensionof existing counterexamples. If counterexamples can be extended, then the counterexample thatallows the shortest extension is chosen. This enables the number of test cases to be reduced, andtheir overall length to be minimized. At the same time, the quality of the test suite would not beadversely affected.

14.1.9 Constraints on Counterexamples

Constraints on the counterexample creation are another conceivable possibility to improve test casegeneration with model checkers. For example, a possible constraint would be to require that eachcounterexample has to pass a certain state. An application that requires this, and a possible solutionhas been presented by the authors in (Fraser et al., 2007). In this work, counterexample creation isconstrained after a model change, in order to create only such counterexamples that are related tothe changes in the model.

A different useful application for constraints would be to require a certain minimum length forcounterexamples. Often, trap properties are already violated in an initial state of the model. Whilea resulting counterexample consisting of only one state might be useful for verification purposes, itis not usable as a test case.

Test purposes, as used in the test tool TGV (Jard and Jéron, 2005), are a different application ofconstraints. Test purposes are a method to describe parts of the model that are interesting for testing.In TGV, test purposes are represented as transition systems with special pass/fail states, and for testcase generation the synchronous product of the model and the test purpose is calculated and pruned.The resulting transition system can be used to create test cases more efficiently than using the wholemodel.

14.1.10 Organization of Counterexamples

When using a model checker for verification, the task of the model checker is fulfilled once acounterexample is created, or a property is proved to be true. In software testing, the creation of

14.2. Summary 209

counterexamples is only the first step. Once the counterexamples have been transformed to testcases, these have to be executed on an implementation under test. Here, further issues need to betackled.

The performance of the test case execution, for example, is often important. Test suite minimization(Section 14.1.2) and abstraction (Section 14.1.5) were already proposed as important methods toreduce the size of test suites and therefore improve the speed of test suite execution, and extensiblecounterexamples (Section 14.1.8) can also be used to reduce the size of test cases and test suites.

There are further improvements conceivable. For example, the order in which test cases are executedhas an influence on the overall speed of fault detection. In (Fraser and Wotawa, 2007a), we haveshown a method to prioritize test cases already during the test case creation. This method is basedon the idea that counterexamples that are produced or subsumed more often are likely to be moreimportant. Inclusion of this technique into model checkers would be straight forward. Prioritizationbased on coverage of test cases would also be simple by either using model checking techniques, orformula rewriting.

A different scenario, where testing can profit from proper organization of counterexamples is aftera model is changed. When then model used to generate test cases changes, it is not necessary torecreate the complete test suite, or to re-execute all test cases. In Chapter 13 we have proposed amethod to identify obsolete test cases after a model change and also have presented different meth-ods to generate test cases tailored for regression testing. Again, these methods could be integrateddirectly into a model checking tool for testing.

14.2 Summary

This thesis started with a presentation of the background and state of the art in testing with modelcheckers. A survey of the field of research was provided, and the basic definitions were given.Subsequently, several novel contributions were described, many of which are related to optimizingthe performance and the quality of testing with model checkers. The thesis concluded with a set ofsuggestions of how model checkers could be improved to be better suited for test case generation.

The techniques reviewed and introduced in this thesis extend the applicability of model checkerbased testing. The main remaining barrier is still the performance of model checkers. Major effortsare being made to improve model checkers; testing, based on model checkers, will naturally benefitfrom these improvements. At the same time, future research will have to focus on abstractiontechniques that enable automated testing for models that are too complex.

In some scenarios the full set of features and possibilities a model checker offers might not benecessary. There are approaches Jéron and Morel (1999) that re-use some ideas of model checkingtechniques in dedicated test tools, such as TGV Jard and Jéron (2005). Such tools are very successfulin some areas where model checking suffers from problems; for example, the performance of TGVis significantly better than regular model checking. At the same time, such tools lack the flexibilityof model checkers. For example, TGV requires suitable test purposes for testing; these test purposescurrently have to be specified manually, although automatic creation of test purposes is an activearea of research. Furthermore, TGV does not return linear test cases but a graph structure fromwhich possibly an infinite number of test cases can be generated. Nevertheless, the lessons learnedfrom creating such tools should ideally be integrated into model checkers.


Even if all performance problems were resolved, there is still one intrinsic problem to all modelbased testing approaches: Where does the model come from? In this thesis and in most paperson model based testing, the existence of a suitable formal model is assumed. The model creation,however, is one of the most difficult parts of the whole development process. Creating modelsmanually is a complicated task, and two specifiers writing a model for an application will probablycome up with different models. Different models, however, will most likely result in different testsuites. Alternative approaches try to extract models from source code, and sometimes model baseddevelopment tools are used, which means that a verifiable model naturally results from the develop-ment process. Such approaches introduce new problems, for example, what exactly is tested by testcases resulting from the model: the implementation, or just the tools that created the model fromthe source code or vice versa?

Due to the context in which the work underlying this thesis was conducted, the main applicationarea are reactive systems. In fact, such systems are the considered application type for most otherwork on testing with model checkers. A reactive system reads input values and sets output valuesaccording to calculations performed on the input values. Usually, this is repeated cyclically, andeach cycle is represented as one logical execution step in the model.

In general, model checker based testing is not limited to reactive systems. Any system that canbe modeled with an automaton-based formalism is suitable for model checker based test case gen-eration. For example, interactions with a graphical user interface are often modeled as automata.The main problem that needs to be solved for any application type is a mapping between steps in amodel checker counterexample and the execution on the real system.

Bibliography

Aynur Abdurazik, Paul Ammann, Wei Ding, and Jeff Offutt. Evaluation of three specification-basedcoverage testing criteria. In Proceedings ICECCS 2000: 6th IEEE International Conference onEngineering of Complex Computer Systems, pages 179–187, Tokyo, Japan, September 2000.

A. T. Acree, T. A. Budd, R. A. DeMillo, R. J. Lipton, and F. G. Sayward. Mutation analysis.Technical report, School of Information and Computer Science, Georgia Inst. of Technology,Atlanta, Ga., Sept. 1979.

Paul Ammann and Paul E. Black. Abstracting Formal Specifications to Generate Software Testsvia Model Checking. In Proceedings of the 18th Digital Avionics Systems Conference, volume 2,1999a.

Paul Ammann and Paul E. Black. A Specification-Based Coverage Metric to Evaluate Test Sets.In HASE ’99: The 4th IEEE International Symposium on High-Assurance Systems Engineering,pages 239–248, Washington, DC, USA, 1999b. IEEE Computer Society. ISBN 0-7695-0418-3.

Paul Ammann, Wei Ding, and Daling Xu. Using a Model Checker to Test Safety Properties. InProceedings of the 7th International Conference on Engineering of Complex Computer Systems(ICECCS 2001), pages 212–221, Skovde, Sweden, 2001. IEEE.

Paul Ammann, Paul E. Black, and Wei Ding. Model Checkers in Software Testing. TechnicalReport NIST-IR 6777, National Institute of Standards and Technology, 2002.

Paul Ammann, Jeff Offutt, and Hong Huang. Coverage criteria for logical expressions. In IS-SRE ’03: Proceedings of the 14th International Symposium on Software Reliability Engineering,page 99, Washington, DC, USA, 2003. IEEE Computer Society. ISBN 0-7695-2007-3.

Paul E. Ammann, Paul E. Black, and William Majurski. Using Model Checking to Generate Testsfrom Specifications. In Proceedings of the Second IEEE International Conference on FormalEngineering Methods (ICFEM’98), pages 46–54. IEEE Computer Society, 1998.

Thomas Ball, Rupak Majumdar, Todd Millstein, and Sriram K. Rajamani. Automatic predicateabstraction of c programs. SIGPLAN Not., 36(5):203–213, 2001. ISSN 0362-1340. doi: 10.1145/381694.378846.

H. Barringer, A. Goldberg, K. Havelund, and K. Sen. Program Monitoring with LTL in Eagle. InPADTAD’04, Parallel and Distributed Systems: Testing and Debugging, 2004.

211

212 Bibliography

Ilan Beer, Shoham Ben-David, Cindy Eisner, and Yoav Rodeh. Efficient detection of vacuity in actlformulaas. In CAV ’97: Proceedings of the 9th International Conference on Computer AidedVerification, pages 279–290, London, UK, 1997. Springer-Verlag. ISBN 3-540-63166-6.

Boris Beizer. Software testing techniques (2nd ed.). Van Nostrand Reinhold Co., New York, NY,USA, 1990. ISBN 0442206720.

Albert Benveniste, Paul Le Guernic, and Christian Jacquemot. Synchronous Programming withEvents and Relations: the SIGNAL Language and Its Semantics. Science of Computer Program-ming, 16(2):103–149, 1991.

Albert Benveniste, Paul Caspi, Stephen A. Edwards, Nicolas Halbwachs, Paul Le Guernic, andRobert de Simone. The Synchronous Languages 12 Years Later. In Proceedings of the IEEE,volume 91, pages 64–83, 2003.

Sergey Berezin, Sérgio Vale Aguiar Campos, and Edmund M. Clarke. Compositional reasoning inmodel checking. In COMPOS’97: Revised Lectures from the International Symposium on Com-positionality: The Significant Difference, pages 81–102, London, UK, 1998. Springer-Verlag.ISBN 3-540-65493-3.

Gérard Berry and Georges Gonthier. The Esterel Synchronous Programming Language: Design,Semantics, Implementation. Science Of Computer Programming, 19(2):87–152, 1992.

Dirk Beyer, Adam J. Chlipala, Thomas A. Henzinger, Ranjit Jhala, and Rupak Majumdar. Gener-ating Tests from Counterexamples. In Proceedings of the 26th International Conference on Soft-ware Engineering (ICSE’04, Edinburgh), pages 326–335. IEEE Computer Society Press, 2004.

Ramesh Bharadwaj and Constance L. Heitmeyer. Model Checking Complete Requirements Speci-fications Using Abstraction. Automated Software Engineering, 6(1):37–68, 1999.

Armin Biere, Alessandro Cimatti, Edmund M. Clarke, and Yunshan Zhu. Symbolic model check-ing without bdds. In TACAS ’99: Proceedings of the 5th International Conference on Toolsand Algorithms for Construction and Analysis of Systems, pages 193–207, London, UK, 1999.Springer-Verlag. ISBN 3-540-65703-7.

P. E. Black and S. Ranville. Winnowing tests: Getting quality coverage from a model checkerwithout quantity. In Digital Avionics Systems, 2001. DASC. The 20th Conference, volume 2,pages 9B6/1–9B6/4 vol.2, 2001.

Paul E. Black. Demonstration of Generating Tests from Formal Specifications [web page] http://hissa.nist.gov/~black/AFTG/, 1998. [Accessed October 24th, 2007].

Paul E. Black. Modeling and Marshaling: Making Tests From Model Checker Counterexamples.In Proc. of the 19th Digital Avionics Systems Conference, pages 1.B.3–1–1.B.3–6 vol.1, 2000.

Paul E. Black, Vadim Okun, and Yaacov Yesha. Mutation Operators for Specifications. In Pro-ceedings of the Fifteenth IEEE International Conference on Automated Software Engineering(ASE’00), 2000.

Paul E. Black, Vadim Okun, and Yaacov Yesha. Mutation of Model Checker Specifications for TestGeneration and Evaluation. Mutation testing for the new century, pages 14–20, 2001.

Sergiy Boroday, Alexandre Petrenko, and Roland Groz. Can a model checker generate tests fornon-deterministic systems? Electronic Notes in Theoretical Computer Science, 190:3–19, 2007.

http://hissa.nist.gov/~black/AFTG/

http://hissa.nist.gov/~black/AFTG/

Bibliography 213

Randal E. Bryant. Graph-based algorithms for boolean function manipulation. IEEE Trans. Com-put., 35(8):677–691, 1986. ISSN 0018-9340.

Randal E. Bryant. Symbolic boolean manipulation with ordered binary-decision diagrams. ACMComput. Surv., 24(3):293–318, 1992. ISSN 0360-0300. doi: 10.1145/136035.136043.

Timothy A. Budd and Dana Angluin. Two notions of correctness and their relation to testing. ActaInf., 18:31–45, 1982.

Timothy A. Budd and Ajei S. Gopal. Program testing by specification mutation. Comput. Lang., 10(1):63–73, 1985. ISSN 0096-0551. doi: 10.1016/0096-0551(85)90011-6.

John Callahan, Francis Schneider, and Steve Easterbrook. Automated Software Testing UsingModel-Checking. In Proceedings 1996 SPIN Workshop, August 1996. Also WVU TechnicalReport NASA-IVV-96-022.

John R. Callahan, Stephen M. Easterbrook, and Todd L. Montgomery. Generating Test Oracles ViaModel Checking. Technical report, NASA/WVU Software Research Lab, 1998.

J. J. Chilenski and S. P. Miller. Applicability of modified condition/decision coverage to softwaretesting. Software Engineering Journal, pages 193–200, September 1994.

Hana Chockler, Orna Kupferman, and Moshe Y. Vardi. Coverage metrics for temporal logic modelchecking. In TACAS 2001: Proceedings of the 7th International Conference on Tools and Al-gorithms for the Construction and Analysis of Systems, pages 528–542, London, UK, 2001.Springer-Verlag. ISBN 3-540-41865-2.

Hana Chockler, Orna Kupferman, and Moshe Vardi. Coverage metrics for formal verification.International Journal on Software Tools for Technology Transfer (STTT), 8(4):373–386, 2006.ISSN 1433-2779. doi: 10.1007/s10009-004-0175-4.

Alessandro Cimatti, Edmund M. Clarke, Fausto Giunchiglia, and Marco Roveri. NUSMV: A NewSymbolic Model Verifier. In CAV ’99: Proceedings of the 11th International Conference onComputer Aided Verification, pages 495–499, London, UK, 1999. Springer-Verlag. ISBN 3-540-66202-2.

E. M. Clarke, E. A. Emerson, and A. P. Sistla. Automatic verification of finite state concurrentsystem using temporal logic specifications: a practical approach. In POPL ’83: Proceedings ofthe 10th ACM SIGACT-SIGPLAN symposium on Principles of programming languages, pages117–126, New York, NY, USA, 1983. ACM Press. ISBN 0-89791-090-7. doi: 10.1145/567067.567080.

Edmund Clarke and Helmut Veith. Counterexamples revisited: Principles, algorithms, applications.In Verification: Theory and Practice, volume 2772 of Lecture Notes in Computer Science, pages208–224, 2004.

Edmund Clarke, Daniel Kroening, and Flavio Lerda. A tool for checking ANSI-C programs. In KurtJensen and Andreas Podelski, editors, Tools and Algorithms for the Construction and Analysisof Systems (TACAS 2004), volume 2988 of Lecture Notes in Computer Science, pages 168–176.Springer, 2004. ISBN 3-540-21299-X.

Edmund M. Clarke and E. Allen Emerson. Design and synthesis of synchronization skeletons usingbranching-time temporal logic. In Logic of Programs, Workshop, pages 52–71, London, UK,1982. Springer-Verlag. ISBN 3-540-11212-X.

214 Bibliography

Edmund M. Clarke, Orna Grumberg, and David E. Long. Verification Tools for Finite State Con-current Systems. In J.W. de Bakker, W.-P. de Roever, and G. Rozenberg, editors, A Decade ofConcurrency-Reflections and Perspectives, volume 803, pages 124–175. Springer-Verlag, 1993.

Edmund M. Clarke, Orna Grumberg, Kenneth L. McMillan, and Xudong Zhao. Efficient genera-tion of counterexamples and witnesses in symbolic model checking. In Proceedings of the 32stConference on Design Automation (DAC), pages 427–432. ACM Press, 1995.

Edmund M. Clarke, Orna Grumberg, Somesh Jha, Yuan Lu, and Helmut Veith. Counterexample-guided abstraction refinement. In CAV ’00: Proceedings of the 12th International Conferenceon Computer Aided Verification, pages 154–169, London, UK, 2000. Springer-Verlag. ISBN3-540-67770-4.

Edmund M. Clarke, Orna Grumberg, and Doron A. Peled. Model Checking. MIT Press, Cambridge,MA., 1 edition, 2001. 3rd printing.

Edmund M. Clarke, Somesh Jha, Yuan Lu, and Helmut Veith. Tree-like counterexamples in modelchecking. In LICS ’02: Proceedings of the 17th Annual IEEE Symposium on Logic in ComputerScience, pages 19–29, Washington, DC, USA, 2002. IEEE Computer Society. ISBN 0-7695-1483-9.

James C. Corbett, Matthew B. Dwyer, John Hatcliff, Shawn Laubach, Corina S. Pasareanu, Robby,and Hongjun Zheng. Bandera: extracting finite-state models from java source code. In ICSE ’00:Proceedings of the 22nd international conference on Software engineering, pages 439–448, NewYork, NY, USA, 2000. ACM Press. ISBN 1-58113-206-9. doi: 10.1145/337180.337234.

Leonardo de Moura, Sam Owre, Harald Rueß, John Rushby, N. Shankar, Maria Sorea, and AshishTiwari. SAL 2. In Rajeev Alur and Doron Peled, editors, Computer-Aided Verification, CAV2004, volume 3114 of Lecture Notes in Computer Science, pages 496–500, Boston, MA, July2004. Springer-Verlag.

René G. de Vries and Jan Tretmans. On-the-fly conformance testing using SPIN. InternationalJournal on Software Tools for Technology Transfer (STTT), 2(4):382–393, March 2000. doi:10.1007/s100090050044.

Claudio DeMartini, Radu Iosif, and Riccardo Sisto. A deadlock detection tool for concurrent javaprograms. Softw. Pract. Exper., 29(7):577–603, 1999. ISSN 0038-0644. doi: 10.1002/(SICI)1097-024X(199906)29:7<577::AID-SPE246>3.0.CO;2-V.

Richard A. DeMillo, Richard J. Lipton, and Frederick G. Sayward. Hints on Test Data Selection:Help for the Practicing Programmer. Computer, 11:34–41, 1978.

George Devaraj, Mats P. E. Heimdahl, and Donglin Liang. Coverage-directed test genera-tion with model checkers: Challenges and opportunities. In COMPSAC ’05: Proceedingsof the 29th Annual International Computer Software and Applications Conference (COMP-SAC’05) Volume 1, pages 455–462, Washington, DC, USA, 2005. IEEE Computer Society. doi:10.1109/COMPSAC.2005.66.

David L. Dill. The murphi verification system. In CAV ’96: Proceedings of the 8th Interna-tional Conference on Computer Aided Verification, pages 390–393, London, UK, 1996. Springer-Verlag. ISBN 3-540-61474-5.

Stefan Edelkamp, Alberto Lluch Lafuente, and Stefan Leue. Directed explicit model checkingwith HSF-SPIN. In SPIN ’01: Proceedings of the 8th international SPIN workshop on Model

Bibliography 215

checking of software, pages 57–79, New York, NY, USA, 2001. Springer-Verlag New York, Inc.ISBN 3-540-42124-6.

Sebastian Elbaum, Alexey G. Malishevsky, and Gregg Rothermel. Prioritizing test cases for regres-sion testing. In ISSTA ’00: Proceedings of the 2000 ACM SIGSOFT international symposium onSoftware testing and analysis, pages 102–112, New York, NY, USA, 2000. ACM Press. ISBN1-58113-266-2. doi: 10.1145/347324.348910.

Sebastian Elbaum, Alexey Malishevsky, and Gregg Rothermel. Incorporating varying test costsand fault severities into test case prioritization. In ICSE ’01: Proceedings of the 23rd Interna-tional Conference on Software Engineering, pages 329–338, Washington, DC, USA, 2001. IEEEComputer Society. ISBN 0-7695-1050-7.

E. Allen Emerson and Joseph Y. Halpern. Decision procedures and expressiveness in the temporallogic of branching time. In STOC ’82: Proceedings of the fourteenth annual ACM symposium onTheory of computing, pages 169–180, New York, NY, USA, 1982. ACM Press. ISBN 0-89791-070-2. doi: 10.1145/800070.802190.

André Engels, Loe Feijs, and Sjouke Mauw. Test generation for intelligent networks using modelchecking. In Ed Brinksma, editor, Proceedings of the Third International Workshop on Tools andAlgorithms for the Construction and Analysis of Systems. (TACAS’97), volume 1217 of LectureNotes in Computer Science, Enschede, the Netherlands, April 1997. Springer-Verlag.

Formal Systems (Europe) Ltd. Failures-Divergence Refinement: FDR2 User Manual, Oct 1997.

Gordon Fraser and Franz Wotawa. Property relevant software testing with model-checkers. SIG-SOFT Softw. Eng. Notes, 31(6):1–10, 2006a. ISSN 0163-5948. doi: 10.1145/1218776.1218787.

Gordon Fraser and Franz Wotawa. Using model-checkers for mutation-based test-case generation,coverage analysis and specification analysis. In Proceedings of the International Conference onSoftware Engineering Advances (ICSEA 2006), pages 16–22, Los Alamitos, CA, USA, 2006b.IEEE Computer Society. ISBN 0-7695-2703-5. doi: 10.1109/ICSEA.2006.75.

Gordon Fraser and Franz Wotawa. Using and improving requirement properties for mutation basedtest-case generation. 22. WI-MAW Rundbrief, Jahrgang 12(2):5–23, 2006c.

Gordon Fraser and Franz Wotawa. Test-case prioritization with model-checkers. In Proceedings ofthe IASTED International Conference on Software Engineering (SE 2007), Innsbruck, 2007a.

Gordon Fraser and Franz Wotawa. Redundancy based test-suite reduction. In Proceedings ofthe 10th International Conference on Fundamental Approaches to Software Engineering (FASE2007), volume 4422 of Lecture Notes in Computer Science, pages 291–305. Springer, 2007b.

Gordon Fraser and Franz Wotawa. Using LTL rewriting to improve the performance of model-checker based test-case generation. In A-MOST ’07: Proceedings of the 3rd international work-shop on Advances in model-based testing, pages 64–74, New York, NY, USA, 2007c. ACM Press.ISBN 978-1-59593-850-3. doi: 10.1145/1291535.1291542.

Gordon Fraser and Franz Wotawa. Mutant minimization for model-checker based test-case gen-eration. In Proceedings of the Third Workshop on Mutation Analysis (Mutation 2007), pages161–166. IEEE Computer Society, 2007d.

Gordon Fraser and Franz Wotawa. Test-case generation and coverage analysis for nondeterminis-tic systems using model-checkers. In Proceedings of the International Conference on Software

216 Bibliography

Engineering Advances (ICSEA 2007), page 45, Los Alamitos, CA, USA, 2007e. IEEE ComputerSociety. ISBN 0-7695-2937-2. doi: 10.1109/ICSEA.2007.71.

Gordon Fraser and Franz Wotawa. Using Model-Checkers to Generate and Analyze Property Rele-vant Test-Cases. Software Quality Journal, 15(3), 2007f. To appear.

Gordon Fraser and Franz Wotawa. Using formal methods for ensuring quality requirements ofsystems. ÖVE Verbandszeitschrift Elektrotechnik und Informationstechnik (e&i), 124(1):13–16,2007g. doi: 10.1007/s00502-006-0411-6.

Gordon Fraser and Franz Wotawa. Creating test-cases incrementally with model-checkers. InProceedings of the Workshop on Model-Based Testing (MOTES 2007) held in conjunction withthe 37th Annual Congress of the Gesellschaft fuer Informatik, pages 415–420, 2007h.

Gordon Fraser and Franz Wotawa. Improving model-checkers for software testing. In Proceedingsof the International Conference on Software Quality (QSIC’07), 2007i. To appear.

Gordon Fraser and Franz Wotawa. Nondeterministic testing with linear model-checker counterex-amples. In Proceedings of the International Conference on Software Quality (QSIC’07), 2007j.To appear.

Gordon Fraser, Bernhard Aichernig, and Franz Wotawa. Handling model changes: Regressiontesting and test-suite update with model-checkers. Electronic Notes in Theoretical ComputerScience, 190:33–46, 2007.

Jose Garcia-Fanjul, Javier Tuya, and Claudio de la Riva. Generating Test Cases Specifications forBPEL Compositions of Web Services Using SPIN. In Proceedings of the International Workshopon Web Services Modeling and Testing (WS-MaTe 2006), pages 83–94, 2006.

A. Gargantini and E. Riccobene. Asm-based testing: Coverage criteria and automatic test sequence.Journal of Universal Computer Science, 7(11):1050–1067, 2001.

Angelo Gargantini. ATGT: ASM Tests Generation Tool [web page] http://cs.unibg.it/gargantini/projects/atgt/, 2007a. [Accessed October 24th, 2007].

Angelo Gargantini. Using Model Checking to Generate Fault Detecting Tests. In Proceedings ofthe International Conference on Tests And Proofs (TAP), Zurich, Switzerland, 2007b.

Angelo Gargantini and Constance Heitmeyer. Using Model Checking to Generate Tests From Re-quirements Specifications. In ESEC/FSE’99: 7th European Software Engineering Conference,Held Jointly with the 7th ACM SIGSOFT Symposium on the Foundations of Software Engineer-ing, volume 1687 of Lecture Notes in Computer Science, pages 146–162, London, UK, Septem-ber 1999. Springer. ISBN 3-540-66538-2.

Angelo Gargantini, Elvinia Riccobene, and Salvatore Rinzivillo. Using Spin to Generate Testsfrom ASM Specifications. In Abstract State Machines 2003. Advances in Theory and Practice:10th International Workshop, ASM 2003, Taormina, Italy, March 3-7, 2003. Proceedings, volume2589 of Lecture Notes in Computer Science, pages 263+. Springer Verlag Gmbh, 2003.

Marie-Claude Gaudel. Testing Can Be Formal, Too. In Proceedings of the 6th International JointConference CAAP/FASE on Theory and Practice of Software Development, pages 82–96, 1995.

Patrice Godefroid. Model checking for programming languages using verisoft. In POPL ’97:Proceedings of the 24th ACM SIGPLAN-SIGACT symposium on Principles of programming lan-guages, pages 174–186, New York, NY, USA, 1997. ACM Press. ISBN 0-89791-853-3. doi:10.1145/263699.263717.

http://cs.unibg.it/gargantini/projects/atgt/

http://cs.unibg.it/gargantini/projects/atgt/

Bibliography 217

John B. Goodenough and Susan L. Gerhart. Toward a theory of test data selection. In Proceedingsof the international conference on Reliable software, pages 493–510, 1975.

Jeffery von Ronne Christie Hong Gregg Rothermel, Mary Jean Harrold. Empirical studies of test-suite reduction. Software Testing, Verification and Reliability, 12(4):219–249, 2002.

Mats Grindal and Brigitta Lindström. Challenges in Testing Real-Time Systems. In Proceedingsof the 10th International Conference on Software Testing Analysis and Review (EuroSTAR’02),2002.

Alex Groce, Doron Peled, and Mihalis Yannakakis. Adaptive model checking. In TACAS ’02:Proceedings of the 8th International Conference on Tools and Algorithms for the Constructionand Analysis of Systems, pages 357–370, London, UK, 2002. Springer-Verlag. ISBN 3-540-43419-4.

Yuri Gurevich. Sequential abstract-state machines capture sequential algorithms. ACM Trans. Com-put. Logic, 1(1):77–111, 2000. ISSN 1529-3785. doi: 10.1145/343369.343384.

N. Halbwachs, P. Caspi, P. Raymond, and D. Pilaud. The synchronous data-flow programminglanguage LUSTRE. Proceedings of the IEEE, 79:1305–1320, 1991.

Grégoire Hamon, Leonardo de Moura, and John Rushby. Generating Efficient Test Sets with aModel Checker. In Proceedings of the Second International Conference on Software Engineeringand Formal Methods (SEFM’04), pages 261–270, 2004.

Grégoire Hamon, Leonardo de Moura, and John Rushby. Automated Test Generation with SAL.Technical report, Computer Science Laboratory, SRI International, 2005.

R. H. Hardin, Z. Har’El, and R. P. Kurshan. COSPAN. In Proceedings of the Eighth Conference onComputer Aided Verification (CAV 1996), volume 1102 of Lecture Notes in Computer Science,pages 423–427. Springer, 1996.

M. Jean Harrold, Rajiv Gupta, and Mary Lou Soffa. A methodology for controlling the size ofa test suite. ACM Trans. Softw. Eng. Methodol., 2(3):270–285, 1993. ISSN 1049-331X. doi:10.1145/152388.152391.

Klaus Havelund. Java pathfinder, a translator from java to promela. In Proceedings of the 5th and 6thInternational SPIN Workshops on Theoretical and Practical Aspects of SPIN Model Checking,page 152, London, UK, 1999. Springer-Verlag. ISBN 3-540-66499-8.

Klaus Havelund and Grigore Rosu. Monitoring programs using rewriting. In ASE ’01: Proceed-ings of the 16th IEEE international conference on Automated software engineering, page 135,Washington, DC, USA, 2001a. IEEE Computer Society.

Klaus Havelund and Grigore Rosu. Monitoring java programs with java pathexplorer. ElectronicNotes in Theoretical Computer Science, 55(2), 2001b.

Klaus Havelund and Grigore Rosu. Efficient monitoring of safety properties. Int. J. Softw. ToolsTechnol. Transf., 6(2):158–173, 2004. ISSN 1433-2779. doi: 10.1007/s10009-003-0117-6.

Mats P. E. Heimdahl, Sanjai Rayadurgam, and Willem Visser. Specification Centered Testing. InProceedings of the Second International Workshop on Automated Program Analysis, Testing andVerification (ICSE 2000), 2000.

Mats P.E. Heimdahl, Sanjai Rayadurgam, Willem Visser, George Devaraj, and Jimin Gao. Auto-Generating Test Sequences using Model Checkers: A Case Study. In Third International Inter-

218 Bibliography

national Workshop on Formal Approaches to Software Testing, volume 2931 of Lecture Notes inComputer Science, pages 42–59. Springer Verlag, October 2003.

Mats Per Erik Heimdahl and George Devaraj. Test-Suite Reduction for Model Based Tests: Effectson Test Quality and Implications for Testing. In ASE, pages 176–185. IEEE Computer Society,2004. ISBN 0-7695-2131-2.

Mats Per Erik Heimdahl, George Devaraj, and Robert Weber. Specification Test Coverage Ade-quacy Criteria = Specification Test Generation Inadequacy Criteria? In HASE, pages 178–186.IEEE Computer Society, 2004. ISBN 0-7695-2094-4.

C. L. Heitmeyer. Encyclopedia of Software Engineering, volume 2, chapter Software Cost Reduc-tion. John Wiley & Sons, 2002.

Matthew Hennessy and Robin Milner. Algebraic laws for nondeterminism and concurrency. J.ACM, 32(1):137–161, 1985. ISSN 0004-5411. doi: 10.1145/2455.2460.

Thomas A. Henzinger, Ranjit Jhala, Rupak Majumdar, and Gregoire Sutre. Software verificationwith Blast. In Model Checking Software: 10th International SPIN Workshop, Portland, OR, USA,May 9-10, 2003. Proceedings, pages 235–239. Springer-Verlag, 2003.

Anders Hessel, Kim G. Larsen, Brian Nielsen, Paul Pettersson, and Arne Skou. Time-Optimal Real-Time Test Case Generation Using Uppaal. In Alexandre Petrenko and Andreas Ulrich, editors,Proceedings of the Third International Workshop on Formal Approaches to Software Testing(FATES 2003), volume 2931, pages 114–130, 2004.

R. M. Hierons. Applying adaptive test cases to nondeterministic implementations. Inf. Process.Lett., 98(2):56–60, 2006. ISSN 0020-0190. doi: 10.1016/j.ipl.2005.12.001.

Gerard J. Holzmann. The model checker SPIN. IEEE Trans. Softw. Eng., 23(5):279–295, 1997.ISSN 0098-5589. doi: 10.1109/32.588521.

Hyoung S. Hong and Hasan Ural. Using model checking for reducing the cost of test generation.In Formal Approaches to Software Testing, volume 3395 of Lecture Notes in Computer Science,pages 110–124. Springer Verlag Gmbh, 2005a.

Hyoung S. Hong and Hasan Ural. Dependence testing: Extending data flow testing with controldependence. In Testing of Communicating Systems, volume 3502 of Lecture Notes in ComputerScience, pages 23–39. Springer Verlag Gmbh, 2005b.

Hyoung S. Hong, Insup Lee, Oleg Sokolsky, and Hasan Ural. A temporal logic based theory of testcoverage and generation. In Tools and Algorithms for the Construction and Analysis of Systems: 8th International Conference, TACAS 2002, Held as Part of the Joint European Conference onTheory and Practice of Software, ETAPS 2002, Grenoble, France, April 8-12, 2002. Proceedings,volume 2280 of Lecture Notes in Computer Science, pages 151–161. Springer Verlag Gmbh,2002.

Hyoung S. Hong, Sung D. Cha, Insup Lee, Oleg Sokolsky, and Hasan Ural. Data flow testing asmodel checking. In ICSE ’03: Proceedings of the 25th International Conference on SoftwareEngineering, pages 232–242, Washington, DC, USA, 2003. IEEE Computer Society.

Hyoung Seok Hong and Insup Lee. Automatic Test Generation from Specifications for Control-Flow and Data-Flow Coverage Criteria. In Proceedings of the International Conference on Soft-ware Engineering (ICSE), 2003.

Bibliography 219

Hyoung Seok Hong, Insup Lee, Oleg Sokolsky, and Sung Deok Cha. Automatic Test Generationfrom Statecharts Using Model Checking. Technical report, MS-CIS-01-07, 2001.

Yatin Hoskote, Timothy Kam, Pei-Hsin Ho, and Xudong Zhao. Coverage estimation for symbolicmodel checking. In DAC ’99: Proceedings of the 36th ACM/IEEE conference on Design au-tomation, pages 300–305, New York, NY, USA, 1999. ACM Press. ISBN 1-58133-109-7. doi:10.1145/309847.309936.

Hai Huan, Wei-Tek Tsai, Raymond Paul, and Yinong Chen. Automated model checking and testingfor composite web services. In Proceedings of the 8th IEEE International Symposium on Object-oriented Real-time Distributed Computing, pages 300–307. IEEE Computer Society, 2005.

Claude Jard and Thierry Jéron. TGV: theory, principles and algorithms. International Journal onSoftware Tools for Technology Transfer (STTT), 7:297–315, 2005.

Nikhil Jayakumar, Mitra Purandare, and Fabio Somenzi. Dos and don’ts of ctl state coverageestimation. In DAC ’03: Proceedings of the 40th conference on Design automation, pages 292–295, New York, NY, USA, 2003. ACM Press. ISBN 1-58113-688-9. doi: 10.1145/775832.775908.

Thierry Jéron and Pierre Morel. Test generation derived from model-checking. In CAV ’99: Pro-ceedings of the 11th International Conference on Computer Aided Verification, pages 108–121,London, UK, 1999. Springer-Verlag. ISBN 3-540-66202-2.

James A. Jones and Mary Jean Harrold. Test-suite reduction and prioritization for modified condi-tion/decision coverage. IEEE Trans. Softw. Eng., 29(3):195–209, 2003. ISSN 0098-5589. doi:10.1109/TSE.2003.1183927.

Sagi Katz, Orna Grumberg, and Daniel Geist. "Have I written enough Properties?" - A Method ofComparison between Specification and Implementation. In CHARME ’99: Proceedings of the10th IFIP WG 10.5 Advanced Research Working Conference on Correct Hardware Design andVerification Methods, pages 280–297, London, UK, 1999. Springer-Verlag. ISBN 3-540-66559-5.

Jung-Min Kim and Adam Porter. A history-based test prioritization technique for regression test-ing in resource constrained environments. In ICSE ’02: Proceedings of the 24th InternationalConference on Software Engineering, pages 119–129, New York, NY, USA, 2002. ACM Press.ISBN 1-58113-472-X. doi: 10.1145/581339.581357.

J Kirby. Example NRL/SCR Software Requirements for an Automobile Cruise Control and Moni-toring System. Technical Report TR-87-07, Wang Institute of Graduate Studies, 1987.

K.L. McMillan. The SMV system. Technical Report CMU-CS-92-131, Carnegie-Mellon Univer-sity, 1992.

Bogdan Korel. Automated Software Test Data Generation. IEEE Trans. Softw. Eng., 16(8):870–879, 1990. doi: 10.1109/32.57624.

Dexter Kozen. Results on the propositional mu-calculus. Theor. Comput. Sci., 27:333–354, 1983.

D. Richard Kuhn and Vadim Okun. Pseudo-exhaustive testing for software. In 30th Annual IEEE /

NASA Software Engineering Workshop (SEW-30 2006), 25-28 April 2006, Loyola College Grad-uate Center, Columbia, MD, USA, pages 153–158. IEEE Computer Society, 2006.

220 Bibliography

Orna Kupferman and Moshe Y. Vardi. Model checking revisited. In CAV ’97: Proceedings of the9th International Conference on Computer Aided Verification, pages 36–47, London, UK, 1997.Springer-Verlag. ISBN 3-540-63166-6.

Orna Kupferman and Moshe Y. Vardi. Vacuity detection in temporal model checking. In CHARME’99: Proceedings of the 10th IFIP WG 10.5 Advanced Research Working Conference on CorrectHardware Design and Verification Methods, pages 82–96, London, UK, 1999. Springer-Verlag.ISBN 3-540-66559-5.

Kim G. Larsen, Paul Pettersson, and Wang Yi. Uppaal in a nutshell. International Journal onSoftware Tools for Technology Transfer (STTT), 1(1 - 2):134–152, December 1997.

Kim G. Larsen, Marius Mikucionis, and Brian Nielsen. Online Testing of Real-time Systems UsingUPPAAL. In Jens Grabowski and Brian Nielsen, editors, Proceedings of the 4th InternationalWorkshop on Formal Approaches to Testing of Software (FATES 2004), volume 3395 of LectureNotes in Computer Science. Springer-Verlag GmbH, 2004.

Nancy G. Leveson, Mats Per Erik Heimdahl, Holly Hildreth, and Jon D. Reese. Requirementsspecification for process-control systems. IEEE Trans. Softw. Eng., 20(9):684–707, 1994. ISSN0098-5589. doi: 10.1109/32.317428.

Orna Lichtenstein and Amir Pnueli. Checking that finite state concurrent programs satisfy theirlinear specification. In POPL ’85: Proceedings of the 12th ACM SIGACT-SIGPLAN symposiumon Principles of programming languages, pages 97–107, New York, NY, USA, 1985. ACM Press.ISBN 0-89791-147-4. doi: 10.1145/318593.318622.

Yu-Seung Ma, Jeff Offutt, and Yong Rae Kwon. Mujava: an automated class mutation system.Softw. Test. Verif. Reliab., 15(2):97–133, 2005. ISSN 0960-0833. doi: 10.1002/stvr.v15:2.

Nicolas Markey and Philippe Schnoebelen. Model checking a path (preliminary report). In Proc.Concurrency Theory (CONCUR’2003), Marseille, France, volume 2761 of Lect.Notes Comp.Sci, pages 251–265. Springer, August 2003.

Kenneth L. McMillan. Symbolic Model Checking. Kluwer Academic Publishers, Norwell, MA,USA, 1993. ISBN 0792393805.

Robert Meolic, Alessandro Fantechi, and Stefania Gnesi. Witness and counterexample automatafor actl. In Formal Techniques for Networked and Distributed Systems - FORTE 2004, volume3235 of Lecture Notes in Computer Science, pages 259–275, 2004.

Madanlal Musuvathi, David Y. W. Park, Andy Chou, Dawson R. Engler, and David L. Dill. Cmc: apragmatic approach to model checking real code. SIGOPS Oper. Syst. Rev., 36(SI):75–88, 2002.ISSN 0163-5980. doi: 10.1145/844128.844136.

Glenford J. Myers. The Art of Software Testing. John Wiley & Sons, Inc., 1979.

A. Jefferson Offutt, Yiwei Xiong, and Shaoying Liu. Criteria for generating specification-basedtests. In ICECCS. IEEE Computer Society, 1999. ISBN 0-7695-0434-5.

Vadim Okun, Paul E. Black, and Yaacov Yesha. Testing with Model Checker: Insuring Fault Visi-bility. In Nikos E. Mastorakis and Petr Ekel, editors, Proceedings of 2002 WSEAS InternationalConference on System Science, Applied Mathematics & Computer Science, and Power Engineer-ing Systems, pages 1351–1356, 2003a.

Vadim Okun, Paul E. Black, and Yaacov Yesha. Testing with Model Checker: Insuring Fault Visi-bility. Technical Report NIST-IR 6929, National Institute of Standards and Technology, 2003b.

Bibliography 221

David Y. W. Park, Ulrich Stern, Jens U. Skakkebaek, and David L. Dill. Java model checking. InASE ’00: Proceedings of the 15th IEEE international conference on Automated software engi-neering, page 253, Washington, DC, USA, 2000. IEEE Computer Society. ISBN 0-7695-0710-7.

Doron Peled. Model checking and testing combined. In Jos C. M. Baeten, Jan Karel Lenstra,Joachim Parrow, and Gerhard J. Woeginger, editors, Automata, Languages and Programming,30th International Colloquium, ICALP 2003, Eindhoven, The Netherlands, June 30 - July 4,2003. Proceedings, volume 2719 of Lecture Notes in Computer Science, pages 47–63. Springer,2003.

A. Petrenko, N. Yevtushenko, and G. v. Bochmann. Testing deterministic implementations fromnondeterministic FSM specifications. In Selected proceedings of the IFIP TC6 9th internationalworkshop on Testing of communicating systems, pages 125–140, London, UK, 1996. Chapman& Hall, Ltd. ISBN 0-412-78790-3.

Amir Pnueli. The temporal logic of programs. In 18th Annual Symposium on Foundations ofComputer Science, 31 October-2 November, Providence, Rhode Island, USA, pages 46–57. IEEE,1977.

Mitra Purandare and Fabio Somenzi. Vacuum cleaning CTL formulae. In CAV ’02: Proceedingsof the 14th International Conference on Computer Aided Verification, pages 485–499, London,UK, 2002. Springer-Verlag. ISBN 3-540-43997-8.

Jean-Pierre Queille and Joseph Sifakis. Specification and verification of concurrent systems in cesar.In Proceedings of the 5th Colloquium on International Symposium on Programming, pages 337–351, London, UK, 1982. Springer-Verlag. ISBN 3-540-11494-7.

Sandra Rapps and Elaine J. Weyuker. Selecting software test data using data flow information. IEEETrans. Softw. Eng., 11(4):367–375, 1985. ISSN 0098-5589. doi: 10.1109/TSE.1985.232226.

S. Rayadurgam and M. P. E. Heimdahl. Test-sequence generation from formal requirement models.In HASE ’01: The 6th IEEE International Symposium on High-Assurance Systems Engineering,Washington, DC, USA, 2001a. IEEE Computer Society. ISBN 0769512755.

Sanjai Rayadurgam and Mats P. E. Heimdahl. Coverage Based Test-Case Generation Using ModelCheckers. In Proceedings of the 8th Annual IEEE International Conference and Workshop onthe Engineering of Computer Based Systems (ECBS 2001), pages 83–91, Washington, DC, April2001b. IEEE Computer Society.

Sanjai Rayadurgam and Mats P. E. Heimdahl. Coverage Based Test-Case Generation Using ModelCheckers. Technical Report 01-005, University of Minnesota, Minneapolis, January 2001c.

Sanjai Rayadurgam and Mats P.E. Heimdahl. Generating MC/DC Adequate Test SequencesThrough Model Checking. In Proceedings of the 28th Annual NASA Goddard Software Engi-neering Workshop, pages 91–96, 2003.

Robby, Matthew B. Dwyer, and John Hatcliff. Bogor: an extensible and highly-modular soft-ware model checking framework. In ESEC/FSE-11: Proceedings of the 9th European softwareengineering conference held jointly with 11th ACM SIGSOFT international symposium on Foun-dations of software engineering, pages 267–276, New York, NY, USA, 2003. ACM Press. ISBN1-58113-743-5. doi: 10.1145/940071.940107.

Grigore Rosu and Klaus Havelund. Rewriting-based techniques for runtime verification. AutomatedSoftware Engg., 12(2):151–197, 2005. ISSN 0928-8910. doi: 10.1007/s10515-005-6205-y.

222 Bibliography

Gregg Rothermel, Mary Jean Harrold, Jeffery Ostrin, and Christie Hong. An empirical study of theeffects of minimization on the fault detection capabilities of test suites. In ICSM ’98: Proceedingsof the International Conference on Software Maintenance, page 34, Washington, DC, USA, 1998.IEEE Computer Society. ISBN 0-8186-8779-7.

Gregg Rothermel, Roland H. Untch, Chengyun Chu, and Mary Jean Harrold. Test case prioritiza-tion: An empirical study. In ICSM ’99: Proceedings of the IEEE International Conference onSoftware Maintenance, page 179, Washington, DC, USA, 1999. IEEE Computer Society. ISBN0-7695-0016-1.

Corina S. Pasareanu Sarfraz Khurshid and Willem Visser. Generalized symbolic execution formodel checking and testing. In TACAS ’03: Proceedings of the 9th International Conferenceon Tools and Algorithms for the Construction and Analysis of Systems, pages 553–568, Warsaw,Poland, 2003. Springer-Verlag.

Jr. Sheldon B. Akers. On a theory of boolean functions. Journal of the Society for Industrial andApplied Mathematics, 7(4):487–498, 1959. doi: 10.1137/0107041.

Hema Srikanth and Laurie Williams. On the economics of requirements-based test case priori-tization. In EDSER ’05: Proceedings of the 7th international workshop on Economics-drivensoftware engineering research, pages 1–3, New York, NY, USA, 2005. ACM Press. ISBN 1-59593-118-X. doi: 10.1145/1083091.1083100.

T. Srivatanakul, J. A. Clark, S. Stepney, and F. Polack. Challenging formal specifications by mu-tation: a CSP security example. In Tenth Asia-Pacific Software Engineering Conference, pages340–350, 2003.

Li Tan, Oleg Sokolsky, and Insup Lee. Specification-based testing with linear temporal logic. InProceedings of IEEE International Conference on Information Reuse and Integration (IRI’04),pages 493–498, 2004.

Jeffrey M. Thompson, Mats P. E. Heimdahl, and Steven P. Miller. Specification-based prototypingfor embedded systems. In ESEC/FSE-7: Proceedings of the 7th European software engineeringconference held jointly with the 7th ACM SIGSOFT international symposium on Foundations ofsoftware engineering, pages 163–179, London, UK, 1999. Springer-Verlag. ISBN 3-540-66538-2. doi: 10.1145/318773.318940.

Jan Tretmans. Conformance Testing with Labelled Transition Systems: Implementation Relationsand Test Generation. Computer Networks and ISDN Systems, 29(1):49–79, 1996. doi: 10.1016/

S0169-7552(96)00017-7.

Jan Tretmans. Testing Concurrent Systems: A Formal Approach. In J.C.M Baeten and S. Mauw,editors, CONCUR’99 – 10th Int. Conference on Concurrency Theory, volume 1664 of LectureNotes in Computer Science, pages 46–65. Springer-Verlag, 1999.

Moshe Y. Vardi and Pierre Wolper. An automata-theoretic approach to automatic program verifi-cation (preliminary report). In Proceedings of the 1st IEEE Symposium on Logic in ComputerScience (LICS’86), pages 332–344. IEEE Computer Society, June 1986.

Sergiy A. Vilkomir and Jonathan P. Bowen. Reinforced Condition/Decision Coverage (RC/DC): ANew Criterion for Software Testing. In ZB ’02: Proceedings of the 2nd International Conferenceof B and Z Users on Formal Specification and Development in Z and B, pages 291–308, London,UK, 2002. Springer-Verlag. ISBN 3-540-43166-7.

Bibliography 223

Willem Visser, Klaus Havelund, Guillaume Brat, and SeungJoon Park. Model checking programs.In ASE ’00: Proceedings of the 15th IEEE international conference on Automated software engi-neering, page 3, Washington, DC, USA, 2000. IEEE Computer Society. ISBN 0-7695-0710-7.

Willem Visser, Corina S. Pasareanu, and Sarfraz Khurshid. Test input generation with javaPathFinder. In ISSTA ’04: Proceedings of the 2004 ACM SIGSOFT International Symposiumon Software Testing and Analysis, pages 97–107, New York, NY, USA, 2004. ACM Press. ISBN1-58113-820-2. doi: 10.1145/1007512.1007526.

Willem Visser, Corina S. Pasareanu, and Radek Pelánek. Test input generation for java contain-ers using state matching. In ISSTA ’06: Proceedings of the 2006 international symposium onSoftware testing and analysis, pages 37–48, New York, NY, USA, 2006. ACM Press. ISBN1-59593-263-1. doi: 10.1145/1146238.1146243.

Michael W. Whalen, Ajitha Rajan, Mats P.E. Heimdahl, and Steven P. Miller. Coverage metricsfor requirements-based testing. In ISSTA’06: Proceedings of the 2006 International Symposiumon Software Testing and Analysis, pages 25–36, New York, NY, USA, 2006. ACM Press. ISBN1-59593-263-1.

Duminda Wijesekera, Lingya Sun, Paul Ammann, and Gordon Fraser. Relating counterexamplesto test cases in CTL model checking specifications. In A-MOST ’07: Proceedings of the 3rdinternational workshop on Advances in model-based testing, pages 75–84, New York, NY, USA,2007. ACM Press. ISBN 978-1-59593-850-3. doi: 10.1145/1291535.1291543.

W. Eric Wong, Joseph R. Horgan, Saul London, and Aditya P. Mathur. Effect of test set minimizationon fault detection effectiveness. In ICSE ’95: Proceedings of the 17th international conferenceon Software engineering, pages 41–50. ACM Press, 1995. ISBN 0-89791-708-1. doi: 10.1145/

225014.225018.

Lihua Xu, Marcio Dias, and Debra Richardson. Generating regression tests via model checking.In COMPSAC ’04: Proceedings of the 28th Annual International Computer Software and Appli-cations Conference (COMPSAC’04), pages 336–341, Washington, DC, USA, 2004. IEEE Com-puter Society. ISBN 0-7695-2209-2-1.

Andreas Zeller. Isolating cause-effect chains from computer programs. In SIGSOFT ’02/FSE-10:Proceedings of the 10th ACM SIGSOFT symposium on Foundations of software engineering,pages 1–10, New York, NY, USA, 2002. ACM Press. ISBN 1-58113-514-9. doi: 10.1145/

587051.587053.

Hongwei Zeng, Huaikou Miao, and Jing Liu. Specification-based test generation and optimiza-tion using model checking. Proceedings of the First Joint IEEE/IFIP Symposium on TheoreticalAspects of Software Engineering (TASE’07), 0:349–355, 2007. doi: 10.1109/TASE.2007.46.

Hao Zhong, Lu Zhang, and Hong Mei. An experimental comparison of four test suite reductiontechniques. In ICSE ’06: Proceeding of the 28th international conference on Software engi-neering, pages 636–640, New York, NY, USA, 2006. ACM Press. ISBN 1-59593-375-1. doi:10.1145/1134285.1134380.

Documents

Automated Software Testing with Model Checkers