11
On the effectiveness of classification trees for test case construction q T.Y. Chen * , P.L. Poon Department of Computer Science, University of Melbourne, Parkville 3052, Australia Received 21 April 1998; accepted 26 August 1998 Abstract The notion of the classification-hierarchy table and the classification-tree construction algorithm provide a systematic approach to the construction of classification trees from given sets of classifications and their associated classes. Using classification trees, the set of all possible test cases can be constructed from functional specifications. This paper extends their study by introducing a metric to measure the effectiveness of a classification tree with respect to the construction of test cases, and providing ways to improve this effectiveness. q 1998 Elsevier Science B.V. All rights reserved. Keywords: Classification-hierarchy table; Classification-tree method; Software testing; Specification-based testing; Test case selection 1. Introduction Various people [1–3] have argued that the expense of software testing is often underestimated although it may account for up to 50% of the project cost. Because testing is an expensive process, its effectiveness should be improved by means of systematic planning, execution and monitoring. As it is well-known that the construction of the set of all possible test cases S, affects the comprehensiveness and, hence, the quality of the test [4–6], numerous researchers have developed their own methods for this construction process. Two approaches to this construction are the black box and the white box testing. For the former approach, most techniques are developed for formal specifications only [7– 9], despite the fact that many real-life specifications are written in an informal way. On the other hand, the few techniques developed for both formal and informal specifications include the classifica- tion-tree method [5], the category-partition method [6] and the decision-table method [10]. The classification-tree method is an extension of the category-partition method. Also, the decision-table method can be regarded as a degen- erated form of the classification-tree method. With respect to the generation of S, the classification-tree method is more effective. Readers may refer to [11] for a detailed comparison between these methods. In general, the classification-tree method developed by Grochtmann and Grimm [5], helps construct S from func- tional specifications (hereafter referred to as the ‘specifica- tions’), via the construction of classification trees. However, their tree construction method is rather ad hoc, resulting in the variation of classification trees constructed from the same specification according to the tester’s own expertise and experience. This problem is subsequently solved by Chen and Poon [12,13] using the notion of classification-hierarchy table (which captures the hierarchical relation for each pair of distinct classifications), and a method for the construction of classification trees from given sets of classifications and their associated classes. However, their tree construction methodol- ogy does not take into account the effectiveness of the classi- fication trees, with respect to the construction of test cases. This paper attempts to solve the above problem by: (i) defining a metric to measure the effectiveness of a classifi- cation tree; (ii) identifying some relationships between a classification tree and this effectiveness metric; and (iii) providing ways to improve this effectiveness metric of a classification tree. The rest of this paper is structured as follows. Section 2 gives an overview of the classification-tree method. Section 3 introduces an effectiveness metric of a classification tree and its underlying rationale. Section 4 describes some struc- tures of a classification tree which could lead to a poor effectiveness metric. Section 5 presents two restructuring Information and Software Technology 40 (1998) 765–775 INFSOF 3965 0950-5849/98/$ - see front matter q 1998 Elsevier Science B.V. All rights reserved. PII: S0950-5849(98)00107-4 Corresponding author. Tel.: 61 3 9287 9101; fax: 61 3 9348 1184; e-mail: [email protected] q This project was partially supported by a research grant from the Australian Research Council.

On the effectiveness of classification trees for test case construction

  • Upload
    ty-chen

  • View
    218

  • Download
    2

Embed Size (px)

Citation preview

On the effectiveness of classification trees for test case constructionq

T.Y. Chen*, P.L. Poon

Department of Computer Science, University of Melbourne, Parkville 3052, Australia

Received 21 April 1998; accepted 26 August 1998

Abstract

The notion of the classification-hierarchy table and the classification-tree construction algorithm provide a systematic approach to theconstruction of classification trees from given sets of classifications and their associated classes. Using classification trees, the set of allpossible test cases can be constructed from functional specifications. This paper extends their study by introducing a metric to measure theeffectiveness of a classification tree with respect to the construction of test cases, and providing ways to improve this effectiveness.q 1998Elsevier Science B.V. All rights reserved.

Keywords:Classification-hierarchy table; Classification-tree method; Software testing; Specification-based testing; Test case selection

1. Introduction

Various people [1–3] have argued that the expense ofsoftware testingis often underestimated although it mayaccount for up to 50% of the project cost. Because testingis an expensive process, its effectiveness should beimproved by means of systematic planning, execution andmonitoring.

As it is well-known that the construction of the set of allpossibletest casesS, affects the comprehensiveness and,hence, the quality of the test [4–6], numerous researchershave developed their own methods for this constructionprocess. Two approaches to this construction are theblackboxand thewhite box testing. For the former approach, mosttechniques are developed for formal specifications only [7–9], despite the fact that many real-life specifications arewritten in an informal way.

On the other hand, the few techniques developed for bothformal and informal specifications include theclassifica-tion-tree method[5], the category-partition method[6]and thedecision-table method[10]. The classification-treemethod is an extension of the category-partition method.Also, the decision-table method can be regarded as a degen-erated form of the classification-tree method. With respectto the generation ofS, the classification-tree method is

more effective. Readers may refer to [11] for a detailedcomparison between these methods.

In general, the classification-tree method developed byGrochtmann and Grimm [5], helps constructS from func-tional specifications(hereafter referred to as the ‘specifica-tions’), via the construction of classification trees. However,their tree construction method is rather ad hoc, resulting inthe variation of classification trees constructed from thesame specification according to the tester’s own expertiseand experience.

This problem is subsequently solved by Chen and Poon[12,13] using the notion ofclassification-hierarchy table(which captures the hierarchical relation for each pair ofdistinct classifications), and a method for the construction ofclassification trees from given sets of classifications and theirassociated classes. However, their tree construction methodol-ogy does not take into account the effectiveness of the classi-fication trees, with respect to the construction of test cases.

This paper attempts to solve the above problem by: (i)defining a metric to measure the effectiveness of a classifi-cation tree; (ii) identifying some relationships between aclassification tree and this effectiveness metric; and (iii)providing ways to improve this effectiveness metric of aclassification tree.

The rest of this paper is structured as follows. Section 2gives an overview of the classification-tree method. Section3 introduces an effectiveness metric of a classification treeand its underlying rationale. Section 4 describes some struc-tures of a classification tree which could lead to a pooreffectiveness metric. Section 5 presents two restructuring

Information and Software Technology 40 (1998) 765–775

INFSOF 3965

0950-5849/98/$ - see front matterq 1998 Elsevier Science B.V. All rights reserved.PII: S0950-5849(98)00107-4

Corresponding author. Tel.: 61 3 9287 9101; fax: 61 3 9348 1184; e-mail:[email protected]

q This project was partially supported by a research grant from theAustralian Research Council.

algorithms for a classification tree so as to improve itseffectiveness metric. Finally, Section 6 concludes the paper.

2. An overview of the classification-tree method

A classification tree,T, is a graphical representation ofthe hierarchical relation between various classifications.The following describes the major steps of the classifica-tion-tree method:

1. From the specification, identify allclassifications(defined as the different criteria for partitioning theinput domainof the program to be tested) and their asso-ciatedclasses(defined as the disjoint subsets of valuesfor each classification).

2. From the sets of classifications and their associatedclasses, constructT.

3. FromT, construct the correspondingcombination table.4. From the combination table, identify all the feasible

combinations of classes. Each combination of classesrepresents onepotential test case.

Example 1 illustrates this method.

Example 1. Suppose we have program PGM, with itsspecification as shown below, for testing:

1. PGM has six input variables, namely,A, B, C, D, E andF.2. Each variableA andC has three possible values (e.g.a1,

a2 anda3 for A), whereas each remaining variable has twopossible values (e.g.b1 andb2 for B).

3. The input domain of PGM may contain any combinationof possible values from some of these variables exceptthe following:

i. (A is a1 or a3) and (E is e1 or e2).ii. (A is a2 or a3) and (D is d1 or d2).iii. ( A is a2) and (C is c3).iv. (A is a3) and (B is b2).v. (B is b1) and (F is f1 or f2).vi. (C is c1 or c2) and (E is e1 or e2).vii. (C is c2 or c3) and (D is d1 or d2).viii. ( D is d1 or d2) and (E is e1 or e2).

Suppose all of the classifications and classes for PGM aresimply defined as the input variables and their possiblevalues, respectively. Thus, for example,A is taken as aclassification witha1, a2 anda3 as its three associated classes(throughout this paper, classifications and classes aredenoted by capital and small letters, respectively).Obviously, with these classifications and their associatedclasses, one approach to the construction ofSPGM (whichdenotesS for PGM) is to select and combine one class ofeach classification, so that each combination of classesrepresents one test case. Using this approach, the size ofSPGM will be 144 (� 32 × 24). However, a drawback ofthis approach is the possible occurrence of many invalidcombinations of classes (e.g.A is a1 andE is e1), as contra-dictions to the constraints between variables stated in (3) ofthe specification of PGM.

Basically, the classification-tree method aims at reducingthe number of invalid combinations of classes, via the enfor-cement of constraints between classifications/classes inT.Suppose a classification treeTPGM for PGM is constructedas depicted in Fig. 1, either on an ad hoc basis or using Chenand Poon’s methodology [12, 13].

In Fig. 1, the small circle at the top ofTPGM represents theinput domain of PGM and is called thegeneral root node.The classificationsA, B and C directly under the generalroot node are calledtop level classifications. Also, every

T.Y. Chen, P.L. Poon / Information and Software Technology 40 (1998) 765–775766

Fig. 1. TPGM and part of its combination table.

classification inTPGM is theparent classificationof thoseclasses (known as itschild classes) which are directly underit. Similarly, every class inTPGM is theparent classof thoseclassifications (known as itschild classifications) which aredirectly under it. For example,A is the parent classificationof a1, a2 anda3, which in turn are the child classes ofA; b2 isthe parent class ofF, which in turn is the child classificationof b2. In TPGM every parent class has only one child classi-fication. However, in some otherT’s, a parent class mayhave two or more child classifications (refer to [5, 12, 13]for details).

In Fig. 1, each column of the combination table corre-sponds exactly to a terminal node (which is a class) ofTPGM. With the combination table, the construction rulestates that a test case can be formed by selecting a combina-tion of classes inTPGM via the following recursive process:

1. One and only one child class of each top level classifica-tion is selected.

2. For every selected class, one and only one child class ofeach child classification is selected.

A test case constructed in this way is known as a ‘poten-tial test case’. For example, row 1 of the combination tablein Fig. 1 represents a potential test case whereA is a1, B isb1, C is c1 andD is d1. We usePi (i > 1) to denote a path inTPGM. For example,P9 denotes the pathC–c1–D–d1, inTPGM. Thus, the potential test case corresponding to row1 is formed by combining all the classes which lie onP1,P6

andP9. Only part of the combination table is shown in Fig.1. The complete combination table contains a total of 75potential test cases. When compared to the 144 test caseswhich would have been constructed simply by selecting andcombining one class of each classification, 69 test cases withinvalid combinations of classes are effectively filtered outbyTPGM. For example, inTPGM, it is impossible to combineb1 andf1 [which is invalid according to (3.v) of the specifi-cation of PGM] to form part of a potential test case, accord-ing to the construction rule of potential test cases.

3. The effectiveness of a classification tree

Occasionally, not all the constraints between classifica-tions can be enforced inT. Therefore, all the potential testcases constructed from the combination table have to bevalidated against the specification to ensure their consis-tency with the specification. Those found to be inconsistentwith the specification are referred to asillegitimate testcases, which should be disregarded for testing. The remain-ing ones are referred to aslegitimate test cases. Example 2illustrates this concept.

Example 2. The following are two potential test casesconstructed fromTPGM:

• a1, b2, c1, d1, d2, f1 (formed by combiningP1, P7 andP10).

• a2, b2, c3, e1, f1 (formed by combiningP3, P7 andP12).

These two potential test cases are illegitimate, becausethe first one contains bothd1 and d2 (which should bedisjointed), and the second one contains botha2 and c3

[which contradicts the constraint (3.iii) of the specificationof PGM]. In fact, out of the 75 potential test casesconstructed fromTPGM, 68 are illegitimate. Only theseven potential test cases listed below, are legitimate:

• a1, b1, c1, d1 (by combiningP1, P6 andP9).• a1, b2, c1, d1, f1 (by combiningP1, P7 andP9).• a1, b2, c1, d1, f2 (by combiningP1, P8 andP9).• a1, b1, c1, d2 (by combiningP2, P6 andP10).• a1, b2, c1, d2, f1 (by combiningP2, P7 andP10).• a1, b2, c1, d2, f2 (by combiningP2, P8 andP10).• a3, b1, c2 (by combiningP5, P6 andP11).

We wish to point out that we are actually interested in thelegitimate test cases, andT merely serves as a means toconstruct them. Based on this rationale, we define a metricET (it is first introduced in [14]) to measure the effective-ness ofT. This effectiveness metric is defined as:

ET � NlT

NpT

�1�

whereNlT andNp

T denote the total number of legitimate testcases and the total number of potential test cases constructedfrom T, respectively.

Obviously, we have 0, ET < 1 as 0, NlT < Np

T. Thevalue ofNl

T can only be known after the identification of allillegitimate test cases from the potential test cases.However,Np

T can be directly computed fromT using thefollowing equations.

Equations for computingNpT:

SupposeT hask (k > 1) top level classifications denotedasL1,…,Lk. Let ti denote the subtree (inT) with Li (1 < i <k) as its root, andN(ti) denote the total number of possiblecombinations of classes forti. Hence,Np

T is defined as:

NpT �

Yk

i�1

N�ti� �2�

whereN(ti) is defined by the following recursive definitions.Let SX

tiandSx

tidenote the subtrees (inti) with classifica-

tion X and classx as their roots, respectively. Also, letN�SXti�

andN�Sxti� represent the total number of possible combina-

tions of classes forSXti

and Sxti

, respectively. Obviously,ti � SLi

ti, andN�ti� � N�SLi

ti�.

• For N�SXti�:

SupposeX hasm child classes in whichm1 are non-terminal classes (denoted asxj’s, 1< j < m1) andm2 are

T.Y. Chen, P.L. Poon / Information and Software Technology 40 (1998) 765–775 767

terminal classes (notem� m1 1 m2). Then,

N�SXti� � m2 1

Xm1

j�1

N�Sxjti � �3�

• For N�Sxti�:

Supposex has n child classifications denoted asYj’s(1 # j # n). Then,

N�Sxti� �

Yn

j�1

N�SYjti � �4�

Eqs. (2)–(4) show thatNpT depends on the number of

terminal nodes and the structure ofT. Example 3 illustrateshow to calculateNp

TPGMandETPGM

.

Example 3. Refer toTPGM in Fig. 1. Let t1, t2 and t3

denote the subtrees with classificationsA, B andC as theirroots, respectively. Thus,

N�SDt1� � 2

N�SEt1� � 2

N�Sa1t1� � N�SD

t1� � 2

N�Sa2t1� � N�SE

t1� � 2

N�SAt1� � 1 1 N�Sa1

t1�1 N�Sa2

t1� � 1 1 2 1 2� 5

Similarly, N�SBt2� and N�SC

t3� are calculated as 3 and 5,

respectively. Hence,

NpTPGM

� N�t1� × N�t2� × N�t3� � N�SAt1� × N�SB

t2� × N�SC

t3�

� 5 × 3 × 5� 75

FromEq. (1),ETPGMiscalculated as0.09. [� (75268)/75].

This small value ofETPGM) means thatTPGM is very

ineffective in generating legitimate test cases. It shouldbe noted that we deliberately choseTPGM (with a smallvalue ofETPGM

) as an example to illustrate our restructur-ing algorithms to be presented later.

Obviously, a small value ofET is undesirable as moreeffort is required in identifying illegitimate test cases fromthe potential test cases. Since this identification process isstill performed manually, it is also more likely that mistakeswill be made. This may affect the completeness of the set oflegitimate test cases (if some of them are mistakenly classi-fied as illegitimate and are consequently ignored) and it mayin turn affect the comprehensiveness of the testing. There-fore, it is highly desirable to develop ways of improving thevalue ofET.

4. Some characteristics ofETT

Obviously, before we can develop ways to improve theeffectiveness ofT, we need to know those features ofTthat may result in a small value ofET. An obvious approachis to examine all the 68 illegitimate test cases constructedfrom TPGM. Our examination of them reveals the followingtypes of illegitimacy:

1. Coexistence ofd1 andd2.2. Coexistence ofe1 ande2.3. Coexistence of (d1 or d2) and (e1 or e2).4. Coexistence of (d1 or d2) and (a2, a3, c2 or c3).5. Coexistence of (e1 or e2) and (a1, a3, c1 or c2).6. Coexistence ofa2 andc3.7. Coexistence ofa3 andb2.

Each of these results from some constraints mentioned inthe specification of PGM which are not enforced inTPGM.In fact, all the above types of illegitimacy can be groupedinto two categories depending on the causes of their occur-rences: (i) existence of duplicated classifications underdifferent Li’s; and (ii) existence of ‘partially dependent’classifications. We further explain these two categories inthe following subsections.

4.1. Existence of duplicated classifications under differentLi’s

4.1.1. Illegitimacy types 1 and 2Refer toTPGM in Fig. 1. Resulting from the duplication

of D under bothA andC, it is possible to selectd1 andd2

from:

• P1 andP10, respectively; or• P9 andP2, respectively.

This results in the coexistence ofd1 and d2, leading toillegitimacy type 1. Similarly, the duplication ofE underbothA andC allows the selection of bothe1 ande2, leadingto illegitimacy type 2.

Intuitively, the number of illegitimate test cases willbecome larger when the number of duplicated classifica-tions or subtrees (with more than one classification) underdifferent Li’s grows.

4.1.2. Illegitimacy types 3, 4 and 5With SD

t1, SE

t1, SD

t3andSE

t3in TPGM, the following invalid

combinations of classes would be constructed:

• d1 ande1 [by selecting (P1 andP12) or (P9 andP3)].• d1 ande2 [by selecting (P1 andP13) or (P9 andP4)].• d2 ande1 [by selecting (P2 andP12) or (P10 andP3)].• d2 ande2 [by selecting (P2 andP13) or (P10 andP4)].

This results in illegitimacy type 3, as a violation of theconstraint (3.viii) of the specification of PGM. In fact, theduplications ofD and E under A and C also lead to theviolation of the constraints (3.i), (3.ii), (3.vi) and (3.vii) of

T.Y. Chen, P.L. Poon / Information and Software Technology 40 (1998) 765–775768

the specification. For these constraints, (3.ii) and (3.vii)correspond to illegitimacy type 4, whereas (3.i) and (3.vi)correspond to illegitimacy type 5.

4.2. Existence of partially dependent classifications

4.2.1. Illegitimacy types 6 and 7SupposeX andYare two distinct classifications. ThenX is

said to be ‘partially dependent’ onY if all of the followingconditions are met:

1. There exist a classx of X, distinct classesyi andyj (wherei,j $ 1) of Y such thatx andyi may coexist, andx andyj

cannot coexist.2. There does not exist a classx0 of X such thatx0 cannot

coexist with any class ofY.3. There does not exist a classy of Y such thaty cannot

coexist with any class ofX.

With the above relationship,X will not be placed underYin T, and vice versa. Also,x andyj are said to be ‘mutuallyexclusive’. With respect to this hierarchical relation, wehave the following proposition.

Proposition 1. If X is partially dependent onY, then Ymust also be partially dependent onX.

Proof. SupposeX is partially dependent onY. Conditions(1) and (3) imply that there must be a classx00 of X such thatx00 may coexist withyj of Y. Obviously,x andx00 are differ-ent. This relationship betweenx00 and yj, together withConditions (2) and (3), imply thatYmust be partially depen-dent onX.A

Example 4. Refer to the specification of PGM in Example1:

• WhenA is a2, C may bec1 or c2, but notc3.• There does not exist a class ofA such that it cannot

coexist with any class ofC.• There does not exist a class ofC such that it cannot

coexist with any class ofA.

Hence,A is partially dependent onC, and classesa2 andc3

are mutually exclusive. By Proposition 1,C is also partiallydependent onA. Caused by similar reasons,A and B arepartially dependent on each other, and classesa3 and b2

are mutually exclusive.

In fact, illegitimacy types 6 and 7 result from the occur-rence of the partially dependent classificationsA andC, andclassificationsA andB, respectively. More specifically, allthe illegitimate test cases of type 6 contain the mutually

exclusive classesa2 andc3, whereas those of type 7 containthe mutually exclusive classesa3 andb2.

5. Our restructuring techniques

Obviously, simply knowingET is not good enough. Amore important issue is how to improveET. The followingsubsections provide two algorithms for restructuringT inorder to improveET.

5.1. For duplicated classifications under different Li’s

As can be seen in Section 4.1, the duplications ofD andEunderA as well as underC (in TPGM) result in the cases ofillegitimacy types 1–5. An obvious approach to thisproblem is to discard allD’s and all E’s from TPGM. Inpractice, this approach does not work as it will affect thecompleteness of the set of legitimate test cases. For exam-ple, if SD

t1, SE

t1, SD

t3andSE

t3are deleted fromTPGM in Fig. 1,

then six out of the seven legitimate test cases listed inSection 3 will not be constructed.

To eliminate the occurrence of illegitimate test casesresulting from the duplication of classifications under differ-ent Li’s, we propose the following algorithm:

Tree restructuring algorithm for duplicated classifications(remove_duplicate ):

Refer to the notation used in Eqs. (2)–(4). Suppose thereexists a subtreeSX

ti(1 # i # k andk $ 2) ofT such thatx is

a child class of classificationX, andx hasY as one of itschild classifications. Also, suppose there exists a subtreeSY

tj

(1 # j # k and j ± i) of T. Note thatSYti� SY

tj. If SY

tiand

SYtj

occur only once inti andtj, respectively, thenT can berestructured as shown below.

1. Initially set the treeY1 � SXti

. Then, pruneSYY1

from Y1.2. Construct the treeY2 with X as its root,x as the unique

child class ofX, and SYti

as the unique subtree ofx(denoted asX–x–SY

ti).

3. ReplaceSYtj

with Y2.

4. ReplaceSXti

according to one of the following steps:

i. If Y1� X–x, replaceSXti

with a null tree (equivalentto pruningSX

tifrom ti).

ii. If Y1 ± X–x andx is a terminal node ofY1, deletexfrom Y1 and then replaceSX

tiwith the newly formed

Y1.iii. If Y1 ± X–x and x is not a terminal node ofY1,

replaceSXti

with Y1.

In remove_duplicate , it should be noted that:

• Step (4.i) will be performed ifx is the unique child classof X andY is the unique child classification ofx, in ti ofT.

• Step (4.ii) will be performed ifx is not the unique child

T.Y. Chen, P.L. Poon / Information and Software Technology 40 (1998) 765–775 769

class ofX andY is the unique child classification ofx, inti of T.

• Step (4.iii) will be performed ifY is not the unique childclassification ofx, in ti of T.

LetT 0 denote the classification tree after restructuringTusing remove_duplicate . The following are the twoimportant properties ofremove_duplicate :

1. NpT 0 # Np

T.2. All legitimate test cases which can be constructed from

T can still be constructed fromT 0, after reformatting allthe relevant potential test cases (hence,Nl

T 0 � NlT).

Now, let us prove the first property and defer the proof ofthe second property until after the presentation of the refor-matting algorithm.

Proposition 2. NpT 0 # Np

T.

Proof. Refer toremove_duplicate . Without loss ofgenerality, let us assume thatk� 2 (i.e.T has two top levelclassifications). Thus,Np

T � N�ti� × N�tj�.

Obviously, N�Y2� � N�SYti� [hence, N�Y2� � N�SY

tj�].

Since: (i)Y1 is initially set toSXti; and (ii) SY

Y1is then pruned

fromY1,Y1 is in fact a subtree ofSXti

. Thus,N�Y1� # N�SXti�.

Let ti0 and tj

0 be ti and tj, respectively, after the appli-cation of remove_duplicate on T. SinceN�Y2� � N�SY

tj�, therefore,N(tj

0) � N(tj). Since we have

N�Y1� # N�SXti�, therefore,N(ti

0) # N(ti). Thus, it followsfrom Eq. (2) thatNp

t 0 # Npt .A

Now, let us illustrate how to applyremove_dupli-cate .

Example 5. Refer toTPGM in Fig. 1. Again, lett1, t2 andt3 denote the subtrees withA, B andC as their roots, respec-tively. It should be noted thatSD

t1andSD

t3occur only once in

t1 and t3, respectively. The following is the sequence ofrestructuring applied toTPGM with the initialization ofY1

to SAt1

in step (1) ofremove_duplicate :

1. Initially setY1 � SAt1

. Then, pruneSDY1

from Y1.2. SetY2 to A–a1–SD

t1.

3. ReplaceSDt3

with Y2.4. Deletea1 from Y1 and then replaceSA

t1with the newly

T.Y. Chen, P.L. Poon / Information and Software Technology 40 (1998) 765–775770

Fig. 2. Construction sequence ofT 0PGM.

formedY1 (becauseY1 ± A–a1 anda1 is a terminal nodeof Y1).

The above restructuring steps and the newly formed clas-sification treeT 0

PGM are depicted in Fig. 2.However, a different classification tree may be produced

if Y1 is initialized asSCt3

instead ofSAt1

in step (1) ofremo-ve_duplicate . In this case, the new sequence of restruc-turing would then be:

1. Initially setY1 � SCt3

. Then, pruneSDY1

from Y1.2. SetY2 to C–c1–SD

t3.

3. ReplaceSDt1

with Y2.4. Deletec1 from Y1 and then replaceSC

t3with the newly

formedY1 (becauseY1 ± C–c1 andc1 is a terminal nodeof Y1).

Fig. 3 depicts the classification treeT 00PGM formed using

the above sequence of restructuring.

Example 5 shows thatremove_duplicate is non-deterministic because of the various ways of initializingY1 in its step (1). Using Eqs. (2)–(4),Np

T 0PGM

and NpT 00

PGM

are both calculated as 45. Thus, eitherT 0PGM or T 00

PGM

can be used for constructing the set of all potential testcases. However, in some situations where the restructuredclassification trees (formed by different initialization ofY1

in remove_duplicate ) may have differentNp’s, thenthe restructured classification tree with the smallestNp

should be chosen.For Example 5, supposeT 0

PGM in Fig. 2 is chosen forconstructing the set of all potential test cases. Obviously, itis quite straightforward to construct the correspondingcombination table from which all potential test cases canbe identified.

A close examination of all the 45 potential test casesconstructed fromT 0

PGM, reveals that some illegitimatetest cases occur because of the coexistence of (a1 and a2)

or (a1 anda3). These types of illegitimacy are not inducedfrom the combination table ofTPGM in Fig. 1 becausea1, a2

anda3 are under the same top level classificationA beforerestructuring. Therefore, only one of them can be selected ata time to form part of a potential test case. However,a1 and(a2 or a3) are now separately placed under two different toplevel classificationsC and A of T 0

PGM, respectively. Thismakes the selection of (a1 anda2) or (a1 anda3) possible. Infact, for the seven legitimate test cases constructed fromTPGM, only one [i.e. the legitimate test case (a3, b1, c2)]of them can be constructed fromT 0

PGM. The remainingsix legitimate test cases are converted into illegitimateones because of the coexistence of (a1 anda2) or (a1 anda3).

In view of this type of illegitimacy caused by restructur-ing of T, some potential test cases constructed fromT 0

should be reformatted using the following algorithm (whichis an extended version of the one in [14]) before the valida-tion of their legitimacy against the specification:

Test case reformatting algorithm (reformat_test_case ):

Refer toremove_duplicate . Again, letT andT 0

denote the classification tree before and after being restruc-tured, respectively. If the condition in step (4.ii) ofremo-ve_duplicate is satisfied (thus,SX

tiin T is replaced with

a modifiedY1 which does not containx),then:

1. Letum (m $ 1) be a child class ofLi in T such thatum isnot anancestor class(the ancestor classes of a classifica-tion H is defined as all the classes aboveH in T) of Y inSYti

(of T), andvn (n $ 1) be a child class ofX in SXti

(ofT) such thatvn ± x. WhenLi � X, um� vn. Construct aset of classesg which contains all the classes appearingin everySum

tiand everySvn

ti, of T.

2. For any potential test case constructed fromT 0 whichcontains classes fromY2 in tj

0 of T 0, reformat this poten-tial test case by deleting all the classes ing from it.elseNo reformatting is required.

Let us illustrate how to applyreformat_test_case .

Example 6. Refer to Figs. 1 and 2. Noteti, tj, X andx inremove_duplicate correspond tot1, t3, A and a1 ofTPGM in Fig. 1. It can be seen from Fig. 2 thatSA

t1of

TPGM is replaced with a modifiedY1 in step (4.ii) ofremo-ve_duplicate . Additionally, 18 potential test cases [e.g.the potential test case (a1, a2, b1, c1, d1, e1)] contain one ormore classes (a1, d1 andd2) selected fromY2 in t3

0 of T 0PGM.

Therefore, these potential test cases should be reformattedas follows:

1. In this case,um� vn asX � Li. The twoum’s (or vn’s) instep (1) ofreformat_test_case correspond toa2

anda3 of TPGM in Fig. 1. Thus,g � { a2, a3, e1, e2}.

T.Y. Chen, P.L. Poon / Information and Software Technology 40 (1998) 765–775 771

Fig. 3. T 00PGM after restructuring.

2. Deletea2, a3, e1 ande2 (if any) from the 18 potential testcases which contain one or more classes selected fromY2

in t30 of T 0

PGM. For example, one of these potential testcases (a1, a2, b1, c1, d1, e1) becomes (a1, b1, c1, d1) after thedeletion process.

After the application ofreformat_test_case , allpotential test cases should then be validated against thespecification. After the removal of all illegitimate testcases, seven legitimate test cases remain (same as thoselisted in Section 3). From Eq. (1),ET 0

PGMis calculated as

0.16 (� 7=45). Compared withETPGM� 0:09, the improve-

ment is about 78%.For each potential test case constructed fromT, there is a

corresponding set of paths inT. Let Ftidenote the set of

those paths appearing inti of T. For example, inTPGM ofFig. 1, {A–a1–D–d1} is a Ft1

(denoted asF1). In thisexample,F1, contains only one path becauseD is theunique child classification ofa1. If, however,a1 has morethan one child classification, thenF1 would contain morethan one path.

Now, we are ready to prove the second property, i.e. withthe use of remove_duplicate and reformat_-test_case , all legitimate test cases constructed fromTare preserved inT 0.

Proposition 3. All legitimate test cases which can beconstructed fromT can still be constructed fromT 0,after the application ofreformat_test_case (i.e.NlT 0 � Nl

T).

Proof. Again, let us assume thatk � 2, and the twosubtrees (inT) with Li andLj (1 < i,j < 2 and i ± j) astheir roots areti andtj, respectively. If aFti

contains anyclass of SY

ti, then it is denoted asF1

ti. Otherwise, it is

denoted asF2ti

.Since every potential test case is constructed by combin-

ing all the classes which appear in the paths of aFtiand a

Ftj, it must belong to one of the following types:

1. One formed by combining all the classes in the paths of aF1

tiand aF1

tj.

2. One formed by combining all the classes in the paths of aF1

tiand aF2

tj.

3. One formed by combining all the classes in the paths of aF2

tiand aF1

tj.

4. One formed by combining all the classes in the paths of aF2

tiand aF2

tj.

Potential test cases of type 1 may be legitimate or illegi-timate. To analyse the impact ofremove_duplicate onthe legitimate test cases of this type, we have to consider thefollowing situations:

• The condition of step (4.i) ofremove_duplicate issatisfied (thus,SX

tiin T is replaced with a null tree)

• If (X in ti of T) ± Li

Although subpaths (which contain classes inSXti

of T)of some elements ofF1

tiare deleted as a result of the

replacement ofSXti

in T with a null tree, these deletedsubpaths reappear inY2 which replacesSY

tjin T.

Thus, all the potential test cases (and hence the legit-imate test cases) of type 1 constructed fromT canstill be formed fromT 0 by combining all the classesappearing in the paths ofF2

ti0 andF1

tj0 .

• If (X in ti of T) � Li

It is obvious thatSXti� ti . Thus, the replacement ofSX

ti

in T with a null tree is equivalent to the pruning ofti

from T. After the pruning process, allF1ti’s are

removed andT 0 � tj0. However, all the potential

test cases (and hence, the legitimate test cases) oftype 1 constructed fromT can still be formed fromT 0, becausex and SY

tjare contained inY2 which

replacesSYtj

in T.

• The condition of step (4.ii) ofremove_duplicate issatisfied (thus,SX

tiin T is replaced with a modifiedY1

which does not containx)

As a result of the replacement ofSXti

in T with themodifiedY1, subpaths (which contain classes inSx

tiof

T) of some elements ofF1ti

will be deleted. However,these deleted subpaths appear inY2 which replacesSY

tj

in T. Also, sinceSxti

(and henceSYti

) does not appear inti0,F1

ti0 � Ø. As a result, all classes in the paths ofF1

tj0

have to be combined with all classes in the paths ofF2

ti0 to form a potential test case. However, any poten-

tial test case containing this combination is illegitimatebecause all the classes inSY

tj, (some of these classes

must exist in certain elements ofF1tj0) cannot coexist

with any class from anySum

ti0 andSvn

ti0 of T 0 (note some

classes ofSum

ti0 or Svn

ti0 must appear in some elements of

F2ti0 ). Obviously, any illegitimate test case of this type

can be converted into a legitimate one through theremoval of all the classes in everySum

ti�� Sum

ti0 � and

every Svnti�� Svn

ti0 � (i.e. steps (1) and (2) ofrefor-

mat_test_case ). Thus, no legitimate test casesconstructed fromT will be omitted fromT 0.

• The condition of step (4.iii) ofremove_duplicate issatisfied (thus,SX

tiin T is replaced withY1)

Although subpaths (which contain classes inSYti

of T) ofsome elements ofF1

tiare deleted because of the replace-

ment ofSXti

in T with Y1, these deleted subpaths reappear inY2 which replacesSY

tjin T. Therefore, all the potential test

cases (and hence, the legitimate test cases) of type 1constructed fromT are still being formed fromT 0 viathe combination of all the classes appearing in the pathsof F2

ti0 andF1

tj0 .

T.Y. Chen, P.L. Poon / Information and Software Technology 40 (1998) 765–775772

Potential test cases of type 2 are illegitimate becausesome classes appearing in the paths ofF2

tjcannot coexist

with any class appearing inSYtj

. For a similar reason, poten-tial test cases of type 3 are also illegitimate.

Finally, let us look at potential test cases of type 4.They remain unchanged after the restructuring processbecauseF2

tiandF2

tjare left intact, as seen inremove_

duplicate .Therefore, all legitimate test cases constructed fromT

can still be formed fromT 0 after the application ofreformat_test_case . As a result,Nl

T 0 � NlT.A

5.2. For partially dependent classifications in differentt i’s

In general, illegitimate test cases caused by a pair ofpartially dependent classifications are difficult, if not impos-sible, to remove, through restructuring ofT (with thepreservation of all legitimate test cases). However,ET canbe improved by restructuringT when the duplicated clas-sifications are under mutually exclusive classes of theirrespective parent classifications (which are partially depen-dent on each other). Under this special circumstance, thefollowing restructuring algorithm can be used to improveET:

Tree restructuring algorithm for partially dependent classi-fications (remove_partial_dependence ):

SupposeT hask (k $ 2) top level classifications denotedasL1,…,Lk; andti and tj (1 # i,j # k) denote the distinctsubtrees (inT) with Li andLj as their roots, respectively.Also, suppose there exist two identical subtreesSY

tiandSY

tjin

T.If: (i) SY

tiandSY

tjoccur only once inti andtj, respectively;

(ii) some ancestor classifications(the ancestor classifica-tions of a subtreeS or a classificationH, is defined as allthe classifications aboveSor H, inT, respectively)M andNof SY

tiandSY

tj, respectively, are partially dependent on each

other; and (iii)SYti

andSYtj

havem(a child class ofM) andn (achild class ofN) as one of their ancestor classes, respec-tively, such thatm and n cannot coexist (i.e.m and n aremutually exclusive classes), thenT can be restructured asshown below:

1. If m is the unique child class ofM, then pruneSMti

fromT. Otherwise, pruneSm

tifrom T.

2. If n is the unique child class ofN, then pruneSNtj

fromT.Otherwise, pruneSn

tjfrom T.

It should be noted thatti andtj in the restructured classi-fication tree cannot be null. Otherwise,T must be incorrectbecause no legitimate test cases can be constructed from it.

Let T 0 and T 000 denote the classification trees afterrestructuringT usingremove_duplicate andremo-ve_partial_dependence , respectively. We have thefollowing relationships:

• NpT 000 # Np

T 0 # NpT.

• All legitimate test cases which can be constructed fromT can also be constructed fromT 000 (i.e. Nl

T 000 � NlT).

Now, let us prove these relationships.

Proposition 4. NpT 000 # Np

T 0 # NpT.

Proof. Refer toremove_duplicate and remove_-partial_dependence . Suppose, without loss of gener-ality, thatk � 2, and the two top level classifications ofTareL1 andL2. Let t1 andt2 correspond toti andtj mentionedin these two algorithms, respectively;t1

0 andt20 be t1 and

t2, respectively, after restructuringT with remove_du-plicate ; andt1

000 andt2000 bet1 andt2, respectively, after

restructuringT with remove_partial_dependence .Also, suppose that the parent classes ofSY

t1andSY

t2in T

arex andw, respectively, and that the parent classificationsof x andw areX andW, respectively. It should be noted thatX, x, W and w may beM, m, N and n, respectively, asmentioned inremove_partial_dependence .

SupposeT 0 is formed with the initialization ofY1 to SXt1

in step (1) ofremove_duplicate . Then t10 of T 0 is

formed by pruning eitherSXt1

, Sxt1

or SYt1

(corresponding tostep (4.i), (4.ii) or (4.iii) ofremove_duplicate , respec-tively) from t1 of T.

Now, let us considert1000 of T 000. The subtreet1

000 is t1

with SMt1

or Smt1

pruned. SinceM � X or M is an ancestorclassification ofX, thus, we have eithert1

000 � t10 or t1

000 is asubtree oft1

0. Therefore,N�t1000� # N�t1

0�.Now, considert2

0 andt2000. From the proof of Proposition

2, we know thatN�t20� � N�t2�. Sincet2

000 is t2 with SNt2

orSnt2

pruned, t2000 is in fact a subtree oft2 (hence,

N�t2000� # N�t2�). It follows thatN�t2

000� # N�t20�.

Since NpT 0 � N�t1

0� × N�t20� and Np

T 000 � N�t1000�×

N�t2000�, and we know that N�t1

000� # N�t10� and

N�t2000� # N�t2

0�, so we must haveNpT 000 # Np

T 0 . FromProposition 2,Np

T 000 # NpT 0 # Np

T immediately follows.AProposition 4 implies that whenever possible,remo-

ve_partial_dependence should be used instead ofremove_duplicate for improvingET.

Proposition 5. All legitimate test cases which can beconstructed fromT can also be constructed fromT 000

(i.e. NlT 000 # Nl

T).

Proof. We follow the proof of Proposition 4. For eachpotential test case constructed fromT, let G be its corre-sponding set of paths inT. Also, letGt1

andGt2denote the

sets of those paths appearing int1 andt2 of T, respectively.If a Gt1

containsm, then it is denoted asG1t1

. Otherwise, itis denoted asG2

t2. Similarly, if a Gt2

containsn, then it isdenoted asG1

t2. Otherwise, it is denoted asG2

t2.

Since every potential test case is constructed by combining

T.Y. Chen, P.L. Poon / Information and Software Technology 40 (1998) 765–775 773

all classes which appear in the paths of aGt1and aGt2

, itmust belong to one of the following types:

1. One formed by combining all the classes in the paths of aG1

t1and aG1

t2.

2. One formed by combining all the classes in the paths of aG1

t1and aG2

t2.

3. One formed by combining all the classes in the paths of aG2

t1and aG1

t2.

4. One formed by combining all the classes in the paths of aG2

t1and aG2

t2.

All the potential test cases of type 1 are illegitimatebecause they contain bothm and n, which are mutuallyexclusive classes. SinceSY

t1and SY

t2occur only once int1

and t2, respectively, Some classes in the paths ofG1t1

and

G2t1

cannot coexist with some classes in the paths ofG2t2

andG1

t2, respectively. Hence, all potential test cases of types 2

and 3 are also illegitimate.For potential test cases of type 4, they may be legitimate

or illegitimate. SinceG2t1

andG2t2

are invariants with respectto the application ofremove_partial_dependence ,therefore, all the legitimate test cases constructed fromTare retained inT 000 (i.e. Nl

T 000 � NlT).A

Example 7 illustrates the application ofremove_par-tial_dependence and the resultant set of legitimatetest cases.

Example 7. Refer toTPGM in Fig. 1. Let t1, t2 and t3

denote the subtrees withA, B andC as their roots, respec-tively. Since: (i)SE

t1andSE

t3occur only once int1 and t3,

respectively; (ii) the parent classificationsA andC of SEt1

andSEt3

, respectively, are partially dependent on each other; and(iii) the parent classesa2 andc3 of SE

t1andSE

t3, respectively,

are mutually exclusive, we should applyremove_par-tial_dependence to restructureTPGM in order toimproveETPGM

. The restructuring steps are as follows:

1. A anda2 in TPGM correspond toM andm in remove_-partial_dependence , respectively. Sincea2 is notthe unique child class ofA, Sa2

t1is pruned fromTPGM.

2. C andc3 in TPGM correspond toN andn in remove_-partial_dependence , respectively. Sincec3 is notthe unique child class ofC, Sc3

t3is pruned fromTPGM.

The resultant classification treeT 000PGM after restructuring

is depicted in Fig. 4.From T 000

PGM of Fig. 4, NpT 000

PGMis calculated as 27 using

Eqs. (2)–(4). Also, the seven legitimate test casesconstructed fromTPGM can still be formed fromT 000

PGM.For example, the legitimate test case (a1, b1, c1, d1)constructed fromTPGM, can be formed by combining allthe classes which lie onP1

000, P4000 andP7

000 in Fig. 4.By Eq. (1),ET 000

PGMis calculated as 0.26(� 7/27). When

comparing withETPGM� 0:09, the improvement is about

189%. It should be noted that, in Fig. 4,ET 000PGM

can befurther improved by restructuringT 000

PGM using remove_duplicate for removing the duplication of the classifica-tion D. If this happens,ET 000

PGMwill be further improved to

0.78 (about 767% improvement).

6. Conclusion

As testing plays an important role in the assurance of thecorrectness (and hence the quality) of software, a systematicway of identifying test cases from specifications is essential.It is because the comprehensiveness of the selected testcases will affect the quality of testing. In view of this, theclassification-tree method developed by Grochtmann andGrimm [5] provides a useful direction in pursuit of testing.Unfortunately, their tree construction method is rather adhoc.

This problem has motivated Chen and Poon [12, 13] todevelop the classification-hierarchy table and its associatedtree construction algorithm, by which classification treescan be constructed systematically from specifications.However, their construction algorithm does not take intoconsideration the effectiveness of classification trees.

In this paper, we have introduced a metricET to measurethe effectiveness of a classification tree. Intuitively speaking,

T.Y. Chen, P.L. Poon / Information and Software Technology 40 (1998) 765–775774

Fig. 4. T 000PGM after restructuring.

ET is a measure of the cost-effectiveness of constructing theset of legitimate test cases from a classification tree.

Simply knowing the value ofET for a classification tree isinsufficient. A more important issue is how to improveET.To address this issue, this paper has investigated the rela-tionship between the structure of a classification tree and itsET value, focusing on those features which could result in apoorET value. We observe that there are two sources lead-ing to a poorET value, the existence of duplicated classifi-cations under different top level classifications, and partiallydependent classifications. From this observation, we haveproposed two algorithms to improveET through restructur-ing the classification trees, prior to the construction ofpotential test cases.

References

[1] R. Ferguson, B. Korel, The chaining approach for software test datageneration, ACM Transactions on Software Engineering and Metho-dology 5 (1) (1996) 63–86.

[2] B. Korel, Automated test data generation for programs with proce-dures, in: ISSTA’96: Proceedings of the 1996 International Sympo-sium on Software Testing and Analysis, January 1996, pp. 209-215.

[3] J. Sanders, E. Curran, Software Quality: A Framework for Success inSoftware Development and Support. Addison-Wesley, Reading, MA,1994.

[4] T. Chusho, Test data selection and quality estimation based on theconcept of essential branches for path testing, IEEE Transactions onSoftware Engineering 13 (5) (1987) 509–517.

[5] M. Grochtmann, K. Grimm, Classification trees for partition testing,Software Testing, Verification and Reliability 3 (1993) 63–82.

[6] T.J. Ostrand, M.J. Balcer, The category-partition method forspecifying and generating functional tests, Communications of theACM 31 (6) (1988) 676–686.

[7] T.Y. Chen, M.F. Lau, Two test data selection strategies towards testingof boolean specifications, in: COMPSAC’97. Proceedings of theTwenty-FirstAnnual InternationalComputerSoftware and ApplicationsConference, IEEE Computer Society Press, August 1997, pp. 608–611.

[8] J.J. Chilenski, S.P. Miller, Applicability of modified condition/deci-sion coverage to software testing, Software Engineering Journal 9 (5)(1994) 193–200.

[9] E.J. Weyuker, T. Goradia, A. Singh, Automatically generating testdata from a boolean specification, IEEE Transactions on SoftwareEngineering 20 (5) (1994) 353–363.

[10] R. Weber, EDP Auditing Conceptual Foundations and Practice, 2nded., McGraw-Hill, New York, 1988.

[11] T.Y. Chen, P.L. Poon, S.F. Tang, A systematic method for auditinguser acceptance tests, IS Audit and Control Journal 5 (1998) 31–36.

[12] T.Y. Chen, P.L. Poon, Classification-hierarchy table: a methodologyfor constructing the classification tree, in: ASWEC’96: Proceedingsof the Australian Software Engineering Conference, IEEE ComputerSociety Press, July 1996, pp. 93–104.

[13] T.Y. Chen, P.L. Poon, Construction of classification trees via theclassification-hierarchy table, Information and Software Technology39 (13) (1997) 889–896.

[14] T.Y. Chen, P.L. Poon, Improving the quality of classification trees viarestructuring, in: APSEC’96: Proceedings of the Asia-Pacific Soft-ware Engineering Conference, IEEE Computer Society Press,December 1996, pp. 83–92.

T.Y. Chen, P.L. Poon / Information and Software Technology 40 (1998) 765–775 775