15
A formal approach for generating oo specifications from natural language Natalia Juristo 1 , Jos e L. Morant 2 , Ana M. Moreno * Faculdad de Informatica, Universidad Politecnica de Madrid, Campus de Montegancedo, 28660 Boadilla del Monte, Madrid, Spain Received 24 September 1997; received in revised form 29 November 1997; accepted 25 March 1998 Abstract The requirements analysis process is essential to software development. The success or failure of a software system can be said to largely depend on the quality of this activity. A formal and disciplined process is therefore necessary for requirements analysis. In this paper, we present an approach that is based on the formal definition of relations between linguistic and OO conceptual structures as a basis for a formal and disciplined problem analysis process. This process is based on two components, conceptual model formalization and OO model construction. The first provides formal rules to identify the key components of conceptual models, and the second, provides a set of definite steps to guide the analyst in model construction. We also present some conclusions concerning the application of our approach versus the standard OMT approach by a group of students at our university. Ó 1999 Elsevier Science Inc. All rights reserved. 1. Introduction Perhaps the most critical factor in software system development is understanding and adequately repre- senting the requirements to be satisfied by the system. One of the reasons for this is that information generated during this process will serve as a starting point for software system construction. So, wrong decisions, mistaken interpretations, or any other error made dur- ing the analysis process, have a critical impact on the other development phases and, therefore, on the soft- ware under construction. As mentioned by Brooks (1987), there are several sources of diculty in the analysis process. First, there are what have been called, quoting Aristotle, essential diculties, which are diculties inherent in the id- iosyncrasy of this activity and, second, there are what are known as accidental diculties, caused by incorrect execution of the activities to be performed during this phase. The primary diculties in the first group are that the customer organization may not be able to clearly define its problem; developers may have to deal with several users, who may have dierent and even contradictory requirements; or developers may misin- terpret the needs of the customer organization. The prime accidental diculty is that developers fail to adequately depict the problem to be solved in concep- tual models. While a silver bullet (Brooks, 1987) is unlikely to re- move the essential diculties and make the analysis process easy, a systematic and disciplined process could do away with accidental diculties (Faulk, 1997). At the same time, this process could provide a stable basis from which to address essential diculties. As suggested by Sutclie (1997), a formal method of analysis should supply exhaustive criteria for identifying the compo- nents of conceptual models and should specify proce- dures to guide software engineers during the construction of the above models in order to ensure that they adequately represent the problem and its solution. However, the present situation is very well summa- rized by Faulk (1997): ‘‘the analysis process is charac- terized by its immaturity’’. Indeed, we find that methods fail to agree on terminology, on the approach and on the activities; and, as shown in Moreno (1997), while there are methods that specify a set of definite steps for the requirements process (Coleman et al., 1994; NCC, 1990; Jacobson, 1992; etc.) none provide formal, justified, complete and correct guidelines for identifying the components of a problem that need to be represented in conceptual models. Therefore, according to Sutclie, current methods fail to provide the basis for solving the The Journal of Systems and Software 48 (1999) 139–153 www.elsevier.com/locate/jss * Corresponding author. E-mail: ammoreno@fi.upm.es 1 E-mail: natalia@fi.upm.es 2 E-mail: jlmorant@fi.upm.es 0164-1212/99/ - see front matter Ó 1999 Elsevier Science Inc. All rights reserved. PII: S 0 1 6 4 - 1 2 1 2 ( 9 9 ) 0 0 0 5 2 - 7

A formal approach for generating oo specifications from natural language

Embed Size (px)

Citation preview

Page 1: A formal approach for generating oo specifications from natural language

A formal approach for generating oo speci®cations from naturallanguage

Natalia Juristo 1, Jos�e L. Morant 2, Ana M. Moreno *

Faculdad de Informatica, Universidad Politecnica de Madrid, Campus de Montegancedo, 28660 Boadilla del Monte, Madrid, Spain

Received 24 September 1997; received in revised form 29 November 1997; accepted 25 March 1998

Abstract

The requirements analysis process is essential to software development. The success or failure of a software system can be said to

largely depend on the quality of this activity. A formal and disciplined process is therefore necessary for requirements analysis. In

this paper, we present an approach that is based on the formal de®nition of relations between linguistic and OO conceptual

structures as a basis for a formal and disciplined problem analysis process. This process is based on two components, conceptual

model formalization and OO model construction. The ®rst provides formal rules to identify the key components of conceptual

models, and the second, provides a set of de®nite steps to guide the analyst in model construction. We also present some conclusions

concerning the application of our approach versus the standard OMT approach by a group of students at our university. Ó 1999

Elsevier Science Inc. All rights reserved.

1. Introduction

Perhaps the most critical factor in software systemdevelopment is understanding and adequately repre-senting the requirements to be satis®ed by the system.One of the reasons for this is that information generatedduring this process will serve as a starting point forsoftware system construction. So, wrong decisions,mistaken interpretations, or any other error made dur-ing the analysis process, have a critical impact on theother development phases and, therefore, on the soft-ware under construction.

As mentioned by Brooks (1987), there are severalsources of di�culty in the analysis process. First, thereare what have been called, quoting Aristotle, essentialdi�culties, which are di�culties inherent in the id-iosyncrasy of this activity and, second, there are whatare known as accidental di�culties, caused by incorrectexecution of the activities to be performed during thisphase. The primary di�culties in the ®rst group arethat the customer organization may not be able toclearly de®ne its problem; developers may have to dealwith several users, who may have di�erent and evencontradictory requirements; or developers may misin-

terpret the needs of the customer organization. Theprime accidental di�culty is that developers fail toadequately depict the problem to be solved in concep-tual models.

While a silver bullet (Brooks, 1987) is unlikely to re-move the essential di�culties and make the analysisprocess easy, a systematic and disciplined process coulddo away with accidental di�culties (Faulk, 1997). At thesame time, this process could provide a stable basis fromwhich to address essential di�culties. As suggested bySutcli�e (1997), a formal method of analysis shouldsupply exhaustive criteria for identifying the compo-nents of conceptual models and should specify proce-dures to guide software engineers during theconstruction of the above models in order to ensure thatthey adequately represent the problem and its solution.

However, the present situation is very well summa-rized by Faulk (1997): ``the analysis process is charac-terized by its immaturity''. Indeed, we ®nd that methodsfail to agree on terminology, on the approach and on theactivities; and, as shown in Moreno (1997), while thereare methods that specify a set of de®nite steps for therequirements process (Coleman et al., 1994; NCC, 1990;Jacobson, 1992; etc.) none provide formal, justi®ed,complete and correct guidelines for identifying thecomponents of a problem that need to be represented inconceptual models. Therefore, according to Sutcli�e,current methods fail to provide the basis for solving the

The Journal of Systems and Software 48 (1999) 139±153www.elsevier.com/locate/jss

* Corresponding author. E-mail: ammoreno@®.upm.es1 E-mail: natalia@®.upm.es2 E-mail: jlmorant@®.upm.es

0164-1212/99/ - see front matter Ó 1999 Elsevier Science Inc. All rights reserved.

PII: S 0 1 6 4 - 1 2 1 2 ( 9 9 ) 0 0 0 5 2 - 7

Page 2: A formal approach for generating oo specifications from natural language

accidental di�culties in the requirements process, as thiswould require a completely formalized process.

The immaturity of the requirements process is par-ticularly apparent with OO conceptual modeling, be-cause it is in its infancy. This insu�ciency in OOmodeling, and the need to remedy it, has already beenstressed by several authors, such as (Iivari, 1995; Basiliet al., 1996; Wang, 1997; Northrop, 1997). They allemphasize the fact that there are no rigorous criteria foridentifying the components of OO conceptual models.They also claim that OO analysis cannot be e�ectivelyperformed and its immaturity is slowing down theadoption of OO.

The work presented in this paper seeks to formalizethe analysis process in order to create conceptual modelsin a rigorous and precise manner. This would overcomethe randomness of the methods used to date and providesupport for resolving the accidental di�culties arising inthis activity. We have focused on OO modeling, becausethis is one of the least mature areas.

The proposed approach is based on examining theinformation most likely to be available at the start of thedevelopment process, i.e., natural language sentencesthat describe the characteristics of the problem to besolved. This description is composed of words that canserve as elements of the conceptual models. We will focuson the formal de®nition of relations between the wordsor linguistic structures and the elements of the concep-tual models or conceptual structures, that is, the rela-tions between the linguistic world and the conceptualworld.

The idea of identifying these relations is not new. Asearly as in 1983, Abbot (1983) explained that nounscould be used to derive classes, adjectives to derive at-tributes, and verbs to de®ne methods, an idea that wassubsequently adopted by Booch (1986). Burg (1997)conducted some of the most rigorous research in thisarea, seeking to de®ne relations using an intermediatelanguage. However, the above relations are not fullyjusti®ed and do not account for all the key elements in aconceptual model. Table 1 shows the most importantresearches in this area. For a more exhaustive study ofthese methods, see Moreno (1997). This study is based

on ®ve criteria, that permit us to analyze each methodwith respect: (1) to the rules provided for identifyingcomponents of conceptual models (in that sense weapply the criteria called quality of information, justi®ca-tion, completeness and correctness), and (2) to the exis-tence of a set of de®ned steps to guide the analyst in theconstruction of conceptual models (for this character-istic we will apply the criterion referred to as de®nition).These ®ve criteria can be described as:· Quality of Information re¯ects whether the method

provides guidelines for identifying the elements ofthe conceptual models, and whether or not they areformal guidelines.

· Justi®cation speci®es whether or not the above guide-lines, if any, are formally justi®ed.

· Completeness shows whether these guidelines coveressential components of conceptual models.

· Correction re¯ects whether the guidelines can be ap-plied without exception.

· De®nition determines whether the method providesdetailed steps to carry out the analysis process.Our work de®nes a relation between linguistic struc-

tures in natural language and conceptual structures inan OO model. The relations are formal and justi®ed,and cover all key elements of the conceptual models.The approach consists of two di�erent activities: con-ceptual model formalization and OO model creation(see Fig. 1).

The formalization provides de®nite rules to identifykey elements of conceptual models by de®ning relationsbetween a subset of structures from the linguistic worldand a subset of structures from the conceptual world.The linguistic world is potentially in®nite, which led usto work with a subset. The linguistic structures of whichthis subset is composed are referred to as linguisticpatterns. The conceptual world is composed of anyconceptual models that represent a problem and its so-lution. In this case, we work with two OO conceptualmodels, the Object Model (OM), which will representthat static structure of the problem, and the BehaviorModel (BM), which will represent its dynamic aspect.The conceptual structures of these models constitutewhat are called conceptual patterns.

Table 1

Linguistic approaches to OO conceptual modeling

Quality of information Justi®cation Completeness Correction De®nition

Abbot (Abbot, 1983) Heuristics Intuitive Partial Partial ÿBooch (Booch, 1986) Heuristics Intuitive Partial Partial ÿJalote (Jalote, 1989) Heuristics Intuitive Partial Partial ÿSaeki et al. (Saeki et al., 1989) Heuristics Not Justi®ed Partial Partial Semide®nite

Rolland and Proix (Rolland and Proix, 1992) Heuristics Not Justi®ed Partial Total Semide®nite

Block et al. (Block et al., 1993) Heuristics Not justi®ed Partial Partial Inde®nite

Kristen (Kristen, 1994) Heuristics Not Justi®ed Partial Partial Semide®nite

Frederiks et al. (Frederiks et al., 1995) Heuristics Not justi®ed Partial Partial De®nite

Burg (Burg, 1997) Rules Not justi®ed Partial Total De®nite

140 N. Juristo et al. / The Journal of Systems and Software 48 (1999) 139±153

Page 3: A formal approach for generating oo specifications from natural language

OO model creation employs the results of the for-malization and guides analysts in building conceptualmodels.

Our approach seeks to give precedence to one of theconceptual representations that can exist empirically tomodel a given problem. The goal is to ensure that theconceptual models adequately represent the problemunder study and its solution. The ®tness of these modelswill be formally justi®ed by employing the results sup-plied by the formalization during the OO model con-struction process.

The paper is structured as follows: Section 2 presentsthe results of the conceptual modeling formalization andits underlying reasoning; Section 3 describes the stepstaken to create the OO model; Section 4 presents theresults of an experiment with a group of students at ouruniversity; and, ®nally, Section 5 contains the conclu-sions and a summary of future areas of research.

2. Conceptual modeling formalization

As mentioned above, the conceptual modeling for-malization de®nes a formally justi®ed correspondencebetween linguistic patterns and conceptual patterns. Inorder to de®ne this correspondence, we have used anintermediate world, the mathematical world. In thismanner, both linguistic patterns and conceptual patternswill be represented by their respective mathematicalrepresentations. If the mathematical representations of alinguistic pattern and a conceptual pattern are equiva-lent, then we can assure in a justi®ed manner, that theaforesaid conceptual pattern can model the linguisticpattern in question.

More formally, we have de®ned a set L of linguisticpatterns and a set C of conceptual patterns, on which wewill de®ne the correspondence. In the mathematicalworld, we will work with two di�erent, althoughequivalent, theories: predicate logic and set theory. Therequirements in L are represented by means of predicatelogic, where the set of logical expressions representingthese requirements is a subset of predicate logic, denotedas PL. The translation will be de®ned as K

K: L! PL:The requirements in C are represented by means of

set theory, where the set of these mathematical repre-

sentations is de®ned as ST. They will be mapped usingtranslation C, which can be de®ned as

C: C ! ST :An equivalence, denoted R, can be de®ned between

the elements of the subsets PL and ST, which is repre-sented as

R: PL$ ST :So, if the logical representation of a speci®c linguistic

pattern is equivalent to the set theory representation of agiven conceptual pattern, the representation of the lin-guistic pattern by means of the respective conceptualpattern is valid. Fig. 2 represents this idea graphically.

Conceptual patterns that describe each linguisticpattern can be determined by means of a constructivedemonstration applying the above transformations, thusenabling all key parts of the conceptual models of aproblem to be identi®ed in a fully justi®ed manner.

The sets L and C, and the reasoning shown in Fig. 2,will now be described.

2.1. Linguistic patterns (L)

In our approach, linguistic patterns have been dividedinto two types: (1) patterns that represent informationwith which the system is to work, and (2) patterns thatrepresent system operation. The ®rst group is calledstatic utility language (SUL) and the second group,dynamic utility language (DUL).

The SUL includes the information that describes,organizes and classi®es concepts in the universe of dis-course related to the problem under examination. In-formation of particular interest includes: (1) relationsamong di�erent concepts, (2) classi®cations of concepts,and (3) relations that express the composition of con-cepts on the basis of others. This information is repre-sented by the linguistic structure simple clause. Thereason is that a simple clause expresses a relation betweenthe subject and the predicate. This relation is determined

Fig. 2. Relation between the linguistic and the conceptual worlds.

Fig. 1. Relation between formalization and OO model creation.

N. Juristo et al. / The Journal of Systems and Software 48 (1999) 139±153 141

Page 4: A formal approach for generating oo specifications from natural language

by the verb of the above clause. Accordingly, thegrammar that represents the SUL is as shown in Fig. 3.

As mentioned above, the DUL will represent systembehavior. There are many linguistic structures that couldexpress this type of information, ranging from a punc-tuation marks, such as comma, or two sentences sepa-rated by a full-stop, to di�erent types of conjunctions,such as, therefore. But not all structures that contain acomma or all sentences separated by a full-stop expressdynamic behavior. It is therefore di�cult to de®ne ob-jective criteria for locating this type of information in atext.

One of the most direct grammatical forms of ex-pressing system behavior is the conditional subordinateclause. This is the type of clause that will be used in ourapproach. Accordingly, the grammar representing theDUL is as shown in Fig. 4.

2.2. Conceptual patterns (C)

Conceptual patterns are structures used to buildconceptual models. As said above, in our approach twoconceptual models are used: OM and BM. Fig. 5 showsconceptual patterns chosen to build these models. ForOM (see patterns c1 to c6), we will employ Rumbaugh'sOMT object model notation (Rumbaugh, 1991). In caseof binary relations, we will employ an intermediate di-rectional notation (patterns c3 to c5). For BM, will weemploy Martin's behavior model (Martin and Odell,1992) which represents the operations to be performedby the system and what events will activate the aboveoperations. The conceptual pattern used to build thismodel is pattern c7 where the control condition speci®esthe logical condition to be satis®ed by the events inorder to activate the operation in question.

2.3. Getting the correspondence between L and C

Below we present the results obtained after applyingthe reasoning shown in Fig. 2 to the sets L and C. For amore exhaustive justi®cation of this correspondence, see(Moreno, 1997).

(a) Get the intermediate representation of linguisticpatterns in predicate logic, that is, translation K. Theresults of this translation are shown in Table 2. Thelogical formulas shown in this table represent the subsetPL. First-order logic was used in the case of SUL lin-guistic patterns, that is, patterns l1 to l5, and proposi-tional logic for the DUL pattern, pattern l6. In theformer case, each nominal group and each complementis equivalent to a unary predicate (P_Nominal Groupand P_Complement, respectively), and the verb isequivalent to a n-ary predicate (P_verb). In the secondcase, each simple clause is represented by means of aproposition (lp_subi, lp_main).

(b) Get the intermediate representation of the con-ceptual patterns in set theory, that is, translation C.Table 3 shows the results of this translation, and thesubset ST in the right column. To get the representationof the OM conceptual patterns, patterns c1 to c6, eachclass has been represented as a set. Pattern c7 is repre-sented by means of a function between two sets.

(c) Describe the equivalence R between PL and ST.This equivalence is shown in Table 4, where the top tworows are used for the static part, and the bottom row forthe dynamic part. In the ®rst case, P ; Q; . . . ; are sets thatrepresent the predicates p(x), q(x), etc., where the ele-ments of the sets are the tuples that make the abovepredicates true. In the latter case, the set theory repre-sentation represents a function between two sets {0,1}p

and {0,1} as a table, whose elements are the true orfalse values of the propositions q1; q2; . . . ; and r,respectively.

(d) Apply the equivalence R to get the correspon-dence between logical representations of the linguisticpatterns and set theory representations of the concep-tual patterns. This correspondence is as shown inTable 5, where: (1) names of classes are equivalent to thenucleus of the noun structure of the noun groups orcomplements, and (2) names of relations are equivalentto the verb in the third person singular; and (3)subordinate1; . . . ; subordinaten represent any simpleclause in a subordinate clause.

3. OOA model construction

This section presents the steps to be taken by analystsin order to create conceptual models from an arbitrarydescription of the problem to be solved. These steps areshown in Fig. 6. It also shows the requirements elicita-tion and utility language validation activities. Theseprocesses are not covered by the proposed method, butinteract with its steps. All steps are iterative, that is, youcan go back to earlier steps. The requirements elicitationprocess provides the method input, which is a naturallanguage description of the problem to be solved. The®rst ®ve steps prepare the problem description for ap-plication of the formalization output, which will be usedduring steps 6 and 7. The other tasks in these steps aswell as steps 8 and 9 combine conceptual patterns toform the OM and the BM and will complete and re®nethe ®nal models. The tasks to be performed in each stepare described below.

3.1. Essential information extraction

During this activity, all relevant information will beextracted from the information supplied by the user.This information represents: (1) the explicit require-ments, and (2) implicit information which is essential for

142 N. Juristo et al. / The Journal of Systems and Software 48 (1999) 139±153

Page 5: A formal approach for generating oo specifications from natural language

Fig. 3. Static utility language grammar.

N. Juristo et al. / The Journal of Systems and Software 48 (1999) 139±153 143

Page 6: A formal approach for generating oo specifications from natural language

understanding the explicit information. This informa-tion comes from several sources: linguistic structures ofthe speci®cations, the application domain or the general®eld of knowledge.

3.2. Synonym and homonym identi®cation

This step removes ambiguities from the speci®ca-tions. In the case of synonyms, a single noun or noungroup is taken. This should be as representative andmeaningful as possible. In the case of homonyms, al-ternative names are taken to distinguish the intendedmeanings.

3.3. Separation of static and dynamic parts

A description of a problem must contain informationconcerning the static and dynamic components of theproblem. Di�erent procedures will be used to constructthe OM and the BM from these two types of infor-mation. Therefore, the two types of information areseparated. There is no formalized procedure for per-forming this task, as it will depend in each case on thedocument analyzed and on the reader's domain know-ledge. Some guidelines can be given, such as, descriptiveinformation about the static part represents the struc-tural properties of information to be processed, whereasinformation about system behavior speci®es interac-tions and events that a�ect the information described inthe static part.

As an example, speci®cations of a vehicle salesproblem might contain the following information for thestatic part:

Vendors may be employees or companies. Employ-ees receive a basic wage and a commission, whereascompanies only receive a commission. Each ordercorresponds to one vendor only, and each vendorhas made at least one order, which is identi®ed byan order number. Several employees may be paidthe same basic wage. Several employees and severalcompanies may be paid the same commission.

And the following information for the dynamic part:

A monthly payment is made to all vendors. When avendor makes a sale, he/she reports the order to thesystem. The system then con®rms the order to thecustomer, and orders are delivered to customersweekly.

3.4. Static requirements structuring

During this stage, descriptive information about thestatic part is structured according to the SUL patterns

Fig. 5. Conceptual patterns.

Fig. 4. Dynamic utility language grammar.

144 N. Juristo et al. / The Journal of Systems and Software 48 (1999) 139±153

Page 7: A formal approach for generating oo specifications from natural language

Table 2

Logical representation of linguistic patterns (translation K)

N. Juristo et al. / The Journal of Systems and Software 48 (1999) 139±153 145

Page 8: A formal approach for generating oo specifications from natural language

Table 3

Set theory representation of conceptual patterns (translation C)

146 N. Juristo et al. / The Journal of Systems and Software 48 (1999) 139±153

Page 9: A formal approach for generating oo specifications from natural language

shown in Fig. 3. Some guidelines are given for thistransformation:· Clauses of the type ``there is X in Y '' or ``X exists in

Y'' are replaced with ``Y has X ''.· Pronouns that are not part of the verb structure are

replaced with the noun or noun group that they rep-resent.

· If a modi®er refers to two nouns, the sentence is re-structured to make the association clearer.According to these guidelines, the static information

presented in the above example would be representedas:1. l1:1 Vendors may be sales employees or companies.2. l2 Employees receive a basic wage and a commission.3. l2 Companies (the adverb only is deleted) receive a

commission.4. l2 Each order corresponds to one vendor (the adverb

only is deleted).5. l2 Each vendor has made at least one order.6. l4 An order is identi®ed by an order number.7. l2 One commission may be paid to several employees

and several companies (several has been made explicitfor company).

8. l2 One basic wage may be paid to several employees.

3.5. Dynamic requirements structuring

A similar process is applied to the information thatrepresents the dynamic part of the problem. In this case,this information is studied independently for each use

case (Jacobson, 1992), and the description of each one isstructured according to the DUL linguistic patternshown in Fig. 4. Guidelines for performing this activityinclude:· Number the clauses of each use case as: use case num-

ber, number of clause that describes use case.· Substitute pronouns with the noun groups they repre-

sent.· Transform adverbs or adverbial expressions of time,

such as ``monthly'', ``daily'', etc., using the conjunc-tion if, for example, ``if the month ends''.

· Switch the verbs in both the subordinate-clause andthe main-clause to the active voice.Here is an example of one use case generated using

the above guidelines to the vehicle sales problem:1. Order Management

1.1. If and only if a vendor makes a sale, then the ven-dor reports the order to the system.

1.2. If and only if a vendor reports the order to thesystem, then the system con®rms the order tothe customer.

1.3. If and only if the system con®rms the order to thecustomer and the week ends, then the companydelivers the order to the customer.

The conceptual models are derived from the outputof Steps 4 and 5. It would therefore be good practice forusers to validate this information to ensure that it rep-resents what they have in mind. In this manner, part ofthe validation process can be completed before buildingthe conceptual models, thus saving re-work time and

Table 4

Equivalence between logic and set theory (equivalence R)

N. Juristo et al. / The Journal of Systems and Software 48 (1999) 139±153 147

Page 10: A formal approach for generating oo specifications from natural language

Table 5

Correspondence between linguistic and conceptual patterns

148 N. Juristo et al. / The Journal of Systems and Software 48 (1999) 139±153

Page 11: A formal approach for generating oo specifications from natural language

e�ort. This would be done during the utility languagevalidation process.

3.6. Object model construction

The OM is created using the linguistic structuresrepresented in the SUL. This involves performance ofthe tasks detailed below.

3.6.1. Identi®cation of classes and relationsSome results of the formalization process will be

applied here, in particular the correspondence betweenSUL linguistic patterns and OM conceptual patterns.For each linguistic structure in this language, the pro-cedure is as follows:1. Identify the modeling pattern for the linguistic pat-

tern of each structure analyzed, as shown in Table 5.2. For relations based on linguistic patterns of types l2

and l5, the relation is labeled using two components.The ®rst is the name of the relation. The second is thenumber of the source clause, followed by the linguis-tic pattern identi®er. If there is another relation in themodel with the same name, add a consecutive numberto the name of the structure.

3. For relations based on linguistic patterns of types l1,l3 and l4, the label of the relation is composed of thenumber of the source clause and the pattern identi®eronly.

4. For relations based on patterns of types l2, l3 and l5,check whether there is another relation of any ofthese types among the entities concerned whichhas the same semantics. If so, add the numerical partof the label of the other relation to the second part ofthe label of each relation, separated by a dot. Thisnew part of the label indicates the original clause ofthe semantically equivalent relation.Fig. 7 shows the output of this task, as applied to the

vehicle sales problem.

3.6.2. Determination of cardinalitiesCardinalities complement the information supplied

by classes and relations. However, it is not considered anessential component and is therefore not included in theformalization discussed in Section 2. Cardinalities are

Fig. 7. Classes and relations.

Fig. 6. OOA method steps.

N. Juristo et al. / The Journal of Systems and Software 48 (1999) 139±153 149

Page 12: A formal approach for generating oo specifications from natural language

taken from the determiners that are part of the noungroups and complements and from the linguistic pat-terns. The method provides several guidelines for ob-taining cardinalities, detailed in Moreno (1997). First,the individual cardinalities of each relation are identi®edand then any relations with the same semantics areuni®ed in a single non-directional relation.

The output of this task applied to Fig. 7 is shown inFig. 8.

3.6.3. Reexamination of inheritanceThis task increases common characteristic sharing. It

involves analyzing whether all the subclasses in an in-heritance hierarchy have a relation with identical se-mantic with another class of the model. In this case, wecan substitute the above relations by a relation betweenthe superclass and the aforesaid class. This has beenapplied to the classes ``Sales employee'', ``Company''and ``Commission'' in Fig. 8 to produce Fig. 9.

3.6.4. Attribute selectionThus far, all we have are classes and relations. In this

task, we determine attributes. This involves studying therelations of aggregation and association in our model. Ifany class participates in one relation only, this class istransformed into an attribute of the other class. Formore details about n-ary relations, see Moreno (1997). Ifthis task is applied to Fig. 9, it outputs Fig. 10.

3.6.5. OM veri®cationDuring this task, the model is examined for class,

attribute and relation completeness. The model is com-plete if it does not contain directional relations. If thereare directional relations, this means that information ismissing, and the requirements elicitation process wouldhave to be repeated to locate this information. In thiscase, Fig. 10 is the veri®ed model. The next step adds theclass operations to the model.

3.7. BM construction

The BM is constructed from the DUL linguisticpatterns. The two tasks to be carried out are detailedbelow.

3.7.1. Identi®cation of events, operations and controlconditions

For each DUL linguistic structure, which describes ause case, the following actions are performed:1. Check that the main clause represents a basic opera-

tion (creation, deletion, change, classi®cation, declas-si®cation and information) on system objects. If theclause does not represent a basic operation, decom-pose it into further clauses, according to DUL con-straints, until each one generates a basic operation.

2. Represent the model in terms of events and opera-tions, applying the results of the formalization pro-cess (Table 5), adding the number of each structureto the name of each source operation.

3. Determine the control conditions for activating eachoperation. A control condition is the logical expres-sion met by the subordinate clause. It correspondsto the events that ®re each operation.

4. Unify all conceptual patterns in a single diagram.The resultant diagram for the vehicle sales example is

shown in Fig. 11.

3.7.2. BM veri®cationNow we will check that all operations for the same

use case (operations starting with the same number) arerelated by means of an event. If this is not the case, someinformation was not made explicit in Step 5, and the

Fig. 8. Relation cardinalities.

Fig. 9. Inheritance optimization.

Fig. 10. Object model with attributes.

150 N. Juristo et al. / The Journal of Systems and Software 48 (1999) 139±153

Page 13: A formal approach for generating oo specifications from natural language

earlier steps should be repeated before proceeding anyfurther.

3.8. OM and BM integration

In this stage, the OM will be completed by adding theoperations identi®ed in the BM. For this purpose, wepropose construction of a matrix as shown in Table 6,whose rows specify the classes of the OM and whosecolumns indicate BM internal operations. For eachoperation, we will mark the position corresponding tothe class on which the method acts, that is, on which abasic operation is performed. The OM will then becompleted by adding the appropiate operations, asshown in Fig. 12.

3.9. OM and BM veri®cation

This task consists of a joint veri®cation of bothconceptual models. This involves checking that there is arelation in the OM between the classes whose operationsare interconnected in the BM by means of events.

4. Results of applying this approach

With a view to re®ning our approach, we workedwith a group of ®nal-year degree students at the Schoolof Computer Science at the Polytechnic University ofMadrid. Some of them were taught the proposed ap-proach and others the standard OMT (Rumbaugh,1991) approach. In this manner, we sought to achieve a

second objective: compare how good the two methodswere when applied by people with no experience in OO,to build conceptual models.

Some of the conclusions drawn from this experimentare discussed below.· Analysts working with our approach thought about

the problem more instead of directly setting aboutcreating models, a common mistake made by inexpertanalysts. This implies better study and understandingof the problem under analysis, a task that is essentialin performing a good analysis.

· Analysts working with our approach spent more timeon the pre-modeling phase, that is on creating theutility languages. Analysts working with OMT spenta lot of time discussing which elements should formpart of conceptual models, which led to incorrectmodeling in some cases. This can be attributed tothe absence of strict criteria for identifying elementsof the conceptual models. Using our approach, 65%of the time was spent on translating requirements tothe utility languages, while 35% was spent on con-structing conceptual models. Using OMT, 85% wasspent on constructing conceptual models, and 15%was spent on gaining an understandig of the problembefore moving on to conceptual modeling.

· Our approach prevented some incorrect modelingconstructions, thanks to the encapsulation of OOconcepts, which may not be fully understood by nov-ice analysts or with which they may be unfamiliar, aslinguistic concepts which are known and generallyapplied by analysts. In this respect analysts workingwith OMT developed incorrect conceptual structures

Fig. 12. Final object model.Fig. 11. Behavior model.

Table 6

Classes and methods matrix

Class Method

Report order to system Con®rm order to customer Send order to customer Pay vendor

Vendor x

Sales employee x

Company x

Order x x x

N. Juristo et al. / The Journal of Systems and Software 48 (1999) 139±153 151

Page 14: A formal approach for generating oo specifications from natural language

because they misunderstood some OO concepts. Thisdid not occur using our approach, as analysts mainlyworked with linguistic concepts which they are usedto.

· The repeatability of conceptual modeling is higher. Inparticular, the models developed using our approachwere all quite similar, while the models obtainedusing OMT were quite diverse.

5. Conclusions

This paper presented an approach for formalizingproblem analysis. The approach uses relations betweencomponents of natural language, representing thecharacteristics of the problem to be solved, and the el-ements of the conceptual models, which are an ab-straction of the universe of discourse of the aboveproblem.

Formalization and OOA model construction providethe basis for de®ning a formalized analysis process, asspeci®ed by Sutcli�e. In this manner, the formalizationprovides formal criteria for identifying the elements ofconceptual models, and the OOA model constructionprovides a set of de®ned activities which guide analystsin conceptual modeling.

Brie¯y, this paper proposed an original and com-prehensive means of conducting formal analysis by:· Selecting conceptual models that adequately repre-

sent the problem to be solved and its solution, captur-ing the information representative of a problem at thelevel of conceptual abstraction.

· De®ning precise and speci®c guidelines which enablesoftware engineers to build conceptual models in arigorous manner, thus eliminating the randomnessand imprecision re¯ected by alternative analysismethods.

· Formally using a linguistic approach to output ad-equate conceptual models from all possible modelsthat could empirically represent a given problem.The choice of these models was based on the useof the mathematical world as a catalyst for con-verting the linguistic world into the conceptualworld.

· Facilitating the validation process. Users are not usu-ally familiar with conceptual models, whereas devel-opers employ these models to represent the problemto be solved. The use of the linguistic approach en-ables users to directly validate the utility languages,composed of sentences in natural language withwhich they are familiar, rather than directly validat-ing the conceptual models with which they are unfa-miliar. Correct application of the results of theformalization and of the OOA model constructionsteps enables conceptual models to be obtained in asatisfactory manner.

This provided a formal, systematic and disciplinedanalysis process, which laid a solid foundation forsolving the accidental di�culties arising in this activityand addressing the management of essential di�culties.

The method's limitations are related to the fact that itis di�cult to get a concise and coherent problem de-scription. However, this description is continually re-®ned and completed during method application, in linewith one of the principles of Software Engineeringproposed by Davis: ``Increase, never substitute, naturallanguage'' (Davis, 1995).

With regard to future work, we intend to develop atool to automate the OOA model construction, that is, atool to guide analysts constructing conceptual models ofa problem. This involves inputting the results of theformalization into the tool. The tool would really beuseful during Steps 6, 7, 8 and 9 of the method, which iswhen the results of the formalization process are appliedand when conceptual models of the problem are built.These steps are easy to automate. It should also providesupport for the earlier steps of the method, that is, Steps1 to 5. These steps require signi®cant intervention on thepart of analysts, for which the method provides thenecessary criteria. However, they are not automaticprocesses.

Acknowledgements

This paper was written while A.M. Moreno wasworking at the Center for Software Systems Engineeringat the University of Colorado Colorado Springs(UCCS). We would like to thank Al Davis, director ofthe above center, for giving us the opportunity to writethis paper and for his comments and suggestions.

References

Abbot, R., 1983. Program design by informal english description.

Communications of the ACM 16 (11), 882±894.

Basili, V.R., Briand, L.C., Malo, W.L., 1996. How reuse in¯uences

productivity in object oriented systems. Communications of the

ACM 39 (10), 104±116.

Block, C.H., MacMillan, M.R., Martin, J.H., Monarchi, D., 1993. A

prototype system for extracting objects and relatioships for

software speci®cations. The Journal of Knowledge Engineering,

pp. 70±78.

Booch, G., 1986. Object oriented development. IEEE Transactions on

Software Engineering 12 (2), 211±221.

Brooks, F., 1987. No silver bullet: Essence and accidents of software

engineering. Computer, pp. 10±19.

Burg, J.F.M., 1997. Linguistic instruments in requirements engineer-

ing, Ph.D Thesis, Vrije Universiteit, Amsterdam.

Coleman, D., Arnold, P., Bodo�, S., Dollin, C., 1994. Object-oriented

Development: The Fusion Method, Prentice-Hall, Englewood

Cli�s, New Jersey.

Davis, A.M., 1995. 201 Principles of Software Development, McGraw-

Hill, New York.

152 N. Juristo et al. / The Journal of Systems and Software 48 (1999) 139±153

Page 15: A formal approach for generating oo specifications from natural language

Faulk, S.R., 1997. Software requirements: A tutorial. In: Software

Engineering, IEEE Computer Society Press, Los Alamitos,

pp. 82±101.

Frederiks, P.J.H., Kister, C.H.A., van de Weide, Th.P., 1995. Object

oriented analysis using informal language, Technical Report,

Computer Science Institute ± R9516, Faculty of Mathematics and

Informatics, Catholic University of Nijmegen, Nijmegen.

Iivari, J., 1995. Object-orientation as structural, functional and

behavioral modeling: A comparison of six methods for object-

oriented analysis. Information and Software Technology 38,

155±163.

Jacobson, I., 1992. Object-oriented Software Engineering: A Use Case

Approach, Addison±Wesley, Wokingham.

Jalote, P., 1989. Functional re®nement and nested objects for OO

design. IEEE Transactions on Software Engineering 15 (3),

264±270.

Kristen, G., 1994. Object orientation: The kiss-method from informa-

tion architecture to information system, Addison±Wesley, Read-

ing, MA.

Martin, J., Odell, J., 1992. Object-oriented analysis and design,

Prentice-Hall, Englewood Cli�s, NJ.

Moreno, A.M., 1997. A conceptual modeling formal method for

software systems. Ph.D Thesis, Universidad Polit�ecnica de

Madrid, Madrid.

NCC Blachwell Ltd., 1990. SSADM, v. 4, Reference Manual. Oxford.

Northrop, M., 1997. Object-oriented development. In: Software

engineering, IEEE Computer Society Press, Los Alamitos,

pp. 148±159.

Rolland, C., Proix, C., 1992. A natural language approach for

requirements engineering process. In: Proceedings of the 4th

International Conference on Advanced Information Systems

Engineering, Manchester, pp. 257±277.

Rumbaugh, J., et al., 1991. Object±oriented modeling technique,

Prentice-Hall, New Jersey.

Saeki, M., Horai, H., Henomoto, H., 1989. Software development

process from natural language speci®cation. In: Proceedings of

the 11th International Conference on Software Engineering, New

York, pp. 64±73.

Sutcli�e, A.G., 1997. Object±oriented systems development: survey of

structure methods. In: Software Engineering, IEEE Computer

Society Press, Los Alamitos, pp. 160±169.

Wang, S., 1997. A synthesis of natural language, semantic network and

objects for business process modeling. Canadian Journal of

Administrative Sciences 14 (1), 79±92.

Dr. Natalia Juristo is a full-time professor of computer science at theUniversidad Politecnica de Madrid, Spain. She is coordinator of theSoftware Engineering Department and director of master's courses inknowledge engineering and software engineering. Juristo has writtentwo books and has published several articles. She is an editorial boardmember for IEEE Software and the International Journal on SoftwareEngineering and Knowledge Engineering. In 1997 Juristo served asprogram chair of the Nineth International Conference on SoftwareEngineering and Knowledge Engineering. She was a fellow of CERNin Switzerland, a member of the sta� of the European Space Agency inItaly, and resident a�liate of the Software Engineering Institute at theCarnegie Mellon University. She is listed in Who's Who in Science andEngineering. She is a senior member of the IEEE Computer Societyand member of the ACM, AAAS and NYAS.

Dr. Jos�e L Morant is a full-time professor of computer science at theUniversidad Politecnica de Madrid, Spain. He is the Dean of theschool of computer science at the above university. Until 1985Dr. Morant worked at ITT±Alcatel on the design and installation ofcommunication applications. After joining the university, he headedthe teleinformatic and information security laboratories at the aboveschool. He has written a book about information security and hasparticipated in several projects about the de®nition of security criteriain communications sponsored by the European Community. Dr.Morant is member of IEEE and head of AEMES (Spanish Society ofSoftware Metrics).

Dr. Ana M. Moreno is assistant professor with the Computer ScienceSchool at the Universidad Politecnica de Madrid. She teaches under-grads and master courses on Software Engineering. The work reportedin this paper is part of Moreno's Ph.D Thesis which was developedunder a research grant of the Spanish Science Foundation. Moreno hasbeen visiting scholar at the Vrije University (The Netherlands) and atthe University of Colorado at Colorado Springs (USA). Dr. Morenohas presented seminars on Object Oriented Development and ProjectManagement sponsored by the European Community and privatecompanies.

N. Juristo et al. / The Journal of Systems and Software 48 (1999) 139±153 153