Representing Software Engineering Knowledge

P1: STR/MVG/ASH P2: ICA

Automated Software Engineering KL426-Mylopoulos April 7, 1997 14:16

Automated Software Engineering 4, 291–317 (1997)c© 1997 Kluwer Academic Publishers. Manufactured in The Netherlands.

Representing Software Engineering Knowledge

JOHN MYLOPOULOS [email protected] of Computer Science, University of Toronto, Toronto, Canada M5S 3H5

ALEX BORGIDA [email protected] of Computer Science, Rutgers University, New Brunswick, NJ 08903, USA

ERIC YU [email protected] of Information Studies, University of Toronto, Toronto, Canada M5S 3G6

Abstract. We argue that one important role that Artificial Intelligence can play in Software Engineering is to actas a source of ideas about representing knowledge that can improve the state-of-the-art insoftware informationmanagement, rather than just building intelligent computer assistants. Among others, such techniques can lead tonew approaches for capturing, recording, organizing, and retrieving knowledge about a software system. Moreover,this knowledge can be stored in a software knowledge base, which serves as “corporate memory”, facilitatingthe work of developers, maintainers and users alike. We pursue this central theme by focusing on requirementsengineering knowledge, illustrating it with ideas originally reported in (Greenspan et al., 1982; Borgida et al.,1993; Yu, 1993) and (Chung, 1993b). The first example concerns the language RML, designed on a foundationof ideas from frame- and logic-based knowledge representation schemes, to offer a novel (at least for its time)formal requirements modeling language. The second contribution adapts solutions of the frame problem originallyproposed in the context of AI planning in order to offer a better formulation of the notion of state change causedby an activity, which appears in most formal requirements modeling languages. The final contribution importsideas from multi-agent planning systems to propose a novel ontology for capturing organizational intentions inrequirements modeling. In each case we examine alterations that have been made to knowledge representationideas in order to adapt them for Software Engineering use.

Keywords: knowledge representation, software knowledge bases, languages

1. Introduction

“...The ultimate goal of artificial intelligence applied to software engineering is automaticprogramming...” (Rich and Walters, 1986)

The role of Artificial Intelligence (hereafter AI) in Software Engineering (SE) has tradi-tionally been one of offering concepts, tools and techniques for building intelligent systemswhich can perform—or assist in the performance of—software engineering tasks. Examplesof such systems can be found as far back as Cordell Green’s seminal contributions on theapplication of theorem proving to automatic program generation (Green, 1969), the long-term and influential Programmers Apprentice project at MIT (Rich and Waters, 1988),or Douglas Smith’s impressive work on the synthesis of divide-and-conquer algorithms(Smith, 1985). This type of research has seen applications of theorem proving, natural



292 MYLOPOULOS, BORGIDA AND YU

language and knowledge-based systems techniques, among others, to tasks such as pro-gramming and program verification, transformations of informal program specificationsto formal ones, or the development of “intelligent assistants”. An influential statementon this direction of research can be found in theReport on a Knowledge-Based Soft-ware Assistant, (Green et al., 1983) and the area has been surveyed several times over theyears (notably, in (Green et al., 1983; Mostow, 1985; Rich and Waters, 1986; Barstow,1987) and (Lowry and Duran, 1989)). In short, this line of research has had impressivesuccesses, has influenced the SE community at-large and characterizes much of todaysresearch on AI&SE, as exemplified by the programmes of KBSE conferences through theyears.

This paper focuses on another tangible and important contribution AI can make to SE,even if it promises no “intelligent” systems/tools what-so-ever. This contribution lies in theadoption/adaptation of knowledge representation techniques in order to capture, record, or-ganize and retrieve software engineering knowledge. A knowledge base built on that basiscan be used as “corporate memory” (Lowry and Duran, 1989) for a software system, facil-itating the work of developers, maintainers and users. After all, software engineers spendconsiderable time trying to understand software systems (Soloway et al., 1988; Corbi, 1989;Devanbu, 1994). Moreover, much of this effort is dedicated to the process of recoveringunrecorded knowledge. Such knowledge includes (but is not limited to):

Domain knowledge—e.g., patients, nurses, treatments, admissions for a hospital registra-tion system;

Requirements knowledge—what is the system intended for?... what functions will itperform?... what information will it handle?

Design knowledge—system architecture, conventions, what does each component do?Implementation knowledge—implementation details;Programming knowledge—about the programming language, data structure and algo-

rithms used;Quality factors—what expectations did the customer have with regard to performance,

reliability, portability,..? how have these influenced the design and implementation ofthe system?

Design rationale—why were decisions made?... how do they relate implementation withdesign, design with requirements?

Historical knowledge—who built and who maintained the system?... what where theirsoftware engineering habits, strengths and weaknesses?

Of course, software knowledge capture and representation is necessary for building in-telligent tools as well. The difference between this “software knowledge management”perspective and the “intelligent assistants” one that prevails in AI&SE research today is thatthe knowledge captured is to be used primarilyby peoplein the former case, rather thanbya (knowledge-based) systemin the latter case. This distinction has profound consequenceson the nature of the representations used and the scope and coverage of the knowledgebases built. In particular, AI’s contribution to the capture and representation of softwareknowledge can be in areas such as:



SOFTWARE ENGINEERING 293

New concepts (ontologies)—for representing some of this knowledge;New notations/languages—for specifying software at some level of abstraction (e.g., re-

quirements or design), or for representing some of the other types of software knowledgelisted above;

Better semantics—for existing requirements or design languages;Software knowledge repositories—which capture useful knowledge about a particular

software system throughout its lifetime, organized for easy human access and supportingbasic retrieval mechanisms to facilitate its use.

For the development of intelligent assistants, on the other hand, emphasis rests with thecharacterization of software engineering tasks (e.g., requirements acquisition, algorithmdesign), the identification of relevant formal and heuristic knowledge, and the constructionof systems that can adequately perform these tasks.

The rest of the paper expands on this central theme of using knowledge representationideas to improve software knowledge management. The next three sections review threecontributions of knowledge representation research to SE originally reported in (Greenspanet al., 1982; Borgida et al., 1993; Yu, 1993) and (Chung, 1993b). The first involves thelanguage RML, designed on a foundation of ideas from frame- and logic-based knowledgerepresentation schemes to offer a novel (at least for its time) formal requirements modelinglanguage. The second contribution adapts solutions of the frame problem originally pro-posed in the context of AI planning in order to offer a better formulation of the notion ofstate change caused by an activity, which appears in most formal requirements modelinglanguages in one form or another. The final contribution imports ideas from multi-agentplanning systems to propose a novel ontology for capturing organizational intentions inrequirements modeling.

It is important to note that, although this paper provides an overview of three lines ofresearch in some detail, the purpose of this paper is not to present or review these worksper se. Rather, the presentation in Sections 2 to 4 are offered as concrete examples tosupport the thesis of this paper, namely, the distinguishing features and potential benefitsof the “software knowledge management” approach to bringing AI techniques to bear onsoftware engineering.

2. Requirements modeling in RML

Language design has been a central theme in Software Engineering research throughoutits history. Programming languages, for example, and their associated methodologies havecontributed greatly to increased programmer productivity. The emergence of formal require-ments modeling languages was the logical next step in providing linguistic, methodologicaland tool support for the early phases of the software lifecycle, for very much the reasonsfirst articulated in (Bell, 1975). However, the subject matter of programming and specifica-tion languages is software systems—objects that are man-made, bounded and objectivelyknown, while this is not the case with requirements. A corollary of this is that designers ofrequirements modeling languages need to turn to research in areas other than core computersystems and programming languages in search of ideas and research results that offer an




intellectual foundation for their designs. To put it another way, it is unwise to try to designrequirements modeling languages by merely adopting programming language ideas. Thissection reviews some of the premises and features of the Requirements Modeling Language(hereafter RML) first proposed in (Greenspan et al., 1982).

While developing the requirements for a software system, the requirements engineerneeds to understand theapplication domain, including the organizational environmentwithin which the proposed system will eventually function. It has been argued as far backas the mid-’70s (Ross and Schoman, 1977b) that it is imperative to captureexplicitly asmuch of this understanding as possible, in order to support communication between thevarious “stakeholders” (customers, users, developers, testers) for a software developmentproject—this is, after all, the primary function of the requirements document. An explicitmodel is also useful in supporting continuity in face of inevitable staff turnover and otherorganizational change. However, a model of a social organization or of the natural world isnot likely to be prescriptive—natural kinds and notions like ‘consumer satisfaction’ have nodefinitionsin terms of necessary and sufficient conditions. Hence we view requirements notasspecifications1 but asmodelsof the application domain. Moreover, these models have tobe structured in ways which are consistent with cognitive principles of mental models andmemory structure (e.g., (Norman, 1988)), since they are to be used (and hence understood)by people. Requirements engineering activities are defined as model construction, manage-ment and analysis tasks. The case for world modeling is articulated eloquently by Jackson(1978, 1983), whose methodology starts with the development of a “model of reality withwhich [the system] is concerned”, prior to system design.

The logical conclusion of these observations is that we should be developingconcep-tual models, expressed in terms of symbols which denote (concrete or abstract) entities,activities, and other phenomena in the world. Moreover, symbols are structured and orga-nized according to principles of conceptual organization, such as “classes and instances,”“parts and wholes,” and “specializations and generalizations.” Others, including (Bubenko,1980) and (Solvberg, 1979), also advocated conceptual modeling for requirements mod-eling, or built requirements languages on top of knowledge representation substrata (e.g.,GIST (Balzer et al., 1982)). The issue of conceptual modeling was considered by a numberof participants at the 1980 Pingree Park Workshop (Brodlie, 1981, 1984), particularly byresearchers working on data modeling for databases. Objects (with intrinsic identity) formthe pearl-seeds around which knowledge about the domain is grouped. Of course, the fieldof Knowledge Representation has a long-standing involvement with this subject matter, andhas served for us as a primary source of ideas.

The above principles defined a foundation for the RML proposal presented originally in(Greenspan et al., 1982) and subsequently in more detail in (Greenspan, 1984; Greenspanet al., 1986). Implicit in this was the notion that some kind offormal languagewould be usedto express requirements models. The advantage of any such formalism is that descriptionswhich adopt it can be assigned a well-defined semantics using formal logic. The advantagesof clear semantics include adjudicating among different interpretations of a given model,and offering a basis for various ways ofreasoningwith models, either through consistencychecking (the foundation of useful tools) or by supporting question-answering. Of course,the appeal and usability of some techniques may be largely due to their relative simplicity




and flexibility derived from informality. We note that the use of a formal requirementsmodeling language does not preclude the concurrent use of informal notations. In fact, theoriginal RML proposal envisioned early use of an informal notation, such as SADT, and atransformation process from an informal SADT model into a formal RML one.2

RML views a model as consisting of objects of various kinds: individuals, ortokens,grouped intoclasses, which are in turn instances ofmetaclasses. Classes and metaclassescan havedefinitional properties, which specify what kinds of information can be associatedto their instances throughfactual properties. For example, if the classPerson hasnameasa definitional property, then each instance ofPerson can have a factual property associatinga specificnameto it. The requirement that every factual property must be induced by a corre-sponding definitional property is called the Property Induction Constraint, and offers a formof type checking. A subclass relationship between classes (and between metaclasses) assertsthat every instance of the subclass is an instance of the superclass, and moreover, every def-initional property of a class is a definitional property of its subclasses (i.e., inheritance).

The class descriptions of figure 2.1 define the activity class namedAdmitPatient andthe entity classPatient . The former is intended to convey the idea that the activity of ad-mitting a new patient (to a hospital) involves three sub-activities which, respectively, obtain

Figure 2.1.




standard information about the patient, including blood pressure (document ), assign herto a bed (AssignBed ) and record the admission (recordAdmission ) in some computer-or paper-based file. The first three properties ofAdmitPatient identify through prop-erties other objects that must be present for every instance of the class (participantsproperties); these may be thought of as analogues of procedural “formal parameters” and“locals”. The next three properties (document , checkIn , record ) are classified un-der parts , and specify sub-activities ofAdmitPatient . The following three proper-ties (canAdmit? , isThereRoom? , patientAlready? ) define preconditions, whichmust be true every timeAdmitPatient is instantiated. The activity also has two postcon-ditions, which specify respectively that the effects of the activity include making the personpt a Patient , and incrementing the count of how many people reside on the ward.

Likewise, thePatient entity class describes instance entities in terms of a numberof properties. First, patients have an associated medicalrecord which is necessary(i.e., must be there for everyPatient instance),unique and apart (i.e., if a patient isremoved from an RML model, so is her medical record). Second, patients have three associa-tion properties, which associate respectively a location, a room and a physician. Moreover,patients are produced byAdmitPatient activities, are modified byAssessPatientactivities and are “consumed” by (i.e., cease to be patients because of) theDischargePa-tient activity. Finally, instantiation of thePatient class is only possible if the patientdoes not have unpaid bills (startClean? ).

According to RML’s view of the world (what we shall call itsontology), there are threetypes of things to be talked about:entities, activitiesandassertions(i.e., every individual to-ken in the model is an instance of exactly one of the classesEntity, Activity , orAssertion).The notions of entity and activity were chosen because they are ubiquitous in modelingaspects of a real world, and match well corresponding notions in other requirements lan-guages; assertions were introduced in order to help structure the requirements model itself.Each object category isspecifiedby listing (using meta-classes) the property categories(kinds of definitional properties) that can be associated with those kinds of classes. Forexample, as we saw in figure 2.1, activity classes have, among others,participants ,

parts andpre/post-condition properties. Entity classes, on the other hand, admitunique , necessary , parts , associations , etc., properties. Note that in theRML framework, an object can be an instance of multiple classes and, likewise, a propertycan belong to multiple property categories (seerecord property).

Each object category is formalized in the semantics of RML in terms of axioms that cap-ture its essence; for example, activities have axioms which state that theirstart time mustprecede theirend time, or that all precondition properties must be true at the start of a newactivity instance, while postconditions will be true at the end. The definition of “instance”isconstrued so that an activity token’s instancehood in an activity class corresponds to theoccurrence of the activity according to the formal properties associated with the class.

Just as for entities, RML activity classes are organized into specialization/generalizationhierarchies. Organizing activities in this way is a step beyond classical object-orientedsoftware engineering approaches, in which objects (corresponding to RML entities) haveattached procedures, but the procedures are themselves not subject to organization by hier-archies of classes.




Figure 2.2.

Assertion (formula) “objects” are the most novel part of RML. They provide a formallanguage for specifying otherwise informal information. Among their roles, they are associ-ated as preconditions and postconditions on activities, and as invariants on entities. Treatingassertions as objects makes them subject to the same structuring/organizing principles asother objects, but the meaning is specific to the logical nature of the assertions. For example,the assertion classIsTreatedWith has properties which define its arguments (p, a pa-tient, andt , a treatment) and ones which define its component sub-assertions (parts). Thisassertion class indicates that a patientp receives treatmentt if and only if the treatment isavailable (Available is another assertion class) and has beenRecommended. In general,an assertion class’s properties classified under thearguments category are taken to befree variables of an open formula, while the induced factual argument properties are taken tobe the values bound to those variables to close the formula. Thus, assertion class instancesrepresent closed formulas, which are also stipulated by the semantics of the language tobe true. Other types of properties of assertions allow the structuring of assertions in termsof their parts, as for other object categories, but in this case parts are interpreted as logicalconjuncts. The resulting representation is somewhat akin to, but semantically richer than,decision tree representations of complex formulas.

A formal semantics is given for RML by defining a mapping from RML descriptions intoa set of assertions in FOPC (Greenspan et al., 1986). These include all RML frameworkaxioms as well as predicates and axioms associated with the specific classes defined by themodeler. Assertions translate into corresponding expressions in FOPC. However, the nota-tion of FOPC provides no structuring/organization principles or other support for buildingand maintaininglarge theories (the essence of Software Engineering)—a defect intendedto be addressed by RML and its data modeling cousins.

The representation of time is essential for languages intended to model dynamic appli-cations, if one is to prevent an implementation bias toward imperative programming style.RML assumes a linear model of time points and encourages history-oriented modelingof an application, which consists of describing possible histories for an entity or activity(or assertion, for that matter). Accordingly, there is a time argument in every predicateappearing in an RML assertion.

To summarize, RML is an object-centered modeling language, in the sense that a modelis built by repeatedly describing classes and individuals, related by binary relationships.The objects have intrinsic identity and act as anchors around which information is grouped.From its ancestors in knowledge representation, RML preserves the belief in the significance




of notions such as “class membership” and “subclass” (with concomitant forms of “in-heritance”) in their more general form: an individual may belong to multiple classes, aproperty may belong to multiple property categories, a class may have many superclasses.What has been omitted from semantic network and frame representations are ideas suchas various kinds of defaults, procedural attachments, and support for representing indefi-nite/incomplete/partial information about individuals. The result is a greatly simplified and,we believe, a more easily learnable/useable language; the price paid is the inability of themodel to perform automatically inferences of various kinds, especially about individuals.This trade-off seems to be the right one to make, since the requirements model is mostlyabout the generic concepts in the domain, and much less about the individuals occurring init any particular moment or our knowledge of them. Similarly, the time model is quite sim-ple, in contrast with alternatives involving branching time, or modal operators. Also fromKnowledge Representation, albeit the logicist camp, comes the firm belief in the importanceof a well-defined, formal semantics, preferably based on a well-understood formalism suchas First Order Predicate Calculus.

The “value-added” to the ideas in knowledge representation are the emphasis on the spe-cific notions of activity and assertion, as well as the property categories motivated by theirempirically frequent occurrence in requirements, and other requirements languages. Themodeling framework described above lends itself to amethodologyfor building require-ments models according to a technique that can be characterized as “stepwise refinementby specialization,” (Borgida et al., 1984)—the idea of building descriptions by developingclass hierarchies in a systematic and incremental manner.

RML should be seen as an early example of languages catering to the requirements engi-neer. Another early example is the Conceptual Information Model, CIM (Bubenko, 1980),perhaps the first comprehensive proposal for a formal requirements modeling language.Its features include an ontology of entities and events, and an assertional sublanguage forspecifying constraints, including complex temporal ones.

The GIST specification language (Balzer et al., 1982), developed at ISI over the sameperiod as RML, was also based on ideas from knowledge representation and supportedmodeling the environment; it was influenced by the notion of making the specificationexecutable, and by the desire to support transformational implementation. It has formedthe basis of an active research group on the problems of requirements description andelicitation (e.g., (Johnson et al., 1992)).

ERAE (Dubois et al., 1986, 1992) is another early effort that explicitly shared with RMLthe view that requirements modeling is a knowledge representation activity, and had a basisin semantic networks and logic. Finally, the KAOS project constitutes another significantresearch effort which strives to develop a comprehensive framework for requirements mod-eling and requirements acquisition methodologies (Dardenne et al., 1993). The languageoffered for requirements modeling provides facilities for modeling goals, agents, alterna-tives, events, actions, existence modalities, agent responsibility and other concepts. More-over, KAOS relies on a meta-model to provide a self-descriptive and extensible modelingframework.

RML was conceived at an early stage in the development of the object-oriented paradigm,and has been recognized as having been influential in the development of other requirements




modeling languages. Since their original development more than ten years ago, the ideasin RML have gone through several incarnations, and are still evolving. The Telos language(which replaces the fixed ontology of RML with an extensible one) has been adopted foruse by a number of research groups worldwide, and has been implemented by at least threeindependent groups. The ConceptBase implementation (Jarke et al., 1995) is most completeand most widely used. Experiences in the use of RML and Telos has been reported in theliterature, and are cited in (Greenspan et al., 1994). RML has also served as the startingpoint for developing more specialized ontologies, such as thei ∗ framework described inSection 4. A new language (to be called Tropos) is being developed by extending Telos toincorporate some of these ontologies (Yu et al., 1996).

3. Semantics of activities and the frame problem

There are two principal arguments in favour offormal requirements modeling languages.Firstly, a formal account of such a language could be used to resolve differences in interpre-tation of a requirements model, at least sometimes. Secondly, such a formal account couldbe used as basis for formally analyzing the model, to establish consistency or otherwiseprove interesting properties about it, thereby helping people understand it. We focus inthis section on the semantics of activities, and point out that the semantics offered by RMLand other languages in the same family have (hidden) difficulties in the way they deal withtheframe problem(McCarthy and Hayes, 1969). In addition, we describe an adaptation toformal specifications of a solution to the frame problem proposed in (Reiter, 1991), whichhas been reported in (Borgida et al., 1993).

To bring this issue into focus, let us consider a particular kind of consistency checkthat we might wish to perform on requirements models: In RML, and many of the above-mentioned requirements modeling languages, one can associate with classes of entities“integrity constraints”—assertions of invariance that are supposed to hold at all times in(the model of) the world. In the earlier example involving hospitals, typical assertionswould be

“The number of patients assigned to a ward cannot exceed the capacity of that ward”

and

“The number of nurses on duty on a ward must be more than one”

In RML, these could be expressed asinvariant assertion properties associated with wards,for example. Then one kind of consistency check of our model would be to verify that thedescriptions of activities are consonant with these invariants, in the sense that if a conditionheld at the time the activity started, then it also held at the end of the activity.3

Consider a simplified description of theAdmitPatient activity, using more standardlogical formulae for assertions, where properties are treated as functions while classes aretreated as unary predicates. (This representation is closer in form to that of many otherrequirements modeling languages, such as ERAE (Dubois et al., 1986).)




Activity Class AdmitPatient withparticipants

pt : Personwrd : Wardd: Doctor

preconditionadmitted # (wrd )< capacity (wrd ) < not Patient (pt )

<

CanAdmit (d,wrd )postcondition

admitted # (wrd ) <- admitted #(wrd ) + 1

<

Patient (pt )

Unfortunately, the above description can be interpreted in at least two ways. The firstreading allowsexactlytwo changes from the model state at the time the activity started toits conclusion—namely those necessary to make the postcondition predicate true. (In thiscase, propertyadmitted # is incremented, andpt becomes an instance ofPatient .) Thesecond reading, known to those familiar with formal program proofs, requiresat leastthesechanges, but allows possibly more. Thus the interpretation of the first reading includes anunstated assumption that “nothing else changes”, while the second reading allows changesto some other predicateR that is also a component of the model state (e.g., the predicateCanAdmit ); or it also changes the predicatePatient for some other argument.

To understand fully the difference, let us consider the translation of the above into pred-icate calculus, according to the formal semantics of RML (Greenspan et al., 1986). Recallthat all predicates and functions have an additional argument, time, and that all activi-ties have an associated start and stop time. Then, given the semantics of property cate-goriesparticipants—which are time invariant properties of an activity,preconditions, andpostconditions—which are to evaluate all predicates/functions in the assertion at the startand stop time respectively, the above corresponds to the following assertion

∀ a,t,t ´,pt,wrd,d .OCCURS(a,AdmitPatient,t,t ') < pt =pt (a,t ) < wrd =wrd (a,t ) < d=d(a,t )

==> [ admitted #(wrd,t )< capacity (wrd,t )) < ¬ Patient (pt,t )

<

CanAdmit (d,wrd,t ) ] <

[ admitted #(wrd,t ') = admitted #(wrd,t ) + 1

<

Patient (pt,t ') ]

Here,OCCURS(a,AdmitPatient,t,t ') is an assertion indicating thata is an activitytoken instance of classAdmitPatient , with starting timet and stopping timet '.

Rather than continually deal with such unwieldy formulas, we will abbreviate theabove to

AdmitPatient (pt,wrd,d )

PRE: admitted #(wrd ) < capacity (wrd ) < ¬Patient (pt )

<

CanAdmit (d,wrd )POST: admitted #'(wrd ) = admitted #(wrd ) + 1

<

Patient '(pt )




where we use the convention that primed predicates and functions are evaluated att '—theend of the activity, while unprimed ones att —the start of the activity.

Given this notation (which happens to be much like that of standard formal programspecification), one of the consistency proof obligations forAdmitPatient mentionedabove amounts to demonstrating the validity of the following formula:

∀x. (Ward(x) ==> nursesOnDuty #(x) > 1) <

[admitted #(wrd ) < capacity (wrd ) < ¬Patient (pt )

<

CanAdmit (d,wrd )] <

[admitted #'(wrd ) = admitted #(wrd ) + 1<

Patient '(pt )]==> ∀x. (Ward'(x) ==> nursesOnDuty #'(x) > 1)]

This cannot be proven because the postcondition of theadmit activity says nothing aboutnursesOnDuty #' norWard'—i.e., the predicatenursesOnDuty # is not constrained atthe stopping time of the activitya (which is an instance ofAdmitPatient ) by the formuladescribing the activity.

Therefore, the modeler must take the trouble to mention explicitly not just the things thatare changed by the activity, but also all those that are not. Thus, one has to explicitly add aconjunction of formulas of the form

∀x . (P(x) ==> P'(x)) < (¬P(x) ==> ¬P'(x) )

for every predicateother thanPatient , includingCanAdmit , Ward, and formulas ofthe form

∀x .f (x) = f '(x)

for every function other thanadmitted# , includingnursesOnDuty # andcapacity .Even to prove that the first constraint (concerning the capacity of wards) is maintained:

∀x .capacity (x) ≥ admitted #(x) <

[admitted #(wrd ) < capacity (wrd ) < ¬Patient (pt ) < canAdmit (d,wrd )] <

[admitted #'(wrd ) = admitted #(wrd ) + 1

<

Patient '(pt )]

==> ∀x .capacity '(x) >- admitted #'(x)

requires knowing that theadmitted # of wards other thanwrd is unchanged; for this, weneed additional post-conditions stating thatPatient andadmitted # remain unchangedfor all arguments other thanpt andwrd :

< ∀y . y =/ wrd ==> admitted #(y) = admitted #'(y)

< ∀x . x =/ pt ==> (Patient (x) <===> Patient '(x))

The extra clauses that were needed to state explicitly that things remain unchanged havebeen calledframe axioms(McCarthy and Hayes, 1969), and the general problem of statingthem succinctly is theframe problem.




Suppose we had an additional property,SleepingOnWard , which specifies thesetofpatients with beds on a ward; if we wanted to specify (according to a different versionof AdmitPatient ) that the patient is assigned a bed, and at the same time “frame” theremaining patients with beds on the ward, we could use in the post-condition the additionalconjunct

∀x, y . SleepingOnWard '(x, y) <==> [(x=pt

<

y=wrd ) < SleepingOnWard (x, y)]

Suppose that we now wanted to specialize the activityAdmitPatient to Admit-ChildPatient , refining the post-condition to add some other values tosleeping-OnWard (say, the childs guardian); then the desired addition

∀x, y .SleepingOnWard '(x, y)<==>[ (x=pt<

y=wrd ) <(x=guardian (pt ) < y=wrd )< SleepingOnWard (x, y) ]

contradicts the earlier one unlessguardian (pt )=pt , which is probably prohibited by thedomain semantics. Therefore, this refined condition cannot be obtained from the postcondi-tion associated withAdmitPatient through someadditiveprocess, like strict inheritance.Note that there would be no problem if the postconditions were stated as follows:

SleepingOnWard '(pt ,wrd ) (for AdmitPatient)

SleepingOnWard '(guardian (pt ),wrd ) < SleepingOnWard '(pt ,wrd )

(for AdmitChildPatient )

but, of course, in this case we would again be in trouble because without explicit framingconstraints,SleepingOnWard' need bear no relationship what-so-ever toSleeping-OnWard. Note that this is not an isolated problem—a significant portion of models forInformation Systems have just such set-valued properties.

The necessity of stating explicitly that an activity leaves a great part of a model state un-changed has been shown to make activity descriptions longer, more difficult to comprehendand change, and more error prone (what if the modeler overlooks some frame axioms?).Also, as illustrated above withAdmitChildPatient , explicit framing assertions in post-conditions make it impossible to use full inheritance/specialization for activities.

Since it was first pointed out by McCarthy and Hayes (1969), the frame problem hasbeen the subject of intense investigation in Artificial Intelligence. In particular, the fieldof nonmonotonic(defeasible) reasoning(see (Reiter, 1987) for a review ) has provided theformal tools of choice to attack it. One approach to the above problem (corresponding to thefirst reading of the initial specification) is then to embed the frame axiom into the language,by having default rules that state for every predicateP “if P(x) holds, then assumeP′(x)holds, unless this leads to a contradiction”. The specification language TaxisDL (Borgidaet al., 1990) did in fact adopt this approach from Knowledge Representation. This solution,and others based on non-monotonic logic, are however not entirely felicitous because,among others, non-monotonic reasoning is unfamiliar and notoriously difficult even forexperts, so it is likely to be misused in modeling. Another drawback of embedding the




frame axioms in the semantics of a requirements modeling language is that, in general, themodeler may want the freedom not to make them in some circumstances4.

We review next an alternative approach to the frame problem based on work reported in(Reiter, 1991) and adapted for procedure specifications in (Borgida et al., 1993, 1995). Theproposal relies on reifying activities—treating them as objects denotable by expressions inthe language—and then providing axioms explaining under what circumstances a predicateor function might change from one model state to the next. Reification of activities isachieved by making activity names be function symbols of a special event sort, with theparameters of the activity becoming the arguments of the function. To talk about actionsin formulas, we also introduce the predicateOccur ( ), which takes an action argument,and a variableα for actions. The intended interpretation ofOccur ( ) is that the activitywhich is its argument was completed successfully.5

For every activity, we must have a so-calledcausalor effect axiomstating that the pre-and post-conditions must have held if the activity occurred. For example, the effect axiomfor theAdmitPatient activity would be

∀x,y . Occur (AdmitPatient (x,y))==>

[admitted #(y) < capacity (y) < ¬Patient (x) <

∃d.CanAdmit (d, y) ] <

[admitted #'(y) = admitted #(y) + 1

<

Patient '(x)]

Note that this omits frame assertions, which will be specified separately throughexpla-nation closure axioms: For every predicateR, the modeler must write two such axiomsdescribing under what circumstancesR might change truth value (from positive to nega-tive or negative to positive). Likewise, for functions a single axiom describes under whatconditions the function changes value from one state to the next. For example, considerthe predicatePatient , and suppose that the only activity affecting it isAdmitPatient ,though there may be many other activities around, such asGetInfo , ChangeDiagnosis ,etc. The explanation closure axioms then are

∀α∀p[Patient (p) < ¬Patient '(p) < Occur (α) ==> false ]

indicating that no activity “removes patients”, while

∀α∀p[¬Patient (p) <

Patient '(p) < Occur (α)==>

∃w.α = AdmitPatient (p,w)]

indicating that onlyAdmitPatient can turn a thing into a patient. Likewise, the axiomfor admitted # is

∀α∀w[admitted #(w) =/ admitted #'(w) < Occur (α)

==> ∃p.α = AdmitPatient (p,w)]

Observe that these axioms capture the fact that the only source of change for the predicatePatient is the activityAdmitPatient , which means that the pre/post-conditions of all




the other activities need not say anything about how they affectPatient . Note also thatthe explanation closure axiom foradmitted # doesn’t explain the nature of the changeto that function. This information is, in fact, provided by the original effect axioms of theactivity, and doesnt have to be restated.

Proof obligations can now be recast to assume as a premise the occurrence of an activity,along with the integrity constraint holding in the state before the activity, and then showthat the invariant holds after the activity:

∀pt ,wrd . (∀x .nursesOnDuty #(x) > 1) < Occur (AdmitPatient (pt ,wrd ))

==>(∀x .nursesOnDuty #'(x) > 1)

For this proof to succeed, there has to be an additional axiom stating that only one activity oc-curs at a time, and we need to know that the actions are distinct if their names and argumentsare distinct: for every pair ofi and j ,∀x,y[(ai (x) =/ aj (y)) < (ai (x) = ai (y) ==> x=y)].

We emphasize that here it is the responsibility of the modeler to write down the changeaxioms, and that these may need to be modified as new activities areadded or old ones arealtered. For example, if we now were toadd an activityDischargePatient , with itsown description

DischargePatient(pt,w)

PRE: Patient (pt )POST: admitted #'(w) = admitted #(w) - 1

< ¬Patient '(pt )

then we would have to add appropriate clauses to the explanation closure axioms ofPatient andadmitted #. This can be done relatively cleanly by adding disjuncts:

∀α∀p.Patient (p) < ¬Patient '(p) < Occur (α) ==> false

< (∃w.α = DischargePatient (p,w))

∀α∀w.admitted #(w) =/ admitted #'(w) <

Occur (α) ==>

(∃p.α = AdmitPatient (p,w)) < (∃p.α = DischargePatient (p,w))

Note that although the frame axioms about a single activity are now spread over possiblyseveral explanation closure axioms, the process of thinking about these axioms is systematic.In particular, if there is a change in the postcondition of some activity a, then we must onlyconsider the explanation closure axioms for those predicates which wereadded , droppedor modified by the change; and even here, we only need to consider that sub-formula ofthe explanation axiom which starts with∃z(α = a... ). Furthermore, any time the stateis extended with another predicate or function, there is no concern that this now mightinadvertently be left open to change by some activity specification.

The above followed a line of approach began in the field of Knowledge Representa-tion by Haas (1987), Pednault (1989) and Schubert (1990), which culminated in Reiter’s(1991) solution to the frame problem. Reiter (1991) is in fact presented in a more gen-eral framework, using the so-called “situation calculus”, where all predicates and functionshave an additional situation argument, and where actions are explicitly used to make state




transitions. Once again, we were able to simplify the original Knowledge Representationformalisms, in this case eliminating explicitly named states and replacing them by thefamiliar primed/unprimed notation of program specifications. From the Knowledge Repre-sentation research area, we were also able to obtain techniques for automatically generatingexplanation closure axioms in some cases, and a semantic characterization of this syntactictransformation in terms of minimizing the set of changed predicates (see (Borgida et al.,1995) for details).

This approach to dealing with the frame problem has led to the development of theGOLOG language (and its concurrent counterpart CONGOLOG), which can be used in thedesign and enactment of complex actions (procedures) based on abstractly specified primi-tive actions. A prototype implementation of a GOLOG interpreter is available (Lesperanceet al., 1994). GOLOG is currently being used in a project to develop tools to assist in theanalysis and redesign of business processes (Yu et al., 1996), based on the ideas reviewedin the next section.

4. New ontologies for requirements modeling

In his classic account of requirements analysis, Douglas Ross argues that requirements mustcapture, among other things, “...why a system is needed, based on current and foreseenconditions, which may be internal operations or an external market...,” (Ross and Schoman,1977a). Suppose that a health insurance company is putting together requirements for anew claims processing system. In defining these requirements, the company may want toconsider alternative ways of doing business. For instance, under current procedures, aninsured patient has a doctor whom he or she goes to first for diagnosis. Once diagnosed,the patient is sent to a hospital for treatment. The patient pays both doctor and hospital andfinally gets reimbursed for expenses covered by her policy. In considering a new claimsprocessing system, the insurance company might ask questions such as:

• Why does the doctor have to send the patient to a hospital for treatment?• Why does the company hire medical assessors to assess and pre-approve treatment plans?• Why does it take so long to have claims processed?• What other concerns would arise if medical treatment was done differently?

Traditional modeling techniques such as structured analysis, data flow diagrams, andentity-relationship modeling focus on the modeling of activities and entities. While theseare important concepts for systems development, they offer little help in the search forinnovative alternative solutions to business problems. More generally, existing modelshave been designed for describingwhata business process is like and how the new systemwill fit in. They cannot expresswhy the process is the way it is. The motivations, intentsand rationales behind the activities (by people or software systems) and entities are missingfrom these models. Most organizational settings involve many participants or players, withcomplex relationships among them. These relationships arestrategicin the sense that thedifferent stakeholders are concerned about opportunities and vulnerabilities, and seek toprotect or further their interests in an attempt to redesign the organization, with the help




of software systems. Indeed, a central argument in requirements engineering as well asin business process reengineering is that if one does not understand why things are donethe way they are, one is likely to simply automate outdated processes (thereby pavingthe proverbial ‘cow path’), and miss the opportunity for innovation in redesigning workprocesses.

In the medical insurance example, the company wants to reduce costs by minimizingthe payout to claims and knows that hospitals are generally expensive. Consequently, thecompany would like to know why patients need to go to a hospital for treatment. At thesame time, the company wants to keep customers happy so that they continue to renew theirpolicies. Customers, on the other hand, want to have fast and effective medical treatmentand to have their claims approved and reimbursed quickly. Hospitals want to have full-timeuse of their facilities (or, maximize profits if they are for-profit institutions), but they alsowant to meet patient needs. What information is collected and used by each stakeholderand the way they set up their business (including existing or future software systems) verymuch reflects such strategic interests.

AI concepts and techniques provide a useful starting point for developing new ontologiesfor modeling these relationships. Unlike traditional systems approaches, which characterizesystems primarily in terms of relationships among inputs and outputs, AI offers character-izations of behavior that are based on intentional properties, such as goals, means-endsreasoning, and plans. The notion of “agent” provides a higher-level abstraction for analysisand design (Newell, 1982). Agents do not simply send outputs and receive inputs, but co-operate with each other to accomplish shared tasks and joint goals, and may also competewith each other (Castelfranchi et al., 1992).

Requirements modelling, however, is not concerned with creating automated agents (atleast not directly). The objective of a richer ontology for requirements modelling is to provideappropriate modeling concepts that can be used by requirements engineers to describe theenvironments within which software systems will operate. Such environments may includemany human as well as non-human agents whose behaviour are not fully predictable orcontrollable. Their intentional contents, unlike those of artificial agents, are accessibleonly in very limited ways. AI techniques, therefore, need to be adapted before they can beapplied to requirements modelling. The adapted techniques would be used by requirementsengineers to express knowledge about systems and their environments, and to reason aboutthem—for example, to assess the implications of alternative ways of using informationsystems in an organization.

The rest of this section sketches an ontology for capturing and comparing strategicconcerns in an organization. The ontology is based partly on a representational frameworkcalled i ∗ (for “distributed intentionality”) where business processes are taken to involvesocial actors who depend on each other for goals to be achieved, tasks to be performed, andresources to be furnished. Thei ∗ framework includes a Strategic Dependency model—fordescribing the network of relationships among actors, and a Strategic Rationale model—for describing and supporting the reasoning that each actor has about its relationships withother actors. These models have been formalized using intentional concepts such as goal,belief, ability, and commitment (e.g., Cohen and Levesque, 1990). The framework hasbeen presented in detail in (Yu, 1995) and has been related to different application areas,




including requirements engineering (Yu, 1993), business process reengineering (Yu andMylopoulos, 1996), and software processes (Yu and Mylopoulos, 1994).

A Strategic Dependency model is a graph, where each node represents anactor, andeach link between two actors indicates that one actor depends on the other for something inorder that the former may attain some goal. We call the depending actor thedependerandthe actor who is depended upon thedependee. The object around which the dependencycentres is called thedependum. By depending on another actor for a dependum, an actoris able to achieve goals that it is otherwise unable to achieve, or not as easily or as well.At the same time, the depender becomes vulnerable. If the dependee fails to deliver thedependum, the depender would be adversely affected in its ability to achieve its goals.

For example, a patient can achieve the goal of being cured by depending on a doctor.Without the opportunity of using the services of a doctor, the patient may not be able toachieve that goal. On the other hand, the patient is vulnerable to the doctor’s not providingproper care. The model distinguishes among four types of dependencies—goal-, task-,resource-, and softgoal-dependency—based on the type of freedom that is allowed in therelationship between depender and dependee. Three levels of dependency strengths aredistinguished based on the degree of vulnerability.

Figure 4.1 shows a Strategic Dependency model from the related domain of automobileinsurance. A car owner depends on the insurance company to reimburse for repairs froman accident (ClaimsPayout ). For this, the car owner pays insurance premium in orderto have coverage (RepairsBeCovered ). The insurance company wants to offer goodservice to the customer in order to keep the business (CustomerBeHappy ). To maintainprofitability, the company depends on appraisers to appraise damages so that only theminimal necessary repairs are approved.

The car owner depends on the claims appraiser for a fair appraisal. However, the appraisercan be expected to act in the interests of the insurance company because of his dependenceon the latter for continued employment. The car owner, in turn, can depend on the bodyshop to give an estimate that maximizes the car owner’s interest, since the body shop

Figure 4.1. Strategic dependency model of traditional auto insurance (the “as is” arrangement).




depends on the car owner for repeat business. An analysis of such strategic dependenciesat a more detailed level would reveal the role of information and information systems inthese relationships.

Information systems that support the claims process have this kind of understanding en-trenched as implicit assumptions. These assumptions are often ignored during organizationalanalysis because existing modeling techniques do not encourage or support the modelingof relationships that involve intentional concepts. Without this deeper understanding, it isdifficult to evolve the organization and its information systems to meet changing needs,as evidenced by the problem of “legacy information systems” (and “legacy business pro-cesses”). With rapid changes in business environments, and recent management conceptssuch as business reengineering, well-entrenched relationships are being re-examined andradically reconfigured. Traditional business patterns and assumptions can no longer betaken for granted.

Hammer and Champy (1993, pp. 136–143) describe a hypothetical but plausible scene inwhich a process redesign team explores new innovative solutions to revitalize an automobileinsurance business. Since it costs as much to process a small claim as a large claim, one wayto reduce administrative costs is to reduce insurance company involvement in dealing withsmall claims. “Why not let the insurance agent handle small claims?”, it was suggested.The insurance agent will do all the inquiry and payout, while the insurance company willconcentrate on large claims that have more significant impact on profitability. The agentgets to cement his relationship with the customer, while the customer is more likely to geta fair hearing from the agent about a fair payout amount. This keeps the customer happy,which is what the insurance company wants.

Figure 4.2 shows the Strategic Dependency graph for this new business process configu-ration. Needless to say, shifting the claims handling responsibilities to the insurance agentmeans that the information needs of the insurance agent are also radically altered. Based onthe new configuration of strategic dependencies, one could derive what information needsto be shared or sent among insurance agents and the insurance company, and how accurateand up-to-date they need to be.

Figure 4.2. Strategic dependency model for alternative 1 (“let the insurance agent handle it”).




Figure 4.3. Strategic dependency model for alternative 2 (“let the body shop handle it”).

Once the traditional wisdom of how an insurance business should be run is no longerregarded as sacred, even more radical solutions could emerge. “Why not let the body shophandle the claims?!”, someone else suggested. Traditionally, body shops are not likely tobe on the side of the insurance company. For example, one would not expect an insurancecompany to be willing to pay according to a body shop’s repair estimates, since the bodyshop is on the customer’s side, as illustrated by the strategic dependencies in figure 4.1.However, for small claims, it may not be a bad idea to bypass all the paperwork and helpthe customer get his car fixed as quickly as possible. This meets the customer’s goal tohave his car fixed promptly, while reducing costs dramatically for the insurance company.However, this approach raises concerns about possible fraud, which need to be addressed.Figure 4.3 shows the Strategic Dependency model for this proposal.

The Strategic Dependency model encourages a deeper understanding of a business pro-cess by focusing on intentional dependencies among actors, beyond the usual understandingbased on activities and input-output flows. It helps identify what is at stake, for whom, andwhat impacts are likely if a dependency fails. The model relies on a characterization of ex-ternal relationships among actors, thus avoiding the need to get at their internal motivationsand intentions.

Although a Strategic Dependency model provides hints about why a process is structuredin a certain way, it does not sufficiently support the process of suggesting, exploring, andevaluating alternative solutions. That is the role of the Strategic Rationale model.

A Strategic Rationale model is a graph with four main types of nodes—goal, task, re-source, and softgoal—and two main types of links—means-ends links and task decom-position links. A Strategic Rationale graph describes the reasoning behind each actor’srelationships with other actors, thus revealing the internal linkages that connect externalstrategic dependencies.




Figure 4.4. Strategic rationale model to support reasoning about re-engineering the claims handling process.

A process is often depicted as a collection of activities with entity flows among them(as in a “work flow” analysis). For example, a claims handling process would includesuch activities as verifying the insurance policy coverage, collecting accident information,determining who is at fault, appraising damages, and making an offer to settle. In theStrategic Rationale model, we arrange these into a hierarchy of means-ends relationshipsand task decompositions (figure 4.4). When a process element is expressed as a goal, thismeans that there might be different possible ways of accomplishing it. A task specifiesone particular way of doing things (of accomplishing a goal), in terms of a decompositioninto subtasks, subgoals, resources, and softgoals. In seeking ways to redesign a businessprocess, goals offer potential places to look for improvement. An ambitious redesign effortneeds to discover and rethink high-level goals—by asking “why” questions, rather than becontent with solutions for low-level goals. Higher goals are discovered by asking “why”questions. Once sufficiently high-level goals have been identified, alternatives may besought by asking “how else” the goals can be accomplished.

In the auto insurance example described in (Hammer and Champy, 1993), the reengi-neering team wanted to consider radical solutions, by identifying a high-level goal: thatclaims be settled. Unencumbered by current business thinking about how this goal should beaccomplished, the team arrived at innovative proposals that involve new strategic businessrelationships with insurance agents and body shops.

Each alternative may have different implications for a number of quality goals, or “soft-goals”, such asCustomerBeHappy , FastProcessing , andProfitable . A softgoal




is one which does not havea priori, clear-cut criteria of satisfaction. Although some ofthese can be measured and quantified, a qualitative approach can be used at the stage ofexploring the space of alternatives. Contributions to softgoals can be positive or negative,and are judged to be adequate or inadequate.

By explicitly representing means-ends relationships, the Strategic Rationale model pro-vides a systematic way for exploring the space of possible new process designs. Genericknowledge in the form of methods and rules can be used to suggest new solutions and toidentify related goals.

The models ofi ∗ are represented in the conceptual modeling language Telos (Mylopouloset al., 1990), a descendent of RML, so that knowledge structuring mechanisms such asclassification, generalization, aggregation, and time can be used in conjunction with theintentional concepts ofi ∗. While i ∗ takes its intuitive underpinnings from theories of orga-nization, the formal characterization of the framework draws on AI concepts and techniques.Research that focuses on the specification of agents is particularly relevant, e.g., (Cohenand Levesque, 1990; Thomas et al., 1991; Lesperance, 1991; Castelfranchi et al., 1992).Adaptations are needed sincei ∗ is intended for descriptive modeling (not prescriptive spec-ification), and deals with intentional relationships (not just intentional properties of agentsin isolation). On the other hand,i ∗ does not attempt to address issues of automatic planning,which are prevalent in the Aritificial Intelligence work dealing with agents and goals. Thei ∗ framework is described in detail in (Yu, 1995a).

Thei ∗ framework contains components (e.g., softgoals) that are not readily amenable toformalization using conventional logical techniques. The notion of softgoal was adaptedfrom a framework for representing and reasoning with non-functional requirements,(Mylopoulos et al., 1992; Chung, 1993b). The proposed framework includes, in addi-tion to softgoals,methodsandsoftgoal dependencies. Unlike conventional goals which areeither met or not met, a softgoal issatisficedif there is sufficient positive support towards itsachievement and little negative support. The formalization of satisficing is based on ideasfrom AI truth maintenance systems, but also decision support systems such as (Potts andBruns, 1988). Examples of softgoal dependencies include anANDdependency betweensoftgoalsg1, g2, . . . , gn andg—meaning that if all ofg1, g2, . . . , gn are satisficed, so isg—or asub dependency betweeng0 andg—meaning that satisficing ofg0 contributes butdoes not guarantee satisficing ofg. Methods represent particular ways of decomposing orsatisficing softgoals. These methods are meant to be domain-specific in the sense that therewill be different methods for decomposing security goals as opposed to user-friendlinessones. The framework is explored in detail for performance and security requirements in(Nixon, 1993) and (Chung, 1993a).

The i ∗ framework illustrates how the introduction of a new ontology can enrich themodeling of a certain class of phenomenon—in this case, organizational environments. Thei ∗ framework thus extends earlier work on goal-oriented and agent-oriented requirementsengineering, e.g., (Feather, 1987; Dubois et al., 1994; Dardenne et al., 1993; Bubenko,1993; Chung, 1993b), by elaborating on the notion of intentional, strategic agents in social,organizational settings. Thei ∗ work is still at an early stage of development. Preliminaryassessments of its use in real settings (Briand et al., 1995) have been positive.




In a similar vein, (Greenspan and Feblowitz, 1993) has proposed a specific ontology fora class of models, those capturing requirements information for service-oriented systems.A service-providing enterprise is modeled from four viewpoints:

• services that meet goals or address the needs of the customers;• work flows or processes performed by the enterprise to provide the services;• organizational units that serve as loci of responsibility for the work;• systems that provide the capabilities and resources for performing the work.

As in Yu’s work, this raises modeling and analysis questions in terms of responsibilities,resource dependencies, roles and positions.

A project is underway to bring thei ∗ framework and the GOLOG language (Section 3)together in a tool set to assist business process analysis and redesign (Yu et al., 1996).i ∗ toolsare used to model and reason about alternative configurations of strategic actor relationships.Selected configurations are then developed more fully into process descriptions in GOLOGand analyzed using simulation and verification tools.

5. Conclusions

We have presented examples where concepts from research on Knowledge Representationwere adapted to support various aspects of requirements modeling languages, including:the basic principles of conceptual modeling for RML, the specification and semantics ofactivities used by most requirements modeling languages, and the introduction of a newontology for representing certain types of requirements. These examples were meant toillustrate the important role Knowledge Representation can play in capturing softwareengineering knowledge.

Requirements capture is not the only area that has taken to heart the importance ofKnowledge Representation as a foundation for software knowledge bases. For example,researchers at AT&T Bell Laboratories (Devanbu et al., 1991; Devanbu, 1994; Selfridge,1991; Selfridge and Terveen, 1996) have shown how KLONE-style representations supportthe capture, organization, integration and consistency checking of knowledge related tosoftware, including architectural design principles and the ubiquitous plans underlyingmuch software. Along the same lines, the DAIDA project (Jarke, 1993) has emphasized theuse of a software knowledge base as the centrepiece of a software development environment.(Bargia and Jarke, 1992) presents a collection of papers, some of which describe researchon the topic of representing software engineering knowledge.

It may be worthwhile to consider some important differences6 that arise if the goal ofsoftware knowledge capture and representation is (i)software knowledge managementasopposed to (ii)development of intelligent assistants:

Type of representation used—In the former case, the representations used can mix formal,declarative representations with informal ones, such as natural language text, diagramsand the like, as long as the human users of the knowledge base can interpret these. (Anexample of this is RML’s close connection to SADT diagrams, which provide a “road




map” for acquiring and reading the formal model.) For the latter case, the representa-tions used are necessarily formal and often procedural (e.g., production rules, or frameswith procedures attached as facets to slots) since these are best suited for the efficientrepresentation of local heuristics used by the intelligent assistant to achieve competencein the performance of some software engineering task.

Type of knowledge captured—In the former case, the knowledge base contains mostlydescriptive information about thespecificsoftware system being developed, and its rela-tionship to its environment. In the latter case, the relationship to the environment is lesssignificant, but there is frequent representation ofheuristics, which are applicable to afamily of applications (e.g., divide and conquer techniques), independent of the specificapplication domain.

Coverage of the software knowledge base—In the former case the coverage can be quitebroad, possibly including all of the types of software knowledge listed in the introduction.In the latter case coverage is usually narrow, specific to the task the intelligent assistantis intended to perform.

Completeness of the software knowledge base—In the former case the knowledge basecan be useful even if it is incomplete (imagine having a software knowledge base for alegacy system which answers at least half your questions). In the latter case the knowledgebase must be sufficiently complete to support the inferences required for the performanceof the task, otherwise the whole system is useless.

Criteria for success of the knowledge capture activity—In the former case, the primarycriterion is how much less time does the software engineer spend chasing or otherwisediscovering knowledge about the system she is working on, and how accurate and com-plete is the knowledge acquired; in the latter, the ultimate criterion is how well theknowledge-based system performs its intended task.

Software engineering practice worldwide is experiencing a profound shift of focus fromsoftware development to software migration (Rubin et al., 1995). Software migration in-volves legacy systems which need to be moved to new platforms, languages or architectures,endowed with new functionality to meet changing organizational objectives and adhering toever-stricter quality standards. A very promising (if notthemost promising) way to meet thechallenges that arise from this shift of focus, is to ensure that in the future, every softwaresystem comes with its own knowledge base, containing most of the information neededto support migration and maintenance activities. For this reason, we believe that the roleKnowledge Representation in Software Engineering advocated in this paper will grow inimportance and in impact.

Acknowledgments

The ideas presented in this paper are based on collaborative research with many col-leagues, including Lawrence Chung (University of Texas at Dallas), Eric Dubois (Univer-sity of Namur), Sol Greenspan (GTE Laboratories), Matthias Jarke (Technical Universityof Aachen), Brian Nixon (University of Toronto) and Ray Reiter (University of Toronto).




The research has been supported in part by the Natural Sciences and Engineering Re-search Council of Canada, the Canadian Institute of Advanced Research, the InformationTechnology Research Centre of Ontario and the Institute of Robotics and Intelligent Sys-tems, funded in part by the government of Canada. A. Borgida was also supported in partby Grant IRI91-19310 from US NSF.

Notes

1. After all, a specification is by definition prescriptive: itspecifiesdesired properties for a system to be built2. For further discussions about formality in Requirements Engineering see (Fickas, 1991).3. An alternative view, perhaps more appropriate for a requirements description, would be to have the pre and

post conditions of the activitiesautomaticallyaugmented to guarantee that the invariant constraints are indeednot violated.

4. Ryan (1991) and Hagelstein and Roelants (1992) are two proposals which embedsameof the frame axiomsin the semantics of the specification language by allowing the specifier to differentiate actions or attributes forwhich the assumptions are to be made from those for which they are not.

5. Note that in RML, activities are objects already, and OCCUR corresponds to checking if some token is aninstance of the class at the appropriate time.

6. This comparison suffers from the usual drawbacks associated with stereotyping: there is a wide variety ofknowledge based software assistants and our characterization obviously applies to some much better than toothers.

References

Anderson, J.R. and Durney, B. 1993. Using scenarios in deficiency-driven requirements engineering.ProceedingsFirst IEEE International Symposium on Requirements Engineering, San Jose.

Balzer, R., Goldman, N., and Wile, G. 1982. Operational specifications as a basis for rapid prototyping.ProceedingsSymposium on Rapid Prototyping, ACM Software Engineering Notes, Vol. 7, No. 5, pp. 3–16.

Barstow, D. 1987. Artificial intelligence and software engineering.Proc. 9th International Conference on SoftwareEngineering, Monterey, pp. 200–211.

Borgida, A., Mylopoulos, J., and Wong, H.K.T. 1984. Generalization/specialization as a basis for software spec-ification. In M. Brodie, J. Mylopoulos, and J. Schmidt (Eds.),On Conceptual Modeling: Perspectives fromArtificial Intelligence, Databases and Programming Languages, Springer Verlag, pp. 87–114.

Borgida, A., Mylopoulos, J., Schmidt, J., and Wetzel, I. 1990. Support for data-intensive applications: Conceptualdesign and software development, In R. Hull, R. Morrison, and D. Stemple (Eds.),Database ProgrammingLanguages, Morgan Kaufmann Publishers, San Mateo, CA.

Borgida, A. and Jarke, M. (Eds.) 1992.IEEE Transactions on Software Engineering, 18(6) and (10), Special issueon knowledge representation and reasoning in software development.

Borgida, A., Mylopoulos, J., and Reiter, R. 1993. ...And nothing else changes: The frame problem in procedurespecifications.Proceedings Fifteenth International Conference on Software Engineering, Baltimore.

Borgida, A., Mylopoulos, J., and Reiter, R. 1995. On the frame problem in procedure specifications,IEEETransactions on Software Engineering, pp. 785–798.

Briand, L., Melo, W., Seaman, C., and Basili, V. 1995. Characterizing and assessing a large-scale softwaremaintenance organization.Proc. 17th Int. Conf. Software Eng., Seattle.

Brodie, M. and Zilles, S. (Eds.). 1981.Proceedings of Workshop on Data Abstraction, Databases and ConceptualModeling, Pingree Park Colorado, Joint SIGART, SIGMOD, SIGPLAN Newsletter.

Brodie, M., Mylopoulos, J., and Schmidt, J. (Eds.). 1984.On Conceptual Modeling: Perspectives from ArtificialIntelligence, Databases and Programming Languages, Springer-Verlag.

Bubenko, J. 1980. Information modeling in the context of system development. InProceedings IFIPVol. 80,pp. 395–411.




Bubenko, J.A. 1993. Extending the scope of information modelling.Proc. 4th Int. Workshop on the DeductiveApproach to Information Systems and Databases, Lloret-Costa Brava, Catalonia, pp. 73–98.

Castelfranchi, C., Miceli, M., and Cesta, A. 1992. Dependence relations among autonomous agents.ProceedingsThird European Workshop on Modeling Autonomous Agents in a Multiagent World; published as DecentralizedAI, III, Elsevier.

Cohen, P. and Levesque, H. 1990. Intention is choice with commitment.Artificial Intelligence, 32(3).Chung, L. 1993a. Dealing with security requirements during the development of information systems.Proceedings

International Conference on Advanced Information Systems Engineering, Paris.Chung, L. 1993b. Representing and Using Non-Functional Requirements: A Process-Oriented Approach. Ph.D.

thesis, Department of Computer Science. University of Toronto.Corbi, T.A. 1989. Program understanding: Challenge for the 1990s.IBM Systems Journal, 28(2)1.Dardenne, A. van Lamsweerde, A., and Fickas, S., 1993. Goal-directed requirements acquisition. InScience of

Computer Programming, 20:3–50.Devanbu, P. 1994. Software Information Systems. Ph.D. Thesis, Dept. of Computer Science, Rutgers University.Devanbu, P. Brachman, R., Selfridge, P., and Ballard, B. 1991. LaSSIE: A knowledge-based software information

system.Communications of ACM.Dubois, E., Hagelstein, J., Lahou, E., Ponsaert, F., and Rifaut, A. 1986. A knowledge representation language for

requirements engineering.Proceedings IEEE, Vol. 74, No. 10.Dubois, E., Du Bois, P., and Rifaut, A. 1992. Elaborating, structuring and expressing formal requirements for

composite systems.Proc. International Conference on Advanced Information Systems Engineering, Manchester.Dubois, E., Du Bois, P., and DuBru, F. 1994. Animating formal requirements specifications of cooperative infor-

mation systems.Proceedings Second International Conference on Cooperative Information Systems, Toronto.Feather, M. 1987. Language support for the specification and derivation of concurrent systems.ACM Transactions

on Programming Languages, 9(2):198–234.Fickas, S. 1993. Position papers for panel on Neats vs. Scruffies. InProc. of 3rd European Software Engineering

Conference, Milan, Springer-Verlag, LNCS.Green, C. 1969. Application of theorem proving to problem solving.Proceedings First International Joint Con-

ference on Artificial Intelligence, Washington, DC, pp. 219–239.Green, C., Luckham, D., Balzer, R., Cheatham, T., and Rich, C. 1983. Report on a Knowledge-Based Software

Assistant, Technical Report KES.U.83.2, Kestrel Institute.Greenspan, S., Mylopoulos, J., and Borgida, A. 1982. Capturing more world knowledge in the requirements

specification.Proc. 6th Int. Conference on Software Engineering, Tokyo.Greenspan, S. 1984. Requirements Modeling: A Knowledge Representation Approach to Requirements Definition,

Ph.D. thesis, Department of Computer Science, University of Toronto.Greenspan, S., Borgida, A., and Mylopoulos, J. 1986. A requirements modeling language and its logic.Information

Systems, 11(1):9–23.Greenspan, S. and Feblowitz, M. 1993. Requirements engineering using the SOS paradigm.Proceedings First

IEEE International Symposium on Requirements Engineering, San Jose, pp. 260–265.Greenspan S., Mylopoulos, J., and Borgida, A. 1994. On formal requirements modeling languages: RML revisited.

Proc. 16th International Conference on Software Engineering. Naples.Haas, A.R. 1987. The case for domain specific frame axioms. In F.M. Brown (Ed.),The Frame Problem in Artificial

Intelligence. Proceedings of the 1987 Workshop. Morgan Kaufmann Publishers, Inc. pp. 343–348.Hagelstein, J. and Roelants, D. 1992. Reconciling operational and declarative specifications.Proceedings Inter-

national Conference on Advanced Information Systems Engineering (CAiSE), Manchester.Hammer, M. and Champy, J. 1993.Reengineering the Corporation: A Manifesto for business revolution, Harper-

Business.Jackson, M. 1978. Information systems: Modeling, sequencing and transformation.Proceedings 3rd International

Conference on Software Engineering, pp. 72–81.Jackson, M. 1983.System Development, Prentice-Hall.Jarke, M. (Ed.) 1993.Database Application Engineering with DAIDA, Research Reports ESPRIT, Springer-Verlag.Jarke, M., Gallersdvrfer, R., Jeusfeld, M.A., Staudt, M., and Eherer, S. 1995. ConceptBase—a deductive object

base for meta data management.Journal of Intelligent Information Systems. (Special Issue on Advances inDeductive Object-Oriented Databases), 4(2):167–192.




Johnson, W.L., Feather, M., and Harris, D. 1992. Representing and presenting requirements knowledge.IEEETransactions on Software Engineering, 853–869.

Lesperance, Y. 1991. A Formal Theory of Indexical Knowledge and Action, Ph.D. thesis, Department of ComputerScience, University of Toronto.

Lesperance, Y., Levesque, H., Lin, F., Marcu, D., Reiter, R., and Scherl, R. 1994. A logical approach to high-levelrobot programming—A progress report.Control of the Physical World by Intelligent Systems, Papers from the1994 AAAI Fall Symposium, New Orleans, LA, 79–85.

Lowry, M. and Duran, R. 1989. Knowledge-based software engineering. In A. Barr and P. Cohen (Eds.),Handbookof Artificial Intelligence IV, Addison-Wesley.

McCarthy, J. and Hayes, P. 1969. Some philosophical problems from the standpoint of artificial intelligence, B.Melzter and D. Michie (Eds.),Machine Intelligence, Edinburgh University Press, Vol. 4, pp. 463–502.

Mostow, J., (Ed.). 1985.IEEE Transactions on Software Engineering. Special issue on artificial intelligence andsoftware engineering, 11(11).

Mylopoulos, J., Borgida, A., Jarke, M., and Koubarakis, M. 1990. Telos: Representing knowledge about infor-mation systems.ACM Transactions on Information Systems.

Mylopoulos, J., Chung, L., and Nixon, B. 1992. Representing and using non-functional requirements. InIEEEtransaction in Software Engineering, 483–497.

Newell, A. 1982. The knowledge level.Artificial Intelligence, 18:87–127.Nixon, B. 1993. Representing and using performance requirements during the development of information sys-

tems.Proceedings First IEEE International Symposium on Requirements Engineering, San Jose, pp. 42–49.

Norman, D.The Psychology of Everyday Things, Basic Books.Pednault, E.P.D. 1989. ADL: Exploring the middle ground between{STRIPS} and the situation calculus.Proceed-

ings of the First International Conference on Principles of Knowledge Representation and Reasoning (KR’89),Morgan Kaufmann Publishers Inc., pp. 324–332.

Potts, C. and Bruns, G. 1988. Recording the reasons for design decisions.Proceedings 10th International Con-ference on Software Engineering, Singapore.

Reiter, R. 1987. Nonmonotonic reasoning.Annual Review of Computer Science, 2:147–186, Annual Reviews Inc.Reiter, R. 1991. The frame problem in the situation calculus: A simple solution (sometimes) and a complete-

ness result for goal regression. In V. Lifschitz (Ed.),Artificial Intelligence and the Mathematical Theory ofComputation: Papers in Honor of John McCarthy, Academic Press, San Diego, CA, pp. 359–380.

Rich, C. and Waters, R. 1986.Readings in Artificial Intelligence and Software Engineering, Morgan-Kaufmann.Rich, C. and Waters, R. 1988. The programmer’s apprentice: A research overview.IEEE Computer, 21(11):10–25.Ross, D.T. and Schoman. 1977a. Structured analysis for requirements definition.IEEE Trans. on Software Engi-

neering, SE-3(1):6–15.Ross, D.T. and Schoman. 1977b. Structured analysis: A language for communicating ideas.IEEE Trans. on

Software Engineering, SE-3(1):16–34.Rubin, H., Yourdon, E., and Battaglia, H. 1995.Industry Canada Worldwide Benchmark Project, Rubin Systems

Inc.Ryan, M. 1993. Defaults in specifications.Proceedings First IEEE International Symposium on Requirements

Engineering, San Jose, pp. 142–151.Schubert, L.K. 1990. Monotonic solution of the frame problem in the situation calculus: An efficient method for

worlds with fully specified actions. In H.E. Kyberg, R.P. Loui, and G.N. Carlson (Eds.),Knowledge Represen-tation and Defeasible Reasoning, Kluwer Academic Press, 23–67.

Selfridge, P. 1991. Knowledge representation support for a software information system.Proc. IEEE Conf. on AIApplications.

Selfridge, P.G. and Terveen, L.G. 1996. Knowledge management tools for business process support and reengi-neering.Journal of Intelligent Systems in Accounting, Finance, and Management.

Smith, D. 1985 Top-down synthesis of divide-and-conquer algorithms.Artificial Intelligence, 27(1):43–96.Soloway, E., Pinto, J., Letovsky, S., Littman, D., and Lampert, R. 1988. Designing documentation to compensate

for de-localized plans.Communications of ACM, 31(11).Solvberg, A. 1979. A contribution to the definition of concepts for expressing users’ information system require-

ments.Proceedings International Conference on E-R Approach to Systems Analysis and Design.




Thomas, B., Shoham, Y., Schwartz, A., and Kraus, S. 1991. Preliminary thoughts on an agent description language.International Journal of Intelligent Systems, 6:498–508.

Yu, E. 1993. Modeling organizations for information systems requirements engineering.Proceedings First IEEEInternational Symposium on Requirements Engineering, San Jose, pp. 34–41.

Yu, E. and Mylopoulos, J. 1994 Understanding ‘why’ in software process modeling, analysis and design,Pro-ceedings Sixteenth International Conference on Software Engineering, Sorrento, Italy.

Yu, E. 1995. Modelling Strategic Relationships for Process Reengineering, Ph.D. thesis, Department of ComputerScience, University of Toronto.

Yu, E. and Mylopoulos, J. 1995. From E-R to ‘A-R’—Modelling strategic actor relationships for business processreengineering.International Journal of Cooperative Information Systems, 4(2–3):125–144.

Yu, E. and Mylopoulos, J. 1996. Using goals, rules, and methods to support reasoning in business processreengineering.International Journal of Intelligent Systems in Accounting, Finance and Management, 5(1).

Yu, E., Mylopoulos, J., and Lesperance, Y. 1996. Modelling the organization: New concepts and tools forreengineering.IEEE Expert.

Documents

Representing Software Engineering Knowledge