DARE Report

DARE A Domain Analysis and Reuse Environment

Phase I Final Report AUGUST 20, 1992

Sponsored by Defense Advanced Research Projects Agency

Defense Small Business Innovation Research Program DARPA Order No. 5916

Issued by U. S. Army Missile Command Under Contract # DAAH01-92-C-R040

AUTHORS: Reuse, Inc. Dr. Rubn Prieto-Daz 12365 Washington Brice Rd. Principal Investigator Fairfax, VA 22033 703-620-5385 Dr. Bill Frakes FAX: 703-620-5385 Consultant Mr. B.K. Gogia Project Manager

reuse inc. 12365 Washington Brice Rd.

Fairfax, VA 22033 703-620-5385

FOREWORD

This Phase I Final Report was prepared by Reuse, Incorporated, 12365 Washington Brice Rd., Fairfax, Virginia, 22033, under DARPA Phase I SBIR Contract No. DAAH01-92-C-R040, issued by U.S. Army Missile Command. The Reuse, Inc. personnel involved in this program and in writing this report are Dr. Rubn Prieto-Daz, the Principal Investigator, Dr. Bill Frakes, an external consultant, and Mr. B.K. Gogia, the Project Manager. The final report covers the period of performance from February 20, 1992 through August 20, 1992. The final report was submitted by Reuse, Inc. August 20, 1992. Although this report is unclassified, its distribution is limited to U.S. Government agencies only; pages containing confidential proprietary information have been marked as such. Other requests for this document must be referred to Director, Defense Advanced Research Projects Agency, 3701 North Fairfax Drive, Arlington VA 22203-1714. THE VIEWS AND CONCLUSIONS CONTAINED IN THIS DOCUMENT ARE THOSE OF THE AUTHORS AND SHOULD NOT BE INTERPRETED AS REPRESENTING THE OFFICIAL POLICIES, EITHER EXPRESS OR IMPLIED, OF THE DEFENSE ADVANCED RESEARCH PROJECTS AGENCY OR THE U.S. GOVERNMENT.

ii

Table of Contents Executive Summary..................................................................................................................... 1

Status of phase I research ............................................................................................... 1 Research results .............................................................................................................. 1 Report overview ............................................................................................................. 3

1- Introduction............................................................................................................................. 5 1.1- The software reuse problem.................................................................................... 5 1.2- Domain analysis...................................................................................................... 6

2- Survey of Domain Analysis Methods ..................................................................................... 6 2.1- Historical perspective.............................................................................................. 7 2.2- Domain analysis methods ....................................................................................... 9

Prieto-Daz Approach........................................................................................ 9 FODA................................................................................................................ 10 IDeA.................................................................................................................. 11

2.3- Summary of main features ...................................................................................... 11 2.4- Discussion of key activities..................................................................................... 12 2.5- Potential for automation.......................................................................................... 12

3- Domain Analysis in the STARS Reuse Library Process Model (SRLPM) ........................... 13 3.1- Primitive operations ................................................................................................ 15

Prepare domain information .............................................................................. 15 Classify domain entities .................................................................................... 15 Derive domain models....................................................................................... 16 Expand and verify models and classification .................................................... 17

3.2- Overview of domain analysis activities .................................................................. 18 3.3- Selected activities in the SRLPM which can be automated .................................... 18

4- Underlying Technologies for Automating Domain Analysis ................................................. 19 4.1- Information retrieval systems.................................................................................. 19 4.2- Artificial intelligence .............................................................................................. 21 4.3- Code static and dynamic analysis tools................................................................... 23 4.4- Interface environments............................................................................................ 25 4.5- CASE tools.............................................................................................................. 25

5- A Domain Analysis and Reuse Environment (DARE) ........................................................... 26 5.1- DARE architecture .................................................................................................. 26 5.2- DARE supported domain analysis process ............................................................. 28

Acquire domain knowledge (A2) ...................................................................... 32 Structure domain knowledge (A3) .................................................................... 32 Identify commonalities (A4) ............................................................................. 36 Generate Domain Models (A5) ......................................................................... 36

5.3- DARE functional model.......................................................................................... 36 5.4- Architecture components ........................................................................................ 38

Document analysis tools.................................................................................... 38 Domain expert knowledge extraction tools....................................................... 38 Code analysis tools............................................................................................ 39 Reuse library tools............................................................................................. 39 Domain analysis interaction tools ..................................................................... 40

5.5- Architecture integration .......................................................................................... 40 6- Conclusion .............................................................................................................................. 41 References ................................................................................................................................... 41

iii

_______________________________________________________________________ reuse inc.

DARE A Domain Analysis and Reuse Environment

Executive Summary Status of phase I research Domain analysis (DA) holds the key for the systematic, formal, and effective practice of software reuse. Proposed approaches and methods for DA assume that domain information exists and is readily usable. Experience indicates, however, that acquiring and structuring domain information is the bottleneck of DA. This Phase I research report presents the architecture and functional analysis of a support environment to automate parts of the acquisition and structuring activities of DA. This Phase I study assesses the potential for automation of DA. Existing techniques and tools, in particular those from information retrieval and expert systems development, provide support for activities in the DA process. Many of these tools can be used immediately while certain DA activities may require the creation of new tools. There is, therefore, a definite potential for automating parts of DA provided a basic framework to conduct DA exists. The framework for conducting DA is provided by a modified RLPM. The RLPM or Reuse Library Process Model, is a methodology developed by Reuse Inc. for the STARS Program. It emphasizes the early analysis for acquiring and structuring domain information. The RLPM converts the ad-hoc nature of DA into a repeatable procedure with well defined, tangible outputs. The modified RLPM presented here organizes the key activities of acquisition and structuring of domain information in a way that can be supported by independent but coordinated sets of tools. The study proposes the Domain Analysis and Reuse Environment (DARE) as a practical and viable support environment for partially automating the early activities of DA. The research reported demonstrates that, although DA is a difficult and complex process, several of its activities deal with small independent steps that can be automated, thus reducing the complexity of DA to an interactive activity of grouping and organizing the outputs of these small steps. This research report shows the main components of a DARE architecture and how these components interact through data and control flows. It also describes the specific tools required to implement DARE. Research results The technical objectives of this study have been two-fold: to determine if and to what extent current domain analysis technology is supportive of a reuse based domain specific software development paradigm, and to determine the potential for automating domain analysis activities. Both objectives will accelerate a paradigm shift towards domain specific reuse based development.

The research objectives focus on providing answers to the following key questions:

1

_______________________________________________________________________ reuse inc.

1- Is it Possible and Feasible to Automate Domain Analysis?

An assessment of the domain analysis activities proposed in the STARS Reuse Library Process Model [Prie91a] was conducted. The study analyzed and selected activities from the perspective of potential automation and confirmed that it is possible and feasible to automate DA. Existing software tools and techniques were identified and associated to selected activities in a modified domain analysis model. Areas outside the software engineering domain, such as Information Retrieval and Artificial Intelligence.

2- What Are the Physical, Human, and Technological Limitations to DA?

It was found that certain activities in the domain analysis process are human intensive and difficult to automate with current technology. These activities were assessed to determine their limitations regarding automation and human as well as technical requirements. While some activities may be automated only partially, others may be impossible to automate. Knowledge abstraction, concept association, and concept unification are currently in the impossible to automate category. Model formation, architecture development, and knowledge organization (i.e., classification) can be partially automated. Text processing, vocabulary development, and several intermediate activities can be automated.

3- What Existing Tools and Techniques Can Be Adapted for Domain Analysis?

Several tools were surveyed and analyzed based on the activities defined by a modified domain analysis process based on RLPM. The study selected, among the identified tools and techniques, those offering the least effort for adaptability to perform DA activities.

In the IR realm, PLS a commercial IR vendor, offers a library of reusable components that we might use to construct the text processing parts of DARE. The code components in the book Information Retrieval: Data Structures and Algorithms [FB92] could also be used. The tools in the Unix/C environment [FFN91], described below, could be used for the reverse engineering part of DARE. Other tools, such as those for plagiarism detection, might also be used. AI methods for knowledge acquisition and representation were also evaluated.

4- What New Tools and Techniques Are Required to Do DA?

While some activities in domain analysis can be related to existing tools, others are unique and require tools or approaches yet to be created. One of the objectives of the study was to identify what DA activities require special tools or techniques not currently available. DA tasks requiring such tools include: bounding and defining the domain, designing interviews and questionnaires for extracting domain knowledge from experts, identifying and abstracting objects and operations from processed text, and some of the activities conducted during domain model development. It was found that, in spite of the absence of such tools, the availability of tools that support several of the other activities in DA, when properly integrated, facilitate the DA process.

5- Can Tools and Techniques for DA Be Integrated in a Support Environment?

2

_______________________________________________________________________ reuse inc.

The answer is yes. This study presents a process model and an architecture to support DA. The DARE environment is an integrated collection of tools that support domain knowledge acquisition and structuring, as well as commonality analysis, model development, and reuse. DARE is a highly interactive environment designed to facilitate the intelligence intensive activities typical of DA. DARE supports all parts of the domain analysis process, knowledge acquisition, concept abstraction, classification, library population, and specification of reusable software. Other features include library functions for search and retrieval, capture and analysis of reuse metrics, and interfaces to other software development environments.

Report overview The Phase I research effort shows that it is possible to partially automate DA and that it can be done through a well orchestrated collection of tools operating under a well defined process model. The following tasks were undertaken to demonstrate that DARE can be developed successfully. Task 1: Survey and analysis of current domain analysis methods. Existing domain analysis methods and approaches were surveyed. The methods surveyed include SEI's FODA [Kang90], MCC's IDeA heuristics [Luba88], SPC's domain analysis process [Jawo90], IBM's product-oriented paradigm [McCa85], and Arango's learning system approach [Ara88]. The survey includes listing of main features and discussion on similarities and differences Task 2- Detailed analysis of STARS Reuse Library Process Model The activities of domain analysis were analyzed to determine their suitability for automation. The analysis included each of the four activities proposed in the SRLPM approach and their derived sub activities. Each activity is decomposed into several levels of detail. The analysis determined to what extent these fine grained activities can be automated. The outcomes of this analysis include:

Description of each low level activity of domain analysis in the STARS Reuse

Library Process Model.

Discussion on the potential for automation. How these activities relate to existing tool capabilities and how feasible is it to adapt tools for these tasks.

Task 3- Selection of domain analysis activities with potential for automation The purpose of this task was to select the DA activities that could be automated and to develop a revised DA process model to integrate them into a coherent and rational process that could be implemented by a coordinated collection of existing tools. Task 4- Evaluation of tools and techniques that meet domain analysis requirements The objective of this evaluation was to assess the availability of the technology supporting DA activities. We surveyed automated reuse library systems and technology,

3

_______________________________________________________________________ reuse inc.

CASE technology provided by the Unix/C environment, and IR systems such as PLS which we might use to construct the text analysis portion of DARE. We evaluated the utility of each of these kinds of tools to support the domain analysis processes in our model.

Task 5- Propose and specify a domain analysis and reuse support environment

The purpose of tasks one through four prepared the ground and built the basis for specifying a domain analysis and reuse environment (DARE). Although DARE could have been proposed without the effort of going through the first four tasks, a careful assessment of the state of the art and existing technology was necessary if a realistic and practical environment was to be proposed. Tasks one through four comprise a structured research plan to determine the feasibility of DARE, and to provide the information necessary to decide whether to pursue such an environment and what level of automation to expect.

The DARE architecture (see figure 3) consists of a user interface, a domain analysis support environment, and a software reuse library. Selected COTS (commercial of the shelf) tools and tools specially designed to support reuse based domain specific development are shared.

Summarizing the contents of this report, section one defines the software reuse problem and domain analysis. Section two gives an historical overview of domain analysis including a summary of the major domain analysis approaches, and their similarities and differences. Section three presents the role of domain analysis in the STARS reuse library process model. In section four we survey the underlying technologies for automating domain analysis, including information retrieval (IR), artificial intelligence (AI), static and dynamic analysis of software, interface environments, and CASE tools. Section 5 presents the DARE architecture, providing a model and explanation of the processes and tools in the model. Section six presents our conclusions.

4

_______________________________________________________________________ reuse inc.

1- Introduction

Domain analysis has become a topic of significant interest in the reuse community. Domain analysis holds the key for the systematic, formal, and effective practice of software reuse. Unfortunately, domain analysis is still an emerging technology and practiced informally. There is a definite opportunity, however, for automating parts of the domain analysis process. The domain analysis methodology proposed in the STARS Reuse Library Process Model [Prie91a] can be used as a framework to identify parts of the process that can be automated by adapting existing tools and techniques. The opportunity for automation presented in this study is in the form of a Domain Analysis and Reuse Environment (DARE). This section presents the reuse problem and its relation to domain analysis.

1.1- The software reuse problem

One of the reasons software reuse has remained an elusive goal is the recurrent emphasis on reusing code. Software reuse is still far from realizing the ideas of a software industry based on interchangeable standard parts first proposed by Dough McIlroy over 20 years ago [McIl69]. Reuse involves more than just code. It involves organizing and encapsulating knowledge and experience, and setting the mechanisms and organizational structures to make them available for reuse.

Software reuse is sensitive to several factors that make the simple hardware analogy of software ICs [Cox86] difficult to apply. The context of the application domain, for example, plays a critical role in the "reusability" of software. Software can not be successfully reused in all domains. The reality is that narrow, well understood application domains based on stable technologies and standardized architectures, such as compilers (e.g., Lex and YACC) and database systems [Bato88], have demonstrated the significant leverage that can be achieved with high level reuse. It is not simply a matter of going out into the field and gathering up components to populate a repository. Casually assembled libraries seldom are the basis of a high payoff reuse system. A reuse library offers considerably more value when its collections consist of integrated packages of reusable knowledge from a particular domain than if they consist of isolated and relatively independent code components. A domain model in the form of a high level architecture, for example, offers the potential reuser a basic structure to start building a new system. Each element of the architecture can be implemented from library components specially designed to meet the architecture requirements.

There is a need, therefore, to focus reuse on all the products of the software development process such as requirements, specifications, designs, code, and test cases and plans. The highest payoff is achieved by reusing high level representations of software products like requirements and designs [Gilr89]. If we are able to reuse an existing software design then we should be able to reuse its code implementation. We should, therefore, focus on the process of capturing, organizing, and encapsulating such requirements and designs for reuse.

5

_______________________________________________________________________ reuse inc.

1.2- Domain analysis Domain analysis is a process by which information used in developing software systems is identified, captured, and organized with the purpose of making it reusable [Prie90]. In simpler terms, domain analysis is similar to systems analysis but instead of being performed on a single system it is done for multiple related systems. During software development, information of several kinds is generated, from requirements analysis to specific designs to source code. One of the objectives of domain analysis is to make this information readily available. In making a reusability decision, that is, in trying to decide whether or not to reuse a component, a software engineer has to understand the context which prompted the original designer to build the component the way it is. By making this development information available, a reuser has leverage in making reuse more effective. A definite improvement in the reuse process results when we succeed, through domain analysis, in deriving common architectures, generic models or specialized languages that substantially leverage the software development process in a specific problem area. How do we find these architectures or languages? It is by identifying features common to a domain of applications, selecting and abstracting the objects and operations that characterize those features, and creating procedures that automate these operations. These intelligence-intensive activities result, typically, after several of the "same kind of" systems have been constructed. It is then, decided to isolate, encapsulate, and standardize certain recurring operations. This is the very process of domain analysis: identifying and structuring information for reusability. A formal methodology and environment for domain analysis is needed. Unfortunately, domain analysis is often conducted in an ad-hoc manner, and success stories are more the exception than the rule. Typically, knowledge of a domain evolves naturally over time until enough experience has been accumulated and several systems have been implemented. Only then can generic abstractions be isolated and reused. This natural domain evolution takes a long time while the demand for software applications increases at a faster rate. There is a need, therefore, to accelerate the domain maturation process.

2- Survey of Domain Analysis Methods Figure 1 below provides an historical perspective on the main domain analysis developments. It starts in 1980 when Neighbors introduced the concept of domain analysis as a key activity to enable reuse practice. The figure shows that several efforts spawn from his original ideas. These efforts range from the highly practical CAMP experience [Cam87] to the more theoretical work of Arango [Aran88]. The rectangle labeled Raytheon represents the reuse program established by the Raytheon Missile Division Corp. [Lane79]. The Raytheon experience is an often quoted success story of institutionalized reuse and its success is attributed to a thorough analysis of their application domain, business applications. They observed that 60% of all business application designs and code were redundant and could be standardized and reused. A reuse program was then established to analyze their existing software and to exploit reuse.

6

_______________________________________________________________________ reuse inc.

Over 5000 production COBOL source programs were examined and classified. Three major module classes were identified: edit, update, and report. They also discovered that most business applications fall into one of three logic structures or design templates (i.e., domain architectures). These logic structures were standardized and a library was created to make all classified components available for reuse. Several modules were also redesigned to fit the standard logic structures. New applications became slight variations of the standard logic structures and were built by assembling modules from the library. Programmers were trained to use the library and to recognize when a logic structure could be reused. The report quotes an average of 60% reused code in their new systems and a net 50% increase in productivity over a period of six years.

The remaining efforts represented by bubbles in Figure 1 are discussed in detail below.

Figure 1- Domain Analysis Partial Time Chart

2.1- Historical perspective

The term domain analysis was first introduced by Neighbors [Neig80] as "the activity of identifying the objects and operations of a class of similar systems in a particular problem domain." During his research with Draco, a code generator system that works by integrating reusable components, he pointed out that "the key to reusable software is captured in domain analysis in that it stresses the reusability of analysis and design, not code." Neighbors later introduced the concept of "domain analyst" [Neig84] as the person responsible for conducting domain analysis. The domain analyst plays a central role in developing reusable components. The Draco system was the first successful demonstration of the feasibility of domain-specific reuse-based software development.

7

_______________________________________________________________________ reuse inc.

The Common Ada Missile Packages (CAMP) Project [Cam87] extended Neighbors' ideas into larger systems. The CAMP Project is the first explicitly reported domain analysis experience, and they acknowledge that "[domain analysis] is the most difficult part of establishing a software reusability program". Neither Neighbors nor the CAMP project address the issue of how to do domain analysis. Both focus on the outcome, not on the process. McCain [McCa85], from IBM Federal Systems Division, Houston, TX, made an initial attempt at addressing this issue by integrating the concept of domain analysis into the software development process. He proposed a "conventional product development model" as the basis for a methodology to construct reusable components. The main concern in this approach is how to identify, a priori, the areas of maximum reuse in a software application. McCain developed his model into a standard practice within IBM. Drawing in part from the above experiences, Prieto-Daz [Prie87] proposed a more cohesive procedural model for domain analysis. This model is based on a methodology for deriving specialized classification schemes in library science [Prie91b]. In deriving a faceted classification scheme, the objective is to create and structure a controlled vocabulary that is standard not only for classifying but also for describing titles in a domain specific collection. This method was successfully applied at GTE Government Systems. The Prieto-Daz method was later updated and revised for the STARS Reuse Library Process Model [Prie91a, Prie91c]. This method is a substantial modification of the earlier approach. The emphasis is on the analysis aspect, especially on knowledge acquisition and knowledge structuring. This newer version of the Prieto-Daz method is presented as a SADT model with potential for partial automation. Synthesis is a software development method and support environment developed by the Software Productivity Consortium (SPC). Synthesis is based on the concept of program families [Parn76] and proposes the engineering of domains to enable application generators. The Synthesis domain analysis process was first proposed in a report by Jaworski [Jawo90]. It is based mainly on object oriented concepts [Coad89] with emphasis on domain design and implementation. The report includes an example of domain analysis on the SOCC (Satellite Operations Control Center) domain. The example shows the products of domain analysis such as the SOCC domain definition, the SOCC taxonomy, and the SOCC stabilities and variations, but falls short on explaining the process to obtain those products. More recently, the SEI has proposed the FODA (Feature Oriented Domain Analysis) methodology [Kang90]. FODA adopts several concepts and recommendations from the SPS report [Gilr89], and presents a comprehensive approach based on feature analysis. The method is illustrated by a domain analysis of window management systems and explains what are the outputs of domain analysis but remains vague about the process to obtain them. In the SPS (Software Productivity Solutions, Inc.) report, concepts from Prieto-Daz' model were integrated with object oriented analysis techniques into a more complete approach to domain analysis. The SPS report adds object orientation to the process of creating a domain architecture, and to the creation of reusable components. The suggested method remains very general about the analysis aspect, but very specific about the creation of reusable Ada components. They conclude that knowledge acquisition,

8

_______________________________________________________________________ reuse inc.

knowledge-based guidance, data storage, retrieval, and environment integration are the key factors for automating domain analysis.

IDeA, Intelligent Design Aid, is an experimental reuse based design environment developed by MCC [Luba88]. It supports reuse of abstract software designs. IDeA provides mechanisms that help users select and adapt design abstractions to solve their software problems. IDeA and its successor ROSE-1, were created as proof-of-concept tools to demonstrate reuse of high level software workproducts other than source code.

Arango [Ara88] focuses on the theoretical and methodological aspects of domain analysis. He argues for explicit definitions of objectives, mechanisms, and performance of a reuse system as a context for comparing and evaluating domain analysis. His view of software reusability is that of a learning system where domain analysis is an ongoing process of knowledge acquisition, concept formation, and concept validation. The changing requirements syndrome in software development is seen as a natural learning process, and resolved through an evolving infrastructure that receives its input from domain analysis.

Domain analysis and domain modeling have become topics of significant interest in the software engineering community. A recommendation from the 1987 Minnowbrook Workshop on Software Reuse [AM88] suggested "concentrating on specific application domains (as opposed to developing a general reusability environment)." Soon thereafter, the Rocky Mountain Workshop on Software Reuse [RM87] acknowledged the lack of a theoretical or methodological framework for domain analysis. More recent workshops have addressed domain analysis and domain modeling directly [DA88, RP89, DM89]. The most recent was the Domain Modeling Workshop at the 13th ICSE, Austin, TX [DM91] where several approaches to domain modeling and domain analysis were presented.

Other related work includes tools that were originally designed for other purposes and turned out to be supportive of domain analysis. In this category are Batory's Genesis system for constructing database management systems [Bato88], CTA's KAPTUR (Knowledge Acquisition for the Preservation of Tradeoffs and Understanding Rationales) system for analyzing software systems [Bail91], AT&T's LaSSIE software information system [Deva91], and MCC's DESIRE (Design Recovery) tool [Bigg89]. These tools present a broad spectrum of techniques and approaches for automating certain aspects of domain analysis.

2.2- Domain analysis methods

Several approaches to domain analysis have emerged in the last few years. Three have been selected to illustrate the differences of objectives, methods, styles, and products.

Prieto-Daz Approach The Prieto-Daz approach was developed for the STARS S increment as part of a model for reuse libraries [Prie91a]. It is based on methods for deriving classification schemes in library science and on methods for systems analysis. The process is a "sandwich"

9

_______________________________________________________________________ reuse inc.

approach where bottom-up activities are supported by the classification process and top-down activities by systems analysis. The objective is to produce a domain model in the form of a generic architecture or standard design for all systems or their instantiations in the domain. Such models provide a common basis for writing requirements for new systems in the domain. In other words, requirements for new systems are based on, or derived from, the domain model thus insuring reuse at the design level. To guarantee such reuse, low level components must act as building blocks for composing a skeleton design or architecture. This is accomplished by the bottom-up identification and classification of low level common functions and by standardizing their interfaces. During the top-down stage, high level designs and requirements of current and new systems are analyzed for commonality. The outcome includes a canonical structure common to all systems in the domain, identification of stable and variable characteristics, a generic functional model, and information on the interrelationships among the structure elements. During bottom-up, low level requirements, source code, and documentation from existing systems is analyzed to produce a preliminary vocabulary, a taxonomy, a classification structure, and standard descriptors. The outcomes of both approaches are then integrated into reusable structures. This integration process consists of associating the products of the bottom-up analysis with the structures derived by the top-down analysis. Standard descriptors, for example, represent elemental components, either available or specified, by using a standard language and vocabulary. Low level components for the generic architecture are defined with these standard descriptors. The result is a natural match between high level generic models and low level components where the domain models can be used as skeleton guides in the construction of new applications.

FODA Feature Oriented Domain Analysis (FODA) is a domain analysis methodology developed by the Software Engineering Institute [Kang91]. The FODA method is based on identifying features common to a class of systems. It is the product of studying and evaluating several DA approaches. Although based mainly on Object Oriented techniques, it borrows significantly from other approaches such as Prieto-Daz' faceted approach, SPS'Ada based approach, and MCC's DESIRE design recovery tool. The FODA method defines three basic activities: context analysis, domain modeling, and architecture modeling. During context analysis, domain analysts interact with users and domain experts to bound and scope the domain and to select sources of information. Domain modeling produces a domain model in multiple views. The domain analyst proposes the domain model to domain experts, users, and requirements analysts for review. The resulting model includes four views: features model, entity-relationship model, dataflow diagrams model, and state-transition model. A standard vocabulary is also produced during domain modeling. During architecture modeling, the domain analyst produces an architectural model that consists of a process interaction model and a module structure chart. The objective of the

10

_______________________________________________________________________ reuse inc.

architectural model is to guide developers in the construction of applications and to provide mappings from the domain models to the architecture.

The FODA report [Kang91] illustrates the process by using the window management systems domain. The example shows in detail how each of the products of domain analysis is derived. It includes: textual descriptions of domain definition and scope, structure and context diagrams, an E-R model, feature models, functional models, and domain terminology dictionary. The example also explains how to use functional models for system simulation. The example uses the Statemate tool for the functional model.

IDeA

IDeA, Intelligent Design Aid, is an experimental reuse based design environment developed by MCC [Luba88]. It supports reuse of abstract software designs. IDeA provides mechanisms that help users select and adapt design abstractions to solve their software problems. A domain analysis methodology was developed to reduce the effort required to identify, select, and characterize designs for the IDeA library. The process is divided into domain analysis and domain engineering. Domain analysis, as in Prieto-Daz and FODA, deals with identification of operations, data objects, properties, and abstractions but focuses on their application for designing solutions to problems in the domain. A "problem solution" is then used to generate specific software designs by applying domain engineering techniques.

The IDeA method consists of three major steps: analysis of similar problem solutions, analysis of solutions in an application domain, and analysis of an abstract application domain. There are specific heuristics for conducting each step. The objective is to characterize generic solutions to common problems in a domain and to provide a reasonable mapping between problems and solutions to make reuse practical. It uses design schemes as mechanisms for mapping problems to solutions, and identifies activities in other domains that are common or similar to the ones in the domain of interest. This approach covers vertical (within a domain) and horizontal (across domains) reuse. The first two steps are aimed at identifying vertical reusable components while the goal of the third step is to find horizontal components.

2.3- Summary of main features

The essential features of the domain analysis methods discussed can be summarized in four basic activities: acquire domain information, analyze and classify domain entities, structure domain knowledge, and generate reusable structures. We analyzed the activities of each method and grouped them into these four activities .

1- Acquire domain information Information acquisition activities include: study the domain, describe and define the domain, prepare domain information for analysis, and perform a preliminary high-level functional analysis of common domain activities. One objective of the acquisition activities is to bound and scope the domain and to provide specific information for estimating the cost and level of effort required for performing domain analysis.

11

_______________________________________________________________________ reuse inc.

2- Analyze and classify domain entities The focus of this activity is to identify specific low level functions and objects, or common features derived from legacy systems, existing documentation, and future requirements. The objective is to classify these entities into a standard framework. The framework may take the form of a taxonomy, a semantic net, or a features model.

3- Structure domain knowledge The purpose of this activity is to associate common

functions to system components. A preliminary domain architecture is proposed to define high level system components. This high level architecture is refined by decomposing system components into more specific functions. The decomposition (i.e., refinement) process is carried on by selecting common functions from the classification or features framework.

4- Generate reusable structures Generating reusable structures is the process of

grouping common functions, attaching them to specific architectural components, and generalizing these specific architectural components. The outcome are generic reusable structures consisting of standard functions and standard interfaces. These generic structures form a domain architecture where different implementations of domain features are plug-compatible reusable components.

2.4- Discussion of key activities Acquiring domain information is the central activity for domain analysis. Success of the remaining activities depends on the quantity, relevance, and quality of the information acquired. Any discussion of automating domain analysis must start with information acquisition. Analysis and classification of domain entities is usually a bottom-up process of identifying and extracting information about specific functions mainly from current applications. Classification includes abstraction and clustering to generate classes of functions with common attributes. A top-down approach can also be used. When conducted as a top-down process, systems specifications and future requirements are analyzed to identify features common to all systems in the domain. Both bottom-up and top-down approaches result in identification of common basic (i.e., primitive) functions. Structuring domain knowledge into a domain architecture allows for a mapping of common functions to system components and provides the basis for defining and specifying reusable components. Generating reusable structures is a process of encapsulating architecture components. For the purpose of automating domain analysis, acquiring domain information and analyzing and classifying that information are essential for developing domain architectures. Current SEE (Software Engineering Environments) technology, to some degree, support the implementation of an architecture (i.e., requirements) into reusable components (i.e., code), but, support for acquiring and structuring domain information is not yet available. 2.5- Potential for automation There is a definite potential for automating parts of the domain analysis process. An essential prerequisite to automation is a framework of properly structured activities. Such

12

_______________________________________________________________________ reuse inc.

a framework is provided by the STARS Reuse Library Process Model (SRLPM) method for domain analysis. A key activity in the SRLPM is to prepare domain information and one of the essential tasks for preparing domain information is knowledge acquisition. It requires "reading" information from several inputs such as technical literature, existing implementations, and current and future requirements. Existing techniques in information retrieval can be used to automatically extract information from these sources. In fact, experience in practicing domain analysis has shown that knowledge extraction is a definite bottleneck in the process. Other proposed domain analysis methods make the unrealistic assumption that knowledge and experience are available and readily usable, giving the impression of a smooth and simple process. Once we get through the knowledge acquisition step, domain analysis is a more tractable problem. Our experience has been, however, that the initial stage in domain analysis (acquiring and structuring knowledge) is the most difficult and time consuming. To classify domain entities, for example, the SRLPM methodology prescribes keyword extraction, concept grouping, and class definition. Existing tools and techniques from information retrieval and object oriented design can be adopted and integrated to support these steps. There are other very specific sub activities in the methodology, like thesaurus construction, for which automated tools already exist. Extracting knowledge from experts is much more complex, and requires human interaction such as interviews and group meetings. There are, however, techniques and support tools for building expert systems that can be adapted for this purpose. In summary, this study explores and analyzes the feasibility of automating parts of the domain analysis process under the framework of the STARS Reuse Library Process Model, and proposes and specifies a domain analysis support environment that automates parts of the domain analysis process.

3- Domain Analysis in the STARS Reuse Library Process Model (SRLPM) A Domain Analysis Process Model was developed as part of the SRLPM [Prie91a]. It is based on methods for deriving classification schemes in library science and on methods for systems analysis. The process is a "sandwich" approach where bottom-up activities are supported by the classification process and top-down activities by systems analysis. The domain analysis process is divided in four activities : 1- Prepare domain information (A51) 2- Classify domain entities (A52) 3- Derive domain models (A53) 4- Expand and verify models and classification (A54) Figure 2 shows a detailed SADT model of how these activities are related, their inputs, controls, and outputs as well as their respective enabling mechanisms. The domain models produced consist of several partial products including domain definition, domain architecture, domain classification scheme, vocabulary, functional model, and reusable structures. Inputs are information on recommended and related domains, and existing (i.e., legacy) systems.

13

_______________________________________________________________________ reuse inc.

14

_______________________________________________________________________ reuse inc.

3.1- Primitive operations

Prepare domain information The tasks in A51 (see figure 2) are to prepare the information needed for domain analysis and to do a preliminary or first-cut analysis. The objectives are to define the domain, to acquire the relevant domain knowledge, and to perform a preliminary high-level functional analysis. The outputs are a definition of the domain, a basic domain architecture, and specific domain knowledge as it relates to building software systems for the domain. The inputs to A51 include available knowledge from recommended and related domains, and information from existing systems. Domain information includes concepts and theory of domain specific systems normally available in text books, research articles, and company reports. In the domain of flight simulators, for example, concepts and theory include stability and control equations, performance equations, linear algebra transformations, feed-back theory and equations, numerical analysis algorithms, and any company specific techniques. Related domains for flight simulators may include, for example, video interfaces, signal processing, and flight control systems. Information from existing flight simulation systems include requirements documents from previous and current systems, designs, source code, and documentation. The control inputs for A51, are domain analysis guidelines, company needs, and a statement of purpose. Company needs are stated in an assessment report addressing specific production, budgetary, and market requirements. The statement of purpose states the cope and objective of domain analysis. A purpose statement should answer questions like is the purpose limited:

To help in domain understanding? To include development of generic architectures? To support building reusable components? To support populating a reuse library? To support the development of an integrated, reuse-based environment?

The statement of purpose determines the breath and depth of the domain analysis activity. It guides the domain analysis team in discriminating domain information and in placing the domain in its proper context and relevance.

Classify domain entities Activity A52, classify domain entities, focuses on the bottom-up analysis. It produces a standard vocabulary, a classification scheme, and a taxonomy of classes. The process is similar to the one used in library science for deriving faceted classification schemes for special collections. Keywords are extracted from input documents, requirement statements, and source code. Classes and facets are identified through a conceptual

15

_______________________________________________________________________ reuse inc.

clustering exercise where common terms are grouped and abstracted. A basic scheme is postulated and then expanded and verified. The final step is the construction of thesauri for vocabulary control. Vocabulary control is achieved by grouping synonyms around a single concept. The inputs to A52 include specific domain knowledge in the form of functional requirements, documentation, source code from existing systems, and feedback information regarding unclassified entries. Unclassified entries are components that can not be classified with the current classification scheme. This information is used to update and expand the classification scheme. The outputs are a faceted classification scheme and a basic taxonomy (a taxonomy can also be seen as an inheritance structure with some entity-relationship model characteristics like agregation and generalization). The classification scheme includes a controlled vocabulary and facet definitions. Together, taxonomic classes and facets form a classification structure. The classification scheme generates classification templates in the form of standard descriptors. These standard descriptors are the basic conceptual units that form the interface between domain architectures and reusable components. Standard descriptors are high level mini-specs for a class of components. In the UNIX tools domain, for example, "locate/identifier/table" is a standard descriptor for a component identified by the statement "Locate line identifiers in data table", The terms "locate", "identifier", and "table" represent concepts in a controlled vocabulary. Standard descriptors can also be represented graphically as E-R models or semantic nets, thus facilitating component encapsulation and parameterization. The control inputs to A52 are domain definition and domain architecture. Both support conceptual analysis. The domain definition, for example, includes global requirements statements used to select keywords for the controlled vocabulary. The mechanism is the domain analysis team. In its minimal form, it consists of a domain analyst, a domain expert, and a librarian.

Derive domain models Activity A53, derive domain models, consolidates the top-down analysis with the bottom-up approach. The objective here is to produce a generic functional architecture or model using functional decomposition as practiced in software systems design. The top level in this decomposition is the preliminary architecture derived in A51 above. The resulting functional model serves as a structure to consolidate the standard descriptors from A52. The idea is to describe or specify low level functions using standard descriptors from the controlled vocabulary, and to associate them with architectural components. The results are layers of functional clusters associated with architecture elements. The core activity in A53 thus, is to assign these functional clusters to architecture units, and to define their relationships. What results is a model that supports design and development of new systems by composing reusable components. The output is the generic functional model. The inputs to A53 include:

16

_______________________________________________________________________ reuse inc.

1) The classification structure from A52 including vocabulary, classes, and standard descriptors,

2) Specific domain knowledge in the form of global requirements and system commonality information, (from A51),

3) Requirements from existing systems, and 4) Feedback information to update and refine the model. This last input is in the form of

earlier versions of the model (labeled incomplete models in diagram A5).

The control inputs are the domain definition and architecture produced by A51. The basic architecture is used as a reference for the top-down decomposition. The mechanism is the domain analysis team. In this case the analyst and an expert are the minimum required.

Expand and verify models and classification

Activity A54 expands and verifies domain models and the classification structure. The objective in A54 is to update the products of domain analysis as new information from current and future systems becomes available.

Activity A54 illustrates the continuing nature of the domain analysis process. All products of domain analysis are reviewed continuously and remain in a permanent state of evolution. The question of when a domain analysis is complete is still a research question and is not discussed here. For the sake of practicality, any outcome of domain analysis, as discussed in the SRLPM document, is considered usable. The library process model assumes an implicit feedback loop for all its activities and a reviewing process for all its outputs.

New requirements, vocabulary, functional components, and limitations and constraints are extracted from existing and new systems. The classification structure and the functional model are updated to accommodate them. The models are then verified against existing systems. Specific designs and requirements from existing and future systems are checked to see if they are represented by the generic model. That is, to check if the model includes all expected instances of systems in the domain.

The output of this activity are reusable structures. Reusable structures are parts of the generic functional model or parts of the classification structure that have been verified and are complete enough to be reusable. These subsets of the domain models are encapsulated and included in the customized library system to drive the construction process. A reusable structure can be as simple as a standard descriptor (i.e., requirement statement) for a class of functions or as elaborate as an architecture for a class of systems. An example of the latter is the architecture for a general compiler; scanner which includes a lexical analyzer, syntax analyzer, semantic analyzer, code generator, and symbol table handler.

The inputs to A54 are the generic functional model from A53 and any new information from current and future systems. The control inputs include domain definition and architecture from A51, the classification structure from A52, and abstractions of the generic functional model from A53. These abstractions are used to help identify reusable structures. The mechanism is the domain analysis team.

17

_______________________________________________________________________ reuse inc.

3.2- Overview of domain analysis activities A detailed overview of domain analysis activities is shown in Table 1 below. Each indentation level corresponds to a decomposition diagram in the SRLPM document. The four activities described in figure 2 above are decomposed to their lowest levels to identify specific tasks.

Table 1- Specific Domain Analysis Tasks in the SRLPM

Analyze domain

Prepare domain information (A51) Define domain

- Select relevant information - Bound domain - Establish global requirements - Verify and validate definition

Obtain domain knowledge - Select sources of information - Extract domain knowledge

-- Read -- Consult -- Study -- Learn

- Review acquired domain information -- Discuss -- Evaluate -- Integrate -- Consolidate

Do high level functional analysis (top-down)

- Identify major functional units - Find interrelationships - Specify generic subsystems - Classify subsystems

-- Analyze common features -- Group and classify

- Select graphic representation method

Classify domain entities (bottom-up) (A52)

Identify objects and operations - Analyze concepts - Analyze requirements - Extract component descriptors - Inspect documentation - Decompose statements by keywords

Abstract and classify - Group terms - Give names to clusters - Arrange by facets - Arrange by hierarchy - Define standard classification templates

-- Consult STARS standards -- Check conflicts/duplication with other libraries

Expand basic classification - Refine meanings - Integrate new classes and terms - Group unclassified terms - Give names to new clusters - Define new templates

Construct thesauri - Find internal synonyms - Add external synonyms - Form thesaurus entries - Verify entries

Derive domain models (consolidate top- down & bottom-up) (A53)

- Group descriptors/classes under functional units - Review domain models (refine initial functional decomposition) - Discover/define new functional units - Rearrange structure (result: generic functional model) - Select model representations

Expand models and classification (A54)

- Apply models to application - Identify inconsistencies - Update models and classification - Define reusable structures

3.3- Selected activities in the SRLPM which can be automated There are several activities in the SRLPM model that can be automated. Most of them fall in the category of basic and indispensable tasks for domain analysis, as follows.

18

_______________________________________________________________________ reuse inc.

Extract domain knowledge Knowledge extraction from text documents can be done automatically using of-the-shelf information retrieval tools. Knowledge extraction from experts requires interviewing and questioning, but their written responses can be processed automatically.

Identify major functional units Reverse engineering tools, specifically code restructuring and requirements analysis tools, can be used to identify major functional units from legacy systems.

Find interrelationships Relationships among components and major functional units can also be identified by using revere engineering tools. Tools that produce call structures and cross referencing information are useful for this task.

Specify generic subsystems The process of identifying generic subsystems within specific system structures or designs can be assisted with program similarity analysis tools.

Classify subsystems Subsystem classification can be assisted with the same kind of tools used to find interrelationships.

Identify objects and operations This task can be automated with information retrieval tools.

Abstract and classify Conceptual clustering tools and AI knowledge representation techniques can be used to assist in this task.

Construct thesauri There are off-the-shelf tools to help construct thesauri.

Group descriptors/classes under functional units Conceptual clustering tools can also be used to assist in this task.

Rearrange structure Architecture revision can be done semiautomatically with reverse engineering tools.

Define reusable structures Reusable structures are refinements of previous domain analysis models. A combination of reverse engineering, information retrieval, AI, and conceptual clustering tools can be used to assist in this task.

4- Underlying Technologies for Automating Domain Analysis

The automation of domain analysis will rely on several basic technologies: information storage and retrieval (IR), artificial intelligence (AI), primarily the sub fields of AI concerned with knowledge acquisition and representation, and static and dynamic analysis tools for code. In this section, we review these basic technologies.

4.1- Information retrieval systems

IR systems are used to automatically index and manage large amounts of documentation. An IR system (see [FB92]) matches user queries formal statements of information

19

_______________________________________________________________________ reuse inc.

needs to documents stored in a database. A document is a data object, usually textual, though it may also contain other types of data such as photographs, graphs, etc. An IR system must support certain basic operations. There must be a way to enter documents into a database, change the documents, and delete them. There must also be some way to search for documents, and present them to a user. IR systems vary greatly in the ways they accomplish these tasks.

Table 2 is a faceted classification of IR systems, containing important IR concepts and vocabulary. The first row of the table specifies the facets that is, the attributes that IR systems share. Facets represent the parts of IR systems that will tend to be constant from system to system. For example, all IR systems must have a database structure they vary in the database structures they have; some have inverted file structures, some have flat file structures, and so on. Full explanations of these terms can be found in [FB92].

Terms within a facet are not mutually exclusive, and more than one term from a facet can be used for a given system. Some decisions constrain others. If one chooses a Boolean conceptual model for example, then one must choose a parse method for queries.

Table 2: Faceted Classification of IR Systems Conceptual File Query Term Document Hardware Model Structure Operations Operations Operations Boolean Flat File Feedback Stem Parse VonNeumann Extended Boolean Inverted File Parse Weight Display Parallel Probabilistic Signature Boolean Thesaurus Cluster IR-Specific String Search Pat Trees Cluster Stoplist Rank Optical Disk Vector Space Graphs Truncation Sort Mag. Disk Hashing Field Mask Assign ID's Viewed another way, each facet is a design decision point in developing the architecture for an IR system. The system designer must choose, for each facet, from the alternative terms for that facet. A given IR system can be classified by the facets and facet values, called terms, that it has. For example, the CATALOG system [Frak84] can be classified as shown in Table 3:

Table 3: Facets and Terms for CATALOG IR System Facets Terms File Structure Inverted file Query Operations Parse, Boolean Term Operations Stem, Stoplist, Truncation Hardware VonNeumann, Mag. Disk Document Operations parse, display, sort, field mask, assign ID's Conceptual Model Boolean

20

_______________________________________________________________________ reuse inc.

IR systems are capable of automatically extracting important vocabulary from text and using it to index documents, in this case reusable software components. Frakes and Nejmeh [Frak88] first proposed using IR systems to classify and store reusable software components. They discussed the use of Catalog for this purpose, and defined the types of indexing fields that might be useful Since then, several other uses of IR systems as reuse libraries have been reported (see [Frak90] for a review). One such system of special interest is GURU [Maar91]. Guru uses simple phrase extraction techniques to automatically derive two word phrases from text. Both individual keywords and phrases composed of those keywords may be useful for identifying domain vocabulary and concepts in DARE. In terms of Table 3, the key operations that will be needed for automatic vocabulary and concept identification are text parsing, stoplist operations, stemming, and truncation. Text parsing involves breaking the text into its component keywords. Stemming is a process of removing prefixes and suffixes from words so that related words can be grouped together. Stemming, for example, is capable of conflating variants such as domain and domains into a single concept. Truncation is manual stemming. Truncation will be a useful feature in the search portion of DARE, since it will help users search using related keywords.

4.2- Artificial intelligence Artificial intelligence (AI) the use of computers to do tasks that previously required human intelligence is a broad field with an immense literature. Of special interest to reuse and domain analysis are the AI sub fields of knowledge extraction/acquisition and knowledge representation. All AI systems are constrained by the amount and quality of the knowledge they contain. Builders of AI systems have found that the so called knowledge acquisition barrier is usually the most difficult problem they must solve in building successful AI systems. Most knowledge acquisition techniques are manual and rely on various interviewing techniques There are also some automatic techniques based on machine learning. [Hart86] and [Kidd87] provide a good summary of knowledge acquisition techniques. One technique for eliciting knowledge, for example, is to ask the same question in different ways. Say, for example, that an expert is asked to identify important sub domains, but is unable to do so. The interviewer might then ask him how the organization is structured, recognizing that organizations are often structured along domain specific lines. Once knowledge has been acquired, it must be represented in a form that the machine can use to do useful work. Many knowledge representation techniques have been proposed. Some of the more popular are production rules, frames, and semantic nets. All of these techniques have been used to represent reusable software components (see [Frak90] for a review). A semantic net is a directed graph whose nodes correspond to conceptual objects and whose arcs correspond to relationships between those objects. Production rules are perhaps the best known of knowledge representation formalisms because of their use in

21

_______________________________________________________________________ reuse inc.

many expert system shells. Production rules might be used to classify reusable components based on attribute value pairs as follows.

IF algorithm needed IS a sort AND sort speed required IS fastest AND implementation language IS C THEN sort to use IS quicksort.c

Frames are data structures, composed of slots and fillers, used for knowledge representation. For example,

Sort AKO :algorithm operation :ordering operands :data objects

The slots here are in the left hand column, and the fillers in the right following the colons. Sort is a special slot which names the frame. AKO, which stands for a kind of is commonly used in frame representations. While the knowledge in frames can be accessed and used in many kinds of inferencing, the inferencing technique usually associated with frames is inheritance. In inheritance, one frame inherits slots, and optionally fillers, from another. Two useful factors to consider when evaluating knowledge representations are representational adequacy and heuristic power Representational adequacy refers to how much one can express with the representation. A simple list of keywords, for example, has poor representational adequacy because the syntactic and semantic relationships between the keywords is missing. Heuristic power refers to the kinds of inferencing one can do with the representation. Logical inference, for example, is a powerful type of processing only possible with some representations. One appeal of the knowledge based approach to reuse representation is that the representations offer a powerful way of expressing the relationships between system components. This is probably extremely important for helping a user understand the function of components. It may be, for example, that information of the form, component transforms input A to output B under condition X will be important for expressing knowledge about a code domain. [Deva91] have used frames to represent software components from System 75, a large switching system consisting of about 1 million lines of C code. Their reuse system, called Lassie, attempts to support multiple views of System 75: a functional view which describes what components are doing, an architectural view of the hardware and software configuration, a feature view that relates basic system functions to features such as call forwarding, and a code view which captures how code components relate to each other. Their taxonomy is based on four categories: object, action, doer, and state. For example, a frame using this taxonomy might describe an object called a user-connect-action which is both a network-action and a call-control-action having a generic process as an actor, that attempts to move from a call-state to a talking-state by using a bus-controller. One interesting aspect of such a scheme is the way it allows the relationships among the various conceptual parts of a system to be made explicit. In addition to the domain specific information about System 75, Lassie also stores information about the

22

_______________________________________________________________________ reuse inc.

environment (UNIX and C) used to develop the system. Lassie also uses a natural language interface as part of its query facility. Systems such as Lassie demonstrated that AI can be used to support reuse and domain analysis, but also again showed the problems associated with knowledge acquisition for such systems. Lassie's authors managed to represent only a small part of System 75, and most of that had to be done manually. Practical problems of getting enough of the System 75 engineer's time to get the knowledge and validate the results was also a problem. 4.3- Code static and dynamic analysis tools One important kind of knowledge about software systems is derived by static and dynamic analysis of code. Static analysis tools analyze code before execution, and provide information about program structure and attributes. Dynamic analysis tools are used to monitor the runtime behavior and performance of code. There are many such tools available for various languages and programming environments. We will use the Unix/C environment for purposes of our discussion. See [FFN91] for a fuller discussion of this topic, and the tools that follow. Cf and cflow produce C system function hierarchies. Such information takes the form function1 function2 function3 .... function-n which says that function1 calls function2 which calls function3 and so on. This information can be used for a variety of purposes including identifying potentially reusable components, and calculation of reuse metrics [Frak92]. Another important class of static analysis tools compute software metrics, i.e. quantitative indicators of software attributes. Many such metrics have been reported in the literature [CDS86]. Many of these measure software complexity. ccount, for example, computes simple metrics such as NCSL and CSL and their ratios. Another important source of static information is make. Information from the make utility can be used to determine structure at the file level. Cscope parses C code and builds a cross reference table that allows the following kinds of information to be reported.

List references to this C symbol: List functions called by this function: List functions calling this function: List lines containing this text string: List file names containing this text string: List files #including this file:

23

_______________________________________________________________________ reuse inc.

An even more powerful static analysis tool is CIA [CNR90], a tool that extracts from C source code information about functions, files, data types, macros, and global variables, and stores this information in a relational database. This information can then be accessed via CIAs reporting tools, by awk, or by other database tools. One type of information, for example, that CIA captures is the calling relationship among functions. CIAs reporting tools can then be used, with other graphics tools, to generate a graphical representation of the information in the database. Such graphical representations can be used as preliminary domain architectures during domain analysis. Some of the types of information CIA output might be used to derive are: Software metrics - CIA can be used to compute many software metrics. Since C

program objects are explicitly stored, it is obviously possible to count them. More sophisticated metrics can also be generated. A measure of the coupling between two functions can be calculated, for example, by counting the number of program objects jointly referenced by the two functions.

Program version comparisons - Two versions of a program can be compared by looking at differences in the CIA databases for the versions. This comparison can reveal declarations created, deleted, or modified, and changes in relationships among program objects. This is different than the UNIX system diff command which only compares lines.

Reuse - The information CIA produces about which functions are most used by other functions could be used to identify reusable components.

Dynamic analysis tools are used to investigate the runtime behavior of software. They are used to find and remove bugs, and measure the execution speed of programs and program components. Such measurement is used to see if the component meet requirements, and/or if the components need to be optimized. Several tools are available in the UNIX/C environment for measuring execution speed. The time tool reports on program CPU usage. For example the command

time who will produce as output:

real 0m2.91s user 0m0.25s sys 0m0.30s

This information shows that 2.91 milliseconds of elapsed clock time took place during execution of the who command, 0.25 milliseconds of time was spent in the who program, and it took 0.30 milliseconds for the kernel to execute the command. The prof utility can be used to measure the time each function in the system takes to execute. When code is compiled in this way, a file called mon.out is generated during

24

_______________________________________________________________________ reuse inc.

execution. This file contains data correlated with the object file and readable by prof to produce a report of the time consumed by individual functions in the program.

When run on an example program, prof produced the following report:

%Time Seconds Cumsecs #Calls msec/call Name 50.0 0.02 0.02 1 17. _read 50.0 0.02 0.03 8 2. _write 0.0 0.00 0.03 2 0. _monitor 0.0 0.00 0.03 1 0. _creat

This report shows that the function read was called once, and this call took 17 milliseconds, about 50% of the total execution time for the program. The functions monitor and creat show zero execution times because they used amounts of time too small for prof to measure. Such information might be used in the analysis of real time domains where execution efficiency is critical. Designers of reusable components have reported that component efficiency is a primary factor in their acceptance by users. Thus, dynamic analysis tools will play a key role in a reuse and domain analysis environment.

4.4- Interface environments

Environments for building high quality bit mapped interfaces have proliferated in the past few years. NeWs and X based environments are the most common, with X based environments becoming the standard. The X environment is, in fact, a good example of successful horizontal reuse. Many powerful tools have been written on top of X, including higher level toolsets such as Motif, and interface generators such as TAE from NASA. These tools make it relatively easy to develop high quality window based environments with graphics.

A high quality interface will be very important for DARE since so much data input and manipulation will be required. One of the key challenges in developing DARE is to identify good interface strategies and data representations. We will probably use an X based interface development environment because of its power and portability across platforms.

4.5- CASE tools

A CASE (computer aided software engineering) toolset supports the activities of software engineering through analysis, software views, and repositories. The UNIX programmers workbench (PWB), containing tools such as cflow and CIA, is an example of a set of coordinated tools of this type. There are also many commercial CASE tools on the market. Some that we may consider for DARE are Cadre's Teamwork, Interactive Development Environments STP (software through pictures), and Softbench from Hewlett-Packard. These tools, as does UNIX PWB, provide support for reverse engineering software, which will provide a good source of knowledge about a domain.

25

_______________________________________________________________________ reuse inc.

5- A Domain Analysis and Reuse Environment (DARE)

A domain analysis and reuse environment is presented in this section. The objective of DARE is to support partial automation of the domain analysis process. The focus in on knowledge acquisition and knowledge structuring activities and it is based on currently available tools from IR, AI, and reverse engineering.

5.1- DARE architecture

A high level complete DARE architecture is shown in Figure 3. It consists of a user interface, a domain analysis support environment, a software reuse library, and an environment to support software construction by composition. This Phase I study has focused on the domain analysis support component of the architecture and its interface to the reuse library and other support tools.

One of the most difficult components is the user interface. The user interface must provide support to a diversity of users: domain analysts, domain experts, systems designers, software engineers, and librarians. Each user should be able to interact with their specific tools through a uniform and standard look and feel. The reuse library is a common repository of domain specific software assets. Assets are produced by the domain analysis support side and consumed by the software construction support side. Software assets include domain specific architectures, generic designs, requirements, code components, test scripts, and any other software work product with potential for reusability. Special effort will be made to use existing STARS reuse libraries as part of the DARE library component.

26

_______________________________________________________________________ reuse inc.

DOMAIN ANALYSIS SUPPORT

SOFTWARE CONSTRUCTION

SUPPORT

REUSE LIBRARY

COTS & SPECIAL TOOLS

interfaces

support

domain analyst

domain expert

systems designer

software engineer

librarian

Figure 3- High Level DARE Architecture

The domain analysis support part of DARE supports specific domain analysis activities. A common interface is proposed to integrate existing and new tools. The outcome of these tools are tangible domain analysis products such as domain taxonomies, domain vocabularies, systems architectures, standard designs, software specifications, reusable code components, and specifications for new components.

The software construction support would enable users to select library assets for building new systems. A domain architecture, for example, which serves as a framework to search the library, could be used to select components explicitly designed to fit parts of the architecture. Similarly, standard designs could be used for selecting the appropriate reusable code components. The interface could also allow other environments to use DARE's facilities.

DARE is not intended to support tasks already covered by existing software development environments. Such tasks include code development (compiling, editing, debugging), project management support, system maintenance, etc. DARE will provide support to systems and software designers in selecting reusable components. The actual reuse-based construction and development of new systems would be conducted in their respective environments.

27

_______________________________________________________________________ reuse inc.

5.2- DARE supported domain analysis process

We have tailored the domain analysis process of section three (SRLPM) to meet the DARE requirements. This customized process pays special attention to the enabling tools and focuses on the early stages of domain analysis. The objective in this section is to describe through a set of SADT diagrams the key activities that DARE will support and to identify the specific tools that can be used to implement these activities. Figure 4 shows the context SADT model. The viewpoint is that of the DARE architect or developer (i.e., those interested in developing a DARE environment).

There are five inputs to the process. The first two, domain knowledge and domain experience include, general and sometimes informal and undocumented, knowledge about the domain. This knowledge is used primarily for defining and scoping the domain. Although the DARE environment will not be intended to support definition and scoping of a domain directly, it will indirectly assist analysts and experts to refine their initial definition. The other three inputs, existing systems, domain related documents, and expert knowledge, are the core inputs to DARE. DARE will rely on written documentation as the source for knowledge acquisition and use text analysis techniques for abstracting and structuring domain knowledge.

The domain analysis process is guided (controlled in SADT terminology) by organization objectives and a reuse strategy. Organization objectives may include understanding the domain, developing a generic domain architecture, creating reusable components, or even developing an application generator for the domain. A reuse strategy determines the road map to follow to accomplish the organizational objectives. One reuse strategy, for example, may be to do domain analysis incrementally and to assess its benefits before advancing to the next step.

The outputs include tangible products: a domain definition, recorded domain knowledge, domain structures, and domain models. A domain definition is a written document specifying the domain. Recorded domain knowledge consists of domain knowledge that has been captured and registered in a database and made available for analysis. Domain structures are the products of structuring recorded domain knowledge through a process of recurrent abstraction, classification, and clustering. Domain models are common domain structures made reusable through a commonality analysis process.

Enabling agents (mechanisms, in SADT terminology) are the domain analysts, the domain experts, and the DARE support environment. DARE will consist of an integrated collection of tools providing automated, semi-automated, and interactive support for domain analysis activities.

28

_______________________________________________________________________ reuse inc.

Do DARE Supported Domain Analysis

A0

PURPOSE: To illustrate a practical domain analysis process with potential for automation and to identify the kinds of tools required for each process stage.

VIEWPOINT: DARE Architect/Developer

Domain Knowledge

Domain Experience

Existing Systems

Domain Related Documents

Expert Knowledge

Organization Objectives

Reuse Strategy

Domain Definition

Recorded Domain Knowledge

Domain Structures

Domain Models

DAREDomain Experts

Domain Analysts

Figure 4- Context SADT Model for DARE Supported Domain Analysis

Figure 5 shows the SADT level A0 decomposition. It consists of five main activities:

A1- Define Domain A2- Acquire Domain Knowledge A3- Structure Domain Knowledge A4- Identify Commonalities A5- Generate Domain Models

A2 through A5 are the activities that will be supported by DARE. A1, although a necessary step, is assumed to be conducted outside the context of DARE. The output of Define Domain is a domain definition and it controls, together with a reuse strategy, the knowledge acquisition process.

Acquire Domain Knowledge will be supported by knowledge acquisition tools. Some of these tools are fully automatic such as scanners, compilers, reuse libraries, and editors or text processors, while others are interactive and semi-automatic like questionnaire templates and interview guidelines. Knowledge is acquired from the three main inputs:

29

_______________________________________________________________________ reuse inc.

existing systems (i.e., source code), domain related documents, and experts. Knowledge from documents will be extracted automatically while knowledge from experts will be converted semiautomatically to text form first, using interviews and questionnaires. Source code from existing systems will be selected manually based on quality, documentation, and relevance, and then re-structured using reengineering tools. Not all source code will be selected. Domain analysts and domain experts are essential support agents. The output is recorded domain knowledge which includes: scanned documents, answered questionnaires, recorded interviews, surveys, and processable source code.

The objective in structuring domain knowledge is to create domain structures suitable for commonality analysis. Such structures include: faceted classification, domain vocabulary, high level functional descriptions, design rationale charts, SADT diagrams, systems code structures, data dictionaries, survey reports, and knowledge structures. Tools that support this process include: lexical analyzers, keyword filters, indexing support, numeric and conceptual clustering, thesaurus construction, reverse engineering, statistical analysis, and tools that support semantic net construction and production rule development. Most structuring activities will be done automatically. Structuring expert knowledge, however, will be conducted semiautomatically.

Domain structures and recorded domain knowledge are used to identify commonalities using conceptual clustering techniques, reverse engineering, and code similarity detection tools. Commonality analysis is a highly interactive activity that requires easy and effective access to all recorded domain knowledge and all domain structures. Both activities: identify commonalities (A4) and generate domain models (A5), must be conducted concurrently, and are connected with a continuous feedback link. Interactive support through a common interface is essential for conducting these activities.

The final outputs are domain models. Domain models provide the basis for designing and implementing reusable components, for providing requirements standards, and for supporting reuse-based development. Domain models are in a continuous evolution process, and are fed back to A2 for refinement. The domain models produced through DARE will be concise, well defined, and practical. These include: a common vocabulary, a common architecture, a classification scheme, and functional specifications for reusable components.

30

__________________________