Expert database systems: Knowledge/data management environments for intelligent information systems

fnjormorion .Y+srems Vol. 15. No. I. pp. 151-160, 1990 0306-4379/90 $3.00 -I- 0.00

Printed HI Great Britain. All rights reserved Copyright “, 1990 Pergamon Press plc

EXPERT DATABASE SYSTEMS: KNOWLEDGE/DATA MANAGEMENT ENVIRONMENTS FOR INTELLIGENT

INFORMATION SYSTEMS

LARRY KERSCHBERG

Department of Information Systems and Systems Engineering, George Mason University, 4400 University Drive, Fairfax. VA 22030, U.S.A.

(Received for publication 19 October 1989)

Abstract-Expert database systems (EDS) are database management systems (DBMS) endowed with knowledge and expertise to support knowledge-based applications which access large shared databases. The architectures, tools and techniques needed to build such systems are varied, and draw upon such fields as artificial intelligence, database management and logic programming. It is precisely the confluence of ideas from these fields that provides the synergism for new insights and tools for building intelligent information systems.

Expertise may reside within the system to improve performance by providing intelligent question-answering, using database semantic integrity constraints for query optimization and combining knowledge- and data-driven search techniques in efficient inference schemes. Conversely, expertise may reside outside the system in knowledge-based application that interpret vast quantities of data and make decision- impelling recommendations to users. Thus the goal of EDS research and development is to provide tools and techniques to make databases “active” agents that can reason, and to allow database systems to support artificial intelligence applications that manage and access large knowledge bases and databases.

Expert database systems allow the specification, prototyping and implementation of knowledge-based information systems that represent a t:ertica/ extension beyond well-defined, transaction-oriented systems to those with knowledge-directed reasoning and interpretation.

1. INTRODUCTION

During the past few years, we have seen the emer- gence of expert ~~~a~~~e systems (EDS) as a vibrant and productive field for research and development. Expert database systems represent the c@&ence of concepts, tools and techniques from diverse areas: artificial intelligence (AI), database management (DB) and logic programming (LP). Three international forums have provided researchers and practitioners the opportunity to present and discuss their latest insights and research results: The First Znterna- tional Workshop on Expert Database Systems [l], and the First and Second International Conferences on Expert Database Systems [2,3]. This article provides an introduction to some of the major issues of the field, and in addition presents some recent research results of the author. In addition, most major conferences have sessions devoted to the ED&elated topics.

Basically, an EDS supports applications that require “knowledge-directed processing of shared information [4].” Expertise may reside within the system to improve performance by providing intelligent question-answering, using database semantic

data and make decision-impelling recommendations to users. Thus the goal of EDS research and development is to provide tools and techniques to make databases “active” agents that can reason, and to allow database systems to support artificial intelligence applications that manage and access large knowledge bases and databases.

The special appeal of EDS is that they evoke a variety of ways in which knowledge and expertise can be incorporated into system architectures. One can envision several possible scenarios:

(1)

(2)

(3)

(4)

(5)

an expert system loosely-coupled with a database system; a database management system (DBMS) enhanced with reasoning capabilities to perform knowledge-dir~ted problem solving; an LP system or an AI knowledge representation system, enhanced with database access and manipulation primitives; an intelligent user interface for query specification, optimization and processing; and a tightly-coupled EDS “shell” for the specification, management and manipulation of integrated knowledge-databases.

integrity constraint for query optimization and com- All of the above architectures are meaningful, and bining knowledge- and data-driven search tech- there indeed may be many others. The particular one niques in efficient inference schemes. Conversely, chosen will depend on the application requirements expertise may reside outside the system in knowledge- and the availability of tools for their implementation. based applications that interpret vast quantities of The terms “loosely-coupled” and “tightly-coupled”

151

152 LARRY Kutsci+~ERG

have come to characterize two important classes of EDS.

By loose coupling we mean that both the AI system and the DBMS will maintain their own functionality and wit1 communicate through a well-defined interface. For example, an AI system might send SQL queries to the database system, and conversely, the database system might send messages to the AI system placing data onto its blackboard. In addition the DBMS could pose questions to the AI system in much the same way a user might consult an expert system. Examples of such architectures include [5-S].

~~ght-co~pZ~~g, on the other hand, implies that at least one system has knowledge of the inner workings of its counterpart, and that special, performance- enhancing access mechanisms are provided. Applica- tions that require tight coupling are those in which both the data and knowledge may be updated and in which the most timely changes must be accessible. Examples of such systems are POSTGRES [9] and the work reported in [lo].

2. THE NEED FOR EDS

Let us explore the motivations for EDS architectures in terms of different types of applications. The reality of present day enterprises indicates that data is considered a corporate-wide resource to be managed by the Data Administration function. Much of that data is managed by a one or more DBMS, and corporations have made major investments in database applications. These applications will continue to evolve to meet changing organizational requirements. The data engineering process refers to database requirements specification, database design, implementation and maintenance.

On the other hand, knowledge-based processing of corporate databases is relatively new, as is the process of acquiring and using knowledge, called knowledge engineering. While data is viewed as static and fac- tual, knowledge is considered to be a dynamic and complex commodity used to solve problems. Thus, data and knowledge go hand-in-hand, and it behoves organization to manage both.

Organizations are realizing that knowledge-based applications can serve as a mechanism for competitive a~vant~e by providing reasoned advice for decision- making. Most of the applications being developed are well-guarded secrets, because the mere mention of the existence of an EDS project may result in the loss of the perceived competitive advantage. There are, however, several case studies reported in the literature, and they will be addressed through the loosely- and tightly-coupled viewpoints.

The most obvious approach to data/knowledge integration is to interface a knowledge-based “shell” or an AI language (e.g. Prolog) with a DBMS. A commercially available architecture that uses this

technique is the KEEconnection [8] from Intelli- Corp which allows the frame-based, object-oriented Knowledge Engineering Environment (KEE) to be tailored so that KEE objects--whose data are stored in a DBMS-can be materialized through an interface that generates SQL queries to the DBMS. Loosely-coupled applications are typically those that view the database as a data server, with knowledge- based processing used to interpret data obtained by issuing SQL queries to the database. The amount of expertise and knowledge required to construct the interface will depend on the nature and the amount of data being retrieved, as well as the amount of preprocessing needed to formulate the query.

For example, American Express’ Authorizer Assis- tant [ 1 I] uses about 800 rules to summarize company policies and guidelines for credit-worthiness. These rules reside in a knowledge base supported by the expert system shell, ART running on a Symbol- its 3645 which can call the Credit Authorization System’s database residing on the IBM 3033 processor. The data needed is a customer’s current authorization request and past spending history.

In the MEDCLAIM system [12] developed for Blue Cross/Blue Shield of South Carolina, a loosely- coupled EDS architecture was implemented: the M.1 expert system shell from Teknowledge, Inc. contains about 120 claims evaluation and BC/BS policy rules, and a relational database of 6000 records represent- ing medical knowledge and BC/BS policy regulations about the medical necessity and adequacy of the treatments for 1000 well-established diseases. The ES shell was used to codify problem-solving knowledge regarding valid claims and the relationships among claim items. The relational database was used to encode highly-structured and formatted knowledge regarding typical recuperation profiles and treatment plans for the 1000 most common diseases and diag- noses. Thus, both the ES shell and the relational database contain dl~ere~t types of k~o~i~edge about the application domain.

The novel aspect of having highly-structured and formatted knowledge (as data) is the simplification and automation of knowledge acquisition from an expert. A graphics-oriented interface was constructed so that the expert could “map” both medical and BC/BS knowledge directly into the database. The data abstraction and organization techniques provided by a semantic data model proved quite useful. In fact, this experience proved to our team that an integrated approach to knowledge and data-engineering is essential to modeling EDS applications [12].

Our team came up with an adage: “Never partition a knowledge base before its time.” As soon as the knowledge base is partitioned into rule-based and data-based component, one must then also worry about the interfaces between them. Changes to either one will impact the other. The project was so successful that BC/BS has implemented a production version of MEDCLAIM. It is written in COBOL

Expert database systems 153

and runs as part of the normal BC/BS claims processing operation.

Both applications cited above require limited amounts of data and the database queries are well- defined. An application requiring more domain-specific knowledge about the database(s) concerns heterogeneous database query processing. Assume that the goal is to support a uniform user interface to access multiple heterogeneous databases, each having different query formats, data models and organizations. Clearly the knowledge-based front-end should be responsible for query refinement through a knowledge-based thesaurus, query decomposition and re- formulation for target databases, and query result assembly for user reports. Extensive knowledge about the database contents and query formats is essential to this approach.

Thus, in some cases, extensive knowledge about the other enrivonment is needed to support a “narrow communication channel” between environments, e.g. database queries. It appears that many new applications will involve the interaction of heterogeneous, distributed knowledge- and data-based systems [13, 141 so that loosely-coupled functional architectures involving knowledge-based system coordination will be essential. One might argue that the extensive knowledge used to support the interface is a form of tight-coupljng.

2.2. Tightly coupled applications

Stonebraker and Hearst [15] analyze the drawbacks of loose-coupling of AI-DBMS:

1. The rule base is memory resident while the database is disk resident, and changes to the rule base may not be saved unless the user intervenes to expressly save it. In addition, the rule base is not shared among multiple users.

2. The AI system may request data through a query and store the result in a cache for performance reasons. The database, however, may be updated in which case the inference engine would be using out-of-date info~ation, unless the data were “locked” by means of a transaction. The performance of the system might be degraded because updates to the selected objects would not be permitted.

3. There are some non -partitionab~e applications in which the shell must retrieve the entire fact base in order to perform its reasoning. The cache size would have to be very large.

The application domain and its performance requirements will be instrumenta in determining the overall EDS architecture. For example, Smith [4) provides two military applications that require a very tighf coupling between multiple expert systems and a database system:

I. An automated map production system in which the DBMS maintains a time-varying

2.

model of the world’s surface. Raw image data is relayed by satellites, and this data must be analyzed by expert systems to extract signifi- cant features. Information regarding these features is used to update the map database. The performance bottleneck is in the feature analysis, while the map database may be updated at a later time. The database size is approx. IO**19 bytes. A system for Naval tactical situation assess- ment for use on board ship. Static information consists of maps, charts, ship characteristics and weapons characteristics. Dynamic data includes contact reports on the position and actions of other ships and aircraft in the vicin- ity. The database must process hundreds of contact reports per second and alert expert systems which analyze potential threats.

These two applications provide certain requirements for an EDS:

(a) a query language and data model that can express the “semantics” of space and time;

(b) the batched reintegration of updated replicated information in a possibly distributed database;

(c) efficient processing of a large number of situation-action rules (triggers) over the database on the arrival of new information; and

(d) the optimization and processing of recursive logic rules over the database in reponse to query requests.

Several interesting proposals have been made to include the processing of situation-action rules in DBMS. The approach suggested by Stonebraker and Hearst [15] is a tightly-coupled EDS architecture with the POSTGRES [9] rule system as an example. Another approach suggested by Delcambre [lb, 171 is to extend the database query language SQL to include the specification of queries containing production rules. Finally, Sellis et al. [IS] propose the use of an existing relational DBMS to implement a generic production rule processor. This area of research is extremely important because of the many applications that require near real-time knowledge-based processing of large collections of constantly updated data.

In subsequent sections we discuss how the fields of database management, artificial intelligence, and logic programming are dealing with these and related requirements in conceptualizing EDS architectures.

2.3. The role of databuse management

Database management systems (DBMS) support the organizational concept the data is a corporate resource that must be managed, refined, protected and made available to multiple users who are authorized to use it. Thus, concurrent access to large shared databases is the raison d’etre of DBMSs. The

154 LARRY KERSCHBERG

commercial success of DBMS supporting the relational model of data has enabled users to develop information sytems that query and update large, structured collections of data that are viewed as “flat files”.

However, DBMS are used in increasingly more complex environments, those in which entities are related in very complex ways, and the rules governing their behavior in response to updates need to be made explicit, rather than be hidden in application programs. The above requirements indicate the DBMSs should also be enhanced with reasoning capabilities for performance reasons.

For example, there is a need to manage different types of data: text, graphics, images, computer-aided design (CAD) schemes, etc. Also, DBMS are being used to support complex environments such as software engineering environments (SEE), configuration management (CM), etc. Traditional DBMS are hard- pressed to handle these new duties, and more robust systems-those supporting more semantically meaningful “data models” and new features such as exten- sibility and long transactions-are being proposed.

This new class of system is called “object-oriented database systems” (0-ODB) [19,20]. The goal of 0-ODB is to provide data models whose structures, operations and constraints can deal with complex objects as they are! This implies that the objects may be hierarchical, and the operations on them need not be decomposed into simpler operations. The modeling paradigm for these systems is based on object-oriented programming as exemplified by the Smalltalk language [21].

In 0-ODB the complex objects are organized into typed classes of objects, and each class has associated with it a collection of permissible operations called methods. In order to perform an operation on an instance (read member) of a class, a message must be sent to the class requesting that the operation be performed. Thus, object classes know what types of operations are admissible, and they are responsible for the execution of those operations.

In addition, the object classes are organized in type hierarchies in the traditional AI sense with property inheritance. Thus, if an object class does not have the requested method associated with it, the method may be found by moving up the type hierarchy. Such hierarchies are very important for the organization of both data and knowledge; they provide a powerful mechanism for placing data attributes and knowledge predicates (rules, constraints, methods) at the appropriate hierarchy level and object class.

There are many important issues to be addressed when building 0-ODB [19]. They are outlined below:

0 Data abstraction and encapsulatio*An object class has a set of operators similar to abstract data types. In terms of the implementation interface, there must be both a public and a private interface.

0

0

l

0

0

0

a

0

Object identity-Every object has a unique identifier which is independent of the particular property values that the object may have. Messages-Objects communicate by sending messages to one another, and each message consists of a receiver object-identifier, the message name and message arguments. Property inheritance-The class hierarchies provide a degree of economy of specification by allowing generic properties to be defined for higher-level object classes and more specialized properties to be associated with lower-level object classes. GraphicsComplex objects and their methods can best be understood and manipulated through object-oriented graphics interfaces. Transaction management-In many complex applications transactions may run for long time periods, and effective methods are needed to handle consistency, concurrency control, recovery, etc. ProtectioeObjects must be protected at the instance level. This is particularly important when multiple users are collaborating on the design of these objects, so that version control is an important issue. Access management-Specialized access paths, storage structures and main memory management are essential for complex objects. Methologies for object-oriented database de- sig*The object oriented paradigm requires new approaches to database design. It is important to be able to model not only data but also knowledge about objects.

Notable projects currently underway to construct the next generation object-oriented DBMS are POSTGRES at UC-Berkeley [9], PROBE at Com- puter Corporation of America [22], EXODUS at the University of Wisconsin [23], GEMSTONE at the Oregon Graduate Center and Servio Logic Develop- ment Corp. [24], ORION at MCC in Austin, Texas (251 and GENESIS at the University of Texas at Austin [26].

2.4. The role of AI research

Researchers in AI have become increasingly aware of the need and advantages of EDS architectures. From the AI view, there is a need to have knowledge- based applications access large databases. Reid Smith [27] has pointed out that the reasoning (inference) component of a knowledge-based system is but a small fraction (6%) of the total system code; real systems require the integration of diverse and possibly distributed data and knowledge sources.

As knowledge bases become larger, they pose serious system performance problems. The pattern- matching processes involved in determining which rules are candidates for “firing” are performance bottlenecks because the rules are not indexed with


respect to their component predicates. Most expert systems rely on operating system virtual memory

techniques [28], and overall performance degrades under heavy paging requirements. Thus secondary storage access mechanisms are desirable for AI systems. They can be used to index a rule base and to provide fast access to facts stored in database files. The management of a knowledge base is an area of open research, although some results have been

obtained [29, 301. Another important avenue of AI research that

impacts EDS is work concerning the Knowledge Level, and the insights gained by asking modal questions regarding the knowledge base and knowledge derived through inference [31, 131. By taking a fiinctionul view of a knowledge base as a knowledge server, one can “tell” and “ask” the knowledge base what it knows. One fundamental result of their work is that AI knowledge representation systems require complicated processing and interpretation of sym- bolic information and axioms associated with the “world” being modeled. The processing involved is quite complex, exceeding the capabilities of current database systems; but it may be possible to characterize different types of knowledge “engines” that are amenable to support by EDS. For an excellent review of the International Conference on Expert Database Systems--especially the Keynote Address and the Panels-the reader is referred to [32].

2.5. The role of logic and logic programming

Logic and logic programming play an important role in EDS architectures. Logic provides a formal basis for both relational databases and database theory. For example, logic may be used to extend the expressive power of query languages to include recursive queries that handle the transitive closure operation found in applications such as the inventory partsexplosion problem, CAD/CAM and routing problems. In addition logic and logic programming provide efficient mechanisms to integrate data, meta-data, that is, data about data or schema information, domain-specific knowledge and control knowledge.

Logic views databases from two points of view: (1) the model theoretic approach; and (2) the proof theoretic approach. In the model theoretic approach the database definition, or schema, is viewed as a time-variant definition, a theory, specified by means of data structure definitions and integrity constraints. The database state, that is, the collection of data instances, at any time is an interpretation of that theory. Integrity contraints are proper axioms that the database state must satisfy at all times. Queries on the database are expressed as well-formed formulae to be translated into relational operations. Tradi- tional DBMS such as relational systems adhere to this viewpoint.

The proof theoretic or deductive database approach there is no separation of the schema and

the data. The database is represented as a first-order

theory with equality. Facts (data instances) are repre-

sented as ground well-formed formulae (wffs) of the first-order theory. The set of proper axioms provides the wffs for deduction and integrity constraints. The set of theorems constitutes the implicitly derivable information. Queries are considered theorems to be proved from the axioms.

Logic programming provides a computational lan-

guage, Prolog, for the proof theoretic approach. Prolog processes Horn Clauses which are a subset of First-Order Predicate Calculus. However, Prolog is not logic programming because it has several addi- tional features not found in logic programming:

0

0

0

a

A

Built-in I/O predicates to read and write to and from terminal and databases. Control of search through depth-first-search and backtracking. Built-in predicates such as Cut and Fail which can control the inference process. Performance sensitivity to the order of predicates in the knowledge base, and the order of terms within predicates.

well-known phenomenon is the “impedance mismatch” between Prolog and relational databases;

Prolog evaluation is tuple-at-a-time due to the unifi- cation process, while relational databases perform associative retrieval on large collections of data. This presents performance problems in a loosely-coupled architecture with a Prolog-based knowledge system and a relational DBMS.

Several excellent articles on the LP view of EDS are found in [33-351. Recent research in LP has focused on more tightly-coupled Prolog-DBMS architectures [lo], which take advantage of the DBMS data dictionary information regarding file index and storage

structures to control access to data, and to pre-fetch collections of data into Prolog’s fact base for more

efficient processing. Logic is a natural way to specify complex queries

to a DBMS, but systems using Prolog as the query language require the user to be aware of Prolog’s inference strategy and the order of evaluation. This results in the user’s having to specify the order of query predicates so as to avoid performace problems. The Logic-Based Data Language [36] alleviates the requirement that the user specify predicate ordering and, in addition, it eliminates the “impedance mismatch” by compiling logic queries into an extended relational algebra.

3. EDS ARCHITECTURES FOR INTELLIGENT INFORMATION SYSTEMS

EDS architectures will have a major impact on the next generation of software and hardware systems that support knowledge-directed processing of large, shared databases. The research and development

156 LARRYKERSCHBERG

goals and requirements of EDS affecting intelligent information systems are:

1. A unified and formal knowledge/data model that captures not only data semantics (i.e. objects, properties, ass~iations) but also know~dge semaatics (i.e. concepts, rules, heuristics, uncertainty, scripts, etc.) of an application,

2. The EDS specification and manipulation languages should have modeling primitives to express and reason about causal, temporal and spatial relationships. It should also have facilities to allow for user-extensible, object-oriented views of the enterprise or application domain.

3. The EDS architecture will merge pattern- directed search and inference, such as those found in production, rule and logic-based systems, with DB associative retrieval and join processing to provide efficient knowledge- based query processing, semantic integrity maintenance, as well as constraint-directed reasoning and system control. This merging of tools and techniques will require a Knowledge Encyclopedia to manage internal system knowledge as well as domain- specific knowledge and data, and to “package” it for the various tools that will access the knowledge/database.

4. The EDS should provide facilities for inexact reasoning over large databases, and an explanation facility will justify the reasoning process to application developers and users.

The above-mentioned EDS features will pro- foundly influence the design and development of intelligent information systems. The traditional phased, linear, development lifecycle will be replaced by an iterative, interactive, fast prototyping mode. In addition, the knowledge-based approach will pro- mote llpw classes of applications in which the knowledge and data engineers work with an expert to elucidate strategic and domain-specific knowledge and data organizations. These new applications represent a vertical extension of applications beyond the well-defined, predictable, transaction-oriented systems to those with knowledge-directed reasoning and interpretation under conditions of uncertainty.

Just as DBMS are used to manage data as a corporate resource, EDS will manage knowledge as a corporate resource. The availability of a unified Knowledge/Data Model will facilitate a specification- based approach for creating knowledge schemes that express the semantics of both knowledge and data. The Knowledge Encyclopedia will represent both system and application knowtedge in an object-oriented view in which object behavior is specified by explicit constraints [37, 381. These constraints will be made available to tools that will help to design and manage the knowledge base. In fact, the Knowledge Encyclopedia will also be used to formulate user-in-

terfaces for knowledge acquisition, and to provide a meta-explanation facility that, by accessing both system and domain-specific knowledge, could document and explain system structure and dynamics to the Knowledge/Data Administrator. The following sections present the major ideas regarding the Knowl- edge Data model and a prototype system called KORTEX.

3.1. The Knowledge Data Model

Recently, Potter and Kerschberg have developed the Knowledge Data Model (KDM) [30]. The KDM belongs to the class of semantic data models. It provides an object-o~ented view of both data and knowledge. KDM modeling primitives include those found in semantic data models, object-oriented models and AI knowledge representation formalisms.

The KDM represents an evolution of the Func- tional Data Model [39,40] and the PRISM system [37,38]. It contains modeling features that allow the semantics of an enterprise to be captured. These include data semantics, as captured by semantic data models and knowledge semantics, as captured in knowledge-based systems, such as heuristics, uncertainty and constraints. The KDM modeling primitives are:

Generalization-Generalization provides the facility in the KDM to group similar objects into a more general object. This is done by means of the “is-a” relationship. This generalization hierarchy defines the inheritance structure.

Classificatio~Classification provides a means whereby specific object instances can be considered as a higher-level object-type (an object-type is a collection of similar objects). This is done through the use of the “is-instance-of” relationship.

AggregatioeAggregation is an abstraction mechanism in which an object is related to its components via the “is-part-of” relationship.

MembershipMembership is an abstraction mechanism that specifically supports the “is- a-member-of” relationship. The underlying notion of the membership relationship is the collection of objects into a set-type object. For example, an attack-mission may be the a set object-type with member object-types land-assaa~t-team and air- assauit -team.

Temporal-Temporal relationship primitives relate object-types by means of synchronous and asyn- chronous relationships.

Constraint-This primitive is used to place a constraint on some aspect of an object, operation or relationship via the “is-constraint-on” relationship. For example, for an air squadron there may be a limit on the number and types of aircraft participating.

Heuristic-A heuristic can be associated with an object via the “is-heuristic-on” relationship. These are used to allow the specification of rules and knowledge to be associated with an object. In this


way object properties can be inferred using appropriate heuristics.

These primitives give a designer the abstraction mechanisms requisite for flexible knowledge and data modeling. They extend semantic data models by:

1, Incorporating heuristics to model inferential relationships.

2. Organizing and associating these heuristics with objects.

3. Providing a tight-coupling (at the specification level) between data objects and knowledge concepts.

Architecturally, the KDM integrates various levels of data and meta-data. These ideas are common to the PRISM system [37,38] and some knowledge- based systems. In particular, the KDM allows a uniform description of, and access to: (1) a populated knowledge/data base organized in terms of a valid KDM schema; (2) the KDM schema (meta-data in the Data Dictionary sense); (3) the KDM meta- knowledge, e.g. the primitives, functions and predicates used to manage the KDM itself; and (4) the system specification knowledge (strategic and methodological knowledge about how the system will allow access to and use of the KDM support tools).

The KDM philosophy allows Knowledge and Data Administrators to tailor user views and even data models (e.g. an Entity/Relationship model) using KDM primitives. The specification language for the KDM is now discussed.

The KDM specification language is the Knowledge Data Language (KDL) [41]. The KDL has two primary constructs, the object-type and the attribute. These are similar to the notion of entity type and function in the Functional Data Model and the DAPLEX language [40]. An object-type is a collection of homogeneous objects having similar properties; the object-type corresponds to real-world objects or concepts.

An attribute corresponds to a property or charac- teristic of an object. In KDL, an attribute of an object-type relates, associates or maps an object of an object-type to either another object, a set of objects, or a list of objects in a target (or range) object-type. The KDM and KDL support attributes that may be computed or inferred. thereby allowing knowledge- based reasoning processes to deduce attribute values and materialize complex views.

In Fig. 1 is a template written in KDL to define a generic object-type specification. An object-type can be defined together with its attributes, value-types, any groupings of attributes, and constraints or heuristics associated with attributes. In additon, any subtype or supertype object-types may be specified, together with appropriate constraints or heuristics.

object-type: OBJECT-TYPE-NAME has [attributes:

( Al-l-RIBUTE-NAMEz [set of/list ofl VALUE-TYPE /rdefauh is single-valued composed of [ATTRIBUTE-NAME,)]

with consuaints (predicate,]] with heuristics (RULE,)];)]

[subtypes: (OBJECT-WE-NAMEJI

[constraints : iH-41

[heuristics : ble,)f

P’successors, predecessors, and concurrents are temporal primitives [successors:

(OBJECT-TYF’E-NAME,)] lpredeCe.%ors:

(OBJECT-TYPE-NAME,]] [concur-rents :

(OBJECT-TYPE-NAME,)] [members:

{MEMBER-NW MEMBER-TYPE1 1 [instances:

fINSTANCE, 1 end-object-type

Fig. I. The KDL template for object-type specification.

Finally, temporal relationships among the object- type and other object-types may be given. Instances or members of object-type can also be specified in the “frame.” Thus, meta-data and knowledge, in the form of constraints and heuristics, as well as data instances, may exist in the same object-type specification.

In the same way that templates can be used for object-type specification, they can also be used to define KDM meta-data concerning the model itself, that is, the specification of object-types, attributes, constraints and heuristics are defined in KDL.

The KDL supports a query facility similar to DAPLEX and allows updates both to the schema and to database instances. However, no facilities as yet exist to study the impact of changes to the database schema or to the constraints or heuristics.

Figure 2 depicts a hypothetical Air Force example in terms of the KDM and its graph representation formalism which is based on the Functional Data Model. Single-, double- and triple-headed arrows denote single-valued, set-valued and list-valued functions, respectively. The rectangles denote object- types. Here we note that some objects are standard while others are inferred (virtual) (e.g. Aces and the Strength of a Squadron). In this particular example the concept of a squadron’s strength is based on the missions flown, aircraft scheduled, the number of pilots and the ratio of Aces to regular pilots. The English statement for the squadron’s strength is:

A squadron’s strength is “Strong” if the squadron conducts at least 50 missions per month, schedules more than 20 aircraft, has at least 10 pilots, and the ratio of Aces to pilots is at least 0.25.

158 LARRY KERSCHBERG

KDM Virtual Objects

Fig. 2. An Air Force example

The associated heuristic, SH, for the strength attribute is:

For EACH s in Squadron SH-C 1: IF COUNT(conducts(s)) > 50 AND SH-C2: IF COUNT(has-pilots(s)) > 10 AND SH-C3: IF COUNT(schedules(s)) > 20 AND SH-C4: IF RATIO(COUNT(aces(s))

TO COUNT(pilots(s))) > 0.25 THEN RI: strength(s) = “strong”

Note that the virtual type Aces can also be defined in terms of a heuristic that defines those pilots qualify- ing to be considered Aces. In processing the strength heuristic, the KDM processor would first evaluate the Aces (possibly) through inference and then evaluate the strength heuristic.

An important issue is that the KDM’s constraint and heuristic primitives allow knowledge-based concepts to be specified as easily as database structures. This capability is similar to database views, but the view is very tightly associated with the schema. In addition, views involving views are quite natural, e.g. the Aces virtual type used in the strength heuristic.

3.3. The KORTEX EDS shell

KORTEX [42] is an experimental EDS prototype that embodies many of the concepts of the KDM. A discussion of the major features follows.

3.3.1. The KORTEX conceptual framework. KORTEX is a prototype semantic database system which embodies many of the concepts of the KDM, especially the (Attribute, Object, Value) paradigm, and also supports a limited inferential capability.

KORTEX provides users with an extended Entity/ Relationship view of data together with rule-basd knowledge representation for inferring attribute values and materializing user-specified virtual objects.

The user represents all entities (types) (such as aircraft, personnel) directly as objects with their corresponding attributes (such as id-number, name, years-of-seruice). In a similar fashion, the user represents all relationships among entities (such as Personnel-assigned-to-aircraft) as objects. These relationships may also have attributes (such as date- of -assignment ). In addition, one may specify that one entity is a generalization or specialization of another entity (such as Colonel IS-A Ojicer). Finally, an attribute value may be determined from data values, by means of computed formulas or through inference rules (such as service-required-of-aircraft based upon miles-flown, age-of-aircraft, etc.). The system functionality is next described, followed by a specification of the system architecture.

3.3.2. The KORTEX user interface and data/knowledge specljication. KORTEX has an interactive menu system that guides the user to specify relationships among entities. This contrasts with the usual situation found in relational database systems, in which relations are not explicitly represented by the database system, but instead are buried within the data. By allowing relationships to be treated as objects in their own right, the KORTEX model allows users to specify desired attributes for relationships.

KORTEX uses interactive menus to obtain user specification of entities and relationships. Entities are characterized by their attributes. Attribute values may be numeric (integer or real) or character strings (such as names). A new feature allows attribute values to also be set valued (such as one of [low, medium, high]), group items (such as address, com- prised of street, city, state and zip code) or Boolean (true or false) (such as for critical-need). Moreover, the user may place constraints upon appropriate attribute values (such as 0 < age < 100).

Attribute values may be specified by the user upon data entry (such as year-of-birth is 1950). Alternately, attribute values may be determined by a user-specified formula, such as age is current-year -year-of- birth. Finally, a collection of rules may be invoked to infer attribute values from other entity attributes (such as critical-shortage of item inferred from values of quantity-on-hand and degree-of-importance). This last feature is an example of the novel Knowledge Component of the database system.

A central advantage of this embedded inferential knowledge is that such information need be specified only once by the user while constructing the database. Hence, such knowledge need not be specified repeat- edly in separate application programs using the database. The KORTEX rule-based knowledge system has further system applications, including the specification of virtual entities, which appear as distinct


objects to the user, but are actually inferred by the system from a related entity. There is a trade-off between explicitly storing data resulting from an inference process or recomputing it every time it is needed. The trade-off involves the computational costs of inference vs the storage cost of materialized views. Another aspect is the update activity against the database, and the effects it might have on the results of the computed view.

KORTEX also supports the notions of generalization and specialization. That is, when a user adds a new entity type to the database, KORTEX will allow the user to specify whether this new entity type is a generalization or specialization of any other entity within the database. Since a specialized entity inherits all attributes of its generalized entity, this greatly reduces the effort required to describe an entity. It also simplifies the logical interactions of the database entities, aiding the user in “keeping track” of the overall database structure Further, the system supports the explicit inheritance mechanism associated with generalization hierarchies.

The functionality just described conforms to that of a knowledge-based extended Entity-Relationship model. The Entity-Relationship model has been used in practice to aid in hand-developing database sche- mas. KORTEX has automated this process to better aid database development. Adding the recent semantics of specialization and generalization further in- creases the expressive power of this database specification tool. The object-oriented semantic data/knowledge model provides a mechanism to define modular data/knowledge “chunks.” Finally, concepts from expert or rule-based systems are incorporated in KORTEX, making this an extremely expressive database system employing tools and techniques from both database and artificial intelligence systems.

3.3..3. 7%~ knowledge data kernel. KORTEX is structured as an object-oriented system and is written in both Franz LISP and COMMON LISP. Each prototype currently has about 5000 lines of LISP code. All major database constructs (such as entities and relationships) are modeled as objects. Inforrna- tion pertaining to an object (such as attributes) resides with that object, so that a database construct and its components reside together as a unit. These objects are represented as frames. Individual units of information are represented within a frame using the attribute-object-value paradigm of the Knowledge Data Model (such as “(type (entity (aircraft)))” to represent “aircraft has type entity”). This structure facilitates the management of the knowledge component of KORTEX.

There is a methodological component in KOR- TEX. Each object in KORTEX is developed in stages of increasing detail. The first stage of development captures the overall structure of an object and its interrelationships (that, is, the database schema). The successive system representations decompose each

object into “concepts,” with assigned unique concept numbers. Concepts refer to object characteristics such as generalization/specialization, relations with other objects, and attributes. Concepts are used to bring together metadata associated with objects. It is here that attribute details are listed, including user- specified formulas or inference rules (in the form of AND/OR trees). KORTEX uses the concept numbers to index object information as needed for retrieval and manipulation.

After the user has completed the specification of the database schema, KORTEX allows him/her to enter database instances. Each database instance is in the “is-instance-of” relationship with either an entity type or a relationship type. During specification, the system enforces all user-stated constraints (such as value limits and key specifications). At any time, the user may browse the database to retrieve data based on a variety of conditions that may be placed on attribute values (such as age > 30 and years-of- service < 10). There is also a browsing capability to list general database schema information. Database instances are also stored as objects (or frames) by the system. Specific object information is stored with its corresponding concept number, to aid in data retrieval.

4. CONCLUSIONS

The integration of concepts, tools and techniques from DB, AI and LP-as embodied in EDS-will create new and revolutionary environments for the specification, design, prototyping and maintenance of Intelligent Information Systems. Many researchers and practitioners are providing insights and proto- types that lead to new architectures for the software and hardware systems of the 1990’s and beyond.

Acknowledgements-The author would like to thank Richard Baum for his help in perfecting the KORTEX system and to J. Hung who implemented the KORTEX system as part of his M.Sc. Degree.

REFERENCES

[l] L. Kerschberg (Ed.). Experr Database Systems: Proceedings from the First International Workshop. Benjamin/Cummings, Menlo Park, Calif. (1986).

[2] L. Kerschberg (Ed.). Expert Database Systems: Pro- ceedings from the First Int. Conf: Benjamin/Cummings, Menlo Park, Calif. (1987).

131 L. Kerschberg (Ed.). Expert Database Systems: Pro- ceedings from the Second International Conference (George Mason University, Fairfax, VA, 1988). Ben- jamin/Cummings, Menlo Park, Calif. (1988).

[4] J. M. Smith. Expert database systems: a database perspective. In [l].

151 C. L. Chang and A. Walker. PROSQL: a Prolog _ . programming interface with SQL/DS. In [l].

161 Y. E. Ioannidis. J. Chen, M. A. Friedman and M. M. e ’ Tsangaris. BERMUDA-an architectural perspective

on interfacing Prolog to a database machine. In [3]. [7] B. Napheys and D. Herkimer. A look at loosely-

coupled prologidatabase systems. In [3].

LARRY KERSCHBERG 160

PI

[91

PO1

[111

WI

1131

[I41

[I51

iI61

v71

I181

[I91

PO1

PI

P21

[231

v41

WI

1261

R. M. Abarbanel and M. D. Williams. A relational representation for knowledge bases. In [2]. M. Stonebraker. Object management in POSTGRES using procedures. In [20]. S. Ceri, G. Gottlob and G. Wiederhold. Interfacing relational databases and Prolog efficiently. In [2]. D. Leinweber. Knowledge-based systems for financial applications. IEEE Expert Fall, 18-31 (1988). J. Weitzel and L. Kerschberg. Developing knowledge- based systems: reorganizing-the systems development life cvcle. Commun. ACM 32(4), (1988). R. J: Brachman and H. J.. ‘Levesqde. Tales from the far side of KRYPTON. Expert Database Sys- tems: Proceedings from the First International Con- ference. Benjamin/Cummings, Menlo Park, Calif. (1987). M. L. Brodie (Chair), D. Bobrow, V. Lesser, S. Mad- nick, D. Tsichritzis and C. Hewitt. Future artificial intelligence requirements for intelligent database systems. Panel Report, Expert Database Systems.’ Pro- ceedings from the Second International Conference. Benjamin/Cummings, Menlo Park, Calif. (1988). M. Stonebraker and M. Hearst. Future trends in expert data base systems. In [3]. L. M. L. Delcambre and J. N. Etheredge. The relational production language: a productional language for relational databases. In [3]. L. M. L. Delcambre. RPL: an expert system language with query power. IEEE Expert Winter, 5161 (1988). T. Sellis, C.-C. Lin and L. Raschid. Implementing large production systems in a DBMS environment: concepts and algorithms. Proc. ACM SIGMOD Conf., May (1988). C. Zaniolo. Prolog: a database query language for all seasons. In [I]. K. Dittrich and U. Dayal (Ed.). Proc. 1986 Int. Work- shop on Object-Oriented Database Systems, ACM and IEEE (1986). A. Goldberg and D. Robson. Smalltalk-SO: The Language and Its Implementation. Addison-Wesley, Reading, Mass. (1983). F. Manola and U. Dayal. PDM: an object-oriented data model. In [20]. M. J. Carev el al. The architecture of the EXODUS

I

extensible DBMS. Proc 1986 Int. Workshop on Object-Oriented Database Systems, ACM and IEEE (1986). D. Maier and J. Stein, Indexing in an object-oriented DBMS. In [20]. W. Kim. Architectural issues in object-oriented databases. MCC Technical Report Number ACT-OODS- 115-89 (1989). D. Batory. GENESIS. a project to develop an extensible database management system. Proc. 1986 Int. Workshop on Object-Oriented Database Systems, ACM and IEEE (1986).

1271

PI

~91

1301

1311

~321

[331

I341

[351

[361

I371

13’31

I391

1401

R. Smith. On the development of commercial expert systems. The AI Mag. S(3), 61-73 (1984). M. Deering and J. Faletti. Database support for storage of AI reasoning knowledge. In [I]. L. Cholvy and R. Demolombe. Querying a rule base. Expert Database Systems: Proceedings from the First International Conference. Benjamin/Cummings, Menlo Park, Calif. (1987). W. D. Potter and L. Kerschberg. The knowledge data model: a unified approach to modeling knowledge and data. Proceedings of the IFIP DS-2 Working Confer- ence on Knowledge and Data (R. Meersman and J. Sowa, Eds), Albufeira, Portugal, November, 1986; also in Darn and Knowledge (R. Meersman and J. Sowa, Eds). North-Holland,-Amsterdam (1988). R. J. Brachman and H. J. Levesaue. What makes a knowledge base knowledge? A view of databases from the knowledge level. Expert Database Systems: Proceedings from the First International Workshop. Benjamin/Cummings, Menlo Park, Calif. (1986). R. P. van de Riet. Expert database systems, conference report. Future Generations Computer Systems, Vol. 2, No. 3, pp. 191-196. North-Holland, Amsterdam (1986). D. S. Parker er al. Logic programming and databases, Working Group Report. In [I]. E. Sciore and D. S. Warren. Towards an integrated database-Prolog system. In [I]. C. Zaniolo et al. Object oriented database systems and knowledge systems. In [I]. S. Tsur and C. Zaniolo. LDL: a logic-based data language. Proc. of the 12th Int. Conf Very Large Data Bases, Kyoto, Japan (1986). A. Shepherd and L. Kerschberg. PRISM: a knowledge- based system for semantic integrity specification and enforcement in database systems. Proc. ACM SIG- MOD Int. Conf on Mgt of Data, Boston, pp. 307-315 (1984). A. Shepherd and L. Kerschberg. Constraint management in expert database systems. Expert Database Systems: Proceedings from the First International Workshop, pp. 309-331. Benjamin/Cummings, Menlo Park, Calif.-(1986). E. H. Siblev and L. Kerschbera. Data model and data architecture considerations. Proc. National Computer Conf, pp. 85-96. AFIPS Press, Reston, VA (1977). D. W. Shipman. The functional data model and the data language DAPLEX. ACM Trans. Databases Syst. 6(l), 140-173 (1981).

[41] W. D. Potter, R. P. Trueblood and C. M. Eastman. KDM/KDL: a hyper-semantic data model and specification language. Engineering. North Holland, Amsterdam (1990).

[42] L. Kerschberg, R. Baum and J. Hung. KORTEX: an expert database system shell for a knowledge based entity relationship model. In?. Conf. on the Entity/Relationship Approach (1989).

Documents

Expert database systems: Knowledge/data management environments for intelligent information systems