9
Pergamon Expert Systems With Applications, Vol. 8, No. 3, pp. 381-389, 1995 Copyright © 1995 Elsevier Science Ltd Printed in the USA. All fights reserved 0957-4174/95 $9.50 + .00 0957-4174(94)E0029-T Knowledge-Based Systems Verification: A Machine Learning-Based Approach HAKIM LOUNIS Laboratoire de Recherche en lnformatique, Universit6 de Paris-Sud, Orsay, France Abstract-- This paper addresses the problem of verification of knowledge bases. It presents a knowledge- based system ( KBS) verification approach that considers system specifications and, consequently, knowledge bases to be partially described when development starts. This partial description is not necessarily perfect, and our work aims at using machine learning techniques to progressively acquire new knowledge and then improve the quality of expert system knowledge bases ( KB ) by coping with two major KB anomalies: incompleteness and incorrectness. The KBs considered in our approach are expressed in different formalisms. 1. INTRODUCTION NOWADAYS THERE STILL ARE FEW commercialized knowledge-based systems (KBS), The major reason is the lack of a strict validation step in their life cycle. There is widespread agreement that KBSs cannot be designed in a linear fashion. This is due to the typical problems they have to resolve; these will require them either to adapt traditional techniques of software de- velopment or to use new techniques relevant to artificial intelligence (AI) systems. For instance, the life cycle of Figure l-a does not seem to be advisable for KBS de- velopment. The model shown in Figure 1-b allows the developer to partially describe KBS specifications; These specifications will be progressively completed through each new cycle. To avoid confusion due to lack of unified termi- nology, we adopt in this paper Laurent's terminology (Laurent, 1992) for validation process. We use different concepts for validation purposes. Some are formaliz- able (e.g., circularity of a rule base, redundancy of a rule base, etc), and some are not (e.g., level of perfor- mances, explanation capabilities, etc). From those concepts we may set up specifications, always in a for- mal way. But we obtain either really formal specifi- cations (with the formalizable concepts) or what Lau- rent (1992) calls pseudo-formal specifications. For ex- ample, to deal with explanation capabilities, we can define a formal process: for example, an ad hoc ques- tionnaire will be filled in by ten users; each answer will give points, and a formal process of aggregation will Requests for reprints should be sent to Hakim Lounis, Laboratoire de Recherche en lnformatique, Universitr, de Paris--Sud, Orsay, France. E-mail: [email protected] 381 produce a final note Nin a given scale (e.g., [0.. 100]). If we choose a threshold, for example, 80, then we can set up a pseudo-formal specification: N > 80. This leads to the following definitions: Definition 1: A validation process is a process that attempts to determine whether or not a KBS satisfies one of its specifications. Definition 2: A verification process is a validation process that attempts to determine whether or not a KBS satisfies one of its formal specifications. Definition 3: An evaluation process is a validation process that attempts to determine whether or not a KBS satisfies one of its pseudo-formal specifications. The most fundamental difference between a verifi- cation process and an evaluation process concerns the interpretation of the result: it is always possible to con- clude whether or not the KBS fits the formal specifi- cation, as well as the corresponding informal specifi- cation in natural language. However, this is not the case with an evaluation process because we cannot conclude without ambiguity whether or not the KBS satisfies the informal specification that led to the pseudo-formal specification. This paper presents an expert system verification approach that considers system specifications to be partially described when development starts. This par- tial description is not necessarily perfect and our work aims at using Machine Learning techniques to pro- gressively improve the quality of expert system knowl- edge bases (KB), by coping with two major KB anom- alies: incompleteness and incorrectness. By integrating Machine Learning techniques in the validation step of a KBS evolutionary life cycle model (e.g., Figure l-b), we allow experts to propose an initial version of the KB, which will be refined and corrected throughout a

Knowledge-based systems verification: A machine learning-based approach

Embed Size (px)

Citation preview

Page 1: Knowledge-based systems verification: A machine learning-based approach

Pergamon Expert Systems With Applications, Vol. 8, No. 3, pp. 381-389, 1995

Copyright © 1995 Elsevier Science Ltd Printed in the USA. All fights reserved

0957-4174/95 $9.50 + .00

0957-4174(94)E0029-T

Knowledge-Based Systems Verification: A Machine Learning-Based Approach

H A K I M LOUNIS

Laboratoire de Recherche en lnformatique, Universit6 de Paris-Sud, Orsay, France

Abstract-- This paper addresses the problem of verification of knowledge bases. It presents a knowledge- based system ( KBS) verification approach that considers system specifications and, consequently, knowledge bases to be partially described when development starts. This partial description is not necessarily perfect, and our work aims at using machine learning techniques to progressively acquire new knowledge and then improve the quality of expert system knowledge bases ( KB ) by coping with two major KB anomalies: incompleteness and incorrectness. The KBs considered in our approach are expressed in different formalisms.

1. INTRODUCTION

N O W A D A Y S T H E R E STILL AR E FEW commercialized knowledge-based systems (KBS), The major reason is the lack of a strict validation step in their life cycle. There is widespread agreement that KBSs cannot be designed in a linear fashion. This is due to the typical problems they have to resolve; these will require them either to adapt traditional techniques of software de- velopment or to use new techniques relevant to artificial intelligence (AI) systems. For instance, the life cycle of Figure l-a does not seem to be advisable for KBS de- velopment. The model shown in Figure 1-b allows the developer to partially describe KBS specifications; These specifications will be progressively completed through each new cycle.

To avoid confusion due to lack of unified termi- nology, we adopt in this paper Laurent's terminology (Laurent, 1992) for validation process. We use different concepts for validation purposes. Some are formaliz- able (e.g., circularity of a rule base, redundancy of a rule base, etc), and some are not (e.g., level of perfor- mances, explanation capabilities, etc). From those concepts we may set up specifications, always in a for- mal way. But we obtain either really formal specifi- cations (with the formalizable concepts) or what Lau- rent (1992) calls pseudo-formal specifications. For ex- ample, to deal with explanation capabilities, we can define a formal process: for example, an ad hoc ques- tionnaire will be filled in by ten users; each answer will give points, and a formal process of aggregation will

Requests for reprints should be sent to Hakim Lounis, Laboratoire de Recherche en lnformatique, Universitr, de Paris--Sud, Orsay, France. E-mail: [email protected]

381

produce a final note Nin a given scale (e.g., [ 0 . . 100]). If we choose a threshold, for example, 80, then we can set up a pseudo-formal specification: N > 80. This leads to the following definitions:

Definition 1: A validation process is a process that attempts to determine whether or not a KBS satisfies one of its specifications.

Definition 2: A verification process is a validation process that attempts to determine whether or not a KBS satisfies one of its formal specifications.

Definition 3: An evaluation process is a validation process that attempts to determine whether or not a KBS satisfies one of its pseudo-formal specifications.

The most fundamental difference between a verifi- cation process and an evaluation process concerns the interpretation of the result: it is always possible to con- clude whether or not the KBS fits the formal specifi- cation, as well as the corresponding informal specifi- cation in natural language. However, this is not the case with an evaluation process because we cannot conclude without ambiguity whether or not the KBS satisfies the informal specification that led to the pseudo-formal specification.

This paper presents an expert system verification approach that considers system specifications to be partially described when development starts. This par- tial description is not necessarily perfect and our work aims at using Machine Learning techniques to pro- gressively improve the quality of expert system knowl- edge bases (KB), by coping with two major KB anom- alies: incompleteness and incorrectness. By integrating Machine Learning techniques in the validation step of a KBS evolutionary life cycle model (e.g., Figure l-b), we allow experts to propose an initial version of the KB, which will be refined and corrected throughout a

Page 2: Knowledge-based systems verification: A machine learning-based approach

382 H. Lounis

[ 5pecnlncation [

I Conceotion I ---~Speci I 'mplcmentau°n I

[ lmp~£tattor !

[ Validation I (a) (b)

FIGURE 1. Different life cycles.

refinement cycle. This particular point permits us to deal with a drawback of pure inductive learning algo- rithms (e.g., ID3, Quinlan, 1983; AQ, Michalski, 1983, • . .): induced rules are often not directly usable by the current expert system's shell. The reason is that such rules are generally flat and therefore do not allow the KBS to have explanation capabilities. We believe that starting with an initial KB, possibly imperfect, which will be refined until it reaches its final expression, is a better way than using directly pure inductive learn- ing. Figure 2 presents a similar view to that previously presented, except that here we deal with the life cycle of a KB:

In our approach, we consider expert systems that deal with KBs expressed in different formalisms (this is in accordance with the current tendency). Each part of the knowledge is represented in a particular for- realism that is considered the most appropriate, relative to the type of knowledge it expresses. The knowledge we consider consists of three parts: shallow knowledge, a deeper kind of knowledge, and a set of examples. The first part is a set of production rules expressed in first order logic (FOL); the second part consists in turn, of two categories of knowledge: Semantic nets represent- ing entities of the application domain and their rela- tionships, and a set of integrity constraints. Last, the set of examples contains observations represented as conjunctions of first order literals. Each observation is classified as belonging to a given concept.

In this paper, we first describe our approach and then illustrate it with an example. We also compare our method to previous work, and we highlight the strengths and weakness of each approach.

2. DESCRIPTION OF T H E APPROACH

Our method starts with the hypothesis that the initial KB provided by an expert is probably incomplete and/

or incorrect. The detection of incompleteness and/or incorrectness is centered on the notion of label (De Kleer, 1986) of a given concept.

Definition 4: The label of a concept is the set of all initial ~ facts, (i.e., literals) that allows a KBS to deduce: the concept:

Econcept= g i = 1, n A j = 1, meij (Eq.2.1)

where Po is an initial first order literal. The knowledge provided to the system includes:

• Rules of expertise: {Ri/Ri = (Ak= l, ,,~) =:" Qi } where Pik and Qi are first order literals.

• A semantic net describing domain entities and re- lations between entities.

• Integrity constraints: {Ik/Ik = incompatibility (con- cept-i, concept-j)} where concept-/and concept-j are first order literals.

• A set of examples: {e/e = desc(e) A class(e)} where desc(e) = Aj=~,pLj is the description of the example; this description is a conjunction of initial first order literals, class(e) is a first order literal that indicates the concept to which the example belongs. All literals are typed. This means that arguments of

a given predicate take their values in a user-defined type. For instance, we have considered the following types: nominal, linear, integer, real, hierarchical•

In this context, an incorrectness corresponds to the case in which an example belongs to a concept that is

Initial facts are used to describe examples. This notion is close to the notion of operationality introduced by Mitchell (1986) in ex- planation-based learning (EBL).

2 The considered inference engine strategy is simple: a rule is fire- able if its left-hand side is verified. If more than one rule is fireable at a moment, the inference engine considers the first in the KB. All fireable rules are considered.

Acquire Knowledge~---~ Build or Rcvise Knowledge ~ 4 ~ Verify Knowledge

FIGURE 2. Life cycle of s KB.

Page 3: Knowledge-based systems verification: A machine learning-based approach

KBS Verification: A Machine Learning-Based Approach 383

incompatible with its real concept. Such an incom- patibility is stated by the metapredicate incompatible (concept-i, concept-j). According to our definition, an incorrectness is detected if the following formal spec- ification is verified:

3 ~ desc(e) = Aj=I,pL j & class(e) = concept-i,

3 an integrity constraint I

= incompatible(concept-i, concept-j),

3a substitution a such that desc(e) ~ {E¢o,o~ptv }

(Eq. 2.2)

On the other hand, an incompleteness corresponds to the case where an example does not belong to its real concept label. The formal specification associated to the latter informal one is the following:

~e: desc(e) = A j= L pLj & class(e) = concept-i,

such that there is no substitution ~ so that

desc(e) ~ {Econcept_i}tr (Eq. 2.3)

As stated by (Rajamoney & DeJong, 1987), imper- fections of the KB are due to many reasons. In our case, when an incompleteness is detected, the revision process has to deal with many cases (summarized by Figure 3 and subsequent figures).

In the case of incorrectness, the revision process aims at specializing the label. This is first done by the lo- calization of the faulty rule(s) and subsequently by spe- cializing their left-hand side. Figure 4 shows the studied cases:

Unlike many other approaches, the revision process concerns a faulty subconcept. A subconcept is defined by a noninitial literal that appears in the definition of the studied concept. Refinements proposed by the learning tool are performed thanks to a Learning al- gorithm, which considers as input data a set of examples depending on the kind of detected anomalies (incor- rectness or incompleteness). In this way, the learned process concerns only the label of this subconcept, without modifying the label of other subconcepts. Therefore, they allow us to preserve the structural form of the rule base (i.e., the shallow knowledge). In the case of a detected incorrectness, the revision algorithm works with positive examples of the given concept and negative ones that are subsumed by the faulty concept label. In the case of incompleteness, the considered positive examples are those that verify the formal spec- ification (2.3) and do not verify the actual definition of the faulty subconcept. The negative ones are those subsumed by the current subconcept definition and those imposed by integrity constraints. We must notice that for each example in the learning-set, a new de- scription called "contextual description" is proposed. It contains literals that have a semantic closeness with the faulty concept.

I linked up in an appropriate place within the ] structural representation of rules. I

1 ~ N ~ ~ [ this is done by dropping I

] literal(s) (1) or by learning a ] ~ ~ more general literal (2). I

FIGURE 3. Solutions for incompleteness.

Page 4: Knowledge-based systems verification: A machine learning-based approach

384 H. Lounis

[ rule(s) by learning Z " ~ ' ~ ? ~ new literals.

m ~ p , - rule(s) by specializing an |

existing literal. I

I is suppressed because it is | ~ " I satisfied exclusively by [

I counter-examples of the | ~1 studied concept. J

FIGURE 4. Solutions for incorrectness.

This revision process may propose some modifica- tions in the deep knowledge base. Such modifications arise when the initial vocabulary is not sufficient, in which case, the semantic net has to be extended with an entity or a relation. These latter correspond, re- spectively, to a predicate argument or to a predicate. The growing of the initial vocabulary is made easier by the fact that the semantic net is organized in different pieces, each piece concerning a particular subconcept (e.g., in the mammal example this will correspond to particular functionalities). Figure 5 illustrates this pro- cess.

Our verification algorithm is summarized as follows:

a) Determine the label Ec of a target concept. b) Determine all examples that are subsumed by the current label E~. c) Identify faulty concepts.

d) For each faulty concept, • Build the learning set; • For each example in the learning set, propose a

contextual description; • Learn a new piece of knowledge; • Revise the KB.

Before illustrating our approach on a particular ex- ample, we summarize the entire revision process as shown in Figure 6.

3. AN EXAMPLE

The knowledge base presented in this section is drawn from Matwin and Plante (1991). It is made up of dif- ferent pieces of knowledge, each piece being represented in a particular formalism, which is considered as the most appropriate relative to the knowledge it expresses.

Initial Semantic-Net rclalionshil~

New entity of the vocabulary

FIGURE 5. The semantic-net completion.

Page 5: Knowledge-based systems verification: A machine learning-based approach

KBS Verification. A Machine Learning-Based Approach 385

Semantic net + Integrity constraints

(

} Rule Base

Complete entities and relationships description

comparison between the label J

i

and the example-set I

~, Label ) J the label of a gwen concept

Correct and Complete initial formulation of the rule base

FIGURE 6. The revision process.

First, a rule base is expressed in first order logic (FOL) that contains an expert's opinion in the mammal the- ory. The second piece of knowledge is a semantic-net describing domain entities and their relationships. It is a deeper kind of knowledge than expert's opinion. Figure 7 shows the semantic-net, associated to the fol- lowing rule base:

the initial mammal's definition, we obtain the following label:

Emammal = blood(x, y) A temperature(y, hot) A heart(x, z) A chambers(z, 4) A fertilization(x, internal) A develop(x, placenta) A reproduction(x, viviparous)

mammal(x) ~-- blood-system(x, mammal) A sexual-life(x, mammal) A locomotion(x, mammal) blood-system(x, mammal) ~- blood(x, y) A temperature(y, hot) A heart(x, z) A chambers(z, 4) sexual-life(x, mammal) ~-- fertilization(x, internal) A way-of-develop(x, mammal) locomotion(x, mammal) ~-- ante-limbs(x, legs) A post-limbs(x, legs) A move(x, ground) way-of-develop(x, mammal) ~-- develop(x, placenta) A reproduction(x, viviparous) A has(x, breasts) bird(x) ~-- reproduction(x, oviparous) A blood(x, y) A temperature(y, hot) A mouth(x, bill)

In addition, integrity constraints are expressed in the following way:

incompatibility(mammal(x), bird(x)), incompatibility (way-of-develop(x, mammal), way- of-develop(x, bird))

Finally, we have a set of classified examples of mammals and birds. Some of them are listed in the Table 1.

As mentioned above, our approach starts by deter- mining concept's label. For instance, if we want to study

A has(x, breasts) A ante-limbs(x, legs) A post-limbs(x, legs) A move(x, ground)

We can notice that many mammal examples are not covered by this current label of mammal concept. For instance, 3e: class(e) = mammal/there is no sub- stitution tr so that: desc(e) ~ {Emammal}a- This is the case for e @ {whale, dolphin, kangaroo, bat, spiny- anteater, ornithorynchus}. On the other hand, we no- tice that examples of mammal, which is a concept in-

behin -~ 2 body forward 2

, o o , , _ _ . _ . ,_,.,.. e t a a, ,o, ,,,,,,,,-,,.,r ~ , egs t \ -- / cg$ w

pog l - l i l nbs~ / an lc - J i lnbs lllalnlna]

IIOVC

moving-envirorlmerl! ~ ground

circulalory-syslcm

I parl-ol blood-411-~ henri

IttlT1p(traltlrL~ ~ p t l r l l p ChalTIbcr S

! h~t breasts

h a . ~ hnktd Io t - dcvcloo f D f(21211alv~v

t trt ~ p I . K . c n l , ~ man.hal

reproduction ;

viviparotls

FIGURE 7. The associated semantic-net.

113 days. 640 days]

Page 6: Knowledge-based systems verification: A machine learning-based approach

386 H. Lounis

TABLE 1 Examples of Mammals

cat : mammal(cat) A blood(cat, b) A temperature(b, hot) A fertilization(cat, internal) A

develop(cat, placenta) A reproduction(cat, viviparous) A has(cat, breasts) A ante-

limbs(cat, legs) A post-limbs(cat, legs) A move(cat, ground) A heart(cat, h) A

chambers(h, 4). whale : mammal(whale) A blood(whale, b) A temperature(b, hot) A fertilization(whale,

internal) A develop(whale, placenta) A reproduction(whale, viviparous) A has(whale, I,

breasts) A ante-limbs(whale, fines) A post-limbs(whale, tail) A move(whale, water) AI

heart(whale, h) A chambers(h, 4) A size(whale, giant), i

dolphin:mammal(dolphin) A blood(dolphin, b) A temperature(b, hot) A i

fertilization(dolphin, internal) A develop(dolphin, placenta) A reproduction(dolphin,::

viviparous) A has(dolphin, breasts) A ante-limbs(dolphin, fines) A post-limbs(dolphin, i~

tail) A move(dolphin, water) A heart(dolphin, h) A chambers(h, 4) A behaviour(dolphin,i

friendly). ~,

kangaroo:mammal(kangaroo) A blood(kan~,aroo, b) A temperature(b, hot) A I I

fert i l ization(kangaroo, internal) A develop(kangaroo, marsupium) A~:

reproduction(kangaroo, viviparous) A has(kangaroo, breasts) A ante-limbs(kangaroo,:i

legs) A post-limbs(kangaroo, legs) A move(kangaroo, ground) A heart(kangaroo, h) Ai

chambers(h, 4). i bat : mammal(bat) A blood(bat, b) A temperature(b, hot) A fertilization(bat, internal) Ai

develop(bat, marsupium) A reproduction(bat, viviparous) A has(bat, breasts) A ante-ii

limbs(bat, wings) A post-limbs(bat, legs) A move(bat, air) A heart(bat, h) A chambers(h,i

~iny-anteater: mammal(spiny-anteater) A blood(spiny-anteater, b) A temperature(b,i

hot) A fertilization(spiny-anteater, internal) A mouth(spiny-anteater, bill) A i

reproduction(spiny-anteater, oviparous) A has(spiny-anteater, breasts) A ante- i

limbs(spiny-anteater, legs) A post-limbs(spiny-anteater, legs) A move(spiny-anteater, i

ground) A heart(spiny-anteater, h) A chambers(h, 4). !

ornythorynchus : mammal(ornithorynchus) A blood(ornythorynchus, b) A!

temperature(b, hot) A fertilization(ornythorynchus, internal) A mouth(ornythorynchus,i

bill) A reproduction(ornythorynchus, oviparous) A has(ornythorynchus, breasts) A ante-i

l imbs(ornythorynchus, legs) A post- l imbs(ornythorynchus, legs) A

,move(ornythpr),!lchus , ~und)_A heart(ornyth0!2(nch.us, h) A chambers(h~.4_): ......

compatible with bird's concept, are covered by the present bird label, that is, 3e:. class(e) = concept-./' and incompatible(concept-j, bird)/there is a substitution so that desc(e) ~ {Ebi~d}a. The concerned examples are the following: {spiny-anteater, ornithorynchus}. In such a case, both incompleteness and incorrectness are detected.

To learn new definitions of faulty concepts, we have used the FOIL algorithm (Quinlan, 1990). It learns Horn clause from examples of relations (in our case, relations are the predicates of the application domain). The target relation is the faulty concept to revise. Like ID3, it employs an information-gain estimate to select the best literal.

To solve incompleteness, the learning process in- duces from the description of dolphin, whale, and bat the expression "ante-limbs(x, y) /X y 4 = legs". These new expressions complete the mammal label, and pre- cisely the locomotion label that is one of its subcon- cepts. We obtain the following new piece of knowledge:

locomotion(x, mammal) ~-- [ante-limbs(x, legs)/~ post-limbs(x, legs)/X move(x, ground)] V ante-limbs(x, fines) V ante-limbs(x, wings)

In the same time, the learning algorithm completes the "locomotion" piece in the semantic-net, concerned

Page 7: Knowledge-based systems verification: A machine learning-based approach

KBS Verification." A Machine Learning-Based Approach 387

TABLE 2 Examples of Birds

r . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

iduck:bird(duck) A blood(duck, b) A temperature(b, hot) A fertilization(duck, internal)

iA mouth(duck, bill) A reproduction(duck, oviparous) A covered(duck, feathers) A ante-

limbs(duck, wings) A post-limbs(duck, legs) A move(duck, air) A heart(duck, h) A

chambers(h, 4).

crow : bird(crow) A blood(crow, b) A temperature(b, hot) A fertilization(crow, internal)

A mouth(crow, bill) A reproduction(crow, oviparous) A covered(crow, feathers) A

color(feathers, black) A ante-limbs(crow, wings) A post-limbs(crow, legs) A

move(crow, air) A heart(crow, h) A chambers(h,_4). ~ . . . . . . . . . . . . . . . . . .

with previous modifications. It has to extend its initial vocabulary as shown in Figure 8.

To complete the solution of incompleteness prob- lem, the learning algorithm considers the following ex- amples: kangaroo, bat, spiny-anteater, and ornitho- rynchus. The nonverified subconcept is way-of-de- velop(x, mammal). From the description of these examples, another definition of way-of-develop(x, mammal) is induced: "[reproduction(x, oviparous)/X has(x, breasts)] V develop(x, marsupium)";

Finally, we obtain a new definition for way-of-de- velop(x, mammal):

way-of-develop(x, mammal) ~-- develop(x, devel- opment-organ)/X reproduction(x, viviparous)/X has(x, breasts) V [reproduction(x, oviparous) A has(x, breasts)]

As in the previous case, we complete the semantic net portion, concerned with mammal sexual life. We add the facts that both marsupium and placenta are development-organ, and that some mammals are ovi- parous (see Figure 9).

To solve incorrectness detected thanks to bird's la- bel, the learning algorithm uses all bird examples and the two mammal examples (i.e., spiny-anteater, and ornithorynchus), which have helped us to detect such incorrectness. Literal "covered(x, feathers)" is induced to specialize the initial bird label:

behin t 2 body Iorward 2

localization n u~bcr A r / p a ~ part-of local l ~ J zation number

post-limbs "~n~amnl~f/t a~mbs

j .~ _ ~ V l l O V t2

moving-environment ~ ground FIGURE 8. Completion of the "locomotion" piece in the

semantic-net.

bird(x) ~- reproduction(x, oviparous) /X blood(x, y)/k temperature(y, hot) A mouth(x, bill) A covered(x, feathers).

This example shows us how our approach deals with an incomplete and/or incorrect KB. It considers a kind of knowledge easier to produce by a domain expert, than a set of rules: reliable examples and counter-ex- amples of a given concept.

4. RELATED W O R K S

Earlier works in the field of verification of KB have not integrated machine learning techniques to revise initial formulation of KBs. They generally consider rule bases expressed in attribute-value logic or first-order logic. CHECK (Nguyen, 1985) statically verifies con- sistency and completeness of KBs expressed in FOL. This method is typical of the approaches that are easy to implement. In spite of the simplicity of the basic concepts used in CHECK, a rule base may present in- coherences that CHECK could not detect.

To solve such weakness, new approaches referred to as dynamic, have been evolved. These take into ac- count the deductive power of knowledge bases. These methods are either exhaustive, in which case they aim at finding the specification of all incoherent situations, or heuristic, in which case they exploit heuristics to select the more "interesting" conjectures of incoher- ence. The exhaustive approach is used by systems like COVADIS (Rousset, 1988) and COCO (Loiseau, 1990). Starting with incoherence specification, the issue is to prove that these specifications are reachable from sets of "incoherent facts." If this is the case, the KB is incoherent; otherwise it is coherent. A typical example of systems adopting a heuristical approach is the SACCO (Ayel, 1987) system. This approach brings computation speed-up and avoids proposing to the ex- pert real but improbable conjectures. To do this, SACCO makes use of heuristics that allow it to define a limited set of potential conjectures of incoherence. If the so defined conjectures are verified by an initial fact base, then the incoherence is detected; otherwise, the KB is assumed coherent.

Page 8: Knowledge-based systems verification: A machine learning-based approach

388 H. Lounis

[')I'C~I Sl S

• linked-to develop f . pregnancy uterus-.~l------ placenta-,,11------ mammal ~ J 13 days, 640 days]

is-aJ de , ' e l ( ) d ~ V

d e v e l o p m e n t - o r g a ~ ~ marsupium ' ~ viviparous is-a ovipar(ms

FIGURE 9. Completion of the "reproduction" piece in the semantic-net.

On the other hand, functional verification of KBS makes sure that the provided results are in accordance with the domain's semantic. For instance, from an ex- pert knowledge and a set of cases, the SEEK system (Politakis, 1984) exploits a rule refinement cycle, which by performance evaluation of the rules on a library of cases, and by analyzing the statistical behaviour of each rule, suggests modifications to be introduced in the ex- pert knowledge.

More recently, several works have integrated ma- chine learning algorithms in order to revise automat- ically imperfect KBs. They generally treat KBs ex- pressed in the form of production rules. Some systems are only capable of generalizing an overly specific (in- complete) KB (Danyluk, 1989; Whitehall, 1990; Wilk- ins, 1988) while others are only capable of specializing an overly general KB (Cohen, 1990; Flann & Dietter- ich, 1989; Mooney & Ourston, 1989).

A number of these systems use the explanation- based learning (EBL) approach to deal with imperfect KBs. For instance, to deal with overly general KBs, EBL/TS (Cohen, 1992) uses the explanation tree of each positive example and then replaces the overly general definition of the given concept with the rule associated with this explanation tree. On the other hand, A-EBL "[20]" treats the problem of multiple example explanations. It throws out inconsistent ex- planations and retains only a minimal set of "good" explanations. This is done by using heuristics. Such approaches start with an imperfect KB expressed in terms of rules, and replaces the entire KB with the rule associated to the explanation tree, revising the opera- tional definition of a given concept. These processes do not preserve the structural form of rule bases and produce generally flat rules.

However, the current tendency in knowledge rep- resentation aims at integrating different formalisms such as rules, frames, semantic-nets, and so on. The goal pursued by such an approach is the increasing of explanation capabilities of KBSs and acquisition of dif- ferent kinds of knowledge, each expressed in a partic- ular formalism, considered as the most appropriate, relative to the type of knowledge it expresses. In the field of knowledge verification, there still are few sys- tems that cope with imperfect KBs expressed in dif- ferent formalisms.

5. CONCLUSION

In this research, we address the issue of KB's verifi- cation. These KBs consist of two levels of knowledge differently structured. This paper is centred on the fol- lowing statement: Integration of a learning process in a validation step of KBS's life cycle allows the detection and correction of anomalies present in the initial for- mulation of the KB.

Our approach exploits a set of examples that experts can provide more easily than knowledge directly ex- pressed in the form of rules. Thanks to the notion of a concept's label, this approach permits us, as a first step, to locate the level at which incompleteness or incorrectness occurs. A subsequent step, which makes use of the concerned examples, calls for learning tech- niques that perform corrections at a level indicated by the incoherent label. In parallel, the description of the deep KB is completed to take account of the new changes. This process permits us to start with an initial imperfect KB, and then to correct and complete it until it reaches its final formulation.

Currently, we are exploring the possibility of learn- ing Integrity Constraints. We aim to start with an In- tegrity Constraint Ic = Incompatibility (concept-i, con- cept-j) and then to learn from the example set, incom- patible environment expressed in terms of initial facts I" = incompatibility (Ai=~, kei). Such a process could reject the initial example description and then avoid the case where the learning algorithm has to deal with an erroneous example's description.

Acknowledgements--I would like to thank Yves Kodratoff, my thesis supervisor, for the support he gave to this work. I would also like to thank all the members of the Inference and Learning group at LRI.

REFERENCES

Ayel, M. (1987). D(tection d'incoh~rences dans les bases de con- naissances: SACCO. Th~se d'rtat, Chambery, France.

Cohen, W.W. (1990). Learning from textbook knowledge: A case study. Proceedings of the 8 th National Conference on Artificial Intelligence (pp. 743-748). Boston, MA.

Cohen, W.W. (1992). Abductive Explanation-Based Learning: A So- lution to the multiple Inconsistent Explanation Problem. Machine Learning Journal, 8(2), 167-213.

Page 9: Knowledge-based systems verification: A machine learning-based approach

K B S Verification." A Machine Learning-Based Approach 389

Danyluk, A.D. (1989). Finding new rules for incomplete theories: Explicit biases for induction with contextual information. Pro- ceedings of the 6 th International Workshop on Machine Learning (pp. 34-36). Ithaca, NY.

De Kleer, J. (1986). An assumption-based-TMS. Artificial Intelligence 28(2), 127-162.

Flann, N.S., & Dietterich, T.G. (1989). A study of explanation-based methods for inductive learning. Machine Learning Journal, 4(2), 187-226.

Laurent, J.P. (1992). Vers une terminologie valide pour le domaine de la validation. Acres des Journ6es sur l'Acquisition, l'Appren- tissage et la Validation, 1-15.

Loiseau, S. (1990). Validation, acquisition et mise au point interactive des BC: Le systdme COCO-X fond~ sur la coherence. Th~se de doctorat, universit6 de Paris-Sud, France.

Matwin, S., & Plante, B. (1991). A deductive-inductive method for theory revision. International Workshop on Machine Learning (pp. 160-174).

Michalski, R.S. (1983). A theory and a methodology of inductive learning. In R.S. Michalski, J.G. Carbonell, & T.M. Mitchell (Eds.), Machine learning. An artificial intelligence approach (pp. 83-134). Morgan Kaufmann.

Mitchell, T.M., & al. (1986). EBL: An unifying view. Machine Learning Journal, 1(l), 47-80.

Mooney, R.J., & Ourston, D. (1989). Induction over the unexplained: Integrated learning of concepts with box explainable and con-

ventional aspects. Proceedings of the 6 th International Workshop on Machine Learning (pp. 5-7). Ithaca, NY.

Nguyen, T.A., & al. (1985). Checking an expert-system knowledge base for consistency and completeness. International Joint Con- .ference on Artificial Intelligence (pp. 375-379).

Politakis, P., & al. (1984). Using empirical analysis to refine ES knowledge bases. Artificial Intelligence, 22, 23-48.

Quinlan, J. (1983). Learning etficient classification procedures and their application to chess end games. In R.S. Michalski, J.G. Car- bonell & T.M. Mitchell (Eds.), Machine learning. An artificial intelligence approach (pp. 463-482). Morgan Kaufmann.

Quinlan, J. (1990). Learning logtcal definitions from relations. Ma- chine Learning Journal, 5, 239-266.

Rajamoney, S., & DeJong, G. (1987). The classification, detection and handling of imperfect theory problems. International Joint Conference on Artificial Intelligence (pp. 205-207).

Rousset, M.C. (1988). On the consistency of knowledge bases: The COVADIS system. European Conference on Artificial Intelligence (pp. 79-84).

Whitehall, B.L. (1990). Knowledge-based learning. An integration of deductive and inductive learning for knowledge base completion. PhD thesis, University of Illinois, Urbana, IL.

Wilkins, D. (1988). Knowledge base refinement using apprenticeship learning techniques. Proceedings of the T h National Conference on Artificial Intelligence (pp. 646-651). St Paul, MN.

T h i s a r t ic le is be ing p u b l i s h e d w i t h o u t t he benef i t o f

t he a u t h o r ' s r e v i e w o f t he proofs , wh ich were n o t avai l -

ab le at press t ime .