Learning with abduction

Learning with AbductionA.C. KakasDepartment of Computer Science, University of Cyprus75 Kallipoleos str., CY-1678 Nicosia, [email protected]. RiguzziDEIS, Universit�a di BolognaViale Risorgimento 2, 40136 Bologna, [email protected] investigate how abduction and induction can be integrated intoa common learning framework through the notion of Abductive ConceptLearning (ACL). ACL is an extension of Inductive Logic Programming(ILP) to the case in which both the background and the target theory areabductive logic programs and where an abductive notion of entailment isused as the coverage relation. In this framework, it is then possible tolearn with incomplete information about the examples by exploiting thehypothetical reasoning of abduction.The paper presents the basic framework of ACL with its main char-acteristics and illustrates its potential in addressing several problems inILP such as learning with incomplete information and multiple predicatelearning. An algorithm for ACL is developed by suitably extending thetop-down ILP method for concept learning and integrating this with anabductive proof procedure for Abductive Logic Programming (ALP). Aprototype system has been developed and applied to learning problemswith incomplete information. The particular role of integrity constraintsin ACL is investigated showing ACL as a hybrid learning framework thatintegrates the explanatory (discriminant) and descriptive (characteristic)settings of ILP.1 IntroductionThe problem of integrating abduction and induction in Machine Learning sys-tems has recently received renewed attention with several works on this topic[DK96, AD95, AD94, Coh95, ?]. In [DK96] the notion of Abductive Concept1

Learning (ACL) was proposed as a learning framework based on an integra-tion of Inductive Logic Programming (ILP) and Abductive Logic Programming(ALP) [KKT93].Abductive Concept Learning is an extension of (ILP) that allows us to learnabductive logic programs of ALP with abduction playing a central role in thecovering relation of the learning problem. The abductive logic programs learnedin ACL contain both rules for the concept(s) to be learned as well as generalclausal theories called integrity constraints. These two parts are synthesizedtogether in a non-trivial way via the abductive reasoning of ALP which is thenused as the basic covering relation for learning. In this way, learning in ACLsynthesizes discriminant and characteristic induction to learn abductive logicprograms.This paper presents the basic framework of ACL with its main characteristicsand demonstrates its potential for addressing several problems in ILP. From apractical point of view ACL allows us to learn from incomplete information andto later classify new cases that again could be incompletely speci�ed. Particularcases of interest are learning from incomplete background data and multiplepredicate learning where the abductive reasoning is used to suitably connectand integrate the learning of the di�erent predicates.An algorithm for ACL is presented. This algorithm adapts the basic top-down method of ILP to deal with the incompleteness of information and to takeinto account the use and learning of integrity constraints. The algorithm alsoincorporates an abductive proof procedure and other abductive reasoning mech-anisms from ALP that are suitably adapted for the context of learning. Thealgorithm is implemented in a new system also called ACL. Suitably adaptedheuristics have been used that take into account the incompleteness of informa-tion. Several initial experiments are presented that demonstrate the ability ofACL to learn with incomplete information and its usefulness in multiple predi-cate learning.The rest of this paper is organized as follows. Section 2 presents a shortreview of ALP needed for the formulation and description of the main propertiesof ACL which are presented in section 3. Section 4 presents the basic algorithmfor ACL and its implementation into a �rst system. Section 5 presents our initialexperiments with ACL and section 6 discusses related work and concludes witha discussion of future work.2 Abductive Logic ProgrammingIn this section we brie y review some of the elements of Abductive Logic Pro-gramming (ALP) needed for the formulation of the learning framework of Ab-ductive Concept Learning (ACL). For a more detailed presentation of ALP thereader is referred to the survey [KKT93] (and its recent update [KKT97]) andreferences therein. 2

Abductive Logic Programming is an extension of Logic Programming tosupport abductive reasoning with theories (logic programs) that incompletelydescribe their problem domain. In ALP this incomplete knowledge is captured(represented) by an abductive theory T .De�nition 2.1 An abductive theory T in ALP is a triple hP;A; Ii, whereP is a logic program1, A is the set of predicates called abducible predicates, andI is a set of �rst-order closed formulae called integrity constraints2.As a knowledge representation framework, when we represent a problem inALP via an abductive theory T we generally assume that the abducible predi-cates inA carry all the incompleteness of the programP in modeling the externalproblem domain in the sense that if we (could) complete the information on theabducible predicates in P then P would completely describe the problem do-main. Partial information that may be known about these abducible predicatescan be represented either explicitly in the program P or implicitly as integrityconstraints in I.An abductive theory can support abductive (or hypothetical) reasoning forseveral purposes such as diagnosis, planning or default reasoning. The centralnotion used for this is that of an abductive explanation of an observation or agiven goal. Informally, an abductive explanation consists of a set of (ground)abducible facts (also called abductive hypotheses) on some of the abduciblepredicates in A which when, added to the program P , render the observationor goal true. The integrity constraints in I must always be satis�ed by anyextension of the program P with such abductive hypotheses for this to be anallowed abductive explanation.To formalize this we need �rst the notion of Generalized Model of an abduc-tive theory.De�nition 2.2 Let T = hP;A; Ii be an abductive theory and � a set of groundabducible facts from A. M (�) is a generalized stable model of T i�� M (�) is the minimal Herbrand model of P [ �, and� M (�) j= I.The program P [� is called an abductive extension of P or T .Here the semantics of integrity constraints is de�ned by the second conditionin the de�nition. Their satisfaction requires that they are true statements in theextension of the program with � for this to be allowed. An abductive theory isthus viewed as representing a collection of di�erent allowed states given by theset of its generalized models.1For simplicity, we will assume that all logic programs are de�nite Horn programs.2In practice, integrity constraints are restricted to be range-restricted clauses.3

De�nition 2.3 Let T = hP;A; Ii be an abductive theory and � any formula3called observation (or query). An abductive explanation for � in T is anyset � of abducible facts from A such that� M (�) is a generalized model of T , and� M (�) j= �.De�nition 2.4 Let T = hP;A; Ii be an abductive theory and � any formula.Then � is abductively entailed by T , denoted by T j=A � i� there existsand abductive explanation of � in T . We say that � is a credulous abductiveconsequence of T . A formula � is called a skeptical abductive consequence ofT i� � is true in all abductive extensions of T .Note that although the integrity constraints reduce the number of possibleexplanations for a set of observations it is still possible for several explanationsthat satisfy (do not violate) the integrity constraints to exist. This problemis known as the multiple explanations problem. In fact, abduction is oftendescribed as inference to the best explanation. To this end, additional criteriacan help to discriminate between di�erent explanations, by rendering some ofthem more appropriate or plausible or preferred over others. A typical criterionis the requirement of minimality (wrt set inclusion) of the explanations. Alsovarious (domain speci�c) criteria of preference can be speci�ed. Moreover, wemay be able to acquire new additional information that refutes or corroboratesan explanation.The following example illustrates the above ideas.Example 2.5 Consider the following abductive framework hP;A; Ii with P thelogic program on family relations:father(X;Y ) parent(X;Y );male(X)mother(X;Y ) parent(X;Y ); female(X)son(X;Y ) parent(Y;X);male(X)daughter(X;Y ) parent(Y;X); female(X)child(X;Y ) son(X;Y )child(X;Y ) daughter(X;Y )loves(X;Y ) parent(X;Y );the integrity constraints I = f male(X); female(X)g, and abducible predi-cates A = fparent;male; femaleg.Consider now the observation O1 = father(Bob; Jane). An abductive explana-tion for O1 is the set �1 = fparent(Bob; Jane);male(Bob)g. This is the uniqueminimal explanation. Let now O2 = child(John;Mary) be another observation.3In general � can be any formula. In practice in many cases it su�ces for � to be theconjunction of a set of ground facts. 4

This has two possible explanations �2 = fparent(John;Mary);male(John)gand �02 = fparent(John;Mary); female(John)g. If we also knew thatmale(John) holds then �02 would be rejected due to the violation of the integrityconstraint. In fact, these two explanations are incompatible with each other.But, note that since parent(John;Mary) belongs to both explanations, thenparent(John;Mary) and loves(John;Mary) are skeptical consequences of thisabductive framework and the observation O2.In some applications of abduction, e.g. when we apply abductive reasoningin learning (see next section), a case of particular interest is that of explainingnegative observations or in other words explaining the falsity (or absence) of anatomic observation. In this case it is useful to extend the basic framework ofALP described above so that the set of abductive explanations contains explic-itly assumptions of the absence as well as the presence of abductive hypotheses.According to the above de�nitions 2.3 and 2.4 a negative fact not A 4 isexplained by the theory T when there exists a generalized model M (�) suchthat M (�) 6j= A (also denoted by M (�) j= not A). For any given such � allabducible assumptions that do not belong to M (�) are assumed to be false. Itis then useful to extend our abductive explanations for not A to record explicitlythese negative abducible assumptions. More speci�cally, it is useful to record in� a set of negative abducible assumptions that is su�cient to ensure that thepositive observation can not be abductively derived.To illustrate this consider again the above example and the negative observa-tion that not father(Jane; John). An extended abductive explanation for thisis ��1 = fnot parent(Jane; John)g or ��2 = fnot male(Jane)g, where follow-ing notation from ALP we use not ab to denote that the abducible assumptionab must be false or absent. This then means that, for example in the caseof ��1, any set � of abducible assumptions that satis�es the conditions of ��1namely that it does not contain the assumption parent(Jane; John) constitutesan abductive explanation for the negative observation not father(Jane; John).Furthermore, if the assumptions of absence of abducible hypotheses can be metthen the negative observation will always hold and so the corresponding positiveobservation can not be abductively derived.De�nition 2.6 Let T = hP;A; Ii be an abductive theory and Q = not A anegative observation (or query). An extended abductive explanation for Qin T is any set �� of positive and negative abducible facts from A such that� M (�), where � is the subset of �� containing only positive assumptions,is a generalized model of T ,� for any negative assumption not ab 2 ��, ab 62M (�), and4We will denote negative facts using the Negation as Failure symbol not instead of : as inmany cases what is needed is the failure of the atom A rather than the explicit negation ofthe atom A. 5

� M (�) j= Q.Hence the extended abductive explanations can be seen as a speci�cationof a set of su�cient conditions which when met ensure that the correspondingpositive observation can not be abductively entailed in the credulous sense i.e.that there is no abductive explanation for the positive observation.The satisfaction of these requirements is then usually achieved by generatingor learning suitable integrity constraints that render the positive assumption abof a negative assumption not ab in �� inconsistent.3 Learning with AbductionAbductive concept learning can be de�ned as a case of discriminant conceptlearning but which contains in it also a characteristic learning problem. It thuscombines discriminant and characteristic learning in a non-trivial way using theabductive entailment relation as the covers relation for the training examples.The language of hypotheses is that of Abductive Logic programming as isalso the language of background knowledge. The language of the examples issimply that of atomic ground facts.De�nition 3.1 Abductive Concept Learning (ACL)Given� a set of positive examples E+ of a concept C,� a set of negative examples E� of a concept C,� an abductive theory T = hP;A; Ii as background theory.FindA new abductive theory T 0 = hP 0; A; I 0i such that� T 0 j=A E+,� 8e� 2 E�, T 0 6j=A e�.Therefore, ACL di�ers from ILP because both the background knowledge andthe learned program are abductive logic programs. As a consequence, the no-tion of deductive entailment of ILP is substituted with the notion of abductiveentailment (j=A) in ALP.The learned hypothesis T 0 = hP 0; A; I 0i will contain in P 0 together with thebackground knowledge P rules for the concept C and (possibly) new integrityconstraints in I 0 to restrict the abducibles in A and thus implicitly also thede�nition of the concept in P 0. The concept rules in P 0 are learned using a formof discriminant induction based on abductive entailment. This discriminantinduction is integrated with a form of characteristic induction which from the6

same data learns integrity constraints that complement the rules in their solutionof the discriminant learning problem.Abductive concept learning can be illustrated as follows.Example 3.2 Suppose we want to learn the concept father. Let the backgroundtheory be T = hP;A; ;i where:P = fparent(john;mary);male(john);parent(david; steve);parent(katy; ellen); female(katy)gthe incomplete or abducible predicates are A = fmale; femalegand let the training data be:E+ = ffather(john;mary); father(david; steve)gE� = ffather(katy; ellen)gIn this case, a possible hypotheses T 0 = hP 0; A; I 0i learned by ACL would add inP 0 be the rulefather(X;Y ) parent(X;Y );male(X):under the (extended) abductive assumptions�� = fmale(david); not male(katy)g.The positive assumption comes from the requirement to cover the positive exam-ples and the negative example from the requirement to fail covering the negativeexamples. In addition, by considering the background knowledge together withthe positive assumption male(david) from ��, we could learn (using character-istic induction) the integrity constraint in I 0: male(X); female(X):Note that this integrity constraint provides independent support for the negativeassumption not male(katy) in �� thus ensuring that the negative example cannot be abductively covered by the learned hypothesis.It is important to note that the treatment of positive and negative examplesby ACL is asymmetric. Whereas for positive examples it is su�cient for thelearned theory to provide an explanation of these for the negative examples thereshould not exist a single abductive extension of the learned theory that coversa negative example i.e. all abductive extensions of the theory should imply thecorresponding negative fact. In other words, the corresponding negative factsshould be skeptical consequences of the learned theory whereas for the positivefacts it is su�cient for these to be credulous consequences.It is often useful for practical reasons to initially relax this strong require-ment on the negative examples and consider an intermediate problem, denotedby I-ACL, where the negative examples are also only required to be credulouslyuncovered i.e. that there exits in the learned theory at least one extension wherethe negative examples can not be covered (and of course in this extension it isalso possible to cover the positive examples). To this end we de�ne the inter-mediate problem where the two conditions in the above de�nition are replacedby: 7

� T 0 j=A E+; not E�, where not E� = fnot e�je� 2 E�g.We can then solve �rst this simpler problem using the notion of extended ab-ductive explanations (as is done in the example above) and then use theseexplanations as input for further learning in order to satisfy the full ACL prob-lem. We will also see that in some cases solving this intermediate problem issu�cient to solve the full problem of ACL particularly when the backgroundtheory already contains relevant integrity constraints.3.1 Properties of Abductive Concept LearningIn ACL we can compare two hypotheses by comparing the abductive explana-tions for the coverage of the positive examples or more generally by comparingtheir generalized models.De�nition 3.3 Let T = hP;A; Ii and E+, E� be given. Let T1 and T2 be twohypotheses for this learning problem. T1 is a more general solution than T2i� for any e+ 2 E+ and any abductive explanation �2 of e+ in T2 there existsan abductive explanation �1 of e+ in T1 such that �1 � �2.De�nition 3.4 Let T1 and T2 be two abductive theories such that for any gen-eralized model M2(�) of T2, � is an allowed abductive extension of T1 and itsgeneralized model M1(�) contains M2(�). Then T1 is a more general theorythan T2.It is easy to see that if T1 is a more general theory than T2 then T1 is a moregeneral solution than T2. Requiring that the hypothesis is maximally generalthen has the e�ect that the learned concept rules in the logic program P 0 of thehypothesis T 0 are minimally speci�c, avoiding the addition of extra consistentbut unnecessary abducible conditions in the body of the rules.In general, the monotonicity property does not hold for ACL. Given twosolutions T1 = hP1; A; I1i and T2 = hP2; A; I2i that cover a positive exampletheir union T = hP1 [ P2; A; I1 [ I2i does not necessarily cover this example.This can happen because an abductive explanation which satis�es the integrityconstraints I1 (or the constraints I2) may violate the constraints I1 [ I2. ACLhas a restricted form of monotonicity as stated by the following lemma.Lemma 3.5� If T1 = hP1; A; Ii covers e+ and T2 = hP2; A; Ii covers e+ then T =hP1 [ P2; A; Ii covers e+.� Let T1 = hP1; A; I1i cover e+ with abductive explanation � and T2 =hP1 [�; A; I2i cover e+. Then T = hP1; A; I1 [ I2i covers e+.8

As a consequence of this property if we remember the abductive assumptionson which the coverage of the positive examples rests, then the generation of theintegrity constraints in the learned theory (for uncovering the negative exam-ples) can be monotonic. In the above example 3.2 in generating the integrityconstraint we include in the background knowledge the abducible hypothesesmale(david) needed for the coverage of the positive examples thus ensuringthat the generated constraint would not uncover any positive examples.4 AlgorithmsIn this section we present an algorithm for ACL. For simplicity, we will con-centrate on the case of single predicate learning and discuss brie y at the endof the section how this can be extended to multiple predicates. Also we willnot consider here fully the problem of integrating the learning of integrity con-straints in this algorithm but rather we will assume that these exist (or havebeen learned) in the background theory. Thus the formal speci�cation problemfor this algorithm is that of the intermediate problem I-ACL (see section 3).For more details on these issues the reader is referred to the technical report[KR96] on ACL.The algorithm is based on the generic top-down algorithm (see e.g. [LD94])suitably adapted to deal with the incompleteness of the abducible predicatesand to take into account the integrity constraints in the background theory. Itincorporates (and adapts) algorithms for abductive reasoning from ALP [KM90]extending the algorithm in [ELM+96].The top level covering and specialization loops of the algorithm are shownin (�gure 1) and (�gure 2) respectively. The covering loop starts with emptyhypothesis and, at each iteration, a new clause is added to the hypothesis. Thenew clause to add is generated in the specialization loop. This clause is selectedamongst other possible re�nements of previous clauses in the current set ofclauses (agenda) using a specially de�ned heuristic evaluation function (�gure3). The agenda is initialized to a clause with an empty body for each targetpredicate and it is gradually enlarged at each iteration of the specilization loop.The evaluation of a clause is done by considering each positive example e+and negative example e�, and by starting and abductive derivation for e+ andnot e� respectively, using the procedure de�ned in [KM90]. This proceduretakes as input the goal to be derived, the abductive theory and, if it succeeds,produces as output the set of assumptions abduced during the derivation. Theset of assumptions abduced for earlier examples are also considered as input toensure that the assumptions made during the derivation of the current exampleare consistent with the ones made before. Thus, we can test each positive andnegative example separately and be sure that the clause will derive e+1 ^ : : : ^e+n ^ not e�1 ^ : : :^ not e�m.The positive and negative examples covered are counted, distinguishing be-9

procedure I-ACL(inputs : E+; E� : training sets,T = hP;A; Ii : background abductive theory,outputs : H : learned theory, � : abduced literals)H := ;� := ;Agenda := f hp(X) true:; V aluei p target predicate,V alue is the value of the heuristic function for the rulegwhile E+ 6= ; do (covering loop)Specialize(input: Agenda; P;H;E+; E�;�;output: Rule;NewAgenda)Evaluate Rule obtaining:E+Rule the set of positive examples covered by Rule�Rule the set of literals abduced duringthe derivation of e+ and not e�E+ := E+ � E+RuleH := H [ fRuleg� := �[�RuleEvaluate all the rules in NewAgendaaccording to the new theory and �Agenda = NewAgendaendwhileoutput H;� Figure 1: ACL, the covering loop10

procedure Specialize(inputs : Agenda : current agenda, T : background theory,H : current hypothesis,E+; E� : training sets,� : current set of abduced literalsoutputs : Rule : rule, Agenda)Select and remove Best rule from AgendaWhile Best covers some e� doBestRefinements := set of re�nements of Best allowedby the language biasfor all Rule 2 BestRefinements doV alue := Evaluate(Rule; T;H;E+; E�;�)if Rule covers at least one pos. ex. thenadd hRule; V aluei to AgendaendforSelect and remove Best rule from Agendaendwhilelet Best be hRule; V alueioutput Rule;AgendaFigure 2: ACL, the specialization loop11

function Evaluate(inputs : Rule: rule ,T = hP;A; Ii : background theory,H : current hypothesis, E+; E� : training sets,� : current set of abduced literals)returns the value of the heuristic function for Rulen� = 0, number of pos. ex. covered by Rule without abductionn�A = 0, number of pos. ex. covered by Rule with abductionn = 0, number of neg. ex. covered by Rule (not e� has failed)nA = 0, number of neg. ex. uncovered with abduction�in = �for each e+ 2 E+ doif AbductiveDerivation(e+; hP [H [ fRuleg; A; Ii;�in;�out)succeeds thenif no new assumptions have been made thenincrement n�else increment n�Aendifendif�in = �outendforfor each e� 2 E� doif AbductiveDerivation(not e�; hP [H [ fRuleg; A; Ii;�in;�out)succeeds thenif new assumptions have been made then increment nAelse increment nendif�in = �outendforreturn Heuristic(n�; n�A; n; nA)Figure 3: ACL, evaluation of the heuristic function12

tween examples covered (uncovered) with or without abduction, and these num-bers are used to calculate the heuristic function. This heuristic function of aclause is an expected classi�cation accuracy [LD94]:A(c) = p(�jc)where p(�jc) is the probability that an example covered by a clause c is positive.In de�ning this we need to take into account the relative strength of covering anexample with abduction (i.e. some assumption are needed and the strength ofthose assumptions) or without assumption (i.e. no assumption is needed) andsimilarly for the failure to cover negative examples with or without assumptions.The heuristic function used isA = n� + k� � n�An� + n + k� � n�A + k � nAwhere n�; n�A; n; nA are as de�ned in �gure 3. The coe�cients k� and kare introduced in order to take into account the uncertainty in the coverage ofthe positive examples and in the failure to cover the negative examples whenabduction is necessary for this to occur. Further details on these coe�cientsand the heuristic function can be found in [KR96].We also mention that the integrity constraints given (or learned �rst) in thebackground theory are used in several ways during the computation of the �-nal hypothesis. They can a�ect signi�cantly the coverage numbers of the ruleswhile these are constructed by rendering assumptions impossible. One interest-ing case of their use is their con�rmation that negative assumptions of the formnot ab(e�) generated in the computed extended abductive explanations are in-deed valid by rendering the assumption ab(e�) inconsistent. Then if not ab(e�)is the only assumption that is needed to ensure that the negative example e� isnot covered this example will now not be counted in nA. Again details can befound in [KR96].4.1 Properties of the AlgorithmThe algorithm described above is sound but not complete for the intermmediateproblem I-ACL. Its soundness is ensured by the soundness of the abductiveproof procedure and its extensions developed for the learning algorithm. Asthe assumptions abduced for the positive and negative examples are globallyrecorded the algorithm can not generate new clauses that make previous clausesinconsistent (i.e. cover negative examples) or that reduce their coverage ofpositive examples.The search space is not completely explored. In particular, there are twochoice points which are not considered in order to reduce the computationalcomplexity of the algorithm. The �rst choice point is related to the greedy searchin the space of possible programs. When no new clause can be added by the13

specialization loop, backtracking can be performed on the previous clause added:the clause is retracted and the previous agenda is considered. The procedureSpecialize is called again and the second best rule is chosen for specialization.The second choice point concerns the abductions that are made in order to coverthe examples. In fact, an example can be abductively derived with di�erentsets of assumptions. Depending on which set of assumptions we choose this canchange the outcome of the algorithm later on.We will now brie y discuss how the above algorithmcan be easily extended tolearn correctly multiple predicates and how this can tackle some of the problemsof multiple predicate learning. The chages required essentialy amount to twosimple extsnsions. First the candidate clauses for the di�erent predicates arekept together in the agenda and the choice of which predicate (in fact whichrule of the predicate) to learn further is left to the heuristic value of the rules.This approach is similar to the one adopted in the system MPL [RLD93] inorder to tackle the problem of the order of learning di�erent predicates (anddi�erent rules, in the case of mutually recursive predicates). Secondly, for eachpredicate, the other predicates in the set of multiple predicates to be learned areconsidered as additional abducible predicates of the background theory for thispredicate. This then means that information from one rule can be passed via theabductive explanations to the construction of other rules for other predicates. Inthis way we can tackle the second major problem of most intensional empiricalILP systems for multiple predicate learning, namely the fact that the additionof a new consistent clause to a theory can make previous clauses inconsistent i.e.make them cover negative examples. In other words, we can be sure that localconsistency of the currently constructed clause also implies global consistencyof the whole theory.The next example shows how ACL can be useful for performing multiplepredicate learning, exploiting assumptions made while learning a rule for onepredicate as additional training data for learning another predicate.Example 4.1 Suppose we want to learn the predicates grandfather and father. Let the background theory be:P = fparent(john;mary);male(john);parent(mary; ellen); female(mary);parent(ellen; sue);parent(david; steve);parent(steve; jim)gA = fgrandfather; father;male; femalegand let the training data for both concepts be:E+ = fgrandfather(john; ellen); grandfather(david; jim);father(john;mary)gE� = fgrandfather(mary; sue)gSuppose that the system generates �rst the following rule for the grandfatherpredicate: 14

grandfather(X;Y ) father(X;Z); parent(Z; Y ):This will be accompanied by the set of assumptions ��1 = ffather(david; steve);not father(mary; ellen)g which become additional training data for the fatherpredicate enabling us to learn the rule:father(X;Y ) parent(X;Y );male(X)abducing now additionaly ��2 = fmale(david); not male(mary)g. Note thatwithout the new negative example father(mary; ellen) this rule could not belearned.Conversely, if we �rst generate in the agenda the rule:father(X;Y ) parent(X;Y )for the father predicate, and then the same rule as above for the grandfatherpredicate, the extra information on father in ��1 above will force the furtherspecialization of the rule for father before the algorithm terminates.Finally, we mention again that the algorithm presented in this section solvesthe intermmediate problem of I-ACL which is a restricted version of ACL al-though in some cases this is also su�cient for the full ACL problem. In orderto perform full ACL we must incorporate the learning of integrity constraintsto ensure that negative examples are not abductively derivable in the learnedtheory. One way to address this issue is to integrate into the algorithm a systemfor characterizing induction, such as Claudien [RB93] (or ICL [RL95]) which willtake as input together with the background theory and positive (or all) trainingexamples, the abductive assumptions � (or ��) and generate clauses to be usedas integrity constraints.5 ExperimentsThe main purpose of the experiments reported in this section is to demonstratethe ability of I-ACL and ACL to learn from incomplete background data andto perform multiple predicate learning. We will consider the well-known mul-tiplexer example and problems of learning family relations from an incompletedatabase.5.1 MultiplexerThe multiplexer example is a well-known benchmark for inductive systems[dV89]. It has recently been used in [RD96] for showing the performance ofthe system ICL-Sat on incomplete data. We perform experiments on the samedata of [RD96] and compare the results of I-ACL with those of ICL-Sat.The problem consist in learning the de�nition of a 6 bits multiplexer, startingfrom a training set where each example consist in 6 bits, where the �rst twobits are interpreted as the address of one of the other four bits. If the bit atthe speci�ed address is 1 (regardless of the values of the other three bits), the15

Experiments I-ACL ICL- SatComplete background 100 % 100 %Incomplete background 98.4 % 82.8 %Incomplete background plus constraints 96.9 % 92.2 %Table 1: Performance on the multiplexer dataexample is considered positive, otherwise is considered negative. For example,consider the example 10 0110. The �rst two bits specify that the third bit shouldbe at 1, so this example is positive. We represent this example by adding thepositive examplemul(e1) to the training set and by including the following factsin the background knowledgebit1at1(e1): bit2at0(e1): bit3at0(e1):bit4at1(e1): bit5at1(e1): bit6at0(e1):For the 6-bit multiplexer problem we have 26 = 64 examples, 32 positive and 32negative. We perform three experiments using the same datasets for each oneas in [RD96]: the �rst on the complete dataset, the second on an incompletedataset and the third on the incomplete dataset plus some integrity constraints.The incomplete dataset was obtained by considering 12 examples out of 64 andby specifying for them only three bits where both the examples and bits wereselected at random. E.g. the above example 10 0110 could have been replacedby 1? 0?1?. The incomplete example is still in the set of positive examples andits description in the background knowledge would bebit1at1(e1): bit3at0(e1): bit5at1(e1):Now, all the predicates bitNatB are abducibles and integrity constraints of theform below are added to the background theory bitNat0(X); bitNat1(X):The dataset of the third experiment is obtained by including additionalintegrity constraints to the incomplete dataset.I-ACL was run on all the three dataset. The measure of performance thatwas adopted is accuracy, de�ned as the number of examples correctly classi�edover the total number of examples in the testing set, i.e. the number of positiveexamples covered plus the number of negative examples not covered over 64.The testing set is the same as the training set of complete examples.The results are reported in table 1. It is interesting to note that the loss ofaccuracy of ACL is not due to negative examples covered by the rules but it isdue to positive examples not covered (1 for the �rst experiment and 2 for thesecond): the learned theories are consistent but not complete. The accuracy16

Experiments n� nACL AccuracyExperiment 2 32 5 92.2 %Experiment 3 32 2 96.9 %Table 2: Testing with incomplete dataof I-ACL in the third experiment is lower than in the second: this unexpectedresult is due to the fact that the testing is done with a background theorydi�erent from the one used for learning. We also note that the results shown inthe table 1 refer to the accuracy of the learned theory with respect to the fullACL problem showing that the I-ACL system has in this case solved successfulythe full ACL problem.We can also test the theory with incomplete data thus showing the abilityof the generated theories to classify incomplete examples. We tested the theoryon the same incomplete data set used for learning in experiments 2 and 3. Theresults of this di�erent testing are reported in table 2, where n� is the numberof covered positive examples, while nACL is de�ned as the number of negativeexamples that are (incorrectly) covered according to the full ACL, i.e. for which e� abductively succeeds instead of failing. Similar results are obtained withother randomly generated incomplete subsets of the complete training examples.Finally, we performed an experiment where more information was taken out.In this case we have taken out completely from the training data some of thenegative examples. The I-ACL system now learned rules that are incorrect (i.e.cover negative examples) as they are too general e.g. the rule:mul(X) bit1at1(X); bit2at1(X):Extending the I-ACL algorithm, we run Claudien to learn clauses, integrityconstraints, that hold on the positive examples. This is able to generate severalconstraints amongst which is the constraint mul(X); bit1at1(X); bit2at1(X); bit6at0(X)which e�ectively corrects the rule above by preventing the negative examples tobe abductively classi�ed as positive even when the rule succeeds. This experi-ment illustrates the potential of ACL to integrate characteristic with discrimi-nant induction.5.2 Learning Family RelationsWe have considered the problem of learning family predicates, e.g. that of father,from a database of family relations [BDR96] containing facts about parent,male and female. We performed several experiments with di�erent degree ofincompleteness of the background knowledge in order to show the performanceof I-ACL as a function of the incompleteness of the data.17

Learns the correct theoryCompleteness without constraint with constraint100 % yes yes90 % yes yes80 % yes yes70 % no yes60 % no yes50 % no noTable 3: Performance on the family dataThe complete background knowledge contains, amongst its 740 facts, 72 factsabout parent, 31 facts about male and 24 facts about female. The training setcontains 36 positive examples of father taken from the family database and 34negative examples of father that were generated randomly. The concept fatherwas learned from a background knowledge containing 100 % , 90 %, 80 %, 70%, 60 % and 50 % of the facts. The facts were taken out from the backgroundrandomly while the training set was the same for all experiments. Then, thesame experiments were repeated with a background augmented with the con-straint male(X); female(X). The results are shown in table 3: without theintegrity constraint, I-ACL was able to learn the correct rulefather(X;Y ) parent(X;Y );male(X)with an incompleteness up to 80 %, while with the integrity constraint we stilllearn the correct rule at 60 %. This experiments demonstrate the usefulness ofintegrity constraints when learning from an incomplete background knowledge.5.3 Multiple Predicate LearningWe considered the problem of learning the predicates brother and sibling froma background knowledge containing facts about parent, male and female. Thebias allowed the body of the rules for brother to be any subset offparent(X;Y ); parent(Y;X); sibling(X;Y ); sibling(Y;X);male(X);male(Y ); female(X); female(Y )gwhile the body of the rules for sibling can be any subset offparent(X;Y ); parent(Y;X); parent(X;Z);parent(Z;X); parent(Z; Y ); parent(Y; Z);male(X);male(Y );male(Z); female(X); female(Y ); female(Z)gTherefore, the rules we are looking for arebrother(X;Y ) sibling(X;Y );male(X):sibling(X;Y ) parent(Z;X); parent(Z; Y ):18

The family database considered, taken from [RLD93], contains 16 facts aboutbrother, 38 about sibling, 22 about parent, 9 about male and 10 about female.The background knowledge was obtained from this database by consideringall the facts about male and female and only 50 % of the facts about parent(selected randomly). The training set contains all the facts about brother and50 % of the facts about sibling (also selected randomly). The negative exampleswere generated by making the Closed World Assumption and taking a randomsample of the false atoms: 36 negative examples for sibling and 37 for brother.Since sibling and parent are incomplete, they both are considered as abducible(brother could also be considered as an abducible but this, as expected, has noe�ect).>From this data, the system has constructed �rst the rulebrother(X;Y ) sibling(Y;X);male(X):It has exploited both the positive examples of sibling to cover positive examplesof brother and negative examples of sibling to avoid covering negative examplesfor brother. This rule was constructed �rst because the heuristics preferred it tothe rules for sibling, as the information available for brother was more completethan the information for sibling. When learning this rule, I-ACL has made anumber of assumptions on sibling: it has abduced 3 positive facts (that willbecome positive examples for sibling) and 33 negative facts (that will becomenegative examples for sibling). Then, I-ACL constructs the rule for siblingsibling(X;Y ) parent(Z;X); parent(Z; Y )using this new training set and making assumptions on parent.This experiment showns how I-ACL is able to learn multiple predicates ex-ploiting as much as possible the information available and generating new datafor one predicate while learning another.6 Conclusions and Related WorkWe have studied the new learning framework of Abductive Concept Learningsetting up its theoretical formulation and properties and developing a �rst sys-tem for it. Initial experiments with this system have demonstrated ACL as asuitable framework for learning with incomplete information. We have also il-lustrated the potential ACL as a hybrid framework that can integrate togetherdiscriminant and characteristic learning and corresponding ILP systems.Abduction has been incorporated in many learning systems (see e.g. [DK96,AD95, AD94, Coh95, ?]) but in most cases this is seen as a useful mechanismthat can support some of the activities of the learning systems. For example,in multistrategy learning (or theory revision) abduction is identi�ed as one ofthe basic computational mechanism (revision operator) for the overall learningprocess. A notable exception is that of [TM94] where a simple form of abductionis used in the learning covering relation in the context of a particular applicationof learning theories for diagnosis. Also recently, the deeper relationship between19

abduction and induction has been the topic of study of an ECAI96 workshop[MRFK96]. Our work builds on earlier work in [DK96] and [ELM+96, LMMR97]for learning simpler forms of abductive theories. Finally, the issue of integratingdiscriminant (or explanatory) and characterizing ILP systems has also been putforward in [DDK96].Recently, there have been several other proposals for learning with incom-plete information. The FOIL-I system [IKI+96] learns from incomplete informa-tion in the training examples set but not in the background knowledge, with aparticular emphasis on learning recursive predicates. Similarly to our approach,it performs a beam-search on the space of clause re�nements, keeping a set ofclauses of di�erent length. In [WD95] the authors propose several frameworksfor learning from partial interpretations. A framework that can learn form in-complete information and is closely related to ACL is that of learning fromsatis�ability [RD96]. This framework is more general than ACL as both theexamples and the hypotheses can be clausal theories (although it is possible toextend ACL to allow examples which are clausal theories). On the other hand,when the training examples for learning from satis�ability are in a suitable formfor ACL then learning from satis�ability can be simulated by ACL. In this casethe learned abductive theory will contain a single trivial (default) rule for theconcept with an empty body and the set of integrity constraints will be theclauses found by learning from satis�ability.Further development of the ACL system is needed to integrate fully thecharacteristic induction process of learning the integrity constraints. An inter-esting approach for this is to use existing ILP systems for characteristic learningsuch as Claudien and Claudien- Sat, but also discriminant systems like ICL andICL-Sat for learning integrity constraints that discriminate between positiveand negative abductive assumptions generated by ACL. We have also began toinvestigate the application of ACL to real-life problems in the area of analyz-ing market research questionnaires where incompleteness of information occursnaturally by unanswered questions or do not know answers.References[AD94] H. Ad�e and M. Denecker. RUTH: An ILP Theory Revision System. InProceedings ISMIS94, 1994.[AD95] H. Ad�e and M. Denecker. AILP: Abductive Inductive Logic Programming.In Proceedings IJCAI95, 1995.[BDR96] H. Blockeel and L. De Raedt. Inductive database design. In Proceed-ings of the 10th International Symposium on Methodologies for IntelligentSystems, volume 1079 of Lecture Notes in Arti�cial Intelligence, pages376{385. Springer-Verlag, 1996.[Coh95] W.W. Cohen. Abductive Explanation-based Learning: A solution to theMultiple Inconsistent Explanation Problem. Machine Learning, 8:167{219,1995. 20

[DDK96] Y. Dimopoulos, S. D�zeroski, and A.C. Kakas. Integrating explanatoryand descriptive learning in ilp. Technical Report TR-96-16, University ofCyprus, Computer Science Department, 1996.[DK96] Y. Dimopoulos and A. Kakas. Abduction and Learning. In Advances inInductive Logic Programming. IOS Press, 1996.[dV89] W. Van de Velde. Idl, or taming the multiplexer problem. In MorikK., editor, Proceedings of the 4th European Working Session on Learning.Pittman, 1989.[ELM+96] F. Esposito, E. Lamma, D. Malerba, P. Mello, M.Milano, F. Riguzzi, andG. Semeraro. Learning Abductive Logic Programs. In Proceedings of theECAI96 Workshop on Abductive and Inductive Reasoning, 1996.[IKI+96] N. Inuzuka, M. Kamo, N. Ishii, H. Seki, and H. Itoh. Top-down inductionof logic programs from incomplete samples. In S. Muggleton, editor, Pro-ceedings of the 6th International Workshop on Inductive Logic Program-ming, pages 119{136. Stockholm University, Royal Institute of Technology,1996.[KKT93] A.C. Kakas, R.A. Kowalski, and F. Toni. Abductive Logic Programming.Journal of Logic and Computation, 2:719{770, 1993.[KKT97] A.C. Kakas, R.A. Kowalski, and F. Toni. The role of abduction in logicprogramming. In D. et al. Gabbay, editor, Handbook of Logic in AI andLogic Programming. 1997. to appear.[KM90] A.C. Kakas and P. Mancarella. On the relation between Truth Mainte-nance and Abduction. In Proceedings of PRICAI90, 1990.[KR96] A.C. Kakas and F. Riguzzi. Abductive Concept Learning. TechnicalReport TR-96-15, University of Cyprus, Computer Science Department,1996.[LD94] N. Lavra�c and S. D�zeroski. Inductive Logic Programming: Techniques andApplications. Ellis Horwood, 1994.[LMMR97] E. Lamma, P. Mello, M. Milano, and F. Riguzzi. Integrating inductionand abduction in logic programming. In P. P. Wang, editor, Prooceedingsof the Third Joint Conference on Information Sciences, volume 2, pages203{206. Duke University, 1997.[MRFK96] M.Denecker, L.De Raedt, P. Flach, and A. Kakas, editors. Proceedingsof ECAI96 Workshop on Abductive and Inductive Reasoning. CatholicUniversity of Leuven, 1996.[RB93] L. De Raedt and M. Bruynooghe. A Theory of Clausal Discovery. InProceedings of IJCAI93, 1993.[RD96] L. De Raedt and L. Dehaspe. Learning from satis�ability. Technical report,Katholieke Universiteit Leuven, 1996.[RL95] L. De Raedt and W. Van Lear. Inductive Constraint Logic. In Proceedingsof the 5th International Workshop on Algorithmic Learning Theory, 1995.[RLD93] L. De Raedt, N. Lavra�c, and S. D�zeroski. Multiple Predicate Learning.In Proceedings of the 3rd International Workshop on Inductive Logic Pro-gramming, 1993. 21

[TM94] C. Thompson and R. Mooney. Inductive learning for abductive diagnosis.In Proc. of AAAI-94, 1994.[WD95] S. Wroble and S. D�zeroski. The ilp description learning problem: Towardsa genearl model-leve de�nition of data mining in ilp. In Proceedings of theFachgruppentre�en Maschinelles Lernen, 1995.

22

Documents

Learning with abduction