21
Computational Intelligence, Volume 17, Number 2, 2001 REVIEW AND RESTORE FOR CASE-BASE MAINTENANCE Thomas Reinartz, Ioannis Iglezakis DaimlerChrysler AG, Research & Technology, Ulm, Germany and Thomas Roth–Berghofer tec:inno GmbH, Sauerwiesen 2, Kaiserslautern, Germany Case-base maintenance is one of the most important issues for current research in case-based reasoning (CBR). In this article we propose an extended six-step CBR cycle and discuss its two additional steps as part of the maintenance phase of the CBR process. The review step covers assessment and monitoring of the knowledge containers, whereas the restore step actually modifies the contents of the containers according to recommendations resulting from the review step in order to keep the knowledge containers in a usable state. Here we focus our attention on the case base. For the review step, we define several quality measures based on different case and case-base properties that describe specific characteristics of the case base such as correctness, consistency, uniqueness, minimality, and incoherence. Then we use these measures to realize monitoring capabilities for the case-base container that indicate when the restore step is necessary. Finally, we also describe several methods for modifications of the case base in the restore step and their relation to the review step. An initial experimental evaluation shows the appropriateness of the proposed concepts and methods before we conclude the article with a discussion of related work and an outline of future directions to extend these aspects of maintenance in CBR. Key words: six-step CBR cycle; case-base maintenance; quality measures; monitor operators; modify operators. 1. INTRODUCTION During the last decade, case-based reasoning (CBR) has evolved to a well estab- lished intelligent technology suitable to support various applications. One of the conse- quences is that the focus of current CBR research has moved from issues of case-base modeling and acquisition, retrieval and indexing tasks, and similar basic challenges more toward application-oriented goals. In particular, one of the crucial questions in CBR today deals with issues related to the life cycle of a CBR system. These issues include the central question of what to do with the contents of a case base over time. First articles on these issues called efforts along these lines case-base maintenance (CBM) (e.g., see Leake and Wilson 1998). Here we focus on CBM as the part of the overall maintenance problem in CBR that specifically deals with the maintenance of the case-base knowledge container. The overall concept of the maintenance problem in CBR also covers all tasks that occur during the lifetime of a CBR system and affect the permanent usability of the system over time. In practice, a CBR system and its contents change over time. For example, new cases are added, old or invalid cases are deleted, similar cases are combined to more general cases, conflicting cases are corrected, and so on. All these changes only happen if some kind of indicator invokes the respective mechanisms to change the CBR system. Therefore, we have to define means of quality control that enable specific realizations of such indicators. In this article we argue that quality measures based on basic properties of single cases and sets of cases are appropriate to realize strategies for quality control Address correspondence to Thomas Reinartz, DaimlerChrysler AG, Research and Technology, FT3/AD, P. O. Box 2360, 89013 Ulm, Germany. c 2001 Blackwell Publishers, 350 Main Street, Malden, MA 02148, USA, and 108 Cowley Road, Oxford, OX4 1JF, UK.

Review and Restore for Case-Base Maintenance

  • Upload
    auth

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Computational Intelligence, Volume 17, Number 2, 2001

REVIEW AND RESTORE FOR CASE-BASE MAINTENANCEThomas Reinartz, Ioannis Iglezakis

DaimlerChrysler AG, Research & Technology, Ulm, Germany

and

Thomas Roth–Berghofertec:inno GmbH, Sauerwiesen 2, Kaiserslautern, Germany

Case-base maintenance is one of the most important issues for current research in case-basedreasoning (CBR). In this article we propose an extended six-step CBR cycle and discuss its two additionalsteps as part of the maintenance phase of the CBR process. The review step covers assessment andmonitoring of the knowledge containers, whereas the restore step actually modifies the contents of thecontainers according to recommendations resulting from the review step in order to keep the knowledgecontainers in a usable state. Here we focus our attention on the case base. For the review step, wedefine several quality measures based on different case and case-base properties that describe specificcharacteristics of the case base such as correctness, consistency, uniqueness, minimality, and incoherence.Then we use these measures to realize monitoring capabilities for the case-base container that indicatewhen the restore step is necessary. Finally, we also describe several methods for modifications of thecase base in the restore step and their relation to the review step. An initial experimental evaluationshows the appropriateness of the proposed concepts and methods before we conclude the article with adiscussion of related work and an outline of future directions to extend these aspects of maintenance inCBR.

Key words: six-step CBR cycle; case-base maintenance; quality measures; monitor operators; modifyoperators.

1. INTRODUCTION

During the last decade, case-based reasoning (CBR) has evolved to a well estab-lished intelligent technology suitable to support various applications. One of the conse-quences is that the focus of current CBR research has moved from issues of case-basemodeling and acquisition, retrieval and indexing tasks, and similar basic challenges moretoward application-oriented goals. In particular, one of the crucial questions in CBRtoday deals with issues related to the life cycle of a CBR system. These issues includethe central question of what to do with the contents of a case base over time.

First articles on these issues called efforts along these lines case-base maintenance(CBM) (e.g., see Leake and Wilson 1998). Here we focus on CBM as the part of theoverall maintenance problem in CBR that specifically deals with the maintenance ofthe case-base knowledge container. The overall concept of the maintenance problem inCBR also covers all tasks that occur during the lifetime of a CBR system and affect thepermanent usability of the system over time.

In practice, a CBR system and its contents change over time. For example, newcases are added, old or invalid cases are deleted, similar cases are combined to moregeneral cases, conflicting cases are corrected, and so on. All these changes only happenif some kind of indicator invokes the respective mechanisms to change the CBR system.Therefore, we have to define means of quality control that enable specific realizations ofsuch indicators. In this article we argue that quality measures based on basic propertiesof single cases and sets of cases are appropriate to realize strategies for quality control

Address correspondence to Thomas Reinartz, DaimlerChrysler AG, Research and Technology, FT3/AD,P. O. Box 2360, 89013 Ulm, Germany.

c© 2001 Blackwell Publishers, 350 Main Street, Malden, MA 02148, USA, and 108 Cowley Road, Oxford, OX4 1JF, UK.

Review and Restore for Case-Base Maintenance 215

in CBR. Note that in CBR we distinguish between two different kinds of quality: Thefirst type of quality concerns the quality of solutions represented within the cases of thecase base, whereas the second type is about the reasoning quality of the CBR systemand the relation among cases of the case base. Here we focus on the second type ofquality.

After we possibly detected quality deficiencies of the CBR system, we also have todecide which modifications to the system lead to an increased quality. Therefore, CBMneeds modify operators that change the system state and return it to a usable stateagain. Here we define example operators to modify the case base.

This article is organized as follows: In the next section we propose an extendedsix-step CBR cycle with two additional steps as part of the maintenance phase of theCBR process—these steps are the review and the restore step. Thereafter, we define therepresentation necessary to specify example operators for both steps. Then we describevarious properties of single cases and sets of cases. These properties are then usedto define quality measures, monitor operators, and modify operators for the reviewand restore steps in two separate sections. Before we close the article with concludingremarks and issues for future work, we present an initial experimental evaluation of theproposed concepts and methods in different application domains and a discussion ofrelated work.1

2. THE SIX-STEP CBR CYCLE

Although early definitions of the CBR process emphasized the application phaseof CBR, it is now accepted that an additional maintenance phase is necessary to copewith all issues that arise when CBR is used in practical applications and especially whenthe CBR system state changes over time (Goker and Roth-Berghofer 1999). From ourperspective, the first three steps in the standard four-step CBR cycle (Aamodt andPlaza 1994)—retrieve, reuse, revise—build the application phase, whereas the fourthstep—retain—is the first step in the maintenance phase of CBR.

The difference between application and maintenance originates from the followingview on the separation of tasks in CBR. Application deals with finding previous expe-riences and reusing them for novel situations without changing the CBR system state.Maintenance tasks cover additional issues to update the CBR system and preserve orreturn to an appropriate system performance. The current four-step cycle only includesthe retain step for such changes that update the case base only in a single direction byadding cases to the case base. Additional maintenance issues are not yet represented.2

Hence we propose to extend the four-step CBR cycle by two additional steps aspart of the maintenance phase of the CBR process: review and restore. The review stepcovers tasks to judge and monitor the current state of the CBR system and its knowledgecontainers, whereas the restore step invokes mechanisms to change the system andits knowledge in order to return to a usable state in situations where CBR systemperformance does not meet desired requirements anymore. Figure 1 shows the resultingsix-step CBR cycle.3 The first three steps form the application phase, whereas the lastthree build the maintenance phase in CBR. Note that it is not necessary to invoke thelast three steps after each application of CBR to solve a new problem.

1Some parts of this paper are based on descriptions by Reinartz, Iglezakis, and Roth-Berghofer (2000).2See Section 8 for a discussion of existing approaches to cope with such issues.3The original four-step CBR cycle is also called the four REs according to the mnemonic nature of its steps.

Hence we call the proposed new six-step process also the six REs.

216 Computational Intelligence

Review

Retain

Reuse

Revise

Restore Retrieve

KnowledgeContainers

Application PhaseMaintenance Phase

Figure 1. The Six-Step CBR Cycle.

KnowledgeContainer Review ok? Restore

n

y

QualityMeasures

MonitorOperators

ModifyOperators

Figure 2. Additional Maintenance Steps in CBR.

In Figure 2 we outline the review and restore steps along with the control flowbetween them and the inputs that are necessary to perform them. The two steps startwith one of the knowledge containers as their input. This knowledge container is eitherthe result of an initial knowledge-acquisition step at the beginning of a CBR project orthe outcome of modifications to the CBR system at a certain stage after the system isalready in use.

The review step considers the current state of the knowledge container and assessesits quality. For this purpose, we have to define quality measures that allow the com-putation of values that indicate the current quality of the system. Additional monitoroperators enable permanent control of these quality values, and specific indicators leadto the initiation of the restore step if the CBR system state is no longer appropriate.

The restore step uses modify operators to change the contents of the CBR system.In an ideal setting, the review step already suggests specific changes to get back to thelevel of quality desired. If there is no need to go to the restore step because the qualityvalues are still in good shape, we simply return to the knowledge container and keep itunchanged.

In the following we focus our attention on maintenance activities that affect the casebase rather than the vocabulary, similarity, or adaptation knowledge container (Richter1995). We argue that it is important to start with the case base as the central knowledgecontainer in CBR (Iglezakis and Roth-Berghofer 2000). For simplicity, we consider anycase base as a set of cases.

Review and Restore for Case-Base Maintenance 217

3. CASE AND CASE-BASE REPRESENTATION

In this section we define the representation of cases and sets of cases. These defini-tions are necessary to specify concrete quality measures for CBR later. For all notations,we aim at as general definitions as possible with maximum flexibility to cover cases andcase bases in most CBR applications and systems.

We start the representation with the basic components for building cases, namely,attributes, values, problems, and solutions (see Definitions 1 to 3). We choose anattribute-value representation because it ensures high flexibility and generality. We areable to transform other case representations into an attribute-value representation ifnecessary.

Definition 1 (Attributes and Values). An attribute aj is a name accompanied by a setVj := �vj1, � � � , vjk, � � � , vjNj

� of values. We denote the set of attributes as A :=�a1, � � � , aj , � � � , aN�, and the set of values as V := ⋃N

j=1 Vj .

Each attribute consists of an identifier (the name) and a set of possible values. Forsimplicity, we assume for all attributes, especially for quantitative attributes, that the setof values contains only “occurring” values. This means by definition and the dynamicnature of CBR that sets of values are dynamic as well.

Definition 2 (Problem). A problem is a set pi := �pi1� � � � � pij′� � � � � piNi� with ∀j′ ∈

�1;Ni ∃aj ∈ A and ∃vjk ∈ Vj : pij′ = vjk, and ∀j ∈ �1;N : | �pi ∩ Vj� |≤ 1. We denotethe set of problems as P := �p1� � � � � pi� � � � � pM�.

We define problems as sets of attribute values and solutions as any form of infor-mation that contributes to the solution of a given problem. The first condition inDefinition 2 ensures that for each element in the set of values of a problem, thereexists a corresponding attribute and a respective value in its set of values. The secondcondition makes sure that for each attribute, a problem does not contain more than asingle value. The latter assumption is simplifying from situations where it makes senseto allow more than a single value for specific attributes. However, this situation is eitheremulated by adding extra sets of single values to the set of values or by relaxing thesecond condition at a later stage of research.

The set-oriented definition of problems enables easy implementation of the qualitymeasures defined below. Note that we do not assume that a single problem specifiesvalues for each available attribute. In contrast, a problem only contains those values forattributes which are relevant to describe the specific situation at hand.

Definition 3 (Solution). A solution si is any item. We denote the (multi-) set of solutionsas S := �s1� � � � � si� � � � � sM�.

For a solution, we do not define any additional requirements in order to avoidrestrictions for solutions in any way (see Definition 3). For example, a solution possi-bly contains any information which describes how to “solve” the problem. The morestructured the domain is, the more structured information or definition we provide fora solution. For example, in classification domains, a solution is simply another attributewith a domain of values containing all occurring class labels. On the restrictive side ofthis flexibility, we are only able to reason about solutions in terms of match or mismatchbut not in terms of any other more fine-grained meanings.

218 Computational Intelligence

Table 1. An Example Case Base

pi si qi

c1 v12 v23 v31 s1 q1c2 v13 v23 s2 q2c3 v13 v23 s3 q3c4 v13 v23 v32 v41 s4 q4c5 v13 v23 v32 s5 q5c6 v12 v31 v45 s6 q6

For both sets, the set of problems and the set of solutions, we assume that it containsM elements. Since we are aware of the fact that the same solution possibly solvesdifferent problems, we allow the set S of solutions to contain the same element morethan once. Hence S in Definition 3 is possibly a multi-set. We further assume that theenumeration of problems and solutions in P and S corresponds to each other; i.e., thesolution for a given problem has the same index in S as the problem has in P .

For the moment, we also presume that P and S contain components of “real”cases, i.e., cases that (currently) exist in the case base and that make sense in termsof the application domain. If the contents of the case base change, both sets P and Schange too. In addition to such real cases, we also have potential cases (i.e., cases thatcan occur by combining any possible combination of attribute values) and meaningfulpotential cases (i.e., potential cases that in fact lead to a case that corresponds to apossible case in the application domain that makes sense).

Definition 4 (Case and Case Base). A case is a tuple ci := �pi� si� qi� with a problempi, a solution si, and additional information qi. A case base is a set of cases C :=�c1� � � � � ci� � � � � cM�. We denote the set of all potential cases by C∗.

Definition 4 states that a case is a tuple containing a problem component pi, asolution component si, and an additional information component qi. This additionalinformation component comprises any extra information that is necessary for qualitycontrol and that is related to the life cycle of this specific case. For example, timestamps for the time the case is added to the case base and the time of its last accessor the relative number of successful usages of the case are typical information items.In this article we do not consider this additional information but focus on the problemand solution components of cases.

Example 1 (Set of Cases). Assume a domain with A := �a1, a2, a3, a4�, V1 := �v11,v12, v13�, V2 := �v21, v22, v23, v24�, V3 := �v31, v32�, and V4 := �v41, v42, v43, v44, v45�.G := �c1, c2, c3, c4, c5, c6� in Table 1 is a set of cases in this domain.

4. CASE AND CASE-BASE PROPERTIES

In this section we present several case and case-base properties. Case propertiesdescribe characteristics of single cases, whereas case-base properties provide informa-tion about sets of cases, i.e., subsets of cases in the case base. We define case properties

Review and Restore for Case-Base Maintenance 219

as the atomic concepts to specify case-base properties, which in turn form the basisfor the definition of quality measures for CBR. In particular, atomic properties areindependent from each other.

For case properties, we further distinguish between isolated and comparative char-acteristics. Whereas isolated properties only consider characteristics of a single case,comparative properties also take into account information on other cases and comparethem to the single case considered. Note that comparative case properties still defineonly characteristics of a single case, although they consider more than a single case.

4.1. Isolated Case Properties

The most important isolated case property is correctness. We consider a case ascorrect if the given solution component really “solves” the problem specified by theproblem component (see Definition 5). The crucial bit in this definition is the notion of“solves”. At this point, we basically use a place holder for any type of relation “solves”denoted by � . Thereby, the definition is open to all application domains. For example,in traditional classification domains, the solution of a case is a class label, and therelation � between problem and solution holds if this class label corresponds to thetrue class label of the problem. Note that as soon as we want to compute the correctnessof a case, we have to define relation � precisely. However, since we aim at application-independent concepts, we are not able to specify � in more detail because this requiresconsideration of domain-specific aspects.

Definition 5 (Correctness). Assume a binary relation � ⊆ P × S and a case ci ∈ C.

ci correct :⇐⇒ � �pi� si�

For the moment, we do not define any other case properties that only considercharacteristics of a single case, although we expect that the additional information com-ponent qi of a case naturally leads to more isolated case properties. For example, theinformation component qi enables control of time and usage aspects and then is ableto indicate the necessity of maintenance based on these aspects.

4.2. Comparative Case Properties

In this subsection we turn to comparative case properties that describe character-istics of single cases in relation to other cases. For all the following definitions in thissubsection, we assume that each case is correct.

Definition 6 (Consistency). Assume G ⊆ C and ci ∈ G.

ci consistent within G :⇐⇒ �ci′ ∈ G : pi′ ⊆ pi ∧ si �= si′

The first comparative case property describes consistency within a set of cases.A single case is consistent within a set of cases if and only if there does not exist anyother case that solves the same or a more general problem differently. In general, thisdefinition covers the issue of alternatives. We assume that for each possible problemthere exists a “best” solution. Naturally, we want the case base to consist only of casesthat have “best” solutions and hence do not allow alternative solutions, neither as analternative mentioned in the same case nor as an extra case that has the same problemcomponent but a different solution component.

220 Computational Intelligence

However, we are aware that there possibly exist application domains where differ-ent, equally good solutions for the same problem make sense. In such domains, theconsistency property indicates that such a situation of alternatives appears in the casebase, and it is up to the CBR engineer to decide whether these alternatives are allowedin this specific context. Then we mark this decision within the information componentqi of the respective cases and do not consider them as inconsistent anymore.

In addition to alternative solutions, Definition 6 also covers situations where thesubset relation between problem components with different solution components indi-cates that the more general problem definition possibly lacks some detailed informationthat is necessary to make the two cases distinctive.

Definition 7 (Uniqueness). Assume G ⊆ C and ci ∈ G.

ci unique within G :⇐⇒ �ci′ ∈ G� ci′ �= ci : pi′ = pi ∧ si = si′ �

Definition 7 declares a case as unique if and only if there does not exist anothercase within the considered set of cases that solves exactly the same problem in exactlythe same way.

Definition 8 (Minimality). Assume G ⊆ C and ci ∈ G.

ci minimal within G :⇐⇒ �ci′ ∈ G : pi′ � pi ∧ si = si′

A more general relation that covers a similar situation is subsumption (seeDefinition 8). A case subsumes another case if its problem component is a true subsetof the problem component of the subsumed one and the solution component remainsthe same. The more specific problem characterization then possibly contains too manydetails unnecessary to specify the core problem. We call a case without any differentcase that subsumes it minimal.

Definition 9 (Incoherence). Assume G ⊆ C, ci ∈ G, and 1 ≤ � ∈ N.

ci incoherent within G :⇐⇒ �ci′ ∈ G :| pi ∩ pi′ | +� = Ni = Ni′ ∧ si = si′

The most complex comparative case property describes situations with two caseswithin a set of cases that coincide in most of its components except for a specificnumber � of values (see Definition 9). We call a case incoherent within a set of cases ifand only if there does not exist any other case which overlaps with the case in too manycomponents. We consider incoherent cases as positive because the more cases differ,the broader is the spectrum of problems that they cover.

With parameter � we trigger the extent of overlapping information within coherentcases. For example, if � is 1, two cases are coherent if all their values are the sameexcept for a single value (in each case). Note that this difference in a single componentcorresponds to a situation either with two different values for the same attribute or withdifferent values for separate attributes.

Example 2 (Case Properties). Assume G in Table 1 and � = 1.

(i) c1 is consistent within G; c4 is not consistent within G exceptif s2 ≡ s3 ≡ s4 ≡ s5.

(ii) c1 is unique within G; c2 is not unique within G if s2 ≡ s3.(iii) c1 is minimal within G; c5 is not minimal within G if s2 ≡ s5 or s3 ≡ s5.(iv) c2 is incoherent within G; c6 is not incoherent within G if s1 ≡ s6.

Review and Restore for Case-Base Maintenance 221

4.3. Case-Base Properties

In the previous two subsections we defined several case properties to reason aboutthe quality of cases. Now we build on these characteristics and specify properties forsets of cases and hence for an entire case base that is simply a set of cases containingall cases in the case base.

Definition 10 (Case-Base Properties). Assume G ⊆ C.

(i) G is correct :⇐⇒ ∀ci ∈ G : ci correct.(ii) G is consistent :⇐⇒ ∀ci ∈ G : ci consistent within G.

(iii) G is unique :⇐⇒ ∀ci ∈ G : ci unique within G.(iv) G is minimal :⇐⇒ ∀ci ∈ G : ci minimal within G.(v) G is incoherent :⇐⇒ ∀ci ∈ G : ci incoherent within G.

Since case properties started with a notion of correctness for single cases, we alsodefine a notion of correctness for a set of cases (see Definition 10). This notion ofcorrectness uses the correctness of single cases—a set of cases is correct if and only ifall its cases are correct.

In a similar vein, we also adopt the definition of consistent, unique, minimal, andincoherent. Definition 10 summarizes the extensions of these definitions for single casesto sets of cases in the same way as for correctness.

5. THE REVIEW STEP FOR CASE-BASE MAINTENANCE

In this section we use the previously defined case and case-base properties to specifyinitial quality measures for the review step in CBR and describe example operators formonitoring the quality of a CBR system. The quality measures and their values haveto reflect user requirements that usually ask for a small, comprehensive, and consistentcase base. Second, these measures should be simple in the sense that their computationis possible in practice. Third, we also expect the measures to be application-independentand as system-independent as possible. Due to this requirement, we focus on more syn-tactical measures rather than semantic ones that additionaly use background knowledgefrom the specific application domain.

5.1. Degrees of Case-Base Properties

We start the set of initial quality measures with several degrees of quality.Definition 11 summarizes degrees of correctness, consistency, uniqueness, minimality,and incoherence. Each of these degrees computes the number of “good” cases in termsof the defined case properties and divides this number by the total number of caseswithin the considered set of cases. Hence each degree counts the relative number of“good” cases within a given set of cases.

Definition 11 (Degrees of Case-Base Properties). Assume C⊆ is the set of all subsetsof C.

(i) D1 : C⊆ �→ �0; 1, D1�G� := | �ci ∈ G | ci correct� | · | G |−1

(ii) D2 : C⊆ �→ �0; 1, D2�G� := | �ci ∈ G | ci consistent within G� | · | G |−1

(iii) D3 : C⊆ �→ �0; 1, D3�G� := | �ci ∈ G | ci unique within G� | · | G |−1

(iv) D4 : C⊆ �→ �0; 1, D4�G� := | �ci ∈ G | ci minimal within G� | · | G |−1

(v) D5 : C⊆ �→ �0; 1, D5�G� := | �ci ∈ G | ci incoherent within G� | · | G |−1

222 Computational Intelligence

The purpose of the different degrees in terms of the maintenance steps in CBR isto get values that provide an indicator for the review step and the decision whether thecase-base state is still good enough or not. In comparison with the case-base properties,these degrees enable a more fine-grained consideration of the characteristics of the casebase. The previously defined case-base properties are essentially special cases for whicheach of the degrees yields value 1.

5.2. Case-Base Quality Measures

Definition 12 shows three examples for quality measures in CBR that cumulate thevarious degrees specified above to a single overall value of case-base quality. The firstvariant considers the minimum value of degrees as the crucial number for quality con-trol. For example, if we assume that we compare quality values with a given threshold totrigger the restore step, this means that quality control based on this measure invokesmodify operators as soon as a single degree—regardless of which one—becomes inap-propriate. The second example in Definition 12 does exactly the contrary. It starts torecommend changes to the case base only if all degrees depict the same bad quality.

Definition 12 (Quality Measures). Assume C⊆ is the set of all subsets of C, and w1, w2,w3, w4, w5 ∈ �0; 1 with

∑x∈�1� 2� 3� 4� 5� wx = 1.

(i) Qmin : C⊆ �→ �0; 1, Qmin�G� := minx∈�1�2�3�4�5�

�Dx�G��(ii) Qmax : C⊆ �→ �0; 1, Qmax�G� := max

x∈�1�2�3�4�5��Dx�G��

(iii) Qw : C⊆ �→ �0; 1, Qw�G� := ∑x∈�1�2�3�4�5�

wx ·Dx�G�

The third measure is a compromise between these two extremes and considers theaverage degree value if we set all weights to the same value. On the other hand, thismeasure also allows the user to set preferences on specific aspects. For example, the userdoes not care about redundant cases at all and sets w3 to zero, but he or she does notwant any inconsistencies within any subset of cases in the case base and consequentlyinitializes w2 to a relatively high value.

Example 3 (Quality Measures). Assume G in Table 1, G is correct, s1 ≡ s6, s2 ≡ s3 ≡ s5,� = 1, and w1 := w2 := w3 := w4 := w5 := 1/5.

Then, D1�G� = 1, D2�G� = 5/6, D3�G� = 2/3, D4�G� = 5/6, and D5�G� = 2/3.Moreover, Qmin�G� = 2/3, Qmax�G� = 1, and Qw�G� = 4/5.

We get more alternative quality measures if we allow the weights wx to be anyfunction of G, or by adding aspects of time, or by taking into account measures of usageinformation, and so on. It is also possible to define quality measures that consider morethan a single degree at a time. For example, we are able to compare two (or more)degree values and consider the absolute difference or their relation to each other. Atthis point, the concept of specifying quality-measures is sufficiently general to enableappropriate definitions of respective quality-control mechanisms in almost any domain.

5.3. Monitoring the Case-Base Quality

As the second main activity in the review step of the maintenance phase in CBR,we have to monitor the different quality measures and decide when we must invoke the

Review and Restore for Case-Base Maintenance 223

Dx

t

1

t1 t6t5t4t3t2 t7

δ

τδD

δt

Figure 3. Example Monitor Operators.

restore step according to the quality values. For this purpose, we define several examplemonitor operators below.

Figure 3 presents an overview of three different strategies to monitor degrees Dx ofdifferent properties (or quality values Qx). On the x axis we see different time stamps,and on the y axis we have the specific value for Dx (or Qx) at a specific time stamp,respectively.

Note that although Dx in Figure 3 is monotonically decreasing over time, this isnot necessarily true in all occasions. If we only apply the first three steps in the CBRcycle, values of Dx (or Qx) do not change at all. As soon as we start the maintenancephase with the retain step, the values for Dx (and Qx) can either increase or decreasedepending on the impact of the new cases. Since larger values of Dx refer to bettercase-base quality, it is the ultimate goal of the review and restore steps to increasevalues of Dx. Situations with increasing values of Dx are not shown in Figure 3; i.e., thecurve in Figure 3 does not represent results of maintenance in CBR.

Definition 13 (Monitor Operators). Assume C⊆ is the set of all subsets of C, G ⊆ C,Gt denotes G at time stamp t, x ∈ �1� 2� 3� 4� 5�, t < t ′, � := �D := Dx�Gt ′ � −Dx�Gt�,and �t := t ′ − t.

(i) M1 : C⊆ × �0; 1 �→ �0� 1�, M1�Gt� � :=

{0� ifDx�Gt� >

1� ifDx�Gt� ≤

(ii) M2 : C⊆ × C⊆ × �0; 1 �→ �0� 1�, M2�Gt�Gt ′� � :=

{0� if � <

1� if � ≥

(iii) M3 : C⊆ × C⊆ × �0;∞� �→ �0� 1�, M3�Gt�Gt ′� � :=

{0� if �D/�t <

1� if �D/�t ≥

Definition 13 specifies the three illustrated example monitor operators as indicatorfunctions that return 0 if no restore step is necessary and 1 if the review step recom-mends performance of the restore step now. All definitions use a user-defined threshold that triggers the tolerances that are allowed for quality deficiencies.

The first example monitor operator only considers a single value of one of thedegrees Dx at a single time stamp t. If this value is above threshold , the relativefrequency of positive cases with respect to one of the case properties within a set ofcases G is still appropriate, and no actions to change the case base are necessitated. Onthe contrary, if the degree value is equal to or below , we assume that the quality ofthe case base is not suitable anymore.

224 Computational Intelligence

The second and third monitor operators specify more fine-grained indicator func-tions because they consider two different states of the case base at two different timestamps; i.e., they judge the quality of the case base over time. M2 takes into account theabsolute difference between two degree values regardless of the amount of time thatlies between the two states, whereas M3 also considers the speed of decreasing quality.The faster the quality of the case base decreases, the sooner it is advisable to restorethe case base. For t ′ − t → 0, M3 is the derivation of Dx. If we explicitly know thefunction Dx, we are able to directly compute the derivation at each time stamp.

For both operators, it is generally useful to consider M1 in parallel because other-wise the quality possibly decreases to zero in small steps or slowly without any actionsto increase the case-base quality at all. Note that as we have monitor operators M1 toM3 for Dx, we also have corresponding operators for Qx not shown in this article.

For more advanced monitor operators, we expect to use more than a single curve ofdegree values in combination within the same operator definition. Thus we are able tounify information from different types of properties and to use their relations to eachother.

It is also possible to use visualization techniques as well as data mining methods.Both approaches can help to learn appropriate values for , or they enable strategies topredict future values of Dx in order to recognize problems in the case base even beforethey really occur.

6. THE RESTORE STEP FOR CBM

If during the review step monitoring indicates problems with the case base, therestore step invokes specific modify operators to change the case base such that thequality of the case base increases to an acceptable level again. Usually, the restore stepsuggests modifications to the case base, but the final decision whether the suggestedchange is made or not is still the task of a human CBR engineer. We also imagine thatthe restore step suggests modifications plus an indication of the new quality value afterthe change is actually applied to the case base. Then it is easier to decide for the CBRengineer whether a specific modification actually leads to an improved case base or not.

In this section we consider several classes of modify operators and analyze whichcase-or case base properties correspond to which operator.

6.1. Basic Modify Operators

Basic modify operators work on single cases. They either add or remove a completecase or change (part of) one of the components of a single case.

Definition 14 (Add Case). Assume C⊆ is the set of all subsets of C, C⊆∗is the set of

all subsets of C∗, G ⊆ C, and ci �∈ G.

add : C⊆ × C∗ �→ C⊆∗� add�G� ci� := G ∪ �ci�

The first simple modify operator adds a new case to a set of cases (seeDefinition 14). We assume that this new case is a meaningful potential case, i.e., acase that really makes sense in the application domain of the CBR system. We furtherassume—though not explicitly mentioned—that the case is correct.

Review and Restore for Case-Base Maintenance 225

Definition 15 (Remove Case). Assume C⊆ is the set of all subsets of C, G ⊆ C, andci ∈ G.

remove : C⊆ × C �→ C⊆� remove�G� ci� := G\�ci�

The second basic modify operator does exactly the opposite of the first one. Itremoves a single case from a set of cases (see Definition 15). Although we define thisoperator by set deletion, we expect that the remove operator is implemented by anadditional flag within the information component qi of a case that indicates that thiscase is no longer available for solving novel problems. In this way, we are able to easilyadd this case by changing the flag value if for some reason this is valuable at a laterstage of the CBR system.

Modifications to the case base within the restore step are all realized by using theadd and remove operators. If we apply one of the operators below that modifies asingle case or combines two cases to a single new one, the modification of the case baserefers to removing the old version of the case(s) and adding the new one.

Definition 16 (Specialize Case). Assume ci = �pi� si� qi�, pi ∩ Vj = ∅, vjk ∈ Vj , and qi∗

is an adapted information component.

specialize : C × V �→ C∗� specialize�ci� vjk� := �pi ∪ �vjk�� si� qi∗�

The third modify operator to support the restore step in CBM specializes a singlecase by adding a single value to the problem component for an attribute that is not yetrepresented in this case (see Definition 16). It is certainly also possible to specialize asingle case by adding more than a single value. This operation simply refers to multipleapplications of the specialize operator in Definition 16. As in Definition 14, weassume that the more specific problem component still results in a meaningful correctcase.

Definition 17 (Generalize Case). Assume ci = �pi� si� qi�, pi ∩ Vj = �vjk�, and qi∗ is anadapted information component.

generalize : C × V �→ C∗� generalize�ci� vjk� := �pi\�vjk�� si� qi∗�

The next definition is again the inverse to the previous one. The generalizeoperator removes a single value from the problem component of a single case (seeDefinition 17). Again, it is also possible to apply this operator multiple times and henceremove more than a single value of a case.

Definition 18 (Adjust Case). Assume ci = �pi� si� qi�, pi ∩ Vj = �vjk�, vjk �= vjk′ ∈ Vj ,and qi∗ is an adapted information component.

adjust : C × V × V �→ C∗� adjust�ci� vjk� vjk′ � := �pi\�vjk� ∪ �vjk′ �� si� qi∗�

As an alternative to the previous two operators, we also define a modify operatorthat combines specialization and generalization. The adjust operator changes a singlevalue in the problem component of a single case to a different value for the sameattribute (see Definition 18).

A more general change of attribute values is the modification of a single case byremoving a specific value and adding another value but for a different attribute. Thisidea leads to the definition of an alter operator (see Definition 19).

226 Computational Intelligence

Definition 19 (Alter Case). Assume ci = �pi� si� qi�, j �= j′, pi∩Vj = �vjk�, pi∩Vj′ = ∅,vj′k′ ∈ Vj′ , and qi∗ is an adapted information component.

alter : C × V × V �→ C∗� alter�ci� vjk� vj′k′ � := �pi\�vjk� ∪ �vj′k′ �� si� qi∗�

For all modify operators that change the problem component, we imagine similaroperators that modify the solution component. However, in the current representationof solutions as any item, we are not able to specify such operators, and since we do notwant to become application-dependent, we cannot specify the solution component inmore detail across different domains.

6.2. More Complex Modify Operators

In the previous subsection we specified various modify operators that add or removecomplete cases or that change the problem component of a single case. Now we turn tomore complex modify operators that work on more than a single case. For simplicity,we only define complex operators that have influence on two cases rather than on morethan two cases. However, it is also possible to specify corresponding operators thatmanipulate three or even more cases in the same way.

The idea of all operators below is to merge two (or more) cases to a new singleone. The respective modification of the case base is then to remove the two (or more)original cases and to add the new case. In all situations, it is also necessary to generatean adapted information component qi∗ .

Definition 20 (Cross Cases). Assume ci = �pi� si� qi�, ci′ = �pi′� si′� qi′ �, 1 ≤ � ∈ N,| pi ∩ pi′ | +� = Ni = Ni′ or pi � pi′ or pi′ � pi, si = si′ , and qi∗ is an adaptedinformation component.

cross : C × C �→ C∗� cross�ci� ci′ � := �pi ∩ pi′� si� qi∗�

The first two more complex modify operators deal with situations where two casesare coherent or one of them is not minimal. The cross operator suggests the inter-section of the two problem components and hence the reduction of both cases to theircommon values (see Definition 20). In the case of incoherence, as in Definition 9,parameter � triggers the amount of overlapping information required. We also do notdistinguish situations where the two cases have different values for the same attributeor have different values for different attributes.

Definition 21 (Join Cases). Assume ci = �pi� si� qi�, ci′ = �pi′� si′� qi′ �, 1 ≤ � ∈ N,| pi ∩ pi′ | +� = Ni = Ni′ and ∀aj ∈ A :| ��pi ∪ pi′ �\�pi ∩ pi′ � ∩ Vj |≤ 1 or pi � pi′ orpi′ � pi, si = si′ , and qi∗ is an adapted information component.

join : C × C �→ C∗� join�ci� ci′ � := �pi ∪ pi′� si� qi∗�

The join operator is similar to the cross operator, but it brings together allthe values of both problem components (see Definition 21). Moreover, we additionallyrequire that the distinctive values in both cases refer to different attributes in order toavoid violation of the condition in Definition 2 that allows only a single value for anattribute.

Review and Restore for Case-Base Maintenance 227

Definition 22 (Combine Cases). Assume ci = �pi� si� qi�, ci′ = �pi′� si′� qi′ �, | pi ∩ pi′ |+1 = Ni = Ni′ , ∃aj ∈ A ∃vjk� vjk′ ∈ Vj : �pi ∪ pi′ �\�pi ∩ pi′ � = �vjk� vjk′ �, si = si′ , andqi∗ is an adapted information component.

combine : C × C × V × V �→ C∗

combine �ci� ci′� vjk� vjk′ � := ��pi ∩ pi′ � ∪ �vjk ∨ vjk′ �� si� qi∗�

The third way to merge two cases into a single new case is to combine two differentvalues for the same attribute into a new value that consists of the disjunction of thetwo original values (see Definition 22). The new value then represents two alternativevalues for the same attribute that can both occur while still having the same solution.We only allow this type of combination if all other values of the two cases coincide andif both cases have the same solution component. Note that this modification requireschanging the respective attribute domain, and the implementation of the retrieve stepmust be able to handle this representation of disjunctions properly.

Definition 23 (Abstract Cases). Assume ci = �pi� si� qi�, ci′ = �pi′� si′� qi′ �, | pi ∩ pi′ |+1 = Ni = Ni′ , ∃aj ∈ A ∃vjk� vjk′ ∈ Vj : �pi ∪ pi′ �\�pi ∩ pi′ � = �vjk� vjk′ �, a binaryrelation " ⊆ Vj × Vj , ∃vjk∗ ∈ Vj : vjk∗ " vjk ∧ vjk∗ " vjk′ , si = si′ , and qi∗ is an adaptedinformation component.

abstract : C × C × V × V �→ C∗�

abstract �ci� ci′� vjk� vjk′ � := ��pi ∩ pi′ � ∪ �vjk∗�� si� qi∗�

The last way to merge cases requires the same assumptions as the combine opera-tor plus a binary relation " on the attribute domain that is considered (seeDefinition 23). For example, such a relation corresponds to a hierarchy of values wherethe successor relation holds if the first value is more general (at a higher level in thehierarchy of values) than the second. If two cases are identical except for a single valuewithin each problem component, and these two values have a common predecessor, wethen can abstract both values to that predecessor and replace the two original valuesby the more abstract new value.

6.3. Relations Between Review and Restore for CBM

In order to use the defined quality measures, monitor operators, and modify opera-tors in a practical setting, it is important to know the relations between the review stepand the restore step. The crucial question is which modify operator is useful to resolve

Table 2. Relations Between Review and Restore for CBM

Review Restore

Not correct Remove, adjust, alterNot consistent Remove, specialize, generalize, adjust, alterNot unique RemoveNot minimal Remove, specialize, generalize, adjust, alter, cross, joinNot incoherent Remove, specialize, generalize, adjust, alter, combine, abstract,

cross, join

228 Computational Intelligence

which quality problem. Since the monitor operators point to cases with negative prop-erties, we therefore have to consider the relation between the different case propertiesand the various modify operators.

Table 2 shows initial examples of such relations between the five different caseproperties and the ten modify operators. We certainly never resolve a quality problemby adding additional cases because additional cases do not change the existing negativecase properties; they possibly only result in new quality problems. Likewise, we arealways able to improve the case-base quality by an application of the remove operatorthat deletes one of the cases with quality problems and hence yields an improvementof the overall case-base quality.

However, it is also possible to optimize the case-base quality by changing or mergingexisting cases. For these situations, Table 2 suggests applicable and helpful modify oper-ators for each case property. For example, if the review step detects an incorrect case, itis only possible to make the case correct by an application of adjust or alter . Otheroperators do not help here. Note that the relations shown in Table 2 assume that we donot want to resolve one quality problem by adding another. For example, if we applyspecialize or generalize to make two redundant cases distinctive, we effectivelyconstruct a case that is not minimal.

7. INITIAL EVALUATION IN REAL-WORLD DOMAINS

In this section we present several experimental results across various types ofdomains to test the appropriateness of the proposed concepts and methods for reviewand restore in CBM.

7.1. Three Types of Application Domains

In order to back up the claim that the proposed concepts and methods for reviewand restore in CBM are suitable across different domains and applicable independent ofthe specific CBR system used, we selected three different types of application domains.The Siemens SIMATIC Knowledge Manager (Lenz et al. 1998) represents an appli-cation of textual CBR deploying tec:inno’s CBR system orenge. The central IT userhelp-desk at debis SH (Roth-Berghofer and Iglezakis 2000) is an example of conversa-tional CBR and uses Inference’s k-Commerce Support Enterprise as the CBR systemof choice. Finally, the selection of nine UCI data sets stands for structural CBR andapplies basic nearest-neighbor methods as the representative of basic CBR techniques.

The Siemens SIMATIC Knowledge Manager. The Siemens SIMATIC KnowledgeManager supports people within the customer support group of the department Auto-mation and Drives. This knowledge manager uses textual CBR to answer questionsposed by customers. The CBR approach reuses knowledge compiled in frequentlyasked questions as well as other types of documents. The current knowledge base thatwe assessed here contains about 25,000 documents. For the experiments, each docu-ment forms a case as a set of information entities constructed during the knowledge-acquisition process.

Review and Restore for Case-Base Maintenance 229

The Central IT User HelpDesk at Debis SH. The central IT user help-desk at debisSH is one of the typical customer support sites where IT end users call the help-deskstaff in case of any IT-related problem. Such problems cover all aspects of IT applica-tions across hardware and software issues as well as other challenges that occur withinthat context. The current case base that we used for the experiments reported herecontains approximately 1100 cases represented as dialogues of questions and answers.

A Selection of UCI Data Sets. In addition to the two industrial domains, we alsoselected nine different data sets from the UCI machine-learning repository with differ-ent data characteristics. For simplicity, we preprocessed numeric attributes by a standardequal-width discretization approach.

7.2. Experimental Design

In this article, we report on two classes of experiments. Within the first set of experi-ments with the two real-world domains at Siemens and debis SH, we took the case baseas it is; computed the different degrees of consistency, uniqueness, minimality, andincoherence; and finally presented these values along with conflicting cases to domainexperts. Note that we did not perform any experiments with correctness because correct-ness automatically leads to the question of solution quality, which we did not considerin this article.

Within the second set of experiments using the UCI data sets, we conducted morecomplex tests simulating review and restore for CBM as part of classification tasks(Iglezakis and Anderson 2000). Therefore, we initially split the original data sets intofive folds to perform five-fold cross-validation. In each run, four of the five folds formthe case base, whereas the remaining one represents the test cases. As the classificationalgorithm, we applied a basic nearest-neighbor algorithm. The benchmark result wasthen the classification accuracy of the entire case base averaged with five-fold cross-validation.

In order to test the effects of the review and restore steps, we first computed thequality of the different case bases according to the specified degrees of case properties.Then we performed the following optimization process using the remove operator torestore the case base. We built a new case base for each property separately by startingwith an empty case base, adding the first case to it without any concern, and thenfor each subsequent case we checked the entire current case base to see whether thereexists any case in the case base that conflicts with the new case according to the concreteproperty. If there exist conflicting cases we removed them from the case base, whereasthe new case was added to the case base. We decided to focus evaluation of the restorestep on the modify operator restore because this operator is the only operator thatis helpful in each situation of conflicting cases (see Table 2). In the end, the remainingoptimized case base was used to classify the test set again, and we compared accuraciesof the raw case base and the optimized one.

7.3. Experimental Results

Table 3 summarizes experimental results of the different types of experiments.The columns show the case-base name (C), the values for the degrees of consistency(D2�C�), uniqueness (D3�C�), minimality (D4�C�), and incoherence with �= 1 (D5�C�),as well as the case-base size without (| C |) and with optimization (| C∗

x |) and classifi-cation accuracies (A�C� and A�C∗

x�, respectively) if applicable.

230 Computational Intelligence

Table 3. Experimental Results in different Real–World Domains

| C | | C∗2 | | C∗

3 | | C∗4 | | C∗

5 |C D2�C� D3�C� D4�C� D5�C�

A�C� A�C∗2 � A�C∗

3 � A�C∗4 � A�C∗

5 �25551 — — — —

Siemens 0�54 1�00 1�00 1�00— — — — —

1129 — — — —debis SH 0�82 0�99 0�99 0�59

— — — — —

637�6 625�2 574�8 551�2 323�4Annealing 0�95 0�82 0�78 0�23

0�97 0�97 0�97 0�97 0�97

160�0 160�0 145�2 156�6 138�2Audiology 1�00 0�83 0�98 0�77

0�70 0�70 0�69 0�70 0�71

552�0 550�6 548�6 552�0 503�4Australian 0�99 0�99 1�00 0�83

0�81 0�81 0�81 0�81 0�79

Credit 552�0 550�6 549�0 551�4 498�80�99 0�99 0�99 0�83

scoring 0�79 0�80 0�79 0�79 0�79

404�8 404�2 401�4 404�8 390�2Housing 0�99 0�98 1�00 0�93

0�31 0�31 0�30 0�31 0�30

614�4 614�4 612�0 614�4 569�8Pima 1�00 0�99 1�00 0�86

0�65 0�65 0�65 0�65 0�65

245�6 244�4 243�2 245�0 212�8Soybean 0�99 0�98 0�99 0�74

0�92 0�92 0�92 0�92 0�91

Voting 348�0 216�4 280�6 119�2 191�20�45 0�71 0�23 0�45

records 0�93 0�91 0�93 0�91 0�92

80�8 80�8 50�4 80�8 30�4Zoo 1�00 0�44 1�00 0�17

0�95 0�95 0�95 0�95 0�92

Within the Siemens and debis SH domains, the degrees of consistency, uniqueness,minimality, and incoherence proved their purpose in that there are in fact cases thatviolate the desired properties. In both domains, it is interesting to see that the mainquality problems in the case base result from conflicts according to consistency or inco-herence. Moreover, when we presented these values along with the conflicting casesto the domain experts, they saw the responsible problems in the case base and agreedto the need for modifications to improve the quality of the case base. For example,in the debis SH domain, we detected several pairs of cases with very similar problemdescriptions but with different solutions that normally resulted from identical cases fordifferent customers in different parts of the case base. Only in a few exceptional situa-

Review and Restore for Case-Base Maintenance 231

tions were such cases customer-dependent—in most of these situations it was posssibleto define more general cases that abstract from the specific customer and that holdacross all customers with this concrete problem.

However, we identified potentials for improvement of the proposed concepts in caseof textual CBR because we expect more appropriate results if the definitions of thecase properties were less syntactic but more semantic and additionally took text anal-ysis capabilities into account. In the specific Siemens domain presented here, we alsoencountered a specific challenge of how to set the solution component that results fromthe fact that the case base in this domain contains different document types. Therefore,we decided to take the solution identifier as a representation of the solution. Since allidentifiers are unique, this means that for all pairs of cases, the solution componentsdiffer. This is the reason why D3�C�, D4�C�, and D5�C� are all equal to 1.00.

For all the nine UCI data sets, Table 3 again shows that the proposed quality mea-sures are able to point to existing problems in the case base. Specific values vary acrossthe different data characteristics but usually indicate quality deficiencies for at least twoof the case properties. Usually, we observe minimum values for the incoherence prop-erty except for the Voting Records domain, where the minimality property results in thelowest degree. In this specific domain, we had to deal with many unknown or missingvalues. Hence it is likely that there exist many pairs of cases where one case subsumesthe other according to some unspecified values in the case representation.

For the restore step, we see that the optimization with operator remove usuallyresults in smaller case bases while mostly preserving classification accuracies. On aver-age, we also recognize that the higher the potential for optimization is (indicated bylower degree values), the smaller is the resulting case base after optimization. Hencethe order of magnitude of the proposed quality measures is also appropriate to predictthe number of cases that we can save without losing classification quality.

In conclusion, the experiments and their results show that the proposed conceptsand methods for CBM are able to point to quality problems in different types of real-world case bases independent of the CBR system used. Moreover, the experimentsalso indicate that the modify operators are an instrument to optimize the quality ofcase bases, again applicable in different types of domains and for varying sorts of CBRsystems.

8. RELATED WORK

The definition of the two additional steps review and restore in the maintenancephase of the CBR cycle presented in this article is the consequence of consolidatingprevious discussions and efforts to enhance the existing standard CBR process towardmaintenance capabilities. All previous efforts recognize that the first three steps in thecurrent CBR cycle sufficiently cover tasks for using a CBR system to solve problems innovel situations but that the retain step is not adequate to keep the CBR system in ausable state over time.

For example, in terms of the framework for describing CBM policies (Leake andWilson 1998), the two novel steps review and restore roughly refer to the triggering andexecution component of the framework, respectively. The defined quality measures,along with the exemplified monitor operators, serve as instances of the timing dimen-sion, whereas the described modify operators are specific examples of maintenanceoperations that are executed during the maintenance phase in CBR.

232 Computational Intelligence

The conclusion of discussions in the workshop “Automating the Construction ofCase-Based Reasoners” at IJCAI in 1999 (Watson 1999) also was to propose two newsteps: review and reflect. Their review step is inserted before the retain step and coverstasks to assess the quality of a single case before adding the case to the case base.Thus this review step ensures that only interesting and valuable cases are stored. Thesecond additional step in their proposal reflects on feature weights, similarity metrics,case coverage, etc. In comparison with our approach, their review step is much morerestrictive because it only considers single cases without their relation to other cases inthe case base, whereas their reflect step corresponds more to our review step. However,we are not aware of any research that follows up these ideas and develops methods toimplement these two steps.

Another example to extend the CBR cycle to deal with maintenance issues is pre-sented by Goker and Roth-Berghofer (1999). They propose to add a second cycle to theexisting four steps. They define an application cycle and a maintenance cycle similar tothe two phases that we separate in this article. The application cycle contains the stepsretrieve, reuse, revise, and recycle, whereas the maintenance cycle consists of the stepsretain and refine. In their work, they argue that the retain task covers review activtities,but again only for single cases. Their refine step is similar to the reflect step above plusactivities that we describe in the restore step. Again, measures to assess the case-basequality and operations to refine it are not specified in detail.

Beyond related work on extensions of the standard four–step CBR cycle, there alsoexist approaches to come up with specific concepts to support the review and restoresteps for CBM. Most of them rely on specific measures to evaluate characteristics ofthe case base and ad hoc methods to change the case-base contents if necessary.

Racine and Yang (1996, 1997) define inconsistency and redundancy measures formaintenance of large and unstructured case bases. They distinguish intra-case and inter-case measures that depend on the availability and use of background knowledge. Intheir 1997 article, they suggest several criteria such as revision effort, retrieval cost,relevancy, and abstractness to evaluate case bases. Focusing on redundancy and incon-sistency detection, Racine and Yang (2000) extend their research to large and semistruc-tured case bases.

Aha (1991) uses performance measures to drive different case-addition strategies(CBL1–4). Furthermore, Aha and Breslow (1997) use guidelines from CBR softwarevendors to optimize conversational CBR systems.

Smyth and Keane (1995) optimize the contents of the case base to preserve itscompetence by deleting cases based on the measures coverage and reachability. Theydefine coverage of a case as the set of all problems in the problem space that can besolved by this case through adaptation and specify reachability of a case as the set ofall cases that are used to solve this case through adaptation. Because coverage andreachability are not computable in general, they suggest to use heuristics based on theassumption that the problem distribution in the case base is representative.

Smyth and McKenna (1998) suggest additional measures called case density andgroup density to support the modeling of competent CBR systems. They also describea case-authoring system that provides visual feedback on the competence of individualcases and groups of cases.

Another strategy of competence preserving is presented by Zhu and Yang (1999).The advantage of their algorithm is that they are able to place a lower bound on thecompetence of the resulting case base. Their strategy is a case-addition rather than acase-deletion policy as defined by Smyth and Keane.

Review and Restore for Case-Base Maintenance 233

Finally, one of the few references that describe concrete operations to deal with themaintenance problem in CBR presents potential modify operators for the vocabularyknowledge container (Heister and Wilke 1998) rather than for the case-base knowledgecontainer as proposed in this article.

All in all, this related work on concepts for review and restore mainly aims atmuch more semantic approaches that involve use of background knowledge, whereasour approach attempts to define more syntactical and domain-independent measuresand operations. In this respect, we view existing work toward review and restore forCBM as domain-dependent example specializations of the general concepts defined inthis article, and we argue that the proposed six–step CBR cycle and its methods opena broader perspective for CBM in general.

9. CONCLUDING REMARKS

In summary, we considered CBM as one of the most important issues in currentCBR research and suggested two additional steps, review and restore, within the main-tenance phase of the CBR process. For the review step, we proposed several qualitymeasures based on case and case-base properties as well as a set of example moni-tor operators to control the quality of the case base. For the restore step, we defineddifferent modify operators and discussed their relation to the review step. The initialexperimental evaluation shows that the proposed concepts and methods are promisingfor different types of application domains using different sorts of CBR systems.

Our suggestions for future work point in several directions. First, we plan to furtheranalyze the specified mechanisms, their relation to each other, and how they relate tocase-base characteristics. Second, we plan to enhance the current mechanisms for mon-itoring and modifying the case-base quality in several respects. For example, the currentdefinition of quality measures considers local similarity only as a matching function—forthis aspect we also plan to extend our definitions to other, more fine-grained local sim-ilarity measures. Another example are enhancements that specifically take into accountthe nature of textual CBR, for instance, and that define specialized concepts and meth-ods for this purpose. Moreover, extensions to other knowledge containers beyond thecase base are an important issue for future work, too. Here we also think of moregeneral concepts using visualization and data-mining techniques.

Finally, another important aspect is continuation of the evaluation and further anal-yses of the experimental results. In particular, we have to conduct more experiments totest the effectiveness of all modify operators not yet analyzed within the studies pre-sented here. All these studies and analyses will help to make the proposed frameworkand its methods a powerful tool for CBR engineers in practice to maintain their casebases and to improve their performance daily.

REFERENCES

Aamodt, A., and E. Plaza. 1994. Case-based reasoning: Foundational issues, methodological varia-tions, and system approaches. AI communications 7(1):39–59.

Aha, D. W. 1991. Case-based learning algorithms. In Proceedings of the DARPA Case-Based Rea-soning Workshop, Morgan Kaufmann, San Mateo, CA, pp. 147–158.

Aha, D. W., and L. A. Breslow. 1997. Refining conversational case libraries. In Proceedings of theSecond International Conference on Case-Based Reasoning. Springer-Verlag, Berlin, pp. 267–278.

Goker, M., and T. Roth-Berghofer. 1999. The development and utilization of the case-based help-desk support system HOMER, Engineering Applications of Artificial Intelligence, 12(6):665–680.

234 Computational Intelligence

Heister, F., and W. Wilke. 1998. An architecture for maintaining case-based reasoning systems. InProceedings of the European Workshop on Case-Based Reasoning (EWCBR). Springer-Verlag,Berlin, pp. 221–232.

Iglezakis, I., and C. E. Anderson. 2000. Towards the use of case properties for maintaining casebased reasoning systems. In Proceedings of the Pacific Rim Knowledge Acquisition Workshop(PKAW). University of New South Wales, pp. 135–146.

Iglezakis, I., and T. Roth-Berghofer. 2000. A survey regarding the central role of the case base formaintenance in case-based reasoning. Proceedings of the ECAI Workshop on Flexible Strategiesfor Maintaining Knowledge Containers. Humboldt University, Berlin, pp. 22–28.

Leake, D. B., and D. C. Wilson. 1998. Categorizing case-base maintenance: Dimensions and direc-tions. In Proceedings of EWCBR-98, Advances in Case-Based Reasoning, Springer-Verlag, Berlin,pp. 196–207.

Lenz, M., B. Bartsch-Sporl, H.-D. Burkhard, and S. Wess. 1998. Case-Based Reasoning Technol-ogy: From Foundations to Applications, Springer-Verlag, Berlin.

Racine, K., and Q. Yang. 1996. On the consistency management of large case bases: the case forvalidation. In Proceedings of the AAAI-96 Workshop on Knowledge Base Validation, AmericanAssociation for Artificial Intelligence (AAAI), New York, pp. 84–90.

Racine, K., and Q. Yang. 1997. Maintaining unstructured case bases. In Proceedings of the 2nd Inter-national Conference on Case-Based Reasoning (ICCBR), Springer-Verlag, Berlin, pp. 553–564.

Racine, K., and Q. Yang. 2000. Redundancy and inconsistency detection in large and semistructuredcase bases. IEEE Transactions on Knowledge and Data Engineering (in press).

Reinartz, T., I. Iglezakis, and T. Roth-Berghofer. 2000. On quality measures for case base main-tenance. In Proceedings of the 5th German Workshop on Case-Based Reasoning, Springer-Verlag,Berlin, pp. 247–259.

Richter, M. M. 1995. The Knowledge Contained in Similarity Measures. Invited talk at the 1stInternational Conference on Case-Based Reasoning (ICCBR), Sesimbra, Portugal.

Roth-Berghofer, T., and I. Iglezakis. 2000. Developing an integrated multilevel help-desk sup-port system. In Proceedings of the 8th German Workshop on Case-Based Reasoning (GWCBR).Daimler-Chrysler, Ulm, Germany, pp. 145–155.

Smyth, B., and M. T. Keane. 1995. Remembering to forget: A competence-preserving deletion policyfor case-based reasoning systems. In Proceedings of the 14th International Joint Conference onArtificial Inteligence. Morgan Kaufmann, San Mateo, CA, pp. 377–382.

Smyth, B., and E. McKenna. 1998. A portrait of case competence: Modelling the competence ofcase-based reasoning systems. In Proceedings of the 4th European Workshop on Case-BasedReasoning, Springer-Verlag, Berlin, 208–220.

Watson, I. 1999. Report of the workshop on automating the construction of case-based reasonersat the 16th International Joint Conference on Artificial Intelligence (IJCAI 1999).http://www.ai-cbr.org/ijcai99/workshop.html.

Zhu, J., and Q. Yang. 1999. Remembering to add: Competence-preserving case addition policiesfor case base maintenance. In Proceedings of the International Joint Conference in ArtificialIntelligence (IJCAI). Morgan Kaufmann, San Mateo, CA, pp. 234–239.