12
Information Systems 29 (2004) 47–58 Ontology-based distributed autonomous knowledge systems Zbigniew W. Ra! s a,b, *, Agnieszka Dardzi ! nska c a Department of Computer Science, University of North Carolina, Charlotte, NC 28223, USA b Polish Academy of Sciences, Institute of Computer Science, Ordona 21, 01-237 Warsaw, Poland c Department of Mathematics, Bialystok University of Technology, Wiejska 45 A, 15-351 Bialystok, Poland Abstract Traditional query processing usually requires that users fully understand the database structure and content to issue a query. Due to the complexity of the database applications and the variety of user needs, the so-called global queries are introduced which traditional query answering systems cannot handle. Query posed to a database D is global if minimum one of its attributes is missing in D while it occurs in other databases. Definitions of a missing attribute in D can be extracted from other databases and shared with D: To handle semantics inconsistencies between the same attributes used at different sites, task ontologies are used as a communication bridge between them. These inconsistencies can be caused either by different granularity levels or by different interpretations of the same attribute. As the final outcome of this research, a rough query answering system based on distributed data mining is presented. r 2003 Elsevier Ltd. All rights reserved. Keywords: Distributed autonomous knowledge systems; Query processing; Knowledge mining; Ontologies; Semantics inconsistencies 1. Introduction In many fields, such as medical, banking and educational, similar databases are kept at many sites. An attribute may be missing in one database, while it occurs in many others. Databases are often incomplete which means that queries can be either answered approximatively (for instance in a rough sets framework) or to retrieved objects some weights are assigned. Each query can be answered in many different ways, depending on the inter- pretation of incomplete values. In the most general scenario, an answer to a query submitted to an information system (see [1,2]) is a set of objects certainly satisfying the query, and a set of objects possibly satisfying the query. Different interpreta- tions may assign different values to possible objects showing the confidence of the query answering system in each object returned as an answer to the query. These values can be calculated either using information on frequency of appearance of some attribute values in the information system or using local and global KDD methods which can predict what attribute values should replace the missing values of incomplete attributes and what weights should be assigned to them. Missing or incomplete attributes lead to pro- blems when answering queries. For example, a user may issue a query to a local database S in ARTICLE IN PRESS *Corresponding author. Department of Computer Science, University of North Carolina, 9201 University City Blvd., Charlotte, NC 28223, USA. Tel.: +1-704-687-4567. E-mail addresses: [email protected] (Z.W. Ra! s), [email protected] (A. Dardzi ! nska). URLs: http://www.cs.uncc.edu/ras, http://www.coe.uncc.edu/adardzin. 0306-4379/04/$ - see front matter r 2003 Elsevier Ltd. All rights reserved. doi:10.1016/S0306-4379(03)00033-4

Ontology-based distributed autonomous knowledge systems

Embed Size (px)

Citation preview

Page 1: Ontology-based distributed autonomous knowledge systems

ARTICLE IN PRESS

Information Systems 29 (2004) 47–58

*Correspond

University of

Charlotte, NC

E-mail addr

adardzin@uncc

URLs: http

http://www.coe

0306-4379/04/$

doi:10.1016/S03

Ontology-based distributed autonomous knowledge systems

Zbigniew W. Ra!sa,b,*, Agnieszka Dardzi !nskac

aDepartment of Computer Science, University of North Carolina, Charlotte, NC 28223, USAbPolish Academy of Sciences, Institute of Computer Science, Ordona 21, 01-237 Warsaw, Poland

cDepartment of Mathematics, Bialystok University of Technology, Wiejska 45 A, 15-351 Bialystok, Poland

Abstract

Traditional query processing usually requires that users fully understand the database structure and content to issue a

query. Due to the complexity of the database applications and the variety of user needs, the so-called global queries are

introduced which traditional query answering systems cannot handle. Query posed to a database D is global if

minimum one of its attributes is missing in D while it occurs in other databases. Definitions of a missing attribute in D

can be extracted from other databases and shared with D: To handle semantics inconsistencies between the sameattributes used at different sites, task ontologies are used as a communication bridge between them. These

inconsistencies can be caused either by different granularity levels or by different interpretations of the same attribute.

As the final outcome of this research, a rough query answering system based on distributed data mining is presented.

r 2003 Elsevier Ltd. All rights reserved.

Keywords: Distributed autonomous knowledge systems; Query processing; Knowledge mining; Ontologies; Semantics inconsistencies

1. Introduction

In many fields, such as medical, banking andeducational, similar databases are kept at manysites. An attribute may be missing in one database,while it occurs in many others. Databases are oftenincomplete which means that queries can be eitheranswered approximatively (for instance in a roughsets framework) or to retrieved objects someweights are assigned. Each query can be answeredin many different ways, depending on the inter-

ing author. Department of Computer Science,

North Carolina, 9201 University City Blvd.,

28223, USA. Tel.: +1-704-687-4567.

esses: [email protected] (Z.W. Ra!s),

.edu (A. Dardzi !nska).

://www.cs.uncc.edu/ras,

.uncc.edu/adardzin.

- see front matter r 2003 Elsevier Ltd. All rights rese

06-4379(03)00033-4

pretation of incomplete values. In the most generalscenario, an answer to a query submitted to aninformation system (see [1,2]) is a set of objectscertainly satisfying the query, and a set of objectspossibly satisfying the query. Different interpreta-tions may assign different values to possible objectsshowing the confidence of the query answeringsystem in each object returned as an answer to thequery. These values can be calculated either usinginformation on frequency of appearance of someattribute values in the information system or usinglocal and global KDD methods which can predictwhat attribute values should replace the missingvalues of incomplete attributes and what weightsshould be assigned to them.Missing or incomplete attributes lead to pro-

blems when answering queries. For example, auser may issue a query to a local database S in

rved.

Page 2: Ontology-based distributed autonomous knowledge systems

ARTICLE IN PRESS

Z.W. Ra!s, A. Dardzi !nska / Information Systems 29 (2004) 47–5848

search for its objects that match a desireddescription, only to realize that attribute a1 usedin that description is either missing or vastmajority of its values in S is incomplete so thatthe query cannot be answered. But definitions ofa1 may be extracted from databases at remote sitesand used to identify objects in S having propertiesrelated to a1 as proposed in [14–16]. The simplicityof this approach is no longer in place when thesemantics of terms used to describe objects at aclient and remote sites differ. Sometime, such adifference in semantics can be repaired quite easily.For instance if ‘‘Temperature in Celsius’’ is used atone site and ‘‘Temperature in Fahrenheit’’ at theother, a simple mapping will fix the problem. Ifdatabases are complete and two attributes have thesame name and differ only in their granularitylevel, a new hierarchical attribute can be formed tofix the problem. If databases are incomplete, theproblem is more complex because of the numberof options available to interpret incomplete values(including null values). The problem is especiallydifficult in a distributed framework when rule-based chase techniques, driven by rules extractedat remote sites, are used by a client site to replacenull values by values which are less incomplete.The notion of an intermediate model, proposed

by Maluf and Wiederhold [3], is very useful to dealwith heterogeneity problem, because it describesthe database content at a relatively high abstractlevel, sufficient to guarantee homogeneous repre-sentation of all databases. Knowledgebases builtjointly with task ontologies proposed in our paper,can be used for a similar purpose. They containrules extracted from databases at remote sites.These rules are seen as definitions of their decisionvalues. Sometime these definitions are inconsistentso some form of a consensus has to be reached.Algorithm addressing this issue was proposedby Ras [4].In this paper, the heterogeneity problem [13,17]

is introduced from the query answering point ofview. Query answering system linked with a clientsite transforms, so-called, global queries into localqueries for a client site using task ontologies anddefinitions extracted at remote sites. These defini-tions may have so many different interpretationsas the number of remote sites used to extract them.

This paper will mainly focus on different inter-pretations of queries due to different interpreta-tions of null values and also due to granularitylevels of a given attribute which may easily varyfrom site to site. Information stored in taskontologies is used to find so-called rough inter-pretations representing consensus of all involvedsites.

2. Distributed information systems

In this section, we recall the notion of adistributed information system and a knowledge-base for a client site formed from rules extracted atremote sites. We introduce the notion of localqueries and give example of their local semantics.By an information system we mean S ¼

ðX ;A;V Þ; where X is a finite set of objects, A isa finite set of attributes, and V ¼

SfVa : aAAg is a

set of their values. The set Va is called the domainof attribute a: We assume that

* Va;Vb are disjoint for any a; bAA such thataab;

* a :X-2Va � f|g is a function for every aAA:

Instead of a; to simplify the notation, we may usea½S� to denote that a is an attribute in S: Sayingthat object x has a property aðxÞ ¼fw1;w2;y;wkg; we mean that only one of thevalues in aðxÞ is a property of x but we do notknow which one. Also, the empty set is removedfrom the codomain of a since the null value isrepresented here by Va:By a distributed information system we mean a

pair DS ¼ ðfSigiAI ;LÞ; where

* I is a set of sites.* Si ¼ ðXi;Ai;ViÞ is an information system forany iAI ;

* L is a symmetric, binary relation on the set I

showing which systems can directly commu-nicate with each other.

A distributed information system DS ¼ðfSigiAI ;LÞ is consistent if the following conditionholds:

ð8iÞð8jÞð8xAXi-XjÞð8aAAi-AjÞ

½ða½Si �ðxÞDa½Sj �ðxÞÞ or ða½Sj �ðxÞDa½Si �ðxÞÞ�:

Page 3: Ontology-based distributed autonomous knowledge systems

ARTICLE IN PRESS

Z.W. Ra!s, A. Dardzi !nska / Information Systems 29 (2004) 47–58 49

Consistency basically means that informationabout objects in one of the systems is more generalthan in the other. Saying another words, twoconsistent systems cannot have conflicting infor-mation about any object x which is stored in bothof them.Our next step is to introduce the notion of a

knowledgebase containing definitions of attributevalues in terms of other attribute values. Thesedefinitions have a form of association rules withonly one decision value. Each information systemSi; iAI ; is coupled with a knowledgebase containingdefinitions of some foreign and some local attributevalues for Si: These definitions are used by a queryanswering system to replace global queries by localqueries so the range of queries the system cansuccessfully handle will significantly increase.Let Sj ¼ ðXj ;Aj ;VjÞ be an information system at

site j; for any jAI : In the remainder of this paperwe assume that Vj ¼

SfVja : aAAjg:

From now on, in this section, we use A to denotethe set of all attributes in DS; A ¼

SfAj : jAIg:

Also, by V we meanSfVj : jAIg:

We begin with a definition of sðiÞ-terms andtheir standard interpretation Mi in DS ¼ðfSjgjAI ;LÞ; where Sj ¼ ðXj ;Aj ;VjÞ and Vj ¼SfVja : aAAjg; for any jAI :By a set of sðiÞ-terms (also called a set of local

queries for site i) we mean a least set Ti such that

* 0; 1ATi;* wATi and BwATi for any wAVi;* if t1; t2ATi then ðt1 þ t2Þ; ðt1 *t2ÞATi:

The negation is used only at the level of atomicterms in the definition of sðiÞ-terms which is due tothe incompleteness of information systems. If weallow to build queries (terms) without this restric-tion, then the negation would have to be moved tothe level of atomic terms, by the query answeringsystems, before queries are processed. In order todo that, De Morgan’s laws have to be used butthey do not hold for incomplete informationsystems for most of the semantics including theoptimistic semantics which is used in this paper asstandard one. Also, it has to be noticed thatattributes in the definition of Si; iAI ; are nothierarchical. The definition of formulas, given

below, is reduced to atomic formulas since, in thispaper, only such formulas need to be investigated.By a set of sðiÞ-formulas we mean a least set Fi

such that

* if t1; t2ATi; then ðt1 ¼ t2ÞAFi and ðt1pt2ÞAFi:

Definition of DS-terms (called global queries) andDS-formulas is quite similar to the definition ofsðiÞ-terms and sðiÞ-formulas (we only replace Ti bySfTi : iAIg and Fi by F ).We say that

* sðiÞ-term t is primitive if it is of the formQft : ððt ¼ wÞ3ðt ¼ BwÞÞ4ðwAUiÞg for any

UiDVi;* sðiÞ-term is in disjunctive normal form (DNF) if

t ¼P

ftj : jAJg where each tj is primitive.

Similar definitions can be given for DS-terms.There is no need to give an example of a local

query. The expression:

select * from Flights

where airline ¼ ‘‘LOT ’’

and departure time ¼ ‘‘morning’’

and departure airport ¼ ‘‘Warsaw’’

and aircraft ¼ ‘‘Boeing’’

is an example of a non-local query (because ofattribute aircraft) in a database

Flights(airline, departure time, arrival time,departure airport, arrival airport).Semantics of sðiÞ-terms and sðiÞ-formulas (local

queries for the site i) can be defined as theinterpretation Mi; called standard, in a distributedinformation system DS ¼ ðfSjgjAI ;LÞ as follows:

* Mið0Þ ¼ |; Mið1Þ ¼ Xi;* MiðwÞ ¼ fxAXi : wAaðxÞg for any wAVia;* MiðBwÞ ¼ fxAXi : weaðxÞg for any wAVia;* if t1; t2 are sðiÞ-terms, then

Miðt1 þ t2Þ ¼ Miðt1Þ,Miðt2Þ;

Miðt1 *t2Þ ¼ Miðt1Þ-Miðt2Þ;

Miðt1¼t2Þ¼ðif Miðt1Þ¼Miðt2Þ then T else F Þ;

Miðt1pt2Þ ¼ ðif Miðt1ÞDMiðt2Þ then T else F Þ;

where T stands for True and F for False:

Clearly, Mi is an optimistic interpretation of sðiÞ-terms which means that predicates f¼;pg should

Page 4: Ontology-based distributed autonomous knowledge systems

ARTICLE IN PRESS

Z.W. Ra!s, A. Dardzi !nska / Information Systems 29 (2004) 47–5850

be read as possibly equal and possibly included,correspondingly. If the set of sðiÞ-formulas isreduced only to equality formulas, the sound andcomplete axiomatization, of the above semantics,for a complete distributed information system wasgiven in paper by Ras [5]. This semantics wasextended to handle global queries in completedistributed information systems (see [5]). Forincomplete distributed information systems, adifferent interpretation of sðiÞ-terms and sðiÞ-formulas was proposed by Ras and Joshi [6]. Itssound and complete axiomatization was also givenby Ras and Joshi [6].We can propose a number of different inter-

pretations for sðiÞ-terms and the same forsðiÞ-formulas when information systems are in-complete. For example, let us assume thatinformation system Si is represented by Table 1.To propose an interpretation of sðiÞ-terms, we

have to decide first how to interpret the attributevalues in Si: For instance, let us take the valuea1AVa: Three possible interpretations J1; J2; J3 ofa1 are given below:

* J1ða1Þ ¼ fðx1; 1Þ; ðx6; 1Þ; ðx2; 1=3Þ; ðx3; 1=3Þg;* J2ða1Þ ¼ fðx1; 1Þ; ðx6; 1Þg;* J3ða1Þ ¼ fðx1; 1Þ; ðx6; 1Þ; ðx2; 1=2Þ; ðx3; 1=2Þg;

Confidence 1=3 assigned to objects x2; x3 in theinterpretation J1 is justified by the fact that 1=3 ofthe objects in Si; for which the value of theattribute a is known, have property a1:Interpretation J2 is a pessimistic one so it

assumes that the missing values, in the column ofSi corresponding to attribute a; are not equalto a1:

Table 1

Information system Si

Xi a b c d e f

x1 a1 c2 d1 e1 f 1

x2 b1 c1 d1 e1 f 1

x3 b1 c2 d2 e1 f 1

x4 a2 b1 f 2

x5 a2 b2 d2 f 2

x6 a1 c1 d1 e1 f 2

x7 a2 b2 c2 d2 f 2

x8 a2 b2 c2 e1 f 1

Interpretation J3 checks the values of the objectsfrom J2ða1Þ in the column of Si corresponding toattribute f : Since 1=2 of the objects in J2ða1Þ havethe property f 1 which is a property of both objectsx2; x3; then the confidence 1=2 is assigned to x2; x3:After the interpretation of values of attributes in

Si is given, we have to decide what is the meaningof functors þ; * ;B in Si: Again, we can propose anumber of interpretations for þ; * ;B: For in-stance, the interpretation J1; defined already foratomic terms, can be extended as (we give only anexample)

* J1ða1*b1Þ ¼ J1ða1Þ#J1ðb1Þ ¼ fðx1; 1Þ; ðx2; 1=3Þ;ðx3; 1=3Þ; ðx6; 1Þg#fðx1; 1=2Þ; ðx2; 1Þ; ðx3; 1Þ;ðx4; 1Þ; ðx6; 1=2Þg ¼ fðx1; 1 � 1=2Þ; ðx2; 1=3 � 1Þ;ðx3; 1=3 � 1Þ; ðx6; 1 � 1=2Þg ¼ fðx1; 1=2Þ; ðx2; 1=3Þ;ðx3; 1=3Þ; ðx6; 1=2Þg:

* J1ða1þ b1Þ ¼ J1ða1Þ"J1ðb1Þ¼fðx1; 1Þ; ðx2; 1=3Þ;ðx3; 1=3Þ; ðx6; 1Þg"fðx1; 1=2Þ; ðx2; 1Þ; ðx3; 1Þ;ðx4; 1Þ; ðx6; 1=2Þg ¼ fðx1;maxð1; 1=2ÞÞ;ðx2;maxð1=3; 1ÞÞ; ðx3;maxð1=3; 1ÞÞ;ðx6;maxð1; 1=2ÞÞg ¼ fðx1; 1Þ; ðx2; 1Þ; ðx3; 1Þ;ðx6; 1Þg:

* J1ðBa1Þ ¼ fðx2; 2=3Þ; ðx3; 2=3Þ; ðx4; 1Þ; ðx5; 1Þ;ðx7; 1Þ; ðx8; 1Þg:

Clearly, this example shows that sðiÞ-terms mayhave many different interpretations at site iAI : Ina distributed scenario, if we want to talk aboutknowledge exchange between sites, we have tobuild a bridge between them so information aboutwhat semantics they use and what is the relation-ship between these semantics can be stored. In thispaper, we suggest that task ontologies can be usedto build such a bridge between sites of adistributed information system.Now, we introduce the notion of ðk; iÞ-rules, for

any iAI : The proposed definition is generalenough to cover rules extracted from an informa-tion system at one of the sites of DS and next usedthem by its other sites either to build theirknowledgebases or just use as definitions to handleglobal queries submitted to them.By ðk; iÞ-rule in DS ¼ ðfSjgjAI ;LÞ; k; iAI ; we

mean a triple ðc; t; sÞ such that

* cAVk � Vi;

Page 5: Ontology-based distributed autonomous knowledge systems

ARTICLE IN PRESS

x1

x2

x3

x4

a1 b1 d2

a2 b2 c2

b1 c1 d1

b1 c2

Information System

a1*b2 --> f2

a2*d1 --> f1

d2 --> f2

rules conf

1

1

1

Knowledge Base

Site 1

Query Answering System

Client Server

List of Servers

A B C D

x1

x5

x6

x7

a1 b1 g2

a2 b1 e1

b2 e3 g2

b1 e1

Information System

Knowledge Base

Site 2

Query Answering System

Client Server

List of Servers

A B E G

Ontologies

Server:Extracts rules describing G interms of A, B.[a2 --> g1]

Sends back theresult torequesting site.

User:Find objectsat Site 1satisfyingq=(a1*c2)+(f2*g1)

g1

a1

Findg1: A,B,C,D;F

Fig. 1. Sample model of DAKS.

Z.W. Ra!s, A. Dardzi !nska / Information Systems 29 (2004) 47–58 51

* t; s are sðkÞ-terms in DNF and they both belongto Tk-Ti;

* MkðtÞDMkðcÞDMkðt þ sÞ:

Any ðk; iÞ-rule ðc; t; sÞ in DS can be seen as adefinition of c which is extracted from Sk but canbe used in Si because all attribute values occurringin the definition of c are local for Si:For any ðk; iÞ-rule ðc; t; sÞ in DS ¼ ðfSjgjAI ;LÞ;

we say that

* ðt-cÞ is a weakly k-certain rule in DS;* ðt þ s-cÞ is a weakly k-possible rule in DS:

The term weakly is used here because of theinterpretation Mi which assigns the set of objectswhich might have property t to any term t: Clearly,if the system Si is becoming less incomplete, thenthe term weakly is less suitable here. For acomplete system, the rule ðt-cÞ is a k-certain rule.Now, we are ready to define a knowledgebase

Dki: Its elements are called definitions of values ofattributes from Vk � Vi in terms of values ofattributes from Vk-Vi:Namely, Dki is defined as a set of ðk; iÞ-rules such

thatif ðc; t; sÞADki; then ð(t1Þ½ðBc; t1; sÞADki�:The idea here is to have definition ofBc in Dki;

if already definition of c is in Dki: It will allow us toapproximate (learn) concept c from both sites, ifneeded.By a knowledgebase for site i; denoted by Di; we

mean any subset ofSfDki : ðk; iÞALg: If definitions

are not extracted at a remote site, partial defini-tions ðc; tÞ corresponding to ðt-cÞ; if available, canbe stored in a knowledgebase at a client site.For simplicity reason, we did not talk here about

possible differences in semantics between sites k; iAI :If these semantics differ, then rules in the knowl-edgebase Dki have to be modified using semanticalgraph, in a task ontology for DS; which shows allrelationships between semantics used in DS:

3. Semantic inconsistencies and distributed

autonomous knowledge systems

In this section, we introduce the notion of adistributed autonomous knowledge system

(DAKS) and next present problems related to theconstruction of its query answering system QAS.We discuss the process of handling semanticsinconsistencies in knowledge extracted at differentsites of DAKS. Next, we outline the transforma-tion steps for global queries so some of theiratomic terms can be replaced by subtermsextracted at different sites of DAKS by knowledgediscovery techniques. The goal of the querytransformation process is to replace a global queryby a local one for a site specified by a user. Thisway, the global query can be processed locally.Also, we show how global queries can beprocessed if two sites involved in a query resolu-tion use different granularity levels for the sameattribute.By DAKS we mean DS ¼ ðfðSi;DiÞgiAI ;LÞ

where ðfSigiAI ;LÞ is a distributed informationsystem, Di ¼

SfDki : ðk; iÞALg is a knowledgebase

for iAI :Fig. 1 shows an example of DAKS and its query

answering system QAS that handles globalqueries.

Page 6: Ontology-based distributed autonomous knowledge systems

ARTICLE IN PRESS

Z.W. Ra!s, A. Dardzi !nska / Information Systems 29 (2004) 47–5852

For simplicity reason only two sites ofDAKS are considered in our example. A userqueries site 1 asking for all its objects satisfyingq ¼ ða1 *c2Þ þ ðf2 *g1Þ: The user is interested onlyin objects at site 1. The first part of the query,which is ða1 *c2Þ; can be handled by local QASbecause both its attribute values are within thedomain of the information system at site 1. Forinstance, taking optimistic interpretation of allattribute values at site 1 (null values are treated assupporting values), objects x1 and x2 will satisfyquery ða1 *c2Þ: Let us consider the second part ofthe query q; which is ðf2 *g1Þ: Attribute value f2 isnot in the domain V1 of the information system S1at site 1 but its definitionis in knowledgebase D1 at site 1. This definitioncan be used to replace the value f2 in q by a termwhich defines f2: In our example, either a partialdefinition ðf2; a1 *b2 þ d2Þ or a definitionðf2; a1 *b2 þ d2;Bða1 *b2 þ d2Þ*Bd2Þ of f2 can begenerated from D1: The attribute value g1 used in aquery q is neither in the domain D1 nor itsdefinition is in D1: In this case we have to searchfor a remote site, where definition of g1 can beextracted from its information system. In ourexample site 2 satisfies this requirement. Expres-sion ðg1; a2Þ can be seen as a partial definition of g1which can be extracted from S2: Alternatively,expression ðg1; a2;Ba1 *Ba2Þ can be used as adefinition of g1 found at site 2. To simplify theproblem further, assume that partial definitions off2 and g1 are used to replace query q by a newquery which can be handled locally by QAS atsite 1. This new query approximating q; describedby a term ða1 þ c2Þ þ ða1 *b2 þ d2Þ*a2; can be seenas a lower approximation of q in rough setsterminology. If we use definition of f2 anddefinition of g1 instead of partial definitions, queryq can be replaced by a rough query. Rough queriesare especially useful when the boundary area in arough query representation is small. In a distrib-uted scenario, similar to the one presented in thispaper, this boundary area is getting smaller andsmaller when more and more sites are used tosearch for definitions of non-local attributes (f2and g1 in our example). Now, let us go back toterm ða1 *c2Þ þ ða1 *b2 þ d2Þ*a2: If the distributionlaw can be applied then our term would be

transformed to ða1 *c2Þ þ ða1 *b2 *a2 þ d2 *a2Þ andnext assuming that properties a1 *a2 ¼ 0 and0* t ¼ 0 hold we would get its final equivalentform which is a1 *c2 þ d2 *a2: However if each ofour three terms a1 *b2; d2; a2 is computed underthree different semantics, then there is a problemwith the above transformation process and with afinal meaning of ða1 *b2 *a2 þ d2 *a2Þ:For instance, let us assume the scenario where

partial definition ðf2; a1 *b2Þ was extracted undersemantics M1; partial definition ðf2; d2Þ undersemantics M2; and ðg1; a2Þ under semantics M3:Null value is interpreted below as a set of allpossible values for a given attribute. Also, x hasthe same meaning as the pair ðx; 1Þ:Semantics M1 (see [6]) is defined as

* M1ðvÞ ¼ fðx; kÞ : vAaðxÞ & k ¼ cardðaðxÞÞg ifvAVa;

* M1ðt1 * t2Þ ¼ M1ðt1Þ#M1ðt2Þ;* M1ðt1 þ t2Þ ¼ M1ðt1Þ"M1ðt2Þ:

To define #; "; let us assume that Pi ¼fðx; p/x;iSÞ : p/x;iSA½0; 1� & xAXg where X is aset of objects. Then, we have

* Pi#Pj ¼ fðx; p/x;iS � p/x;jSÞ : xAXg;* Pi"Pj ¼ fðx;maxðp/x;iS; p/x;jSÞÞ : xAXg:

Semantics M2 is defined as

* M2ðvÞ ¼ fx : vAaðxÞ & cardðaðxÞÞ ¼ 1g if vAVa;* M2ðt1 * t2Þ ¼ M2ðt1Þ-M2ðt2Þ;* M2ðt1 þ t2Þ ¼ M2ðt1Þ,M2ðt2Þ:

Finally, semantics M3 is defined as

* M3ðvÞ ¼ fx : vAaðxÞg if vAVa;* M3ðt1 * t2Þ ¼ M3ðt1Þ-M3ðt2Þ;* M3ðt1 þ t2Þ ¼ M3ðt1Þ,M3ðt2Þ:

Assume now, the following relationship betweensemantics Mi and Mj:

Mi%Mj iff ½ðx; kÞAMiðaÞ-ð(k1XkÞ½ðx; k1ÞAMjðaÞ��:It can be easily proved that % is a partial order

relation.In our example, for any vAVa; we haveM2ðvÞ%M1ðvÞ%M3ðvÞ:We say that functors þ and * preserve

monotonicity property for semantics N1; N2 if

Page 7: Ontology-based distributed autonomous knowledge systems

ARTICLE IN PRESS

User Query

Client Application

Local QuerySearch localKnowledgebase

Rule found

Remote Serverapplication

Generate Rules

Send rules toremote client

Substitute non-localattribute values in aquery by conditionparts of applicablerules

Remote clientsaves rulesin a localknowledgebase

Local ServerApplication

Serverquerieslocal DB

Results sentto the user

NoYes

YesNo

Fig. 2. Query and rough query processing by DAKS.

Z.W. Ra!s, A. Dardzi !nska / Information Systems 29 (2004) 47–58 53

the following two conditions hold:

* ½N1ðt1Þ%N2ðt1Þ & N1ðt2Þ%N2ðt2Þ� implies N1ðt1þt2Þ%N2ðt1 þ t2Þ;

* ½N1ðt1Þ%N2ðt1Þ & N1ðt2Þ%N2ðt2Þ� implies N1ðt1 *t2Þ%N2ðt1 *t2Þ:

Let ðO;%Þ be a partially ordered set of semantics.We say that it preserves monotonicity property forþ and * ; if þ and * preserve monotonicityproperty for any N1; N2 in O:It can be easily checked that ðfM1;M2;M3g;%Þ

preserves monotonicity property for þ and * :We adopt, in this paper, the definition of

ontology proposed by Mizoguchi [7]. He claimsthat ontology [9–12,18] should consist of task

ontology which characterizes the computationalarchitecture of a (distributed) knowledge systemwhich performs a task and domain ontology whichcharacterizes the domain knowledge where thetask is performed.To answer a query at a client site, many remote

sites may have to be accessed. At each of thesesites, the same attributes can be used for knowl-edge (definitions) extraction. Semantics assigned tothem may differ at all or some of these sites. In aquery transformation process new terms areformed to replace the current subterms. Thesenew terms are constructed from the above defini-tions. Since they are extracted not necessarily fromthe same site of DAKS, such a query transforma-tion process is not legal unless a common, possiblyoptimal, semantics for all these terms is found.We claim that one way to solve this problem is

to assume that partially ordered set of semanticsðO;%Þ; preserving monotonicity property for þand * ; is a part of global ontology (or taskontology in Mizoguchi’s [7] definition).In our example, we evaluate query q taking first

M2 and next M3 as two common semantics for allsubterms obtained during the transformationprocess of q because M2ðvÞ%M1ðvÞ%M3ðvÞ: Ingeneral, if ðO;%Þ is a lattice and fMigiAI is a set ofsemantics involved in transforming query q; thenwe should take Mmin ¼

TfMigiAI (the greatest

lower bound) and Mmax ¼SfMigiAI (the least

upper bound) as two common semantics forprocessing query q:

Common semantics is also needed because ofthe necessity to prove soundness and possiblycompleteness of axioms to be used in a querytransformation process.Fig. 2 gives a flowchart of a query transforma-

tion process in QAS assuming that local querysemantics at all contacted sites is the same andDAKS is consistent (granularity levels of the sameattributes at remote sites and the client site are thesame). This flowchart will be replaced by twosimilar flowcharts (corresponding to Mmin andMmax) if semantics at contacted remote sites and aclient site differ. Semantics Mmin and Mmax can beseen jointly as a rough semantics represented in avery general way by Fig. 3.Saying another words, an optimal, non-rough

semantics (if it only exists) suitable for the wholequery transformation process is somewhere be-tween two semantics Mmin and Mmax:The word rough is used here in the sense of

rough sets theory (see [2]).If we increase the number of sites from which

definitions of non-local attributes are collected andthen resolve inconsistencies among them (see [4]),

Page 8: Ontology-based distributed autonomous knowledge systems

ARTICLE IN PRESS

QRAS for Site i

Rough transformationengine based on logicalaxioms and operationalsemantics Ni and Ji

Global querysubmittedto Site i

local query at Site i(upper approximation)

local query at Site i(lower approximation)

Fig. 3. Query rough answering system (QRAS).

Z.W. Ra!s, A. Dardzi !nska / Information Systems 29 (2004) 47–5854

the local confidence in resulting definitions isexpected to be higher since they represent con-sensus of more sites. At the same time, if thenumber of remote sites involved in a querytransformation process is increased, the numberof different semantics may increase as well whichin result may also increase the roughness of theanswer to the query.Assume now that one of the users needs to

retrieve objects from the system at Site 1 satisfyinga global query q and Site 2 of DAKS is contactedto find definition of an attribute value w listed in q:Attributes in the overlap between systems at Site 1and Site 2 have different granularity levels. Let ussay that attributes a; b from the overlap are used inthe definition of w: Assume that values of attributea at Site 1 are more general than its values at theSite 2. Knowing the semantics relationship be-tween these two granularity levels, we can replacevalues of a at Site 2 by values from Site 1 to learnw: Definitions of w extracted at Site 2 can bedirectly applied at site Site 1 to clarify the meaningof q for the Site 1. Assume now that values ofattribute b at Site 1 are more specific than at theSite 2. Again, knowing the semantics relationshipbetween these two granularity levels, we learndefinitions of values of b at Site 2 which aregeneralizations of values requested by Site 1. Thisway, a global query submitted to Site 1 is replacedby a more general query which is local so it can behandled locally at Site 1.We claim that the semantics relationships

between different granularity levels of the sameattribute, due to its presence at minimum two sitesof DAKS, should be represented by a hierarchicalweighted graph as a part of a global (task)ontology for DAKS. The need of weighted graphsinstead of hierarchical attributes is justified by thefact that some sites in DAKS may not agree on the

same relationships between attribute values ofmultiple granularity.

4. Query processing based on reducts

In this section we recall the notion of a reduct(see [2]) and show how the partially ordered set ofsemantics ðO;%Þ can be used to improve queryanswering process in DAKS as introduced in [8].Let us assume that S ¼ ðX ;A;V Þ is an informa-

tion system and V ¼SfVa : aAAg: Let BCA: We

say that x; yAX are indiscernible by B; denoted½xEBy�; if ð8aABÞ½aðxÞ ¼ aðyÞ�:Now, assume that both B1;B2 are subsets of A:

We say that B1 depends on B2 ifEB2CEB1 : Also,we say that B1 is a covering of B2 if B2 depends onB1 and B1 is minimal.By a reduct of A in S (for simplicity reason we

say A-reduct of S) we mean any covering of A:

Example. Assume the following scenario:

* S1 ¼ ðX1; fc; d; e; gg;V1Þ; S2 ¼ ðX2; fa; b; c; d; f g;V2Þ; S3 ¼ ðX3; fb; e; g; hg;V3Þ are informationsystems,

* User submits a query q ¼ qðc; e; f Þ to the queryanswering system QAS associated withsystem S1;

* Systems S1; S2; S3 are parts of DAKS.

Attribute f is non-local for a system S1 so thequery answering system associated with S1 has tocontact other sites of DAKS requesting a defini-tion of f in terms of fd; c; e; gg: Such a request isdenoted by /f : d; c; e; gS: Assume that thesystem S2 is contacted. The definition of f ;extracted from S2; involves only attributesfd; c; e; gg-fa; b; c; d; f g ¼ fc; dg: There are threef -reducts (coverings of f) in S2: They are:fa; bg; fa; cg; fb; cg: The optimal f -reduct is theone which has minimal number of elementsoutside fc; dg: Let us assume that fb; cg is chosenas an optimal f -reduct in S2:Then, the definition of f in terms of attributes

fb; cg will be extracted from S2 and the queryanswering system of S2 will contact other sitesof DKS requesting a definition of b (which is

Page 9: Ontology-based distributed autonomous knowledge systems

ARTICLE IN PRESS

x1

x2

x3

x4

x5

x6

x7

x8

a b d

a1

a1

a1

a1

a2

a2

a3

a3

b1

b2

b1

b2

b2

b2

b1

b1

c1

c2

c1

c2

c1

c1

c3

c3

d1

d1

d1

d1

d1

d1

d2

d2

f1

f2

f1

f2

f3

f3

f4

f4

X2

Coverings of f : {a,b} , {a,c}, {b,c}

X1 d e g

q = q(c,e,f)

c, e : local attributesf : non-local

step 1

< f : d, c , e, g>

overlap

Covering {b, c} is chosen as optimal one.

y1

y2

y3

y4

y5

y6

y7

y8

b1

b1

b1

b1

b2

b2

b2

b2

e1

e1

e1

e1

e2

e2

e3

e3

g1

g2

g1

g2

g2

g2

g3

g3

h1

h2

h1

h2

h1

h1

h3

h3

e g hb

step 2

<b : d, c, e, g>

overlap

QAS

site 1

step 2

rules extractedat site 2

step 3

e1 > b1e2 > b2e3 > b2

Coverings of b: {e}, {g,h}Covering {e} ischosen asoptimal one.

b1*c1 > f1b2*c2 > f2b2*c1 > f3c3 > f4

fc c

Fig. 4. Process of resolving a query by QAS in DAKS.

Z.W. Ra!s, A. Dardzi !nska / Information Systems 29 (2004) 47–58 55

non-local for S1) in terms of attributes fd; c; e; gg:If definition of b is found, then it is sent to QAS ofthe site 1. Fig. 4 shows the process of resolvingquery q in the example above. Since the set feg isone of the coverings of b in the system contactedby S2; then the search for a definition of f iscompleted with a help of two systems.Now, let us go back to the process of resolving

global query q ¼ qðc; e; f Þ visually shown in Fig. 4.Clearly, the resulting query depends only onattributes c and e so it can be processed by localQAS for the site 1. If semantics of queries differ atour three involved sites then previously presentedtransformation steps for query q are not legalbecause different semantics cannot be mixed. Oneway to handle this problem is to construct a roughsemantics common for all three sites.To consider the most general scenario, assume

now that fS1;S2;y;Sk1g is a group of systems inDAKS which are contacted by S in order to help S

to understand a global query q: Also, let us assumethat fS1;S2;y;Sk1g is one of the possible groupsof systems which can be contacted by S in theprocess of solving q: Now, each system Si;ð1pipk1Þ; may still have to contact recursively anew group of systems Sði;1Þ;Sði;2Þ;y;Sði;niÞ inDAKS to look for the best definitions of subtermsof q which are not understood by S: This processwill continue till all systems contacted in lastrecursive calls are able to provide optimal defini-tions of values of attributes in terms of attributeslocal for S:Fig. 5 shows the tree of recursive calls con-

structed when solving query q:Clearly, each node of this tree represents an

autonomous information system with its owninterpretation of queries. So, definitions of valuesof attributes have interpretations the same asinterpretations assigned to systems from whichthese definitions have been extracted. Rules

Page 10: Ontology-based distributed autonomous knowledge systems

ARTICLE IN PRESS

S

SiS1 SK1

S(i,2)S(i,1) S(i,ni)

S(i,1,1) S(i,1,2) . . . . . . . S(i,ni,1) S(i,ni,2) . . . . . .

Fig. 5. Recursive calls tree driven by query q submitted to S:

Global querysubmitted to S

Rough transformationengine based on logicalaxioms, ontologies andsemantics M(min), M(max)

local query at site of S(semantics M(max))

QRAS for the site of S

local query at site of S(semantics M(min))

Fig. 6. Query rough answering system (QRAS).

Z.W. Ra!s, A. Dardzi !nska / Information Systems 29 (2004) 47–5856

discovered from the system Sði; k; jÞ have the sameinterpretation as the one assigned to queries forSði; k; jÞ: To solve a query q submitted to S (fromFig. 5) definitions received from all nodes of thetree have to be used, which means that theproblem related to mixing their correspondinglocal semantics requires our attention.Assume that query answering systems linked

with sites of DAKS use only semantics from O;where ðO;%Þ is a partially ordered set of localsemantics introduced in the previous section. Now,any global query submitted to S may generatemany trees similar to the one at Fig. 5. Clearly, thebest tree for q will be among trees of smallestheight and with nodes having local semanticsassigned to them which are maximally close inðO;%Þ to the local semantics assigned to S: IffMigiAI is a set of all local semantics assigned tonodes of the tree T used to process query q; thenwe may process the query by extracting definitionsof global attributes at level i of tree T and next usethem at level i-1 of T : This process has to be donerecursively till the root of T is reached. During thisrecursive procedure we do not have to deal withinconsistencies between local semantics, since wealready know that the final meaning of theresulting query will be between boundaries Mmin ¼TfMigiAI and Mmax ¼

SfMigiAI : Clearly, to have

these two boundaries uniquely defined we need toassume that ðO;%Þ is a lattice.

These two boundaries are generated by thequery rough answering system (QRAS) for DAKSbased on ðO;%Þ: QRAS is represented by Fig. 6,where M(min), M(max) are two boundary seman-tics generated for global query q submitted to S:Clearly, M(min) and M(max) stand for Mmin andMmax:Let us assume that ðO;%Þ presented by Fig. 7 is

a part of a global (task) ontology for DAKS andquery q ¼ qðaÞ is submitted to the query answeringsystem of S which has local semantics M8:Attribute a listed in q is not local for S: If we donot introduce the notion of a distance betweennodes in ðO;%Þ reflecting definitions of Mi; iAI ;then M2; M3; and M9 are the closest localsemantics to M8: So, all sites with local semanticsM8 assigned to their queries have the firstpreference to be asked for help by S in solving q:The next preference should be given to all siteswith local semantics M2; M3; and M9 assigned tothem.

Page 11: Ontology-based distributed autonomous knowledge systems

ARTICLE IN PRESS

M1

M2

M3

M4

M6

M7

M8

M5

M9 M11

M10

M12

M13

Fig. 7. Sample task ontology ðO;%Þ:

Z.W. Ra!s, A. Dardzi !nska / Information Systems 29 (2004) 47–58 57

Assuming that systems SiðAiÞ; iAI ; can be askedby SðAÞ for help to solve qðaÞ; the secondparameter which is worth to consider here is thenumber of attributes in a-reduct of SiðAiÞ whichare outside the overlap A-Ai: The most preferablesites to be asked for help are all these sites whichhave that number the smallest. This way we mayminimize the number of sites involved in proces-sing query qðaÞ which in turn may give us muchsmaller number of semantics used to process theinitial query. Clearly, we assume here that thisstrategy is applied recursively till no attributes areoutside the corresponding overlap.Query answering system for handling global

queries in S can be implemented as a heuristicalgorithm with a preference to contact first all sitesin DAKS with local semantics which are maxi-mally close in ðO;%Þ to the semantics assigned to S:This strategy seems to be the most reasonablebecause assuming that the number of sites inDAKS is rather large, we should not have anyproblems to identify enough number of sites worthto contact. Also, by getting help only from siteswith local semantics which is either equal or closeto the semantics assigned to S; the roughness ofthe resulting local semantics of global queries at S

is guaranteed to be rather small.

5. Conclusion

Clearly, the easiest way to solve semanticsinconsistencies problem is to apply the same localsemantics at all remote sites. However, when

databases are incomplete and we replace their nullvalues using rule-based chase algorithms based onrules locally extracted then we are alreadycommitted to the semantics used by these algo-rithms. If we do not keep track what and how thenull values have been replaced by rule-based chasealgorithms, there is no way back for us. Also, itsounds rather unrealistic that one local semanticsfor incomplete databases can be chosen as astandard. We claim that in such cases a partiallyordered set of local semantics ðO;%Þ or itsequivalent structure should be a part of task

ontologies to provide this new meta-data informa-tion needed to solve the semantics inconsistenciesproblem.

References

[1] Z. Ras, A. Dardzinska, Handling semantics inconsistencies

in query answering based on distributed knowledge

mining, in: Foundations of Intelligent Systems, Proceed-

ings of ISMIS’02 Symposium, Lecture Notes in Computer

Science/Lecture Notes in Artificial Intelligence, Vol. 2366,

Springer, Berlin, 2002, pp. 66–74.

[2] Z. Pawlak, Rough classification, Int. J. Man–Machine

Studies 20 (1984) 469–483.

[3] D. Maluf, G. Wiederhold, Abstraction of representation

for interoperation, in: Foundations of Intelligent

Systems, Proceedings of the 10th International

Symposium, Lecture Notes in Computer Science/Lecture

Notes in Artificial Intelligence, Vol. 1325, Springer, Berlin,

1997, pp. 441–455.

[4] Z. Ras, Dictionaries in a distributed knowledge-based

system, in: Concurrent Engineering: Research and Appli-

cations, Conference Proceedings, Pittsburgh, PA, Con-

current Technologies Corporation, 1994, pp. 383–390.

Page 12: Ontology-based distributed autonomous knowledge systems

ARTICLE IN PRESS

Z.W. Ra!s, A. Dardzi !nska / Information Systems 29 (2004) 47–5858

[5] Z. Ras, Resolving queries through cooperation in multi-

agent systems, in: T.Y. Lin, N. Cercone (Eds.), Rough Sets

and Data Mining, Kluwer Academic Publishers,

Dordrecht, 1997, pp. 239–258.

[6] Z. Ras, S. Joshi, Query approximate answering system for

an incomplete DKBS, Fund. Inform. 30 (3/4) (1997)

313–324.

[7] R. Mizoguchi, Ontological engineering: foundation of

the next generation knowledge processing, in: Proceedings

of Web Intelligence: Research and Development,

Lecture Notes in Computer Science/Lecture Notes in

Artificial Intelligence, Vol. 2198, Springer, Berlin, 2001,

pp. 44–57.

[8] Z. Ras, Query answering based on distributed knowledge

mining, in: N. Zhong, J. Lin, S. Ohsuga, J. Bradshaw

(Eds.), Intelligent Agent Technology, Research and

Development, Proceedings of IAT’01, World Scientific,

Singapore, 2001, pp. 17–27.

[9] T. Andreasen, J.F. Nilsson, H.E. Thomsen, Ontology-

based quering, in: Proceedings of the Flexible Query

Answering Systems, Advances in Soft Computing, Physi-

ca-Verlag, Wurzburg, 2000, pp. 15–26.

[10] J. Cardoso, A. Sheth, Semantic e-workflow composition,

Int. J. Intelligent Inform. Systems, Vol. 20, No. 3, Kluwer,

2003.

[11] R.A. Meersman, Semantic ontology tools in IS design, in:

Foundations of Intelligent Systems, Proceedings of 11th

International Symposium, Lecture Notes in Computer

Science/Lecture Notes in Artificial Intelligence, Vol. 1609,

Springer, Berlin, 1999, pp. 30–45.

[12] R.A. Meersman, Ontologies and databases: more than a

fleeting resemblance, in: Proceedings of OES/SEO Work-

shop, Rome, Italy, September 14–15, 2001.

[13] S. Navathe, M. Donahoo, Towards intelligent integration

of heterogeneous information sources, in: Proceedings of

the Sixth International Workshop on Database Re-

engineering and Interoperability, 1995.

[14] A.L. Prodromidis, S. Stolfo, Mining databases with

different schemas: integrating incompatible classifiers, in:

Proceedings of the Fourth International Conference on

Knowledge Discovery and Data Mining, AAAI Press,

Menlo Park, CA, 1998, pp. 314–318.

[15] Z. Ras, J. ’Zytkow, Mining for attribute definitions in a

distributed two-layered DB system, J. Intelligent Inform.

Systems 14 (2/3) (2000) 115–130.

[16] Z. Ras, J. ’Zytkow, Discovery of equations to augment the

shared operational semantics in distributed autonomous

BD System, in: PAKDD’99 Proceedings, Lecture Notes in

Computer Science/Lecture Notes in Artificial Intelligence,

Vol. 1574, Springer, Berlin, 1999, pp. 453–463.

[17] V.S. Subrahmanian, Heterogeneous agent systems, in:

Foundations of Intelligent Systems, Proceedings of 11th

International Symposium, Lecture Notes in Computer

Science/Lecture Notes in Artificial Intelligence, Vol. 1609,

Springer, Berlin, 1999, pp. 46–55.

[18] K. Sumi, R. Mizoguchi, Consensus formation support

system via ontology generation, in: Proceedings of the

PRICAI’02 International Workshop on Intelligent Media

Technology for Communicative Reality, 2002.