17

2 Visser, Stuckenschmidt, Schuster & V€¦ · uster v ogele g @informatik.uni-bremen.de R e c eive d (12/01/1999; evise d 03/20/2000) Abstract. The dev elopmen tto w ards op en geographical

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: 2 Visser, Stuckenschmidt, Schuster & V€¦ · uster v ogele g @informatik.uni-bremen.de R e c eive d (12/01/1999; evise d 03/20/2000) Abstract. The dev elopmen tto w ards op en geographical

Ontologies for Geographic Information Processing

U. VISSER, H. STUCKENSCHMIDT, G. SCHUSTER & T. V�OGELE

TZI - Center for Computing Technologies, University of Bremen, Universit�atsallee 21-23,

D-28359 Bremen, Germany

[email protected]

Received (12/01/1999; Revised 03/20/2000)

Abstract. The development towards open geographical information systems (GIS) and the interoper-

ability between these systems demand new requirements for the description of the underlying data. The

exchange of data between GIS is problematic and often fails due to confusion in the meaning of concepts.

The term semantic translator, a translator between GIS and/or catalogue systems which gives the user

the option to map data between the systems is a current research topic. This paper proposes an overview

of formal ontologies and how they can be used for geographical information integration. A description

of an intelligent architecture for semantic-based information retrieval is introduced and shows in a case

study how this approach can be used for general purposes.

Keywords: Ontology, Semantic Translation, Intelligent Information Integration, Information Retrieval,

Geo-processing

1. Introduction

Information processing in geographical applica-

tions is a complex task. The following subsections

give a brief overview about the problem domain

and recent activities from various organizations.

1.1. The Problem Domain

In order to solve a problem in an environmental

domain (e. g. where does the high sulfate concen-

tration in the river come from?) involves various

data from di�erent areas (e. g. data regarding the

river, adjacent waste dumps, ground water ow

etc). Frequently, all the data is not available from

one database but is distributed and has di�erent

formats (e. g. single data, time series, spatial data

with di�erent resolutions).

Therefore this requires a profound data prepa-

ration before the actual analysis can be accom-

plished. Recent studies in areas such as data

warehousing (Wiener et al., 1996), information in-

tegration (Galhardas et al., 1998; Bergamashi et

al., 1999) and interoperability between GIS (Vck-

ovski et al., 1999) have addressed these problems.

The problem domain is complex, not completely

understood and dynamic. Inventing and develop-

ing methods for environmental information sys-

tems is a challenging task for both computer and

geo-scientists.

Therefore, how do we integrate the intelligent

methods and algorithms computer scientists o�er

us and how do we merge these with the knowledge

of environmental experts?

1.2. Spatio-Temporal Information

Spatio-temporal information can be described as

a window at a certain place and time. This

window gives insight about the data but is only

Page 2: 2 Visser, Stuckenschmidt, Schuster & V€¦ · uster v ogele g @informatik.uni-bremen.de R e c eive d (12/01/1999; evise d 03/20/2000) Abstract. The dev elopmen tto w ards op en geographical

2 Visser, Stuckenschmidt, Schuster & V�ogele

one window in the space-time continuum. The

OpenGISTM Consortium (OGC) de�nes place as

a measurable piece of the real world. Time is a

point, an interval or a collection of points and in-

tervals in what we perceive as the time contin-

uum. Time and place can be measured and sur-

veyed, and their coordinates in a particular spa-

tial, temporal reference system can be derived.

The consortium uses the term location to cover

both place and time (OGC, 1999b). G�unther

(G�unther, 1998) describes properties of environ-

mental data as follows (extract):

% 'Large' more detailed

1. Complex: spatial data objects often have a

complex structure. An object could be repre-

sented by a single point or thousands of poly-

gons.

2. Dynamic: spatial data are dynamic. Inser-

tions and deletions interact with updates.

3. Large: spatial data tend to be large (in

terms of the amount of data, e. g. geographical

maps).

4. No standard algebra de�ned: basically this

means that there is no standard set of opera-

tors.

1.3. Geographic Information Systems

Geographical information systems are essential

and important tools to analyse and visualize

spatio-temporal information. Originally devel-

oped for the creation of thematic maps, GIS sup-

port data capture (e. g. digitizing), data storage

(DBMS, spatial DBMS), and data analysis (e. g.

combination of spatial and non-spatial data).

Lately, the OGC demands new requirements for

GIS. The objectives of the OGC are a full inte-

gration of geo-spatial data and geo-processing re-

sources into mainstream computing. Open and

interoperable geo-processing, or the ability to

share heterogeneous geo-data and geo-processing

resources transparently in a networked environ-

ment is the main aim of this organization. The

interoperability of GIS demands new requirements

which can be achieved in two ways. Firstly, the de-

velopers of GIS have to come together and de�ne

de facto standards. The OGC's abstract speci�-

GIS III

simplefeature

simplefeature

simplefeature

simplefeature

simplefeature

simplefeature

simplefeature

towards a fully componentized GIS

componentsusesGIS

systems on various platforms

GIS I

GIS II

GIS as autonomous

Simple featuresas basic functions ofvarious GIS as components

Fig. 1. Development of GIS

cation models are a �rst (and big!) step in this

direction. Secondly, an approach is to develop

semantic translators (see section 1.4.2), to de�ne

the meaning of concepts. For example the con-

cept forest in the ATKIS (AdV, 1998) catalogue

has a di�erent semantic than the concept forest in

CORINE land cover catalogue (EEA, 1997-1999).

We will discuss this example later in this paper.

Figure 1 gives an overview about the OGC's GIS

future. The idea is to de�ne 'simple features' and

compose these to a customized GIS system.

1.4. Current Activities

As mentioned previously, the activities related

to interoperability between GIS can be seen as

two main streams: Firstly, the OGC is work-

ing on the standardization of components and has

published their �rst frozen abstract speci�cation

(OGC, 1999a). Secondly, the activities around se-

mantic translators are worth mentioning. OGCs

topic 14 Semantics and Information Communities

hasn't been considered in depth, however, a core

task force will be working on this topic in the fu-

ture.

1.4.1. Standardization: The OGC is working

on an evolution from traditional GIS solutions,

in which proprietary data models and monolithic

software functions are made interoperable and ex-

tensible. Applications which adhere to the ob-

jectives of Open GIS are free to access and use

various types of distributed data, and to utilize

Page 3: 2 Visser, Stuckenschmidt, Schuster & V€¦ · uster v ogele g @informatik.uni-bremen.de R e c eive d (12/01/1999; evise d 03/20/2000) Abstract. The dev elopmen tto w ards op en geographical

Ontologies for Geographic Information Processing 3

multiple geo-processing tools and services. A for-

mal speci�cation, the Open Geodata Interoper-

ability Speci�cation (OGIS), is currently under

development and will de�ne the types and meth-

ods necessary to build interoperable systems. At

the 2nd International Conference on Interoperat-

ing Geographic Information Systems (Vckovski et

al., 1999) almost 50% of the contributions were

related to the OpenGIS ideas. We can assume

that the OGC and their ideas and visions will

continue in the near future. The relation between

the IT industry with their standards approach and

the GIS interoperability approach was one of the

topics at the above mentioned conference (Berre,

1999). General problems and solutions for syntac-

tic and semantic interoperability in the context of

IT-standards, such as ISO RMODP, ISO CSMF,

CORBA/EJB/COM+, UML, XML and the Euro-

pean DISGIS Esprit IV project (DISGIS-Project,

1999), which deals with practical experiences re-

garding the use of ISO/TC211 and OpenGIS in-

teroperability approaches were discussed.

In summary, standardization is one way to over-

come interoperability de�ciencies between GIS.

However, it is hard to convince vendors to sup-

port standardization and it is a time consuming

task.

1.4.2. Semantic Translation: In order to achieve

interoperability between GIS the data from one

system have to be found and integrated into an-

other system. However, this integration process is

not always supported by the users system. There-

fore, tools are required to achieve interoperabil-

ity of data sources. Problems that might arise

due to heterogeneity of the data are already well-

known within the distributed database systems

community (e. g. Kim and Seo, 1991; Saltor and

Rodriguez, 1997). In general, these problems can

be divided into three categories:

� syntax (e. g. data format heterogeneity),� structure (e. g. homonymes, synonymes or dif-

ferent attributes in database tables), and� semantics (e. g. intended meaning of terms in

a special context or application).

Syntactic approaches can help to overcome

problems that belong to the �rst two categories.

Unfortunately, syntactic approaches do not sup-

port the reconciliation of the semantic heterogene-

ity problems appropriately. As environmental sci-

ence has an interdisciplinary character, environ-

mental information often faces semantic problems.

They arise among others from the use of di�er-

ent terminologies established for certain purposes.

This type of semantic heterogeneity is already

a problem for human experts in communicating

with each other. Therefore, it becomes even more

challenging when attempting to integrate these

terminologies automatically. Furthermore, these

approaches must address two main problems:

� attaching semantics to information sources

and entities and� drawing conclusions from the semantic anno-

tations available.

Early approaches for semantic integration were

mainly based on the use of thesauri to translate

between speci�c vocabularies. Fulton (Fulton,

1996) de�ned the term semantic plug and play,

an architecture in which the relationships among

the data are managed through the models that

de�ne that data and the operations performed on

it. More recently, semantic enrichment of infor-

mation has been discovered as a promising ap-

plication area for well-known AI-techniques and

methods (Fensel, 1999). In this course of research

more powerful concepts for semantic annotation

have been developed. Semantic annotation facil-

ities should become more widely applicable with

the further development of the Resource Descrip-

tion Framework RDF (W3C 1999c webref.). This

is intended to become a standard annotation lan-

guage for semantic information on the world wide

web.

With semantic translation data translation can

go beyond the traditional mapping and conversion

of geometric primitives. If we look at the term 'se-

mantic' w.r.t. geographical data we are referring

to the meaning of a concept (e. g. the concept for-

est in a geographical sense). This is quite di�erent

to the term 'semantic' as it is used in programming

languages, where semantics determines the exact

function of a language.

Commercial and Non-Commercial Tools: Cur-

rently, there are a few commercial and non-

commercial systems on the market that make use

of semantic translation. An example: The Feature

Page 4: 2 Visser, Stuckenschmidt, Schuster & V€¦ · uster v ogele g @informatik.uni-bremen.de R e c eive d (12/01/1999; evise d 03/20/2000) Abstract. The dev elopmen tto w ards op en geographical

4 Visser, Stuckenschmidt, Schuster & V�ogele

Manipulation Engine (FME) (FME, 1999), origi-

nally developed for the Canadian Government, is

'emerging as a de facto standard in the industry

for sharing geospatial data between diverse appli-

cations' (Michael Cosentino, Geospatial Market

Development Manager, Sun Systems Inc.). Un-

derlying the engine is a rich data model which

is internally consistent and inherently extensible.

Constructs within the models of the input, or out-

put formats, or systems are mapped to constructs

in the engine's model. The engine provides a se-

ries of methods to carry out model-to-model trans-

formations, applicable to data either on input or

output. Cosentino argues that this functionality

ensures that neither the data provider nor the data

consumer feels constrained; they can use their re-

spective systems anyway they wish. FME pro-

vides a translation tool through which sophisti-

cated spatial translation operations between vari-

ous standard GIS data formats can be performed.

FME is the core of a number of applications, such

as the Geo-Task Server (Huber, 1998).

Other Activities: The German Federal/States

working group 'environmental information sys-

tems' (BLAK UIS) stated that semantic interoper-

ability is required for open environmental systems

at their workshop (Bock et al., 1999) this year.

It is anticipated that the authorities will perform

further work on this topic. An idea to overcome

the de�ciencies of exchanging or comparing data

between GIS and/or between catalogues is to use

ontologies. The advantage of ontologies is the ex-

istence of formal semantics. This allows de�ned

ontologies for concepts (such as forest) for di�er-

ent catalogue systems and to de�ne axioms for the

'translation' between those ontologies.

2. Ontologies and their Application

The term 'Ontology' has been used in many ways

and across di�erent communities (Guarino and

Giaretta, 1995). If we try to motivate the use of

ontologies for geographic information processing

we have to make clear what we have in mind when

we refer to ontologies. Thereby, we mainly follow

the description given in (Uschold and Gruninger,

1996). In the following sections we will introduce

ontologies as an explication of some shared vo-

cabulary or conceptualization of a speci�c subject

matter. We will brie y describe the way an ontol-

ogy explicates concepts and their properties and

argue for the bene�t of this explication in di�erent

typical application scenarios.

2.1. Shared Vocabularies and Conceptualizations

In general, each person has an individual view of

the world. However, there is a common basis of

understanding in terms of the language we use to

communicate with each other. Terms from nat-

ural language can therefore be assumed to be a

shared vocabulary relying on a (mostly) common

understanding of certain concepts with only little

variety. This common understanding relies on the

idea of how the world is organized. We often call

this idea a 'conceptualization' of the world. Such

conceptualizations provide a terminology that can

be used for communication.

The example of natural language already shows

that a conceptualization is never universally valid,

but rather for a limited number of persons com-

mitting to that conceptualization. This fact is

re ected in the existence of di�erent languages

which di�er more (English and Japanese) or less

(German and Dutch). Things get even worse when

we are dealing with every day language but with

terminologies developed for special scienti�c or

economic areas. In these cases we often �nd sit-

uations where the same term refers to di�erent

phenomena. The use of the term 'ontology' in phi-

losophy and its use in computer science may serve

as an example. The consequence is a separation

into di�erent groups that share a terminology and

its conceptualization. These groups are also called

information communities.

The main problem with the use of a shared ter-

minology according to a speci�c conceptualization

of the world is that much information remains im-

plicit. When a mathematician talks about the bi-

nomial�n

k

�he has much more in mind than just

the formula itself. He will also think about its in-

terpretation (the number of subsets of a certain

size) and its potential uses (e. g. estimating the

chance of winning in a lottery). Ontologies have

set out to overcome the problem of implicit and

hidden knowledge by making the conceptualiza-

tion of a domain (e. g. mathematics) explicit. This

Page 5: 2 Visser, Stuckenschmidt, Schuster & V€¦ · uster v ogele g @informatik.uni-bremen.de R e c eive d (12/01/1999; evise d 03/20/2000) Abstract. The dev elopmen tto w ards op en geographical

Ontologies for Geographic Information Processing 5

corresponds to a popular de�nition of the term on-

tology in computer science (Gruber, 1993):

"An ontology is an explicit speci�cation of

a conceptualization."

An ontology is used to make assumptions about

the meaning of a term available. It can also be

seen as an explication of the context a term is nor-

mally used in. Lenat (1998) for example describes

context in terms of twelve independent dimensions

that have to be know in order to completely under-

stand a piece of knowledge and shows how these

dimensions can be explicated using the 'Cyc' on-

tology.

2.2. Speci�cation of Context Knowledge

There are many di�erent ways in which an on-

tology may explicate a conceptualization and the

corresponding context knowledge. The possibili-

ties range from a purely informal natural language

description of a term corresponding to a glossary

up to strictly formal approaches with the expres-

sive power of full �rst order predicate logic or even

beyond (e. g. Ontolingua (Gruber, 1991)). Jasper

and Uschold (1999) distinguish two ways in which

the mechanisms for the speci�cation of context

knowledge by an ontology can be compared:

Level of Formality: The speci�cation of a con-

ceptualization and its implicit context knowledge

can be done at di�erent levels of formality. As

already mentioned above, a glossary of terms can

also be seen as an ontology despite its purely infor-

mal character. A �rst step to gain more formality

is to prescribe a structure to be used for the de-

scription. A good example for this approach is

the new standard web annotation language XML

(Bray et al., 1998). XML o�ers the possibility to

de�ne terms and organize them in a simple hierar-

chy according to the expected structure of the web

document. The organization of the term is called

a 'Data Type De�nitions' (DTD). This DTD is an

ontology describing the terminology of a web page

on a low level of formality. However, the rather

informal character of XML encourages its misuse.

While the hierarchy of an XML speci�cation was

originally designed to describe layout it can also

be exploited to represent sub-type hierarchies (van

Harmelen and Fensel, 1999) which may lead to

confusion. This problem can be solved by assign-

ing formal semantics to the structures used for the

description of the ontology. An example is the

conceptual modeling language CML (Schreiber et

al., 1994). CML o�ers primitives to describe a

domain that can be given a formal semantics in

terms of �rst order logic (Aben, 1993). However,

a formalization is only available for the structural

part of a speci�cation. Assertions about terms

and the description of dynamic knowledge is not

formalized o�ering total freedom for the descrip-

tion. On the other extreme there are also speci�-

cation languages which are completely formal. A

prominent example is the Knowledge Interchange

Format KIF (Genesereth and Fikes, 1992) which

was designed to enable di�erent knowledge-based

systems to exchange knowledge. KIF has been

used as a basis for the Ontolingua language (see

above) thus giving a formal semantics to that lan-

guage as well.

Extent of Explication: The other comparison cri-

terion is the extent of explication that is reached

by the ontology. This criterion is strongly con-

nected with the expressive power of the speci�ca-

tion language used. We already mentioned DTDs

which are mainly a simple hierarchy of terms. We

can generalize this by saying that the least expres-

sive speci�cation of an ontology consists of an or-

ganization of terms in a network using two-placed

relations. This idea goes back to the use of seman-

tic networks in the seventies. Many extensions of

the basic idea have been proposed. One of the

most in uential was the use of roles that could

be �lled out by entities showing a certain type

(Brachman, 1977). This kind of value restriction

can still be found in recent approaches. RDF-

schema descriptions (Brickley et al., 1998) which

might become a new standard for the descrip-

tion of web-pages is an example. An RDF-schema

contains class de�nitions with associated proper-

ties that can be restricted by so-called constraint-

properties. However, default values and value

range descriptions are not expressive enough to

cover all possible conceptualizations. More ex-

pressive power can be provided by allowing classes

to be speci�ed by logical formulas. These for-

Page 6: 2 Visser, Stuckenschmidt, Schuster & V€¦ · uster v ogele g @informatik.uni-bremen.de R e c eive d (12/01/1999; evise d 03/20/2000) Abstract. The dev elopmen tto w ards op en geographical

6 Visser, Stuckenschmidt, Schuster & V�ogele

mulas can be restricted to a decidable subset of

�rst order logic. This is the approach of so-called

description logics (Borgida and Patel-Schneider,

1994) Nevertheless, there are also approaches al-

lowing for more expressive descriptions. In On-

tolingua, for example, classes can be de�ned by

arbitrary KIF-expressions. Beyond the expres-

siveness of full �rst-order predicate logic there are

also special purpose languages that have an ex-

tended expressiveness to cover speci�c needs of

their application area. Examples are speci�cation

languages for knowledge-based systems often in-

cluding variants of dynamic logic to describe sys-

tem dynamics (compare Fensel and van Harmelen,

1994).

2.3. Bene�ts for Applications

Ontologies are useful for many di�erent appli-

cations that can be classi�ed into several areas

(Jasper and Uschold, 1999). Each of these areas

has di�erent requirements on the level of formal-

ity and the extent of explication provided by the

ontology. We will shortly review common applica-

tion areas namely the support of communication

processes, the speci�cation of systems and infor-

mation entities, and the interoperability of com-

puter systems.

Communication: Information communities are

useful because they facilitate communication and

cooperation among its members by the use of a

shared terminology with a well de�ned meaning.

On the other hand, the formation of informa-

tion communities makes communication between

members from di�erent information communities

very diÆcult because they do not agree on a com-

mon conceptualization. They may use the shared

vocabulary of natural language. However, most

of the vocabulary used in their information com-

munities is highly specialized and not shared with

other communities. This situation demands for

an explication and explanation of the terminology

used. Informal ontologies with a large extent of

explication are a good choice to overcome these

problems. While de�nitions have always played

an important role in scienti�c literature, concep-

tual models of certain domains are rather new.

However, nowadays systems analysis and related

�elds such as software engineering rely on con-

ceptual modeling to communicate structure and

details of a problem domain as well as the pro-

posed solution between domain experts and engi-

neers. Prominent examples of ontologies used for

communication are Entity-Relationship (ER) dia-

grams (Chen, 1976) and object-oriented modeling

languages such as UML (OMG, 1999; Rumbaugh

et al., 1991).

Systems Engineering: ER-diagrams and UML

are not only used for communication, they also

serve as construction plans for data and systems

guiding the process of developing the system. The

use of ontologies for the description of information

and systems has many bene�ts. The ontology can

be used to identify requirements as well as incon-

sistencies in a chosen design. It can help to ac-

quire or search for available information. Once

a systems component has been implemented its

speci�cation can be used for maintenance and ex-

tension purposes. Another very challenging appli-

cation of ontology-based speci�cation is the reuse

of existing software. In this case the specifying

ontology serves as a basis to decide if an existing

component matches the requirements of a given

task.

Depending on the purpose of the speci�cation,

ontologies of di�erent formal strength and expres-

siveness are to be used. While the process of com-

munication design decisions and the acquisition

of additional information normally bene�t from

rather informal and expressive ontology represen-

tations (often graphical), the directed search for

information needs a rather strict speci�cation with

a limited vocabulary to limit the computational

e�ort. At the moment, the support of semi- auto-

matic software reuse seems to be one of the most

challenging applications of ontologies because it

requires expressive ontologies with a high level of

formal strength (see for example van Heijst et al.,

1997).

Interoperability: The above considerations might

provoke the impression that the bene�ts of on-

tologies are limited to systems analysis and de-

sign. However, an important application area of

ontologies is the integration of existing systems.

The ability to exchange information at runtime,

also known as interoperability, is an important

Page 7: 2 Visser, Stuckenschmidt, Schuster & V€¦ · uster v ogele g @informatik.uni-bremen.de R e c eive d (12/01/1999; evise d 03/20/2000) Abstract. The dev elopmen tto w ards op en geographical

Ontologies for Geographic Information Processing 7

topic. The attempt to provide interoperability

su�ers from problems similar to those associated

with the communication amongst di�erent infor-

mation communities. The important di�erence is

that the actors are not persons able to perform

abstraction and common sense reasoning about

the meaning of terms, but machines. In order

to enable machines to understand each other we

also have to explicate the context of each system,

but on a much higher level of formality in order

to make it machine-understandable (the KIF lan-

guage was originally de�ned for the purpose of

exchanging knowledge models between di�erent

knowledge-based systems). Ontologies are often

used as Inter-Linguas in order to provide inter-

operability (Uschold and Gruninger, 1996): they

serve as a common format for data interchange.

Each system that inter-operates with other sys-

tems has to transfer its information into this com-

mon framework. Interoperability is achieved by

explicitly considering contextual knowledge in the

translation process.

3. Interoperability and Integration of In-

formation Sources

The interoperability of geographical information

systems is an important topic in geoinformatics

research (compare Vckovski et al., 1999). Many

problems have to be solved in order to provide

complete interoperability between heterogeneous

systems. One of the most basic problems is the

integration of the information used by di�erent

systems. This information often shows signi�cant

di�erences in terms of representation and struc-

turing thus making integration a challenging task.

We distinguish four levels of integration we have

to cover in order to provide interoperability:

Technical integration: The World Wide Web

provides a well established infrastructure to

exchange large amounts of information from

all over the world. Information from web-

pages and web-databases can be accessed in

principle.

Syntactic integration: Many standards have

evolved that can be used to integrate di�er-

ent information sources. Beside the classi-

cal database interfaces such as ODBC web-

oriented standards such as HTML and XML

gain importance (see www.w3c.org).

Structural integration: The �rst problem that

goes beyond the purely syntactic level is the

integration of heterogeneous structures. This

problem is solved by mediator systems de�n-

ing mapping rules between di�erent informa-

tion structures (Chawathe et al., 1994).

Semantic integration: In the following, we use

the term semantic integration or semantic

translation, respectively, to denote the resolu-

tion of semantic con icts that disable a one-

to-one mapping between concepts or terms.

Throughout this paper we will focus on this

problem.

We argued that ontologies can be useful to

solve the semantic integration problem (Stucken-

schmidt et al., 1999). In the following we present

a general approach to semantic translation and

discuss the role of ontologies in this approach.

3.1. Semantic Translation as Context Transfor-

mation

Our approach to the semantic integration prob-

lem is based on the view that each information

source serves as a context for the interpretation of

the information contained therein. This view im-

plies that an information entity can only be com-

pletely understood within its source unless we �nd

ways to preserve the contextual information in the

translation process. This claim has two implica-

tions:

1. We have to represent the context of an infor-

mation entity given by its source

2. We have to use this contextual information to

integrate an entity into the new context given

by the target of the translation

We have shown that contextual knowledge of

an information entity can be represented by neces-

sary and suÆcient conditions for deciding whether

an entity belongs to a certain class of objects. Us-

ing these conditions the integration of an entity

in a new context is equivalent with a classi�cation

that is based on its contextual knowledge (Stuck-

enschmidt and Visser, 2000). Details of this ap-

proach are given below.

Page 8: 2 Visser, Stuckenschmidt, Schuster & V€¦ · uster v ogele g @informatik.uni-bremen.de R e c eive d (12/01/1999; evise d 03/20/2000) Abstract. The dev elopmen tto w ards op en geographical

8 Visser, Stuckenschmidt, Schuster & V�ogele

3.1.1. Contextual Knowledge In information

sources contextual knowledge is often hidden in

type information. Most information sources are

based on a data model describing classes, at-

tributes, and relations. Each entity within the

information source is assigned to one of these cat-

egories. We will refer to them as 'concepts' in the

sequel. Depending on the intended use of the in-

formation source each concept is assumed to serve

a special function and to show special properties

necessary for that function. Some of these prop-

erties will explicitly be contained in the informa-

tion source while other properties remain implicit

because there is a silent agreement that a prop-

erty always holds. In order to support semantic

translation we have to explicate these hidden as-

sumptions by de�ning the necessary and suÆcient

conditions an information entity has to ful�ll in

order to belong to that concept.

Necessary Conditions: Concepts are described

by a set of necessary conditions in terms of val-

ues vi for some properties pi. We write pX

ito

denote that the entity X shows property pi. We

claim that there are properties that are character-

istic for a concept and can therefore always be

observed for instances of that class. We write

NC = fp1; � � � ; pmg to denote that the concept

c has necessary conditions p1; � � � ; pm. Assumingthat class and property de�nitions always refer to

the same entity X we get the following equation:

Nc � c(X)) p

X

i ^ � � � ^ pXm (1)

SuÆcient Conditions: On the other hand, we as-

sume that an entity automatically belongs to the

concept c if it shows suÆcient characteristic prop-

erties. We write SC = fp1; � � � ; png to denote thatp1 � � � ; pn are suÆcient conditions indicating that

X belongs to the concept c. We characterize the

class c by the following equation:

Sc � p

X

1^ � � � ^ pX

n) c(X) (2)

The distinction between necessary and suÆcient

conditions for concept membership enables us to

identify entities that de�nitely belong to a concept

because they show all suÆcient conditions. On the

other hand, we can identify entities that clearly

do not belong to the concept because they do not

ful�ll the necessary conditions.

3.1.2. Context Transformation: Concepts iden-

tify common properties of their members by de�n-

ing necessary conditions for a membership. A clas-

si�cation problem is characterized by the determi-

nation of membership relations between an object

and a set of prede�ned concepts. The identi�ca-

tion process starts with data about the object that

has to be classi�ed. This data is provided by so-

called observation. During the classi�cation pro-

cess the observed data is matched against the nec-

essary conditions provided by the class de�nitions

leading to one or more classes. The match be-

tween observations and membership conditions is

performed using knowledge that associates prop-

erties of objects with their class. This view of clas-

si�cation can be formalized in the following way

(Ste�k, 1995):

� Let C be a set of solution classes (in our case

concept predicates fc1; : : : ; cmg)� Let O be a set of Observations (in our case the

necessary conditions for concept membership

fNcjc 2 Cg)� Let R be a set of classi�cation rules (in our

case suÆcient conditions for class membership

fScjc 2 Cg)

Then in principle a classi�cation task is to �nd a

solution class ci 2 C in such a way, that

O ^ R) ci(X) (3)

In terms of the de�nitions given above, seman-

tic translation is equivalent to a re-classi�cation

of entities already classi�ed in a semantic struc-

ture CS = fcS

1; � � � ; cSng using another semantic

structure CT = fcT

1; � � � ; cT

mg. The process of

re-classi�cation can be based upon the semantic

characterizations given by both structures. The

source structure provides the observations (O =

fNcjc 2 CSg), while solution classes and classi�-

cation rules are provided by the target structure

(C = CT; R = fScjc 2 C

T g). Using these de�ni-

tions, a single information entity can be translated

from one context into the other by �nding a con-

cept de�nition cTiin the target structure satisfying

equation 3.

Page 9: 2 Visser, Stuckenschmidt, Schuster & V€¦ · uster v ogele g @informatik.uni-bremen.de R e c eive d (12/01/1999; evise d 03/20/2000) Abstract. The dev elopmen tto w ards op en geographical

Ontologies for Geographic Information Processing 9

3.2. Support for the Translation Process

The considerations from the last section provide

a theoretical foundation for semantic translation.

However, there are still many problems that have

to be solved to make this approach function prop-

erly. The most important question is how and

what kind of context knowledge has to be consid-

ered in the translation process because the choice

of the representation has mayor impacts on the

classi�cation method to choose and the expected

results. Ontologies can play an important role in

the translation process because their ability to ex-

plicate context knowledge provides great support.

In the following we analyze the roles di�erent on-

tologies play in our translation approach and de-

scribe how they support the whole process of in-

formation integration.

3.2.1. The Role of Ontologies: A closer look

at the semantic translation approach described

above reveals that di�erent ontologies are used

for di�erent purposes within the approach. In or-

der to get clear notions of these di�erent roles

we adopt the distinction made in (Jasper and

Uschold, 1999). They distinguish three roles an

ontology can play in an application scenario, each

associated with a level of application:

L0: Operational data

L1: Ontology

L2: Ontology representation language

We will see that each of these roles occurs within

our framework. Each role is �lled by a another

kind of ontology with di�erent extents of explica-

tion according to the speci�c requirements.

Operational Information that should be trans-

lated from one information source to another cor-

responds to L0. We argued that the real task is

to determine the concept an information entity

belongs to in a new context. So we rather trans-

late type annotations than the information entity

itself. This type information already is an ontol-

ogy in the sense of an explicit speci�cation of a

conceptualization because we have to describe the

concepts we want to translate. As a consequence,

we are already concerned with an ontology on the

level of operational data. However, this ontology

does not show a large extend of explication be-

cause it consists of a set of concept terms arranged

in a simple taxonomy.

Speci�cation of Contextual Knowledge is the ba-

sis for the translation of information entities. We

use necessary and suÆcient conditions for con-

cept membership to specify contextual knowledge.

This kind of context explication is a typical appli-

cation of an ontology. The descriptions of neces-

sary and suÆcient conditions is therefore an on-

tology corresponding to level L1. It shows a larger

extent of explication than the pure taxonomy of

concept terms because it explicates the intended

meaning of these terms. Each information source

to be integrated is supposed to be speci�ed by

such an ontology to enable us to use its contex-

tual knowledge in the translation process.

Properties of Concept de�ning necessary and

suÆcient conditions serve as a common vocabu-

lary used to build the ontologies of di�erent infor-

mation sources to be integrated. As such they

can be seen as an ontology representation lan-

guage corresponding to level L2. They have to be

shared across all information sources to enable a

classi�er to check whether conditions are ful�lled.

They explicate a common understanding of a ba-

sic vocabulary that is necessary to explain and ex-

change specialized vocabulary from di�erent infor-

mation sources. The extent of explication required

from an ontology specifying properties largely de-

pends on the complexity of the information to be

translated and requirements on the eÆciency of

the translation. If complex information has to be

translated once more complex property de�nitions

may be used than in the case of simple information

that has to be translated into real-time.

3.2.2. Process and Supporting Technologies In

order to clarify the use of di�erent ontologies we

will now discuss the process of intelligent infor-

mation integration that is implied by our ap-

proach. The process sketched below describes ac-

tors, supporting tools, and knowledge items (i. e.

ontologies) involved. Notice that although the ap-

proach described above translates only between

two sources at a time, it is not limited to bilat-

eral integration because we do not use a hard-

coded translator but a general classi�er that will

be able to integrate every information source own-

ing a suitable semantic annotation.

Page 10: 2 Visser, Stuckenschmidt, Schuster & V€¦ · uster v ogele g @informatik.uni-bremen.de R e c eive d (12/01/1999; evise d 03/20/2000) Abstract. The dev elopmen tto w ards op en geographical

10 Visser, Stuckenschmidt, Schuster & V�ogele

Independent Domain Expert

ModellingTool

Shared Properties

Fig. 2. Authoring of Shared Properties (Step 1)

Authoring of Shared Terminology: Our approach

relies on the use of a shared terminology in terms

of properties used to de�ne di�erent concepts.

This shared terminology has to be general enough

to be used across all information sources to be

integrated but speci�c enough to make meaning-

ful de�nitions possible. Therefore, the shared ter-

minology will normally be built by an indepen-

dent domain expert who is familiar with typical

tasks and problems in a domain, but who is not

concerned with a speci�c information source. As

building a domain ontology is a challenging task

suÆcient tool support has to be provided to build

that ontology. Figure 2 illustrates this process.

A growing number of ontology editors exists

(Duineveld et al., 1999). The choice of a tool has

to be based on the special needs of the domain to

be modeled and the knowledge of the expert.

Annotation of Information Sources: Once a com-

mon vocabulary exists, it can be used to annotate

di�erent information sources. In this case annota-

tion means that the inherent concept hierarchy of

an information source is extracted and each con-

cept is described by necessary and suÆcient condi-

tions using the terminology built in step one. The

result of this annotation process is an ontology of

the information source to be integrated. The an-

ConceptHierarchy

SharedProperties

InformationOntology

Information Owner

AnnotationTool

Fig. 3. Annotation of Information Sources (Step 2)

SourceOntology

Concept Termform Source

Classifier

Concept Term from Target

TargetOntology

Fig. 4. Translation by Re-Classi�cation (Step 3)

notation will normally be done by the owner of an

information source who wants to provide better

access to his information. In order to enable the

information owner to annotate his information he

has to know about the right vocabulary to use.

It will be bene�cial to provide tool support also

for this step. We need an annotation tool with

di�erent repositories of vocabularies according to

di�erent domains of interest. Figure 3 illustrates

the annotation step.

In the case study described later we used the

ontolingua editor (www.ksl.stanford.edu) in order

to build information ontologies from scratch. This

is possible as long as the same group of people

owns both source and target information. How-

ever, in real life scenarios, information sources are

normally completely distributed making annota-

tion support based on property repositories un-

avoidable.

Semantic Translation of Information Entities:

The only purpose of the steps described above was

to lay a base for the actual translation step. The

existence of ontologies for all information sources

to be integrated enables the translator to work on

these ontologies instead of treating real data. This

way of using ontologies as surrogates for informa-

tion sources has already been investigated in the

context of information retrieval (Visser and Stuck-

enschmidt, 1999). In that paper we showed that

the search for interesting information can be en-

hanced by ontologies. Concerning semantic trans-

lation the use of ontologies as surrogates for in-

formation sources enables us to restrict the trans-

lation on the transformation of type information

attached to an information entity by manipulating

concept terms indicating the type of the entity.

Page 11: 2 Visser, Stuckenschmidt, Schuster & V€¦ · uster v ogele g @informatik.uni-bremen.de R e c eive d (12/01/1999; evise d 03/20/2000) Abstract. The dev elopmen tto w ards op en geographical

Ontologies for Geographic Information Processing 11

Figure 4 illustrates this manipulation. The new

concept term describing the type of an informa-

tion entity in the target information source is de-

termined automatically by a classi�er that uses

ontologies of source and target structures as classi-

�cation knowledge. This is possible, because both

ontologies are based on the same basic vocabulary

that has been built in the �rst step of the integra-

tion approach.

4. Geographic Information Sources and

Ontologies, an Example

An example of the used geographical information

sources and their description with Ontolingua will

be given in this section. We use two catalogue sys-

tems, namely the German ATKIS-OK-250 (AdV,

1998) and the European CORINE land cover cata-

logue (EEA, 1997-1999). The vegetation ontology

will be used for the de�nition of primitives (e. g.

forest-trees, forest-plants, grass).

4.1. Ontologies for ATKIS, CORINE and Vege-

tation

4.1.1. The ATKIS-OK-250 Catalogue: ATKIS

(Amtliches Topographisch-Kartographisches In-

formationssystem) is an oÆcial information sys-

tem in Germany. It is a project of the head survey-

ing oÆces of all the German states. The working

group o�ers digital landscape models (e. g. DLM

250, 1:250 000) with a detailed documentation in

the object catalogue OK-250. This catalogue is

the basis for our description. The ontology for

our concept forest consists of Classes, Functions

and Instances. One class is the following:

;;; ------------------ Classes --------------

;;; Forest

(Define-Okbc-Frame Forest

:Direct-Superclasses (Vegetation-Area)

:Direct-Types (Class Primitive)

:Own-Slots ((Arity 1))

:Sentences

((=> (Forest ?X0)

(Or (Has-Vegetation ?X0 Forest-Plants)

(And (Has-Vegetation ?X0 Grass)

(Is-Cultivated ?X0 1))))

(=> (Forest ?X0)

(> (Size-In-Hectares ?X0) 10)))

:Template-Facets

((Size-In-Hectares (Numeric-Minimum 10))))

We see that the class forest is a subclass from

vegetation area. We also see that there is an in-

ternal rule which says that a thing is a forest if it

has forest-plants or cultivated grass as vegetation.

In addition, the size of the area has to be at least

10 hectares.

;;; ------------------ Instance --------------

;;; Stadtwald-1990

(Define-Okbc-Frame Stadtwald-1990

:Direct-Types (Forest)

:Own-Slots ((Is-Cultivated 0) (Is-Cultivated)

(Has-Vegetation Forest-Trees)

(Size-In-Hectares 25)))

;;; Weidedamm3-1990

(Define-Okbc-Frame Weidedamm3-1990

:Direct-Types (Forest)

:Own-Slots ((Is-Cultivated 0) (Is-Cultivated)

(Has-Vegetation Forest-Plants)

(Size-In-Hectares 12)))

The instances Stadtwald 1990 andWeidedamm3-

1990 show the state of a particular area in 1990.

The Stadtwald is not cultivated but has forest-

trees and is an area of 25 hectares while the Wei-

dedamm3 is not cultivated, has forest-plants, and

measures twelve hectares.

4.1.2. CORINE land cover: From 1985 to

1990, the European Commission carried out the

CORINE Programme (Co-ordination of Informa-

tion on the Environment). The results are essen-

tially of three types, corresponding to the three

aims of the Programme: (a) an information sys-

tem on the state of the environment in the Euro-

pean Community has been created (the CORINE

system). It is composed of a series of data bases

describing the environment in the European Com-

munity, as well as of data bases with background

information. (b) Nomenclatures and methodolo-

gies were developed for carrying out the programs,

which are now used as the reference in the ar-

eas concerned at the Community level. (c) A

systematic e�ort was made to concert activities

with all the bodies involved in the production of

environmental information especially at interna-

tional level. As a result of this activity, and in-

deed of the whole programs, several groups of in-

Page 12: 2 Visser, Stuckenschmidt, Schuster & V€¦ · uster v ogele g @informatik.uni-bremen.de R e c eive d (12/01/1999; evise d 03/20/2000) Abstract. The dev elopmen tto w ards op en geographical

12 Visser, Stuckenschmidt, Schuster & V�ogele

ternational scientists have been working together

towards agreed targets. They now share a pool

of expertise on various themes of environmental

information.

This nomenclature with its 44 classes is the ba-

sis for our description. In order to demonstrate

the hierarchy we use a tree to describe parts of

the ontology (see �g. 6):

Here we see that a forest has forest-plants and is

at least 25 ha big because the superclass of forest

is Forests-And-Semi-Natural-Areas and then area.

The minimum size in hectares of an area is 25 (see

facet in class area). Please note that according to

the CORINE nomenclature sport and leisure facil-

ities are arti�cial-non-agricultural-vegetated-areas

which themselves consists of sod grass as vegeta-

tion. Let's de�ne some instances for this ontology:

;;; ------------------ Instance --------------

;;; Pauliner-Marsch

(Define-Okbc-Frame Pauliner-Marsch

:Direct-Types (Sport-And-Leisure-Facilities)

:Own-Slots ((Size-In-Hectares 40)

(Size-In-Hectares)))

;;; Stadtwald-2000

(Define-Okbc-Frame Stadtwald-2000

:Direct-Types (Mixed-Forest)

:Own-Slots ((Size-In-Hectares 25)

(Is-Cultivated 0)

(Has-Vegetation Forest-Trees)))

;;; Weidedamm3-2000

(Define-Okbc-Frame Weidedamm3-2000

:Direct-Types (Discontinuous-Urban-Fabric)

:Own-Slots ((Size-In-Hectares 30)

(Is-Cultivated 1)))

This is the information we would get out of

an classi�ed satellite image. There is a Stadt-

wald 2000 instance which is a type of mixed forest

with 25 ha, not cultivated and forest trees. There

is also the Pauliner Marsch, a sports-and-leisure

facility according to CORINE land cover with 40

ha.

4.1.3. Vegetation: If we want to match or pro-

cess the knowledge from the above described on-

tologies we either have to de�ne another ontology

which matches the concepts or we use a domain

ontology which will act as a fundament for the

two ontologies (see also �gure 6). In this ontology

the primitives such as plants, soil-type etc. are

de�ned. Please note that we show a part of the

ontology as a tree for better understanding (�gure

5).

We de�ne forest trees as forest plants and those

as plants. Also, Sod grass is grass and grass is

plants. Special cultures such as vine or hop are

also de�ned as plants.

4.2. Flexible Retrieval of Geographic Informa-

tion

In this chapter we show how the above mentioned

methods can be used for exible retrieval of geo-

graphical information. We mentioned in section

3 how information can be retrieved in general

and that ontology-based information retrieval of-

fers bene�ts for this process. In order to show

how this works we come back to our concepts for-

est within the ATKIS-OK-250 and CORINE land

cover catalogues.

The use of ontologies gives us two options: (a)

integrated views and (b) veri�cation. An inte-

grated view from the users perspective merges the

data between the catalogues. This process can be

seen as two layers which lay on top of each other.

The view needs a third ontology with axioms for

the translation process between the concepts. The

second option gives users the opportunity to ver-

ify ATKIS-OK-250 data with CORINE land cover

data or vice versa.

A query interface { this could be an intelligent

dialogue within a GIS system { sends its request to

an inference engine. The inference engine builds

up the actual knowledge base by using the ontolo-

gies of the concepts. The interesting part of the

whole idea is that the inference engine can infer on

the actual knowledge base and is therefore able to

GrassSod-Grass

Pasture-...

Grass Special-CultureForest-Plants

Forest-Trees

Plants

...

... ...

... ...

Fig. 5. Part of the vegetation ontology

Page 13: 2 Visser, Stuckenschmidt, Schuster & V€¦ · uster v ogele g @informatik.uni-bremen.de R e c eive d (12/01/1999; evise d 03/20/2000) Abstract. The dev elopmen tto w ards op en geographical

Ontologies for Geographic Information Processing 13

is_cultivated = yes

Forests-And-Semi-Natural-Areas

Vegetated-AreaArtificial-Non-Agricultural-

Sport-And-Leisure-Facilities

Green-Urban-Areas

Discontinuous-Urban-Fabric

Broad-leavedForest

has_vegetationforest_plants

...

... ...

size > 24 ha

Area

Artificial-Surfaces Level 1

Urban-Fabric Forests Level 2

Mixed-Forest Level 3

......

...

sod_grasshas_vegetation

Fig. 6. Part of the CORINE land cover ontology

derive new knowledge which can be used for fur-

ther questions.

A typical problem within the planning process

of authorities is the use of heterogeneous data

categorized as ATKIS-OK-250 and CORINE land

cover satellite pictures (see �gure 7). In order to

map/exchange data between these two catalogues

we �rst look at the ontologies which are partly

described on page 11 (ATKIS-OK-250) and in �g-

ure 6 (CORINE land cover). As theorem prover

we use a PROLOG system such as SWI-Prolog

(Wielemaker, 1998).

A simple query to an inference engine could

be: "What is the superclass of 'Sod grass'?" If

we would use a PROLOG system we would put

the query: sub-class Of(Sod grass,X) and we would

get the solution grass. We would get all solutions

and the complete class hierarchy if we query sub-

S

CORINE

Vegetation

Solution ?

ATKIS-

proverOntology

Theorem-

Forests...Objecttype

mapSatellite-picture

ATKIS CORINELandcover

Class

Data structure Data structure

OntologyATKIS

domain ontologies, such as plants, soiltype etc

Fig. 7. Deductive Integration of Geographic Information

class Of(X,Y). More complex queries require more

complex representations. We use the following ax-

ioms to describe the concept forest. There are two

possibilities:

1. The size in hectares must be greater than 10

and the vegetation has to be forest plants.

The concept forest pants is de�ned in the on-

tology vegetation.

2. The size in hectares must be greater than 10

and the vegetation has to be grass. The con-

cept grass is also de�ned in the ontology veg-

etation. In addition, the vegetation has to be

cultivated.

We denoted the rules in PROLOG syntax:

1. forest(X) :-

2. size In Hectares(X, Y), Y > 10,

3. has Vegetation(X, Z),

4. a kind of(Z, forest Plants).

5. forest(X) :-

6. size In Hectares(X, Y), Y > 10,

7. has Vegetation(X, Z),

8. a kind of(Z, grass),

9. is Cultivated(X, true).

An example for an integrated view would be

the following scenario: The user wants to see

the development of the forests within in certain

area over the last years. He uses ATKIS-OK-

250 data within his GIS and wants to verify the

data with actual satellite images. He gets classi-

�ed CORINE land cover data and is seeking for

the equivalent of forest in this catalogue (see �g.

Page 14: 2 Visser, Stuckenschmidt, Schuster & V€¦ · uster v ogele g @informatik.uni-bremen.de R e c eive d (12/01/1999; evise d 03/20/2000) Abstract. The dev elopmen tto w ards op en geographical

14 Visser, Stuckenschmidt, Schuster & V�ogele

6. The theorem prover derives the answer to this

question in building up the KB. The query would

be forest(stadtwald 2000). The following shows the

path through the KB (traced). An Exit marks a

return with a Yes.

Call: forest(stadtwald_2000)

Call: size_In_Hectares(stadtwald_2000, _L144)

Exit: size_In_Hectares(stadtwald_2000, 25)

Call: 25>10

Exit: 25>10

Call: has_Vegetation(stadtwald_2000, _L145)

Exit: has_Vegetation(stadtwald_2000, forest_Trees)

Call: a_kind_of(forest_Trees, forest_Plants)

Call: class(forest_Trees)

Exit: class(forest_Trees)

Call: class(forest_Plants)

Exit: class(forest_Plants)

Call: forest_Trees=forest_Plants

Fail: forest_Trees=forest_Plants

Redo: a_kind_of(forest_Trees, forest_Plants)

Call: subclass_Of(forest_Trees, forest_Plants)

Exit: subclass_Of(forest_Trees, forest_Plants)

Exit: a_kind_of(forest_Trees, forest_Plants)

Exit: forest(stadtwald_2000)

We can see that the theorem prover �rst checks

if the size is bigger than 10. It knows (through

the instance entered by the user) that the stadt-

wald 2000 has forest trees. The system is seek-

ing for forest trees and concludes that a forest tree

is a forest plant. This matches the �rst prolog

rule mentioned above and therefore the answer to

the query is Yes. The user now checks whether

the area Weidedamm III which according to the

ATKIS-OK-250 catalogue was a forest in 1990 is

still a forest. He checks this with the help of actual

satellite images in CORINE land cover format.

The query would be forest(weidedamm3 2000) and

the answer can be seen here:

Call: forest(weidedamm3_2000)

Call: size_In_Hectares(weidedamm3_2000, _L144)

Exit: size_In_Hectares(weidedamm3_2000, 30)

Call: 30>10

Exit: 30>10

Call: has_Vegetation(weidedamm3_2000, _L145)

Fail: has_Vegetation(weidedamm3_2000, _L145)

Fail: forest(weidedamm3_2000)

As we see the query fails because the slot

has vegetation fails. This is because there is no

vegetation on this area anymore, the satellite

image was classi�ed Discontinuous Urban Fabric

within the CORINE catalogue. The results pre-

sented are not very surprising, because most of

the conditions for membership to the concept for-

est were directly met by the instance 'stadtwald'

and the missing of vegetation in the instance 'wei-

dedamm' is also a criterion easy to check. Nev-

ertheless, we want to show that the ontological

foundation enables us to perform reasoning that

produces results that are not obvious and require

some additional knowledge. The ATKIS-OK-250

ontology gives two possible de�nitions of the con-

cept forest. we will use the second one to deduce

that the so-called Pauliner Marsch is also a for-

est according to that ontology. The only informa-

tion we have is that it is a member of the con-

cept Sport And Leisure Facilities taken from the

CORINE land cover ontology. To classify this area

as a 'forest' we need background knowledge about

vegetation and cultivation of sport and leisure fa-

cilities. This background knowledge can also be

speci�ed using PROLOG clauses:

1. is Cultivated(X, true) :-

2. instance Of(X, Y),

3. a kind of(Y, arti�cial Surfaces).

4. has Vegetation(X, sod Grass) :-

5. instance Of(X, Y),

6. a kind of(Y, sport And Leisure Facilities).

The clauses attach characterizing proper-

ties to the concepts arti�cial surfaces and

Sport and Leisure Facilities. These properties are

inherited by the instances of the subconcepts

thereby completing the properties needed to clas-

sify the instance under consideration as being a

member of the concept forest. The trace of the

PROLOG reasoner illustrates this:

Call: forest(pauliner_Marsch)

Call: size_In_Hectares(pauliner_Marsch, _L193)

Exit: size_In_Hectares(pauliner_Marsch, 40)

Call: 40>10

Exit: 40>10

Call: has_Vegetation(pauliner_Marsch, _L194)

Call: instance_Of(pauliner_Marsch, _L206)

Exit: instance_Of(pauliner_Marsch,

sport_And_Leisure_Facilities)

Call: a_kind_of(sport_And_Leisure_Facilities,

Page 15: 2 Visser, Stuckenschmidt, Schuster & V€¦ · uster v ogele g @informatik.uni-bremen.de R e c eive d (12/01/1999; evise d 03/20/2000) Abstract. The dev elopmen tto w ards op en geographical

Ontologies for Geographic Information Processing 15

sport_And_Leisure_Facilities)

...

Exit: a_kind_of(sport_And_Leisure_Facilities,

sport_And_Leisure_Facilities)

Exit: has_Vegetation(pauliner_Marsch, sod_Grass)

Call: a_kind_of(sod_Grass, grass)

...

Exit: a_kind_of(sod_Grass, grass)

After comparing the size of the instance with

the required size, the vegetation is checked. As

there is no vegetation de�ned for the instance,

the mentioned axiom is used to deduce that all

instances of that class have Sod Grass as a vege-

tation. Just as in the examples above, the system

is able to identify sod grass as being a kind of

grass. The fact that the 'Pauliner Marsch' is cul-

tivated is deduced in a similar way as shown in

the trace below.

Call: is_Cultivated(pauliner_Marsch, true)

Call: instance_Of(pauliner_Marsch, _L252)

Exit: instance_Of(pauliner_Marsch,

sport_And_Leisure_Facilities)

Call: a_kind_of(sport_And_Leisure_Facilities,

artificial_Surfaces)

...

Redo: a_kind_of(artificial_Non_Agricultural_

Vegetated_Area,

artificial_Surfaces)

Call: subclass_Of(artificial_Non_Agricultural_

Vegetated_Area,_L276)

Exit: subclass_Of(artificial_Non_Agricultural_

Vegetated_Area,

artificial_Surfaces)

Call: a_kind_of(artificial_Surfaces,

artificial_Surfaces)

...

Exit: a_kind_of(artificial_Surfaces,

artificial_Surfaces)

Exit: a_kind_of(artificial_Non_Agricultural_

Vegetated_Area,

artificial_Surfaces)

Exit: a_kind_of(sport_And_Leisure_Facilities,

artificial_Surfaces)

Exit: is_Cultivated(pauliner_Marsch, true)

Exit: forest(pauliner_Marsch)

4.3. Results of Example

This example still does by no means cover all pos-

sibilities of ontology-based information integra-

tion. We restricted ourselves to simple taxonom-

ical reasoning using only the second order predi-

cates class, subclass Of and instance Of. One can

imagine to make use of other concepts like range

restrictions on slots or mathematical properties of

relations.

This example shows that mapping between two

catalogue systems can be done by using ontologies

for the description of the data. We describe con-

cepts with what we call 'primitives', basic items

of concepts, e. g. a tree or grass. The only require-

ment is that there must be a shared vocabulary,

meaning that the same primitives have to be used

in the catalogue systems. However, the composi-

tion of these primitives can be di�erent.

5. Discussion

In this paper we demonstrated how the use of

formal ontologies can enhance intelligent informa-

tion retrieval. We showed that ontologies with

formal semantics can help to generate semantic

translators between data sources. There are sev-

eral ways to translate one data source into an-

other, but the bene�ts of using underlying ontolo-

gies and an additional inference engine with the

ability to derive new knowledge are obvious. We

outlined the advantages of ontologies and stated

that their formal semantics can help to support

the semi-automatical translation process between

data sources.

We noted that adding new knowledge to an on-

tology is easier than adding knowledge to semi-

structured meta-data. The formal description

helps us to �nd errors (e. g. units, scales). On-

tologies therefore help to improve the data qual-

ity. Ontologies are more exible because we can

use them not only for static information but also

for brokering functional components (Benjamins

et al., 1999).

In addition, we can think about the integration

of other sources. The satellite picture for instance

could be pre-processed with advanced image op-

erators (e. g. for texture, edges and colors). This

additional knowledge could be transferred into a

knowledge base semi-automatically and could act

as an additional source for potential queries.

We believe that our approach can be seen as

a step towards the semantic translation problem.

Annotation of domain knowledge to data means

Page 16: 2 Visser, Stuckenschmidt, Schuster & V€¦ · uster v ogele g @informatik.uni-bremen.de R e c eive d (12/01/1999; evise d 03/20/2000) Abstract. The dev elopmen tto w ards op en geographical

16 Visser, Stuckenschmidt, Schuster & V�ogele

�nd, acquire and represent the knowledge. As

this is also a time consuming task which has to be

done by domain experts and knowledge engineers

it seems diÆcult to foresee whether this direction

will be followed by organizations. However, if sci-

entists are able to provide suÆcient tools which

are easy to use this can help to overcome the ob-

stacles.

References

1. M. Aben. Formally specifying re-usable knowledge

model components. Knowledge Acquisition Journal,

5:119{141, 1993.

2. AdV. Amtliches Topographisch-Kartographisches In-

formationssystem ATKIS. Landesvermessungsamt

NRW, Bonn, 1998.

3. V.R. Benjamins, B. Wielinga, J. Wielemakers, and

D. Fensel. Towards brokering problem-solving knowl-edge on the internet. In D. Fensel and R. Studer, ed-

itors, Knowledge Acquisition, Modeling and Mnage-

ment, volume 1621 of Lecture Notes in Arti�cial In-

telligence. Springer, 1999.

4. Bergamashi, Castano, Vincini, and Beneventano. In-

telligent techniques for the extraction and integration

of heterogeneous information. In Workshop Intelli-

gent Information Integration, IJCAI 99, Stockholm,

Sweden, 1999.

5. A. Berre. The it standards approach to gis interoper-

ability. Tutorial T2 of the 2nd International Confer-

ence on Interoperating Geographic Information Sys-

tems, 1999.

6. M. Bock, K. Greve, and W. Kuhn, editors. Of-

fene Umweltinformationssysteme { Chancen und

M"oglichkeiten der OpenGIS-Entwicklung im

Umweltbereich, volume 7 of IFGI prints, M�unster,

feb 1999. Institut f�ur Geoinformatik, Universit�atM�unster.

7. A. Borgida and P.F. Patel-Schneider. A semantics

and complete algorithm for subsumption in the classic

description logic. JAIR, 1:277{308, 1994.

8. R.J. Brachman. What's in a concept: Structural foun-

dations for semantic nets. International Journal of

Man-Machine Studies, 9:127{152, 1977.

9. T. Bray, J. Paoli, and C.M. Sperberg-McQueen. Ex-

tensible markup language (xml) 1.0. Technical Report

REC-rdf, W3C, 1998.

10. D. Brickley, R. Guha, and A. Layman. Ressource de-

scription framework schema speci�cation. Technical

Report PR-rdf-schema, W3C, 1998.

11. S. Chawathe, H. Garcia-Molina, J. Hammer, K. Ire-

land, Y. Papakonstantinou, J. Ullman, and J. Widom.

The TSIMMIS Project: Integration of Heterogeneous

Information Sources. In Proceedings of IPSJ Confer-

ence, pages 7{18, 1994.

12. P. P.-S. Chen. The entity relationship model { toward

a un�ed view of data. ACM Transactions on Database

Systems, (1):9 { 36, 1976.

13. DISGIS-Project. Distributed geographical in-formation systems (disgis). White paper,

http://www.disgis.com/White 1.htm/, jul 1999.

(ESPRIT IV 22.084).14. A.J. Duineveld, R. Stoter, M.R. Weiden, B. Kenepa,

and V.R. Benjamins. Wondertools? a comparative

study of ontological engineering tools. In Proceed-

ings of the 12th Ban� Knowledge Acquisition for

Knowledge-Based Systems Workshop [19].15. EEA. Corine land cover. technical guide, European

Environmental Agency, ETC/LC, European Topic

Centre on Land Cover, 1997-1999.16. D. Fensel. Intelligent information integration. In Pro-

ceedings of the IJCAI'99 Workshop, Stockholm, Swe-

den, 1999.17. D. Fensel and F. van Harmelen. A comparison of

languages which operationalise and formalise KADS

models of expertise. The Knowledge Engineering Re-

view, 9:105{146, 1994.18. J. Fulton. Semantic plug and play. In Proceedings

of the Joint Workshop on Standards for the Use of

Models that De�ne the Data and Processes of Infor-

mation Systems, Seattle, WA, 1996.19. B. Gaines, R. Kremer, and M. Musen. Proceedings of

the 12th ban� knowledge acquisition for knowledge-

based systems workshop. Technical report, University

of Calgary/Stanford University, 1999.20. H. Galhardas, Eric Simon, and Anthony Tomasic. A

framework for classifying environmental metadata. In

AAAI, Workshop on AI and Information Integration,

Madison, WI, 1998.21. M.R. Genesereth and R.E. Fikes. Knowledge inter-

change format version 3.0 reference manual. Report of

the Knowledge Systems Laboratory KSL 91-1, Stan-

ford University, 1992.22. O. G�unther. Environmental Information Systems.

Springer, Berlin, 1998.23. Object Managemant Group. Omg uni�ed modeling

language speci�cation uml v1.3. Document ad/99-06-

08, Object Management Group (OMG), 1999.24. T. Gruber. Ontolingua: A mechanim to support

portable ontologies. KSL Report KSL-91-66, Stan-

ford University, 1991.25. T.R. Gruber. A translation approach to portable on-

tology speci�cations. Knowledge Acquisition, 5(2),

1993.26. N. Guarino and P. Giaretta. Ontologies and knowl-

edge bases: Towards a terminological clari�cation.

In N. Mars, editor, Towards Very Large Knowledge

Bases: Knowledge Building and Knowledge Sharing,

pages 25{32. Amsterdam, 1995.27. M. Huber. High-tech-entscheidungstrends bei

geodaten-servern. GeoBit, 98(3):18{20, 1998.28. R. Jasper and M. Uschold. A framework for under-

standing and classifying ontoogy applications. In Pro-

ceedings of the 12th Ban� Knowledge Acquisition for

Knowledge-Based Systems Workshop [19].29. W. Kim and J. Seo. Classifying schematic and data

heterogeinity in multidatabase systems. IEEE Com-

puter, 24(12):12{18, 1991.30. D.B. Lenat. The dimensions of context space. Avail-

able on the web-site of the Cycorp Corporation.

(http://www.cyc.com/publications), 1998.31. FMEr Ltd. Semantic translation. White paper,

http://safe.com/whitepaper o.htm, jul 1999.

Page 17: 2 Visser, Stuckenschmidt, Schuster & V€¦ · uster v ogele g @informatik.uni-bremen.de R e c eive d (12/01/1999; evise d 03/20/2000) Abstract. The dev elopmen tto w ards op en geographical

Ontologies for Geographic Information Processing 17

32. Enrico Motta. Reusable Components for Knowledge

Models. PhD thesis, KMI, The Open University,

United Kingdom, 1997.

33. OGC. The opengisTM abstract speci�cation. Tech-nical Report 99-100r1.doc, Open GIS Consortium,

1999. 1999a.

34. OGC. Topic 2: Spatial reference systems. TechnicalReport 99-100r1.doc, Open GIS Consortium, 1999.

1999b.

35. J. Rumbaugh, M. Blaha, W. Premerlani, F. Eddy, andW. Lorensen. Object-Oriented Modeling and Design.

Prentice Hall International, Inc., Englewood Cli�s,

New Jersey, 1991.

36. F. Saltor and E. Rodriguez. On intelligent access

to heterogeneous information. In Proceedings of

the 4th Workshop Knowledge Representation Meets

Databases (KRDB '97), Athens, Greece, 1997.

37. A. Th. Schreibener, B. Wielinga, H. Akkermans, W.

van de Velde, and A. Anjewierden. Cml the com-

monkads conceptual modeling language. In Steels

et al., editor, A Future of Knowledge Acquisition,

Proc. 8th European Knowledge Acquisition Workshop

(EKAW 94), number 867 in Lecture Notes in Arti�-

cial Intelligence. Springer, 1994.

38. M. Ste�k. Introduction to Knowledge Systems. Mor-

gan Kaufman, San Francisco, California, 1995.

39. H. Stuckenschmidt and U. Visser. Semantic transla-

tion based on approximate re-classi�cation. In Pro-

ceedings of the Workshop on Semantic Approxima-

tion, Granularity and Vagueness at KR 2000, 2000.Accepted.

40. H. Stuckenschmidt, U. Visser, G. Schuster, and

T. V�ogele. Ontologies for geographic information inte-gration. In Visser and Pundt, editors, Proceedings of

the Workshop "Intelligent Methods in Environmen-

tal Protection: Special Aspects of Processing in Space

and Time, 13. International Symposium of Computer

Science for Environmental Protection (UI 99), num-ber 5/99 in Research reports of the Department of

Mathematics and Computer Science, University of

Bremen. University of Bremen, 1999.

41. M. Uschold and M. Gruninger. Ontologies: Princi-

ples, methods and applications. Knowledge Engineer-

ing Review, 11(2), 1996.

42. F. van Harmelen and D. Fensel. Practical knowledge

representation for the web. In D. Fensel, editor, Pro-

ceedings of the IJCAI'99 Workshop on Intelligent In-

formation Integration, 1999.

43. G. van Heijst, A.T. Schreiber, and B.J. Wielinga.

Using explicit ontologies for kbs development. In-

ternational Journal of Human-Computer Studies,

46(2/3):183{292, 1997.

44. A. Vckovski, K.E. Brassel, and H.-J. Schek, editors.

Proceedings of the 2nd International Conference on

Interoperating Geographic Information Systems, vol-

ume 1580 of Lecture Notes in Computer Science,

Z�urich, 1999. Springer.

45. U. Visser and H. Stuckenschmidt. Intelligent,

location-dependent acquisition and retrieval of envi-

ronmental information. In M. rumor, editor, Informa-

tion Technology in the Service of Local Government

Planning and Management. The Urban Data Man-

agement Society, 1999.

46. W3C. Resource descrition framework (rdf) schema

speci�cation. http://www.w3.org/TR/PR-rdf-

schema, mar 1999. W3C Proposed Recommendation.

47. J. Wielemaker. Swi-prolog 3.1. Reference manual,

Univ. of Amsterdam, Dept. of Social Science Infor-

matics (SWI), 1998.

48. J.L. Wiener, H. Gupta, W.J. Labio, Y. Zhuge,

H. Garcia-Molina, and J. Widom. Whips: A system

prototype for warehouse view maintenance. In Work-

shop on materialized views, pages 26{33, Montreal,Canada, 1996.