12
60 1094-7167/01/$10.00 © 2001 IEEE IEEE INTELLIGENT SYSTEMS T h e S e m a n t i c W e b Creating Semantic Web Contents with Protégé-2000 Natalya F. Noy, Michael Sintek, Stefan Decker, Monica Crubézy, Ray W. Fergerson, and Mark A. Musen, Stanford University B ecause we can process only a tiny fraction of information available on the Web, we must turn to machines for help in processing and analyzing its contents. With current technology, machines cannot understand and interpret the meaning of the infor- mation in natural-language form, which is how most Web information is represented today. We need a Semantic Web to express informa- tion in a precise, machine-interpretable form, so soft- ware agents processing the same set of data share an understanding of what the terms describing the data mean. 1 Consequently, we’ve recently seen an explosion in the number of Semantic Web languages devel- oped. Because researchers and developers haven’t yet reached a consensus on which language is the most suitable, which features each language must have, or which syntax is the most appropriate, we are likely to see even more languages emerge. We need to develop tools that will let us experiment with these new languages so we can compare their expressive- ness and features, change language specifications, and select a suitable language for a specific task. In this article, we describe Protégé-2000, a graph- ical tool for ontology editing and knowledge acqui- sition that we can adapt to enable conceptual model- ing with new and evolving Semantic Web languages. Protégé-2000 lets us think about domain models at a conceptual level without having to know the syntax of the language ultimately used on the Web. We can concentrate on the concepts and relationships in the domain and the facts about them that we need to express. For example, if we are developing an ontol- ogy of wines, food, and appropriate wine–food com- binations, we can focus on Bordeaux and lamb instead of markup tags and correct syntax. Naturally, designing a new tool specifically for a new language could be better than adapting an exist- ing tool. We can offer several reasons, however, for adapting an existing tool at the stage where no sin- gle language has emerged as the winner. First, we can experiment with emerging languages without committing enormous amounts of resources to cre- ating tools that are custom-tailored for these lan- guages—only to decide later that the languages are not suitable. Second, Protégé-2000 already provides considerable functionality that a new tool will need to replicate, both at the modeling and user-interface levels. Third, using different customizations of the same tool to edit ontologies in different languages gives us most of the translation among the models in the languages “for free.” Translating a model from one language to another becomes as easy as select- ing a “save as…” item from a menu. Semantic Web languages AI researchers have used ontologies for a long time to express formally a shared understanding of information. An ontology is an explicit specification of the concepts in a domain and the relations among them, which provides a formal vocabulary for infor- mation exchange. Specific instances of the concepts defined in the ontologies—instance data—paired with ontologies constitute the basis of the Semantic Web. In recent experiments to prototype the Seman- tic Web, members of different communities with dif- ferent backgrounds and goals in mind have created a multitude of languages for representing ontologies and instance data on the Web (see Table1). Typically, a Semantic Web language for describing ontologies and instance data contains a hierarchical description As researchers continue to create new languages in the hope of developing a Semantic Web, they still lack consensus on a standard. The authors describe how Protégé- 2000—a tool for ontology development and knowledge acquisition—can be adapted for editing models in different Semantic Web languages.

Creating Semantic Web Contents with Protégé-2000...Natalya F. Noy, Michael Sintek, Stefan Decker, Monica Crubézy, Ray .W Fergerson, and Mark A. Musen, Stanford University Because

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Creating Semantic Web Contents with Protégé-2000...Natalya F. Noy, Michael Sintek, Stefan Decker, Monica Crubézy, Ray .W Fergerson, and Mark A. Musen, Stanford University Because

60 1094-7167/01/$10.00 © 2001 IEEE IEEE INTELLIGENT SYSTEMS

T h e S e m a n t i c W e b

Creating Semantic Web Contents with Protégé-2000Natalya F. Noy, Michael Sintek, Stefan Decker, Monica Crubézy, Ray W. Fergerson, andMark A. Musen, Stanford University

B ecause we can process only a tiny fraction of information available on the Web,

we must turn to machines for help in processing and analyzing its contents. With

current technology, machines cannot understand and interpret the meaning of the infor-

mation in natural-language form, which is how most Web information is represented

today. We need a Semantic Web to express informa-tion in a precise, machine-interpretable form, so soft-ware agents processing the same set of data share anunderstanding of what the terms describing the datamean.1

Consequently, we’ve recently seen an explosionin the number of Semantic Web languages devel-oped. Because researchers and developers haven’tyet reached a consensus on which language is themost suitable, which features each language musthave, or which syntax is the most appropriate, we arelikely to see even more languages emerge. We needto develop tools that will let us experiment with thesenew languages so we can compare their expressive-ness and features, change language specifications,and select a suitable language for a specific task.

In this article, we describe Protégé-2000, a graph-ical tool for ontology editing and knowledge acqui-sition that we can adapt to enable conceptual model-ing with new and evolving Semantic Web languages.Protégé-2000 lets us think about domain models at aconceptual level without having to know the syntaxof the language ultimately used on the Web. We canconcentrate on the concepts and relationships in thedomain and the facts about them that we need toexpress. For example, if we are developing an ontol-ogy of wines, food, and appropriate wine–food com-binations, we can focus on Bordeaux and lambinstead of markup tags and correct syntax.

Naturally, designing a new tool specifically for anew language could be better than adapting an exist-ing tool. We can offer several reasons, however, for

adapting an existing tool at the stage where no sin-gle language has emerged as the winner. First, wecan experiment with emerging languages withoutcommitting enormous amounts of resources to cre-ating tools that are custom-tailored for these lan-guages—only to decide later that the languages arenot suitable. Second, Protégé-2000 already providesconsiderable functionality that a new tool will needto replicate, both at the modeling and user-interfacelevels. Third, using different customizations of thesame tool to edit ontologies in different languagesgives us most of the translation among the modelsin the languages “for free.” Translating a model fromone language to another becomes as easy as select-ing a “save as…” item from a menu.

Semantic Web languagesAI researchers have used ontologies for a long

time to express formally a shared understanding ofinformation. An ontology is an explicit specificationof the concepts in a domain and the relations amongthem, which provides a formal vocabulary for infor-mation exchange. Specific instances of the conceptsdefined in the ontologies—instance data—pairedwith ontologies constitute the basis of the SemanticWeb. In recent experiments to prototype the Seman-tic Web, members of different communities with dif-ferent backgrounds and goals in mind have createda multitude of languages for representing ontologiesand instance data on the Web (see Table1). Typically,a Semantic Web language for describing ontologiesand instance data contains a hierarchical description

As researchers continue

to create new languages

in the hope of developing

a Semantic Web, they still

lack consensus on a

standard. The authors

describe how Protégé-

2000—a tool for

ontology development

and knowledge

acquisition—can be

adapted for editing

models in different

Semantic Web

languages.

Page 2: Creating Semantic Web Contents with Protégé-2000...Natalya F. Noy, Michael Sintek, Stefan Decker, Monica Crubézy, Ray .W Fergerson, and Mark A. Musen, Stanford University Because

of important concepts in a domain (classes).Individuals in the domain are instances ofthese classes, and properties (slots) of eachclass describe various features and attributesof the concept. Logical statements describerelations among concepts. For example, con-sider an ontology describing wines, food, andappropriate wine–food combinations. Someof the classes describing this domain are Wine,Wineries, and different types of Food. Someproperties of the Wine class include the wine’sflavor, body, sugar level, and the winery that pro-duced it.

These notions are present in many Seman-tic Web languages existing today includingSHOE, Topic Maps, XOL, RDF and RDFS,and DAML+OIL.

The SHOE (Simple HTML OntologyExtensions) language, developed at the Uni-versity of Maryland, introduces primitives todefine ontology and instance data on Webpages. Classes are called categories in SHOE.Categories constitute a simple is-a hierarchy,and slots are binary relations. SHOE alsoallows relations among instances or instancesand data to have any number of arguments(not just binary relations). Horn clausesexpress intensional definitions in SHOE.

The Hytime community developed TopicMaps, a recent ISO standard (ISO/IEC13250). Topic Maps aim to annotate docu-ments with conceptual information. Topicscorrespond to classes in other ontology lan-guages and can be linked to documents. Top-ics are instances of Topic Types (other topics),which can be related to one another with Asso-ciations. Associations correspond closely toslots in other ontology languages. Associa-tions belong to Association Types, which areagain Topics. Topic Maps do not have a spe-cialized primitive for representing instances.Any instance of a topic type can act as a topictype itself.

The bioinformatics community designedXOL for the exchange of ontologies in thefield of bioinformatics. It evolved to becomea general language for interchange of ontol-ogy and instance data. Being an interchangelanguage, XOL includes primitives found inmany knowledge-representation systems,object databases, and relational databases. Itprovides means to define classes, a class hier-archy, slots, facets, and instances.

RDF (resource description framework)provides a graph-based data model, consist-ing of nodes and edges. Nodes correspond to

objects or resources and the edges to prop-erties of these objects. The labels on thenodes and edges are Uniform Resource Iden-tifiers (URIs). However, RDF itself does notdefine any primitives for creating ontologies.It is the basis for several other ontology-def-inition languages such as RDFS andDAML+OIL.

RDF Schema2 defines the primitives forcreating ontologies. Figure 1 shows an exam-ple of a graph representing our ontology ofwines as an RDFS. In RDFS, there areclasses of concepts, which constitute a hier-archy with multiple inheritance. For exam-ple, the class Wine is a subclass of the classDrink. Classes typically have instances (forexample, a specific red wine is an instanceof the Red Wine class) and a resource can be aninstance of more than one class (for exam-ple, Romariz Port is an instance of both the RedWine and the Dessert Wine classes). Resourceshave properties associated with them (forexample, Wine has flavor). Properties describeattributes of a resource or a relation of aresource to another resource. RDFS definesa property’s domain—resources that can besubjects of the property—and a property’srange—resources that can be objects of the

MARCH/APRIL 2001 computer.org/intelligent 61

Table 1. A selection of Semantic Web languages.

Language Description URL

XOL XML-based ontology-exchange language www.ai.sri.com/~pkarp/xol

Topic Maps ISO standard for describing knowledge structures www.topicmaps.org

SHOE Simple HTML Ontology Extensions www.cs.umd.edu/projects/plus/SHOE

RDF and Resource description framework and www.w3.org/RDFRDFS RDF Schema

DAML+OIL DARPA Agent Markup Language + www.daml.orgOntology Inference Layer

rdf:type

rdfs:subClassOf

rdfs:subClassOf

rdf:type

rdf:typerdf:type rdf:type rdf:type

rdf:type

rdf:type

rdf:type

rdf:type

rdfs:rangerdfs:range

rdfs:subClassOf

rdfs:subClassOf rdfs:subClassOf

rdfs:domain

rdfs:domain

rdf:Property

d:White_wine d:Dessert_wine

d:Drink d:maker

d:grape

rdfs:Class

d:Wine

d:Winery

d:Red_wine d:Rose_wine

d:Wine_grape

Figure 1. An RDF Schema graph representing the Wine ontology.

Page 3: Creating Semantic Web Contents with Protégé-2000...Natalya F. Noy, Michael Sintek, Stefan Decker, Monica Crubézy, Ray .W Fergerson, and Mark A. Musen, Stanford University Because

property. For example, the property makermay have a class Wine as its domain and aclass Winery as its range.

DAML+OIL (DARPA Agent Markup lan-guage + Ontology Inference Layer)3 takes adifferent approach to defining classes andinstances. In addition to defining classes andinstances declaratively, DAML+OIL andother description-logics languages let us cre-ate intensional class definitions using Booleanexpressions and specify necessary, or neces-sary and sufficient, conditions for class mem-bership. These languages rely on inferenceengines (classifiers) to compute a class hier-archy and to determine class membership ofinstances based on the properties of classesand instances. For example, we can define aclass of Bordeaux wines as “a class of winesproduced by a winery in the Bordeaux region.”In DAML+OIL, we can also specify globalproperties of classes and slots. For example,we can say that the location slot is transitive: ifa winery is located in the Bordeaux region andthe Bordeaux region is located in France, thenthe winery is in France. We will describeDAML+OIL in more detail later.

We can see from this discussion thatSemantic Web languages for representingontologies and instance data have many fea-tures in common. At the same time, there aresignificant differences stemming from dif-ferent design goals for the languages. Inadapting Protégé-2000 as an editor for theselanguages, we build on the similaritiesamong them and custom-tailor the tool toaccount for the individual differences.

Protégé-2000For many years now, experts in domains

such as medicine and manufacturing haveused Protégé-2000 for domain modeling. Weshow not only how we adapt Protégé-2000to the new world of the Semantic Web—reusing its user interface, internal represen-tation, and framework—but also how ourchanges enable conceptual modeling with thenew Semantic Web languages.

Protégé-2000 is highly customizable,which makes its adaptation as an editor for anew language faster than creating a new edi-tor from scratch. The following featuresmake this customization possible:

• An extensible knowledge model. We canredefine declaratively the representationalprimitives the system uses.

• A customizable output file format. We canimplement Protégé-2000 components thattranslate from the Protégé-2000 internalrepresentation to a text representation inany formal language.

• A customizable user interface. We canreplace Protégé-2000 user-interface com-ponents for displaying and acquiring datawith new components that fit the new lan-guage better.

• An extensible architecture that enablesintegration with other applications. Wecan connect Protégé-2000 directly toexternal semantic modules, such as spe-cific reasoning engines operating on themodels in the new language.

Protégé-2000 knowledge model Protégé-2000’s representational primi-

tives—the elements of its knowledgemodel4—are very similar to those of theSemantic Web languages that we describedearlier. Protégé-2000 has classes, instances

62 computer.org/intelligent IEEE INTELLIGENT SYSTEMS

T h e S e m a n t i c W e b

The tabs representing differentviews of a knowledge base and the

configuration information

The class hiearchy The slot definition

The facets

Figure 2. A snapshot of the ontology representing wines. The tree on the left represents a class hierarchy. The form on the rightshows the definition of the Wine class.

Page 4: Creating Semantic Web Contents with Protégé-2000...Natalya F. Noy, Michael Sintek, Stefan Decker, Monica Crubézy, Ray .W Fergerson, and Mark A. Musen, Stanford University Because

of these classes, slots representing attributesof classes and instances, and facets express-ing additional information about slots.

Figure 2 shows an example definition of aclass, which is part of an ontology describingwines, food, and desirable wine–food com-binations. In the figure, the tree on the leftrepresents a class hierarchy. The class of Pauil-lac wines, for instance, is a subclass of theclass of Médoc wines. In other words, eachPauillac wine is a Médoc wine. The class of Médocwines is, in turn, a subclass of Red Bordeauxwines and so on. The form on the right in Fig-ure 2 represents the definition of the selectedclass (Wine). There is the name of the class,its documentation, a list of possible con-straints, and definitions of slots that theinstances of this class will have. Instances ofthe class Wine will have slots describing theirflavor, body, sugar level, the winery that producedthe wine, and so on.

The form in Figure 3 displays an instanceof the class Pauillac representing Château LafiteRothschild Pauillac, and the fields display the slotvalues for that instance. Therefore, we knowthat Château Lafite Rothschild Pauillac has afull body and strong flavor among otherproperties. Both the class-definition forms(the right-hand side in Figure 2) and theinstance-definition forms (Figure 3) areknowledge-acquisition forms in Protégé-2000. The fields on the knowledge-acquisi-tion forms correspond to slot values, and wedefine classes and instances by filling in slotvalues in these fields. Protégé-2000 gener-ates knowledge-acquisition forms automati-cally based on the types of the slots andrestrictions on their values.

The Protégé-2000 user interface (Figure2) consists of several tabs for editing differ-ent elements of a knowledge base and cus-tom tailoring the layout of the knowledge-acquisition forms, such as the forms inFigures 2 and 3. The Classes tab definesclasses and slots, and the Instances tabacquires specific instances. The Forms taballows us to change the layout and the con-tents of the knowledge-acquisition forms.

We can customize almost all of the Pro-tégé-2000 features we have described to fita specific domain or task by

• changing declaratively the standard classand slot definitions;

• changing the content and the layout of theknowledge-acquisition forms; and

• developing plug-ins using the Protégé-2000 application-programming interface.5

Let’s look at how we can customize Pro-tégé-2000 and then see how we can use thisflexibility to create Protégé-based editors fornew Semantic Web languages.

Changing the notion of classesand slots

The definition of the Wine class in Figure 2is a standard class definition in Protégé-2000,with a class name, documentation, list ofslots, and so on. What if we need to add moreattributes to a class definition, or change howa class looks, or change the default definitionof a class in the system? For instance, wemight want to add a list of a few best winer-ies for each type of wine in the hierarchy.Such a list is a property of a class (such asPauillac wines) rather than a property of spe-cific instances of the class (such as ChâteauLafite Rothschild Pauillac). The list of the bestwineries for a class is not inherited by its sub-classes: The best wineries producing red Bor-deaux are not necessarily the same as the bestMédoc or Pauillac wineries (although, theymay overlap). Therefore, this list mustbecome a part of a class definition the sameway as documentation is a part of a class def-inition. The Protégé-2000 metaclass archi-tecture lets us do just that.4,5

Metaclasses are templates for classes inthe same way that classes are templates forinstances. We can define our own meta-classes and effectively change a definition ofwhat a class is, in the same way we woulddefine a new class. The default Protégé-2000template (the standard metaclass) defines thefields that we see in Figure 2. We can extenddeclaratively this standard definition of what

a class is with new fields of any type bydefining a new metaclass, which simplybecomes a part of the knowledge base. Fig-ure 4 shows a definition of the Red Bordeauxclass that includes the additional field with alist of the best Bordeaux wineries.

Similarly, we can define new metaslots asuser-defined templates for new slots. If slotdefinitions in our system must have fields inaddition to the ones that Protégé-2000 has,we simply define new templates where wedescribe these new fields.

Custom-tailoring slot widgets forvalue acquisition

The look and behavior of the fields on theknowledge-acquisition forms in Figures 2and 3 depend on the types of the values thatthe fields can take. A field for a string value,such as a class name, has a simple text win-dow. A field that contains a list of complexvalues is a list box with buttons to add andremove values and to create new values.These fields are called slot widgets. They notonly display the values appropriately, but alsohelp to ensure that the values are correctbased on the slot definitions in the ontology.For example, the maker of a wine must be awinery—an instance of the Winery class. Theslot widget for the maker slot in Figure 3 letsus set the value only to a Winery instance.

Developers can extend Protégé-2000 byimplementing their own slot widgets that aretailored to acquire and verify particular kindsof values. Suppose we wanted to be moreprecise about the sugar level in wine and tomark it on a scale rather than simply choos-ing among three values—dry, sweet, or off-dry.

MARCH/APRIL 2001 computer.org/intelligent 63

Figure 3. An instance of the class Pauillac representing the Château Lafite Rothschild Pauillac. Thiswine has a full body, a strong flavor, and a moderate tannin level, among other properties.

Page 5: Creating Semantic Web Contents with Protégé-2000...Natalya F. Noy, Michael Sintek, Stefan Decker, Monica Crubézy, Ray .W Fergerson, and Mark A. Musen, Stanford University Because

We could store the value as a number in thesugar slot. We could use a slot widget thatwould let us select the value on a slider ratherthan enter a number (see Figure 5). When wecustomize knowledge-acquisition forms, wechoose not only the layout of the fields onthe form, but also the slot widgets that mustbe used for different fields. The slot widgetswe choose do not usually affect the contentsof the knowledge base itself, but their use canmake the look and feel of the tool much moresuitable for a particular domain or language.Slot widgets also can help ensure the internalconsistency of a knowledge base by check-ing, for example, that an integer value thatwe enter is between the allowed maximumand minimum for that slot.

Using a back-end plug-in to generate the right output

When we develop a domain model in Pro-tégé-2000, we do not need to think about thespecific file format that Protégé-2000 will use

to save our work. We think about our domainat the conceptual level and create classes, slots,facets, and instances using the graphical userinterface. Protégé-2000 then saves the result-ing domain models in its own file format.Developers can change this file format in thesame way they plug in slot widgets. Back-endplug-ins let developers substitute the Protégé-2000 text file format with any other file for-mat. For example, suppose we wanted to useXML to publish the wine ontology and otherdomain models we create using Protégé-2000.We would then need to create an XML backend that substitutes files in the Protégé-2000format with the files in XML. A back end cre-ates a mapping between the in-memory rep-resentation of a Protégé-2000 knowledge baseand the file output in the required format. Theback end also enables us to import the files inthat format into Protégé-2000. The new fileformat has the same status as the Protégé-2000native file format, and the users can chooseeither format for their files.

Editing Semantic Web languageswith Protégé-2000

Armed with the arsenal of tools to custom-tailor Protégé-2000 quickly and easily, let’slook at what is involved in creating a Protégé-2000 editor for a new Semantic Web language.We will use the Protégé-RDFS editor devel-oped in our laboratory as an example, but theideas are the same for any new language.

We start creating a Protégé-2000 editor forour new language by determining the differ-ences between the knowledge models of thetwo languages: the knowledge model of Pro-tégé-2000 and the knowledge model under-lying our language of choice. We then decidewhich of the available tools—metaclasses,custom user-interface components, or a cus-tom back end—we will use to reconcile thedifferences or to hide them from the user.

In practice, the overlap between the knowl-edge models underlying the Semantic Weblanguages available today is very large. Themodels might use different terminology for

64 computer.org/intelligent IEEE INTELLIGENT SYSTEMS

T h e S e m a n t i c W e b

The field in the new template

Figure 4. A class definition that uses a nonstandard template. We added the best wineries slot to the standard class-definition template.

Page 6: Creating Semantic Web Contents with Protégé-2000...Natalya F. Noy, Michael Sintek, Stefan Decker, Monica Crubézy, Ray .W Fergerson, and Mark A. Musen, Stanford University Because

the same notion (for example, slots in Pro-tégé-2000 and properties in RDFS). How-ever, the structure of the concepts, the under-lying semantics, and the restrictions are oftensimilar.

When we compare the two knowledgemodels, we identify four categories of con-cepts (see Figure 6):

1. Concepts that are exactly the same in thetwo languages (possibly with differentnames). Usually, classes, inheritance,instances, slots as properties of classes andinstances, and many of the slot restrictionsfall into this category.

2. Concepts that are the same but expresseddifferently in the two languages. Forexample, Protégé-2000 associates slotswith classes by attaching a slot to a class.RDFS defines essentially the same rela-tionship by defining a domain of a property.

3. Concepts in our language of choice thatdo not have an equivalent in Protégé-2000.For example, RDFS allows an instance tohave more than one type, whereas in Pro-tégé-2000 each instance has a unique directtype.

4. Concepts that Protégé-2000 supports andour language of choice does not. For exam-ple, Protégé-2000 allows a slot to have

more than one allowed class for its values,whereas the range of a property in RDFSis limited to a single class.

Naturally, we can express all the featuresof our language that fall into the first categorydirectly in Protégé-2000. We deal with the dif-ferences in the other three categories by defin-ing appropriate metaclasses and metaslots andby resolving the remaining changes in theback end. We hide the differences from theuser behind custom-tailored slot widgets.

The second item on the list, the conceptsthat do not have a direct equivalent in Protégé-2000 but that can be mapped to native Pro-tégé-2000 concepts, deserves a special dis-cussion. Consider domains of properties inRDFS (rdfs:domain). A domain specifies a classon which a property might be used. For exam-ple, the domain of the flavor property is the Wineclass. Protégé-2000 slots are similar to prop-erties in RDFS. Attaching a slot to a class inProtégé-2000 also specifies that a slot can beused with that class. For example, the flavor slot

MARCH/APRIL 2001 computer.org/intelligent 65

A slider widget fornumeric values

Figure 5. Changing a slot widget. We use a slider instead of a simple field to acquirenumeric values for the sugar level.

(2) Concepts that can beencoded as native Protégé

concepts

(1) Conceptsthat are identical

semantically

(3) Concepts that do nothave an equivalentin native Protégé

(4) Concepts in Protégé thatdo not have an equivalent

in the language

Use nativeProtégé concepts

Use Protégéconcepts directly

Use metaclasses andmetaslots to encode

the information

Use custom labels andslot widgets to hide

the differences

Back end

Use custom slot widgetsto facilitate

knowledge entry

Use knowledge-acquisitionforms to disable

the features

Map between the model inProtégé and the model

required by the language

Map directly intothe model required

by the language

Modeling

User interface

Map between the model inProtégé and the model

required by the language

Define the means of storingthe information in the

language format

The new language3

1

2

4 Protégé-2000

Figure 6. Comparing the knowledge models of Protégé-2000 and a new Semantic Web language.

Page 7: Creating Semantic Web Contents with Protégé-2000...Natalya F. Noy, Michael Sintek, Stefan Decker, Monica Crubézy, Ray .W Fergerson, and Mark A. Musen, Stanford University Because

is attached to the Wine class. We have two waysto encode the RDFS domain information in aProtégé-RDFS editor. First, we can add adomain slot to a template (metaslot) that we willuse for all our slots. Then, a field for domainwill appear on each form for a slot, and wewill fill it in there. Second, we can simply usethe native Protégé-2000 notion of slot attach-ment and translate the attachments of slots toclasses into domains of properties in the backend. The second solution lets us use the Pro-tégé-2000 user interface directly and hides thefeatures of a specific language used to storethe information.

We find it extremely beneficial to adopt theparadigm of using the native Protégé-2000 fea-tures wherever possible and of resorting toadditional definitions, such as metaclasses andmetaslots, only when absolutely necessary.This approach maximally facilitates theexchange of domain models among differentlanguages, which we edit (or will edit) withProtégé-2000. As new languages emerge andwe experiment with them, the knowledge mod-els underlying these languages will undoubt-edly overlap. By encoding as much as possiblein the native Protégé-2000 structures and leav-ing part of the translation between the Protégé-2000 model and the language to the back end,we maximize the amount of information thatwe will preserve by simply loading a knowl-edge base in one language supported by Pro-tégé-2000 and saving it to another language.Even though there would often be some parts

of these models that will not be part of thisoverlap, we are maximizing the amount ofinformation that gets ported among models indifferent languages for free.

Having generated the four groups of con-cepts after comparing the two knowledgemodels (see Figure 6), we can reconcile thedifferences using

1. modeling—by changing default defini-tions of classes and slots at the modelinglevel;

2. the user interface—by developing spe-cialized user-interface components; and

3. the back end—by implementing the newback end that will translate between thedomain model in Protégé-2000 and thedomain model in our language of choice.

Let’s look at how each of these three lev-els works using the development of a Pro-tégé-based RDFS editor as an example (seeTable 2 for a summary of the entire process).

The modeling levelWe start by determining which concepts in

our language of choice are identical to Protégé-2000 concepts or that can be represented usingthe native Protégé-2000 concepts. We use thenative Protégé-2000 as a means to model thisgroup of concepts, even if it is not how theseelements are directly expressed in our lan-guage. We then define the new templates forclass and slot definitions if necessary.

Consider, for example, the two attrib-utes—rdfs:seeAlso and rdfs:isDefinedBy—that areassociated with each class and each propertyin RDFS. The rdfs:seeAlso property specifiesanother resource containing informationabout the subject resource, and the rdfs:isDe-finedBy property indicates the resource defin-ing the subject resource. The values of theseproperties are other resources or URIs point-ing to other resources. We must add these twofields that the Protégé-2000 itself does nothave to each class and slot form in our knowl-edge base. To add these fields, we define anew metaclass that will serve as a templatefor RDFS classes. This metaclass is, in fact,equivalent to the RDFS class rdfs:Class. Figure7 shows the definition of rdfs:Class with the newtemplate slots that will appear on each classform that uses this template.

The user-interface levelWhen creating a Protégé-based editor for a

new language, we can change both the behav-ior and the look and feel of the knowledge-acquisition forms to reflect the terminologyand the features of the language. First, we canchange the labels on the forms—the simplesttype of customization. For example, we caneasily replace Protégé’s “Template slots”label in a class definition with the RDFS“Properties” label to give the form an RDFSlook. Other elements that we can easily con-figure by manipulating the forms includewhether or not a field should be visible to the

66 computer.org/intelligent IEEE INTELLIGENT SYSTEMS

T h e S e m a n t i c W e b

Table 2. Creating the Protégé-based RDFS editor.

Category (1) Concepts in RDFS (2) Concepts in RDFS (3) Concepts in RDFS (4) Concepts in Protégéthat are (nearly) identical that can be encoded that do not have an that do not haveto Protégé concepts as native Protégé concepts equivalent in native Protégé an equivalent in RDFS

Modeling rdfs:Class = :STANDARD-CLASS Do not use explicit rdfs:domain and rdfs:Class is a default for Cardinality, inverse slot, rdfs:subClassOf = rdfs:range for properties; rdfs: metaclass, and rdf:Property is a and default value facets;

subclass of domain encoded as slot attachment; default metaslot; add properties multiple allowed classesrdf:type = instance of rdfs:range encoded as allowed class rdfs:isDefinedBy, rdfs:seeAlso for a slotrdf:Property = Instance-typed slots as core slots; add

:STANDARD-SLOT rdfs:ConstraintProperty and rdfs:subPropertyOf = rdfs:ConstraintResource as

subslot of core classes; multiple types of rdfs:Resource = :THING an instancerdf:comment =

:DOCUMENTATION

User Custom labels on class Plug-in URI slot widget for Disable default-value andinterface and slots forms (for validating URI-type slots. inverse-slot widgets on

example, “Properties” slot forms.and “Comment”)

Back end Map Protégé concepts Translate slot attachments On import, create new class Write out extra facet informationdirectly to RDFS concepts as rdfs:domain for as a subclass of the multiple as Protégé-specific properties

properties and allowed types. on properties. If a slot has classes as rdfs:range multiple allowed classes, create

a new class for rdfs:range value. On import, do the reverse.

Page 8: Creating Semantic Web Contents with Protégé-2000...Natalya F. Noy, Michael Sintek, Stefan Decker, Monica Crubézy, Ray .W Fergerson, and Mark A. Musen, Stanford University Because

user, the buttons on the fields, the position andsize of the fields, and the slot widgets to beused for each field. We perform this configu-ration entirely in the Protégé-2000 Forms taband not in the programming code.

We could also develop our own slot-wid-get plug-ins to allow editing and verificationof elements that are unique to our language.For example, a URI widget in the Protégé-RDF editor can validate that the user hasentered a correct URI or even take the userto the corresponding Web page.

Disabling fields for some slots on the formprevents the user from exercising Protégé-2000 features that the particular Semantic Weblanguage does not support. For example, wecan disable the field for entering default slotvalues in the Protégé-RDFS editor, becauseRDFS does not support default values.

The back-end levelWhatever the differences between Protégé-

2000 and our language that we could notresolve at the modeling and user-interface lev-els, we will need to reconcile in the modulethat saves the internal Protégé-2000 repre-

sentation in the required output file format—the back-end plug-in. The back-end plug-in

1. saves a Protégé-2000 knowledge base ina file format that conforms to the syntaxof our language of choice;

2. maps the elements of the Protégé-2000knowledge base that do not have a directequivalent in our language to the appro-priate set of elements in this language; and

3. imports domain models in this languagethat were developed elsewhere for edit-ing in Protégé-2000.

Usually, when developers define a lan-guage with a new syntax, they quickly imple-ment a parser that allows developers to readand write files in that language’s syntax.Many of the new languages are extensions ofXML or RDF, and thus we can often use theexisting XML and RDF parsers to take careof the syntactic part of adapting to the newlanguage.

In RDFS, the back end must deal with anumber of issues that we did not resolve atthe modeling level or in the user interface. We

might have resolved some of these issuesthere, but it would have unnecessarily com-plicated the editor for the user. For example,instances in Protégé-2000 are of a singleclass, whereas in RDFS they can be directinstances of several classes (for example, theyhave several direct types). Because the RDFSmodel is more general, we have no problemin saving a Protégé-2000 knowledge base inRDFS. However, when we import RDFSinstance data into Protégé-2000, we must dealwith instances that have several direct types.Suppose we have a class for red wines and aclass for dessert wines. We have Romariz Port asan instance of both classes in RDFS. Whenwe import this RDFS instance data into Pro-tégé-2000, the back end can create a new classthat is a subclass of both classes (for exam-ple, denoting a concept of dessert red wines)and make the Romariz Port instance an instanceof this new class. We can record the two orig-inal classes of Romariz Port as additional slotson the newly created class (as shown in Fig-ure 4). When saving back to RDFS, the backend can extract the information from this slot,thus preserving the original model.

MARCH/APRIL 2001 computer.org/intelligent 67

The additional slotsdefining RFDS-specific

properties

Figure 7. A template definition for classes in RDFS. The class rdfs:Class inherits most of the slots from the standard class template, butthe two slots at the top of the list of properties are the ones that we defined for RDFS.

Page 9: Creating Semantic Web Contents with Protégé-2000...Natalya F. Noy, Michael Sintek, Stefan Decker, Monica Crubézy, Ray .W Fergerson, and Mark A. Musen, Stanford University Because

Any user-defined back end has the samestatus as all the other back ends, includingthe ones that are part of the core Protégé-2000 system: it can be used as a storage for-mat for Protégé-2000 knowledge bases.Therefore, there is another, no less impor-tant, goal of a back-end plug-in: to ensurethat when we create a knowledge base in Pro-tégé-2000, save it using the back-end plug-in,and load it again, we have preserved all theinformation. Hence, we must find a way tostore the elements that Protégé-2000 sup-ports, but that our language of choice doesnot. Most Semantic Web languages are flex-ible enough to easily store this information.For example, in RDFS, we simply add newProtégé-specific properties to slots to recorddefault values, which RDFS does not have.These properties have no meaning to anotherRDFS agent, but if we read the knowledge

base back in Protégé-2000, we will have thedefault values preserved.

Creating new tabs to includeother semantic modules

In addition to creating a Protégé-basededitor for a new Semantic Web language,developers can plug in other applications inthe knowledge-base–editing environment. Inaddition to the standard tabs that constitutethe Protégé-2000 user interface (Figures 2and 4), developers can create tab plug-ins inthe same way they can plug in new slot wid-gets. These tabs can include arbitrary appli-cations that benefit from the live connectionto the knowledge base. These applicationsthen become an integral part of the knowl-edge-base–editing environment.

Consider our wine example again. Havingcreated a knowledge base of wines and food

and the appropriate combinations, we mightwant to build an application that produceswine suggestions for a meal course in a restau-rant. Such an application would actively usethe data in the knowledge base but it wouldalso implement its own reasoning mechanismto analyze suggestions. We can implement thiswine-selection application as a tab plug-in.

In practical terms, a tab plug-in is a sepa-rate application, a developer’s own user-interface space from which the developer canconnect to, query, and update the current Pro-tégé-2000 knowledge base.

In the realm of the Semantic Web, a tabcan include any applications that would helpus acquire or annotate the knowledge base.Such applications can

• enable direct annotation of HTML pageswith semantic elements;

68 computer.org/intelligent IEEE INTELLIGENT SYSTEMS

T h e S e m a n t i c W e b

Auxilliary coreclasses

Figure 8. The definition for the spicy-red-meat-course class in the Protégé-OIL editor. In addition to the standard fields, such as thoseshown in Figure 2 ), we have OIL-specific fields such as hasPropertyRestriction and subClassOf for specifying complex class expressions.These slots use the OIL-specific slot widget to display expressions. The tree on the left contains the auxiliary core classes wedefined for OIL.

Page 10: Creating Semantic Web Contents with Protégé-2000...Natalya F. Noy, Michael Sintek, Stefan Decker, Monica Crubézy, Ray .W Fergerson, and Mark A. Musen, Stanford University Because

• provide connection to external reasoningand inference resources;

• acquire the semantic data automaticallyfrom text; and

• present a graphical view of a set of inter-related resources.

Using Protégé to edit DAML+OILDAML+OIL, the Semantic Web language

that was heavily inspired by research indescription logics (DL), allows more typesof concept definitions in ontologies than Pro-tégé-2000 and RDFS do. The DL-inspiredlanguages usually include the following fea-tures in addition to the ones found in the tra-ditional frame-based languages:

• We can specify not only necessary but alsosufficient conditions for class membership.For example, if a wine is produced by awinery from the Bordeaux region, it is aBordeaux wine.

• We can use arbitrary Boolean expressionsin class and slot definitions to specifysuperclasses of a class, domain and rangeof a slot, and so on. For example, a spicyred-meat course must contain red meat andmust contain food that is itself spicy orfood containing something that is spicy.

• We can specify global slot properties. Forexample, location is a transitive property: ifthe Château Lafite Rothschild winery is in the Bor-deaux region and the Bordeaux region is inFrance, then the Château Lafite Rothschild wineryis in France.

• We can define global axioms that expressadditional properties of classes. For exam-ple, we can say that the classification ofthe class of all wines into the subclassesfor red, white, and rosé wines is disjoint:Each instance of the Wine class belongsonly to one of these subclasses.

We have adapted Protégé-2000 to work asan OIL editor. (The OIL language is a pre-cursor for DAML+OIL.) In doing so, we fol-lowed the same steps we described in creat-ing the Protégé-based RDFS editor. Inaddition, we have integrated external servicesfor OIL ontologies into Protégé-2000. Inte-grating DAML+OIL would require nearlyidentical steps.

The modeling levelWe introduce the new class and slot tem-

plates, OilClass and OilProperty, to specify com-plex class and slot definitions. As a result, a

class template, for example, acquires thesethree new fields (see Figure 8):

1. type—to specify whether the class defin-ition contains only necessary or bothnecessary and sufficient conditions forclass membership;

2. hasPropertyRestriction—to specify complexexpressions for slot restrictions; and

3. subclassOf—to specify complex expres-sions describing the position of the classin the class hierarchy.

To integrate OIL into Protégé-2000, we usedthe names from the RDFS serialization syntaxof (Standard-)OIL and not the plain ASCII ver-sion. See www.ontoknowledge.org/oil/ for thevarious syntaxes and versions.

Just as for RDFS, we use as many nativeProtégé-2000 mechanisms for modeling OILontologies as possible. If a new class is sim-ply a subclass of several existing classes inthe hierarchy, we use Protégé’s own notion ofsubclasses by placing the new class where itbelongs in the hierarchy. However, if a super-class definition requires boolean expres-sions—something Protégé-2000 does notallow—we use the subClassOf field that we seeon the template. Even though Protégé-2000does not understand the semantics of thisfield, we can represent this additional super-class information declaratively, and then passit to a classifier or simply save in OIL.

We use the hasPropertyRestriction field when weneed complex expressions or when we needto specify existential slot constraints: Protégé-2000 allows definition of value-type con-straints on slots (“All values of this slot mustbe instances of this class”). OIL allows exis-tential slot constraints in addition to the value-type constraints (“One value of this slot mustcome from this class and one value mustcome from that class”). We build the complexexpressions declaratively by creatinginstances of core auxiliary classes. We cansee some of these core classes in the tree inFigure 8. In the example, we specify a sub-class of a meal-course, spicy-red-meat-course, whichwe define as “a course that must contain redmeat and must contain food that is itself spicyor food containing something that is spicy.”

Even though Protégé-2000 does not sup-port some of the semantics that OIL has, wecan still encode the additional informationdeclaratively. Protégé-2000 will “ignore” theinformation, but it will be able to pass it onto a classifier or to encode it in OIL so that anOIL agent can understand it.

The user-interface levelApart from changing labels and rearrang-

ing fields on the forms for the OilClass and Oil-Property templates, we created a new slot widgetto allow easier editing of nested expressionssuch as the ones representing “food that isitself spicy or food containing something thatis spicy” in Figure 8. This widget augmentsthe standard Protégé-2000 widget for select-ing and creating values for instance-valuedslots with a display of the nested expressionsin a more practical form. A further extensionof this simple but effective slot widget caninclude a full context-sensitive, validatingexpression editor.

The back-end levelWe describe here an OIL back end that pro-

duces the RDFS output for OIL. Therefore,we can build largely on the existing RDFSback end. In defining the class and slot namesand the structure of the auxiliary core classesin the OIL editor, we have mainly adhered tothe RDFS specification of OIL. As a result,just using the RDFS back end, described ear-lier, gives us an output that is very close butnot identical to the RDFS OIL output that weneed. Thus, to create the OIL back end, westarted with the existing RDFS back end. Weadapted it to add the parts of definitions spec-ified by the native Protégé-2000 means to thecomplex class expressions.

The OIL back end encodes the conceptsthat Protégé-2000 has and OIL does not(global cardinality restrictions on slots, forexample) by defining additional statementsin a Protégé namespace. An OIL agent willnot understand these statements and willignore them, but Protégé-2000 will be able toextract the necessary information from them.

Because many Semantic Web languagesare in their infancy and already come in manydifferent versions, there is an alternativeapproach to developing specific back endsfor each of these versions. We can create ageneral RDF back end for Protégé-2000 andthen use a declarative and easily adaptableRDF transformation language for generatingthe desired outputs. Some research groupsare currently investigating such a back endand the corresponding RDF transformation(and query) language.

Accessing external services througha tab plug-in

The DL languages, such as OIL andDAML+OIL, traditionally rely on an infer-ence component—a classifier—to find the

MARCH/APRIL 2001 computer.org/intelligent 69

Page 11: Creating Semantic Web Contents with Protégé-2000...Natalya F. Noy, Michael Sintek, Stefan Decker, Monica Crubézy, Ray .W Fergerson, and Mark A. Musen, Stanford University Because

right position of a class in the class hierar-chy and to determine which class definitionsare unsatisfiable (cannot have any instances).Therefore, it is crucial to have a connectionto a DL classifier as part of the environmentfor editing OIL and DAML+OIL ontologies.Having created a set of definitions, we caninvoke the classifier to determine how theevolving class hierarchy looks. We can seethe effects that changes in class definitionswill have on the evolving hierarchy. We canimmediately check if logical expressionsdefining a class contradict one another mak-ing the class unsatisfiable.

Therefore, in order to create a full-fledgedProtégé-based OIL editor, we need to con-nect Protégé-2000 to such an inference com-ponent and present the results to the user. Weimplemented this connection as a Protégé-2000 tab plug-in.

Figure 9 shows the OIL tab in action. Ini-tially, the class hierarchy has the various meal-course subclasses as siblings. In addition, wespecify that an oyster-shellfish-course is a meal-course that has OYSTERS as the value for its FOOD slot;

a shellfish-course is a meal-course that has shellfishas its food, and so on. We then use the OIL tabto connect to a DL classifier, FaCT,6 and tohave it rearrange the class hierarchy accord-ing to the class definitions. In the resultinghierarchy, the oyster-shellfish-course class, forexample, is correctly classified as being asubclass of the shellfish-course class.

W ith the advent of the Semantic Web,the current network of online

resources is expanding from a set of staticdocuments designed mainly for human con-sumption to a new Web of dynamic docu-ments, services, and devices, which softwareagents will be able to understand. Develop-ers will likely create many different repre-sentation languages to embrace the hetero-geneous nature of these resources. Somelanguages will be used to describe specificdomains of knowledge; others will modelcapabilities of services and devices. These

languages will have different emphasis,scope, and expressive power.

Protégé-2000 provides full-fledged sup-port for knowledge modeling and acquisi-tion. Developers also can custom-tailor Pro-tégé-2000 quickly and easily to be an editorfor a new Semantic Web language. A Pro-tégé-based editor enables modeling at a con-ceptual level that allows developers to thinkin terms of concepts and relations in thedomain that they are modeling and not interms of the syntax of the final output.

By adapting Protégé-2000 to edit a newSemantic Web language rather than creatinga new editor from scratch or using a text edi-tor to create ontologies in the new language,we obtain a graphical, conceptual-levelontology editor and knowledge-acquisitiontool. We get a new editor to experiment withthe new language without investing manyresources into it. And we can use Protégé-2000 as an interchange module to translatemost of the models in other Semantic-Weblanguages into our new language and viceversa. In our experience, it takes a few days

70 computer.org/intelligent IEEE INTELLIGENT SYSTEMS

T h e S e m a n t i c W e b

OIL tab plugin

Figure 9. Tab plug-in for classification of OIL ontologies. On the right, we see a hierarchy of meal courses before classification. Themiddle pane shows interactions with the FaCT classifier. The hierarchy on the right is the one that the classifier computed.

Page 12: Creating Semantic Web Contents with Protégé-2000...Natalya F. Noy, Michael Sintek, Stefan Decker, Monica Crubézy, Ray .W Fergerson, and Mark A. Musen, Stanford University Because

to adapt Protégé-2000 to a new Semantic-Web language—a lot less time than isrequired to create any sophisticated softwarefrom scratch. We were able to create theseeditors even for a language like OIL, whichtakes a knowledge-modeling approach thatis different from the frame-based approachfor which Protégé originally was designed.The extensible and flexible knowledge modeland the open plug-in architecture of Protégé-2000 constitute the basis for developing asuite of conceptual-level editors for Seman-tic Web languages.

AcknowledgmentsFor more information about the Protégé project,

please visit http://protege.stanford.edu. A grantfrom Spawar, a grant from FastTrack Systems, andthe DARPA DAML program supported this work.

References

1. T. Berners-Lee, M. Fischetti, and M. Der-touzos, Weaving the Web: The OriginalDesign and Ultimate Destiny of the WorldWide Web by its Inventor, Harper, San Fran-cisco, 1999.

2. D. Brickley and R.V. Guha, “ResourceDescription Framework (RDF) Schema Spec-ification,” World Wide Web Consortium, Pro-posed Recommendation 1999, www.w3.org/TR/2000/CR-rdf-schema-20000327 (cur-rent 28 Mar. 2001).

3. J. Hendler and D.L. McGuinness, “TheDARPA Agent Markup Language, ” IEEEIntelligent Systems, vol. 16, no. 6, Jan./Feb.,2000, pp. 67–73.

4. N.F. Noy, R.W. Fergerson, and M.A. Musen,“The Knowledge Model of Protégé-2000:Combining Interoperability and Flexibility,”Proc. Knowledge Engineering and Knowl-edge Management: 12th Int’l Conf. (EKAW-2000), Lecture Notes in Artificial Intelligence,no. 1937, Springer-Verlag, Berlin, 2000,pp.17–32.

5. M.A. Musen et al., “Component-Based Sup-port for Building Knowledge-AcquisitionSystems, ” Proc. Conf. Intelligent Informa-tion Processing (IIP 2000) Int’l Federationfor Information Processing World ComputerCongress (WCC 2000), Beijing, China, 2000,http://smi-web.stanford.edu/pubs/SMI_Abstracts/SMI-2000-0838.html (current 28Mar. 2001).

6. I. Horrocks, “The FaCT system,” Proc. Auto-mated Reasoning with Analytic Tableaux andRelated Methods: Int’l Conf. Tableaux 98, Lec-ture Notes in Artificial Intelligence, no. 1397,Springer-Verlag, Berlin, 1998, pp. 307–312.

MARCH/APRIL 2001 computer.org/intelligent 71

Natalya F. Noy is a research scientist in the Stanford Medical Informaticslaboratory at Stanford University. Her interests include ontology developmentand evaluation, semantic integration of ontologies, and making ontology-development accessible to experts in noncomputer-science domains. She hasa BS in applied mathematics from Moscow State University, Russia, an MAin computer science from Boston University, and a PhD in computer sciencefrom Northeastern University in Boston. Contact her at Stanford MedicalInformatics, 251 Campus Dr., Stanford Univ., Stanford, CA 94305;[email protected].

Michael Sintek is a research scientist at the German Research Center forArtificial Intelligence. Currently, he is project leader of the FRODO projectwhere an RDF-based framework for building distributed organizational mem-ories is developed. He has a Diplom in computer science and economics fromthe University of Kaiserslautern. Contact him at the German Research Cen-ter for Artificial Intelligence (DFKI) GmbH, Knowledge Management Group,Postfach 2080, D-67608 Kaiserslautern, Germany; [email protected].

Stefan Decker is a postdoctoral fellow at the Department of Computer Sci-ence at Stanford University, where he works on Semantic Web Infrastruc-ture in the DARPA DAML program. His research interests include knowledgerepresentation and database systems for the Web, information integration,and ontology articulation and merging. He has a PhD in computer sciencefrom the University of Karlsruhe, Germany, where he worked on ontology-based access to information. Contact him at Stanford University, Gates Hall4A, Room 425, Stanford, CA 94305; [email protected].

Monica Crubézy is a postdoctoral researcher in the Stanford Medical Infor-matics laboratory at Stanford University. Her research focuses on the mod-eling and integration of libraries of problem-solving methods in the Protégéknowledge-based-system development framework. She graduated from theÉcole Polytechnique Féminine, France, in general engineering and computerscience. She has a PhD in computer science from the Institut National deRecherche en Informatique et Automatique in Sophia Antipolis, France. Con-tact her at Stanford Medical Informatics, 251 Campus Drive,Stanford Uni-versity, Stanford, CA 94305; [email protected].

Ray Fergerson is a programmer in the Stanford Medical Informatics labo-ratory at Stanford University. He has a BS in physics from the ColoradoSchool of Mines and a PhD in experimental nuclear physics from the Uni-versity of Texas in Austin. Contact him at Stanford Medical Informatics, 251Campus Drive, Stanford University, Stanford, CA 94305; [email protected].

Mark A. Musen’s biography appears in the Guest Editors’ Introduction on page 25.

T h e A u t h o r s