Upload
hailey-mimms
View
217
Download
0
Tags:
Embed Size (px)
Citation preview
Agenda
IntroductionLevels of heterogeneityPrevious work in the fieldPROMPT Suite of ToolsPrompt on ProtégéThe Web of DataCRS : Managing Co-referencesSilk – A link discovery framework
IntroductionCan a single ontology suffice for various applications?
Definition – The task of relating the vocabulary of two Ontologies that share the same domain of discourse
It’s a morphism that consists of a collection of functions assigning symbols used in one vocabulary to the symbols in the other[1]
This would provide a common layer from which ontologies can be accessed and exchange information.
Translation is different from mapping
IntroductionAn analogy to the problem – Clocks
Levels of Heterogeneity in Ontologies Syntactic
Structural
Semantic
Mapping discoveryFirst approach is to use a reference ontology
Example – the upper Ontologies SUMO and DOLCE
What when a shared ontology is not available?
Structural & definitional information can be used to discover mappings
Example tools – IF-Map, QOM, MAFRA & Prompt
PROMPT Suite of ToolsInteractive tools for ontology merging and
mappingOntology
formal specification of domain information facilitate knowledge sharing and reuse
Different ontologies –may overlap, need to be reconciled
Determine correlation Find all conceptsDetermine similaritiesChange source ontologies or remove overlapRecord mapping for future reference
Ontology ManagementTasks
Finding correlationsMerging ontologiesVersion managementFactoring ontologies
ToolsBenefit from being tightly integrated into
single frameworkUniform user interfaceSame interaction paradigms Easy access from one tool to another
PROMPT Knowledge ModelBased on knowledge model of ProtégéFrame based Types of frames
ClassSet of entities specifying a concept
Slots Attributes of class Has domain and range Must have unique names
Instances Elements of class
PROMPT FrameworkTools for multiple-ontology managementExtension to Protege ontology-editing environmentOpen architecture allows easy extension with
pluginsTools in PROMPT
IPROMPT – Interactive ontology merging toolANCHORPROMPT – a graph-based tool for finding
similarities between ontologiesPROMPTDIFF –for finding a diff between two versions
of the same ontologyPROMPTFACTOR – a tool for extracting a part of an
ontology
IPROMPT
Interactive ontology merging toolLeads user through merging processSuggestions for mergingIdentifies inconsistencies and potential
problemsSuggests strategies for resolving
Uses structure of concepts and their relation along with user input
Decision based on local contextIterative
IPROMPT AlgorithmCreates initial suggestion based on lexical
similarity of namesMerged ontology contains frames which are
similar to frames in input ontologies2 ontologies O1 and O2 are merged to form Om
Merging decisions are designer and task dependent
Set of knowledge based operations definedFor each operation:
Changes performed automaticallyNew merging suggestionsInconsistencies and potential problems
IPROMPT Operations
Merge classes Merge slotsMerge instancesShallow copy of a class
Copy class from source ontology to mergedDeep copy of a class
Also copies all the parents of the class up to the root hierarchy
Inconsistencies & Potential Problems
Name conflicts
Dangling references
Redundancy in the class hierarchy
Slot values violating slot-value restrictions
Additional features
Setting up preferred ontology
Maintaining user focus
Providing feedback to user
Logging of ontology merging and editing operations
ANCHORPROMPT
Graph based tool for finding similarities Compares larger portionsGoal : Augment IPROMPT by determining
additional points of similarityInput : Anchors - Set of pairs of related
termsAnchor identification – Manual /AutomaticEach ontology is viewed as a directed
labeled graph
AlgorithmBegins with anchor pair
TRIAL, TrailPERSON, Person
Path 1: TRIAL -> PROTOCOL -> STUDY-SITE -> PERSON
Path 2: Trial -> Design -> Blinding -> PersonDetermine similarity score for pair of related
termsIf two pairs of terms from the source ontologies
are similar and there are paths connecting the terms, then the elements in those paths are often similar as well
PROMPTDIFFTool for comparing ontology versionsVersion comparison in software code is
based on comparing text filesOntologies have different text representationHeuristics algorithm that produces a
structural diff between two versionsCompares the structure of the two ontology
versionsIdentifies frames changed and what changes
were made
PromptDiff AlgorithmAn extensible set of heuristic matchersFixed-point algorithm to combine the results of the
matchers to produce a structural diff between two versions
PROMPTFACTOR
Tool for factoring out semantically independent part of an large ontology into a new sub-ontology
Ensures that severed links do not introduce ill-defined concepts in the sub-ontology
User can specify concepts of interestPerforms the transitive closure of the
superclass relation and all the relations defined by slots
Target ontology works as stand-alone
PromptFactor Algorithm
User specifies the concept of interestPromptFactor traverses the ontology termDetermines transitive closure of all
relations including subclass-of relationDetermines all the parents of selected term
in hierarchyUser interactiveDetermines inconsistencies
Prompt Demo It is available as a plug-in for Protégé 3.4
Uses linguistic similarity matches between concepts
Also matches slot names and slot value types
In cases where automation is not possible, user intervention is needed; possible actions are suggested
Alignment is followed by merging
Alignment is establishing links between the ontologies
Merging is the creation of a single coherent ontology
The Web of DataData sources span a large range of domains
RDF data model is used to publish structured data on the web
Explicit RDF links exist between entities in different data sources
However, there is a lack of tools to set RDF links to other data sources
SilkIt is a link specification language
Allows specification of the links that should be discovered between data sources, as well as conditions to be fulfilled to be linked
Link conditions are specified using similarity metrics; they can use aggregation functions to combine similarity scores
Data access performed using SPARQL
Silk FeaturesSupport for owl:sameAs links and other
types of RDF links
Provides a declarative language to specify link conditions
Datasets need not be replicated locally
Caching, indexing and entity pre-selection are used to enhance performance
Silk similarity metrics
Similarity metrics can be combined using aggregation functions
Sets of resources can be selected using Silk RDF path selector language
Silk Pre-MatchingComparison of all entities in Source ‘S’ and
Target ‘T’ would need O(|S|*|T|)
Using pre-matching a limited set of target entities that are likely to match a given source entity is found
Performed by indexing the target resources based on their property values
Using this scheme reduces runtime to O(|S| + |T|)
Managing coreferences
Semantic web vision - Large quantities of information Readily available InterlinkedMachine readable
Fragmented webSignificant overlapNeed to identify ‘duplicates’Co-reference resolution – determining
“equivalent” URIs
Co-reference Resolution Service (CRS)
Systematic analysis and heuristic based approach :IdentifyingPublishingManaging Using co-reference information
Most prevalent way – owl:sameAsEquivalence – context dependent
CRSes
Maintain sets of equivalent URIsStoring co-reference data separatelyURI definition and synonyms are kept
separateManagement techniques - history, rollback,
annotationUse of multiple CRSes that applications can
useCore functionality in PHP – easy integrationBacked by MySQL
Data representation in CRS
Equivalent URIs are stored in bundles1 URI in each bundle is considered as a
canon- preferred URIFormation of bundles:
Check if URI already exists in any bundleIf not, create a ‘singleton’ bundle for new URIsPerform merge – union of bundles with
“equivalent” URIs Constituent bundles that were merged are
marked inactive
Data representation
Data storage – Indexed tables of hashed URIs
Permits fast lookup to find:Canon of given URIAll URIs in a bundle
Deprecate URIs by flagsFinding all equivalences -
coref:coreferenceData links to the bundle for that URI and recursively repeat the process for each URI in that bundle
<rdf:RDF xmlns:coref="http://www.rkbexplorer.com/ontologies/coref#" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"> <coref:Bundle> <coref:canon rdf:resource="http://southampton.rkbexplorer.com/id/person-
00021"/> <coref:duplicate rdf:resource="http://acm.rkbexplorer.com/id/person-
102898" /> <coref:duplicate rdf:resource="http://citeseer.rkbexplorer.com/id/resource-
CSP109002" /> <coref:duplicate rdf:resource="http://dblp.rkbexplorer.com/id/people-
27aedbcb" /> <coref:duplicate rdf:resource="http://eprints.rkbexplorer.com/id/kfupm/person-
27aed0c1" /> <coref:duplicate rdf:resource="http://southampton.rkbexplorer.com/id/person-
00021" /> <coref:duplicate rdf:resource="http://wiki.rkbexplorer.com/id/hugh_glaser" /> <coref:lastUpdated>2009-01-16 11:11:40</coref:lastUpdated> </coref:Bundle> </rdf:RDF>RDF description of equivalent URIs in a bundle
Ways to speed up Look up only 1 URI from each CRSFollow only coref:canon predicate
Lookup would need O(log|S|+ log|T|)
References[1] The PROMPT Suite: Interactive Tools For Ontology Merging And Mapping – Natalya F. Noy and Mark A. Musen;Stanford Medical Informatics, Stanford University
[2] Managing Co-reference on the Semantic Web - Hugh Glaser, Afraz Jaffri, Ian C. Millard School of Electronics and Computer Science University of Southampton Southampton, Hampshire, UK
[3] Ontology Mapping: The State of the Art Yannis Kalfoglou and Marco Schorlemmer
[4] Kalfoglou, Y. and Schorlemmer, M. (2003a). IFMap: an ontology mapping method based on information flow theory. Journal on Data Semantics, 1(1):98–127.
[5] Silk – A Link Discovery Framework for the Web of Data Julius Volz, Christian Bizer et al.