Ontology Alignment/Matching Prafulla Palwe. Agenda ► Introduction Being serious about the...

Preview:

Citation preview

Ontology Ontology Alignment/MatchingAlignment/Matching

Prafulla PalwePrafulla Palwe

AgendaAgenda► IntroductionIntroduction

Being serious about the semantic webBeing serious about the semantic web Living with heterogeneity Living with heterogeneity Heterogeneity problemHeterogeneity problem I have a plan for you I have a plan for you

► Matching ProblemMatching Problem Matching OperationMatching Operation MotivationMotivation Schema Matching Vs Ontology MatchingSchema Matching Vs Ontology Matching CorrespondenceCorrespondence AlignmentAlignment

► Matching ProcessMatching Process Sequential compositionSequential composition Parallel compositionParallel composition

► Application DomainsApplication Domains TraditionalTraditional EmergentEmergent

► ClassificationClassification Matching DimensionsMatching Dimensions

► Basic TechniquesBasic Techniques Element Level Element Level Structure LevelStructure Level

► Summary and ChallengesSummary and Challenges

IntroductionIntroduction►Being serious about the semantic web -

It is not one guy's ontology It is not several guys' common ontology It is many guys and girls' many ontologies So it is a mess, but a meaningful mess

IntroductionIntroduction

► Living with heterogeneity - The semantic web will be:

►Huge►Dynamic►Heterogeneous

These are not bugs, they are features. We must learn to live with them.

IntroductionIntroduction

►Heterogeneity problem – Resources being expressed in different ways must be

reconciled before being used. Mismatch between formalized knowledge can occur when:

► different languages are used;► different terminologies are used;► different modeling is used.

IntroductionIntroduction

► I have a plan for you – ReconciliationI have a plan for you – Reconciliation

Matching ProblemMatching Problem

►Matching OperationMatching Operation DefinitionDefinition – Matching operation takes as – Matching operation takes as

input ontologies, each consisting of a set input ontologies, each consisting of a set of discrete entities (e.g., tables, XML of discrete entities (e.g., tables, XML elements, classes, properties) and elements, classes, properties) and determines as output the relationships determines as output the relationships (e.g., equivalence, subsumption) holding (e.g., equivalence, subsumption) holding between these entitiesbetween these entities

Matching ProblemMatching Problem

►Motivation –Motivation – 2 XML Schemas 2 XML Schemas 2 Ontologies 2 Ontologies

Matching ProblemMatching Problem

Matching ProblemMatching Problem

Matching ProblemMatching Problem

Matching ProblemMatching Problem

Matching ProblemMatching Problem

Matching ProblemMatching Problem

Matching ProblemMatching Problem

Matching ProblemMatching Problem

Matching ProblemMatching Problem►Schema mapping Vs ontology mappingSchema mapping Vs ontology mapping

Differences -Differences -►Schemas often do not provide explicit Schemas often do not provide explicit

semantics for their datasemantics for their data Relational schemas provide no generalizationRelational schemas provide no generalization

►Ontologies are logical systems that constrain Ontologies are logical systems that constrain the meaningthe meaning

Ontology definition as set of logical axiomsOntology definition as set of logical axioms

Commonalities -Commonalities -►Schemas and ontologies provide a vocabulary Schemas and ontologies provide a vocabulary

of terms that describes the domain of interestof terms that describes the domain of interest►Schemas and ontologies constrain the Schemas and ontologies constrain the

meaning of terms used in the vocabulary.meaning of terms used in the vocabulary.

Matching ProblemMatching Problem

►CorrespondenceCorrespondence Definition –Definition – Given 2 ontologies O and O’ , a Given 2 ontologies O and O’ , a

correspondence between M between O and correspondence between M between O and O’ is a 5-uple : <id,e,e’,R,n> such that:O’ is a 5-uple : <id,e,e’,R,n> such that:►id is a unique identifier of the correspondence.id is a unique identifier of the correspondence.►e and e’ are entities of O and O’ (e.g. XML e and e’ are entities of O and O’ (e.g. XML

Elements, classes)Elements, classes)►R is a relation (e.g. equivalence (=), disjointness R is a relation (e.g. equivalence (=), disjointness

(_|_))(_|_))►n is a confidence measure in some n is a confidence measure in some

mathematical structure (typically in the [0,1] mathematical structure (typically in the [0,1] range)range)

Matching ProblemMatching Problem

►AlignmentAlignment Definition – Definition – Given 2 ontologies O and O’, an alignment Given 2 ontologies O and O’, an alignment

A between O and O’:A between O and O’:►Is a set of correspondence on O and O’Is a set of correspondence on O and O’►With some cardinality: 1-1, 1-* etc.With some cardinality: 1-1, 1-* etc.►Some additional metadata (method, date, Some additional metadata (method, date,

properties etc)properties etc)

Matching ProcessMatching Process

Matching ProcessMatching Process

Matching ProcessMatching Process

Matching ProcessMatching Process

Matching ProcessMatching Process

Matching ProcessMatching Process

Matching ProcessMatching Process

►General Basic Matching ProcessGeneral Basic Matching Process

Matching ProcessMatching Process

►Sequential CompositionSequential Composition

Matching ProcessMatching Process

►Parallel compositionParallel composition

Matching ProcessMatching Process

►Similarity Filter, alignment extractor Similarity Filter, alignment extractor and alignment filter –and alignment filter –

Matching ProcessMatching Process

►Aggregation Operations –Aggregation Operations – There are many different ways to aggregate matcher results, There are many different ways to aggregate matcher results,

usually depending on confidence/similarity:usually depending on confidence/similarity:► Triangular norms (min, weighted products) useful for selecting Triangular norms (min, weighted products) useful for selecting

only the best resultsonly the best results► Multidimensional distances (Eudidean distance, weighted Multidimensional distances (Eudidean distance, weighted

sum) useful for taking into account all dimensionssum) useful for taking into account all dimensions► Fuzzy aggregation (min, weighted average) useful for Fuzzy aggregation (min, weighted average) useful for

aggregating competing algorithms and averaging their resultsaggregating competing algorithms and averaging their results► Other specific measures (e.g., ordered weighted average)Other specific measures (e.g., ordered weighted average)

Application DomainsApplication Domains

►Traditional - Traditional - Ontology evolutionOntology evolution Schema integrationSchema integration Catalog integrationCatalog integration Data integrationData integration

Application DomainsApplication Domains

►Ontology EvolutionOntology Evolution

Application DomainsApplication Domains

►Catalog IntegrationCatalog Integration

Application DomainsApplication Domains

►EmergentEmergent P2P information sharingP2P information sharing Agent communicationAgent communication Web service compositionWeb service composition Query answering on the webQuery answering on the web

Application DomainsApplication Domains

►P2P information sharingP2P information sharing

Application DomainsApplication Domains

►Web Service CompositionWeb Service Composition

Application DomainsApplication Domains

►Agent communicationAgent communication

ClassificationsClassifications

►Matching DimensionsMatching Dimensions Input DimensionsInput Dimensions

►Underlying models (e.g. XML, OWL)Underlying models (e.g. XML, OWL)►Schema Level Vs Instance LevelSchema Level Vs Instance Level

Process Dimensions Process Dimensions ►Approximate Vs ExactApproximate Vs Exact► Interpretation of the inputInterpretation of the input

Output DimensionsOutput Dimensions►CardinalityCardinality►Equivalence Vs Diverse relationsEquivalence Vs Diverse relations►Graded Vs Absolute ConfidenceGraded Vs Absolute Confidence

ClassificationsClassifications

►Three LayersThree Layers Upper LayerUpper Layer

► Granularity of matchGranularity of match► Interpretation of the input informationInterpretation of the input information

Middle LayerMiddle Layer► Represents classes of elementary (basic) matching techniquesRepresents classes of elementary (basic) matching techniques

Lower LayerLower Layer► Based on the kind of input which is used by elementary Based on the kind of input which is used by elementary

matching techniquesmatching techniques

ClassificationsClassifications►Classification of schema based Classification of schema based

techniquestechniques

Basic TechniquesBasic Techniques►Element Level TechniquesElement Level Techniques

String based – String based – Prefix -Prefix -

► Takes an input 2 strings and checks whether the first string Takes an input 2 strings and checks whether the first string starts with the second starts with the second

► e.g. net = network but also hot = hotele.g. net = network but also hot = hotel Suffix – Suffix –

► Takes an input 2 strings and checks whether the first string Takes an input 2 strings and checks whether the first string ends with the second ends with the second

► e.g. ID = PID but also word = sworde.g. ID = PID but also word = sword

Edit Distance –Edit Distance –► Takes as input 2 strings and calculates the number of edit Takes as input 2 strings and calculates the number of edit

operations (insertion,deletion,substitution) of characters operations (insertion,deletion,substitution) of characters required to transform one string into other normalized by required to transform one string into other normalized by length of the max string.length of the max string.

► editDistance(NKN, Nikon) = 0.4editDistance(NKN, Nikon) = 0.4

Basic TechniquesBasic Techniques Language based –Language based – Tokenization –Tokenization –

► Parses names into tokens by recognizing punctuation, casesParses names into tokens by recognizing punctuation, cases► Hands-Free_Kits Hands-Free_Kits <hands, free, kits> <hands, free, kits>

Lemmatization –Lemmatization –► Analyses morphologically tokens in order to find all their Analyses morphologically tokens in order to find all their

possible basic formspossible basic forms► Kits Kits Kit Kit

Elimination –Elimination –► Discards empty tokens that are articles, prepositions, Discards empty tokens that are articles, prepositions,

conjuctions conjuctions ► a, the, by, type of, their, from a, the, by, type of, their, from

Basic TechniquesBasic Techniques

►Structure Level Techniques Structure Level Techniques Ontologies are viewed as graph-like structure containing Ontologies are viewed as graph-like structure containing

terms and their inter-relationships.terms and their inter-relationships.

Taxonomy basedTaxonomy based►Bounded path matchingBounded path matching

These take 2 paths with links between classes These take 2 paths with links between classes defined by the hierarchical relations, compare terms defined by the hierarchical relations, compare terms and their positions along these paths and identify and their positions along these paths and identify similar terms.similar terms.

►Super(sub)-concept rules Super(sub)-concept rules If super concepts are the same, the actual concepts If super concepts are the same, the actual concepts

are similar to each otherare similar to each other

Basic TechniquesBasic Techniques

Tree based Tree based Children Children

►2 non leaf schema elements are structurally 2 non leaf schema elements are structurally similar if their immediate children sets are similar if their immediate children sets are highly similarhighly similar

Leaves Leaves ►2 non leaf schema elements are structurally 2 non leaf schema elements are structurally

similar if their leaf sets are highly similar, even similar if their leaf sets are highly similar, even if their immediate children are not.if their immediate children are not.

Basic TechniquesBasic Techniques

Basic TechniquesBasic Techniques

Basic TechniquesBasic Techniques

Summary and ChallengesSummary and Challenges

► SummarySummary Ontology Matching and alignment is the process of Ontology Matching and alignment is the process of

developing the common or most common developing the common or most common structure/semantic terms out of 2 or more different structure/semantic terms out of 2 or more different ontologies/structures/schemas.ontologies/structures/schemas.

Different efficient and complex algorithms using Different efficient and complex algorithms using basic techniques of matching process, can be basic techniques of matching process, can be developed for matching and alignment generation.developed for matching and alignment generation.

► ChallengesChallenges Developing generic and highly efficient matching Developing generic and highly efficient matching

and alignment generation algorithms.and alignment generation algorithms.

Thank YouThank You

Recommended