Upload
nitish-aggarwal
View
1.248
Download
8
Embed Size (px)
DESCRIPTION
Citation preview
Copyright 2010 Digital Enterprise Research Institute. All rights reserved.
Digital Enterprise Research Institute www.deri.ie
1
A Similarity Measure Based on Semantic and Linguistic Information
Nitish AggarwalDERI, NUI Galway
Wednesday,15th June, 2011DERI, Reading Group
Digital Enterprise Research Institute www.deri.ie
Based On:
“A Feature and Information Theoretic Framework for Semantic Similarity and Relatedness”
Authors: Giuseppe Pirro and JeoromeEuzenat
Published: International Semantic Web Conference, 2010
“SyMSS: A syntax-based measure for short-text semantic similarity ”
Author: J. Oliva, J. Serrano, M. Castillo, and Ángel Iglesias
Published: Journal Data & Knowledge Engineering, Volume 70 Issue 4 April,2011
2
Digital Enterprise Research Institute www.deri.ie
Overview
Introduction
Classical Approaches
Ontology-based Similarity
Set of relations
Information Content
SyMSS (Syntax-based) Deep Parsing
Influence of adjectives and adverbs
Conclusion
3
Digital Enterprise Research Institute www.deri.ie
Introduction & Motivation
Short-text Similarity Lack of Semantics and Linguistics
Applications Semantic Annotation Semantic Search Information Retrieval and Extraction
4
Digital Enterprise Research Institute www.deri.ie
Classical Approaches
String Similarity Levenshtein distance, Dice Coefficient
Corpus-based ESA, Google distance,Vector-Space Model
Ontology-based Path distance, Information content
Syntax Similarity Word-order, Part of Speech
5
Digital Enterprise Research Institute www.deri.ie
First Paper:
“A Feature and Information Theoretic Framework for Semantic Similarity and Relatedness”
Authors: Giuseppe Pirro and JeoromeEuzenat
Published: International Semantic Web Conference, 2010
“SyMSS: A syntax-based measure for short-text semantic similarity ”
Author: J. Oliva, J. Serrano, M. Castillo, and Ángel Iglesias
Published: Journal Data & Knowledge Engineering, Volume 70 Issue 4 April,2011
6
Digital Enterprise Research Institute www.deri.ie
Ontology-based - Overview
Features Whole set of semantic relations defined in an ontology
Resnik’s Information Content IC(c) = -log p(c)
Intrinsic Information Content Overcome the analysis of large corpora
Extended Information Content Map feature-based model to information theoretic
domain
7
Digital Enterprise Research Institute www.deri.ie
Ontology-based - Why whole set?
8
Eyes Ears
Relation: Part of
Digital Enterprise Research Institute www.deri.ie
Ontology-based - model
Tversky’s feature-based similarity model common features of two concepts ~ similarity Extra feature ~ 1/similarity .
Ratio-base formulation of Tverky’s model
.
9
Digital Enterprise Research Institute www.deri.ie
Ontology-based - Mapping
1
10
Mapping between feature-based and information theoretic similarity models
1. MSCA: Most Specific Common Abstraction
Digital Enterprise Research Institute www.deri.ie
Ontology-based - Example
11
T1: Car
T2: Bicycle
Example of Concept Feature
Digital Enterprise Research Institute www.deri.ie
Ontology-based - Example
12
T1: Car
T2: Bicycle
Example of Concept Feature
Digital Enterprise Research Institute www.deri.ie
Ontology-based - Framework
Intrinsic information content(iIC)
.
where sub(c) is number of sub-concept of given concept c.
Extended information content(eIC) where EIC(c) is relatedness coefficient using all kind of relations
13
Digital Enterprise Research Institute www.deri.ie DataSet: 65 human evaluated pairs
Correlation values:
14
Ontology-based – Evaluation of Similarity
Digital Enterprise Research Institute www.deri.ie
Ontology-based – Evaluation of Relatedness
DataSet : Wordnet 353
Correlation value:
15
Digital Enterprise Research Institute www.deri.ie
16
Ontology-based - Summary
Intrinsic similarity measure Ontology-based similarity Outperforms corpus measures
Limitation No short-text Model-based
– E,g, only concepts in the ontology are considered (e.g. car accident)
Digital Enterprise Research Institute www.deri.ie
Second paper (SyMSS)
“A Feature and Information Theoretic Framework for Semantic Similarity and Relatedness”
Authors: Giuseppe Pirro and JeoromeEuzenat
Published: International Semantic Web Conference, 2010
“SyMSS: A syntax-based measure for short-text semantic similarity ”
Author: J. Oliva, J. Serrano, M. Castillo, and Ángel Iglesias
Published: Journal Data & Knowledge Engineering, Volume 70 Issue 4 April,2011
17
Digital Enterprise Research Institute www.deri.ie
SyMSS - Overview
SyMSS = “syntax-based similarity for short-term text”
Syntactic Information Not only word order Deep Parsing Parts of speech
Semantic Information Wordnet similarity Different ontology-based similarity
18
Digital Enterprise Research Institute www.deri.ie
SyMSS - Semantic Information
Path-base measure Shortest path Hirst and st. Onge (HSO)
Information Content Resnik measure Jiang and Corath measure Lin measure
Gloss-base measure Gloss Overlap and Gloss vector
19
Digital Enterprise Research Institute www.deri.ie
SyMSS - Syntactic Information
Parse tree phrases Head of phrases
Head similarity Head of phrases which have same syntactic function
Penalization factor Non shared phrases
20
Digital Enterprise Research Institute www.deri.ie
SyMSS - Model
My brother has a dog with four legs
My brother has four legs
Sim(Has,Has) = 1
Sim(brother,brother) = 1Sim(dog,leg) = 0.1414
PF = 0.03
Digital Enterprise Research Institute www.deri.ie
SyMSS - Evaluation
DataSet: 30 pairs out of 65 human evaluated pairs
Correlation values:
22
Digital Enterprise Research Institute www.deri.ie
SyMSS - Effect of adverb and adjective
Sentence1: ”I have a big dog”
Sentence2: ”I have a little dog”
8.68% gain in SyMSS with HSO
23
Digital Enterprise Research Institute www.deri.ie
24
SyMSS - Summary
Syntax-based similarity considers… Nouns and verbs Influence of adjectives and adverbs
Limitation Depend on parsed structure
– E.g. not grammatically correct Depend on word similarity
Digital Enterprise Research Institute www.deri.ie
25
Conclusion
No established method for short text Parsing of phrases is difficult
Concept similarity depend on model Weak model
– E.g. xebr: Extraordinary Income and xebr: Other Operating Income ->
Pathlength = 0.2 and Expert = 0.8
Need a syntactic similarity for concepts tag (word or phrase)