Upload
noah-valencia
View
50
Download
2
Tags:
Embed Size (px)
DESCRIPTION
Semantic Similarity Methods in WordNet and Their Application to Information Retrieval on the Web. Giannis Varelas Epimenidis Voutsakis Paraskevi Raftopoulou Euripides G.M. Petrakis Evangelos Milios. Semantic Similarity. - PowerPoint PPT Presentation
Citation preview
04/20/23 Semantic Similarity 1
Semantic Similarity Methods in WordNet andTheir Application to Information Retrieval onthe Web
Giannis VarelasEpimenidis VoutsakisParaskevi RaftopoulouEuripides G.M. PetrakisEvangelos Milios
04/20/23 Semantic Similarity 2
Semantic Similarity
Semantic Similarity relates to computing the conceptual similarity between terms which are not lexicographically similar “car” “automobile”
Map two terms to an ontology and compute their relationship in that ontology
04/20/23 Semantic Similarity 3
Objectives We investigate several Semantic Similarity
Methods and we evaluate their performance http://www.ece.tuc.gr/similarity
We propose the Semantic Similarity Retrieval Model (SSRM) for computing similarity between documents containing semantically similar but not necessarily lexicographically similar terms http://www.ece.tuc.gr/intellisearch
04/20/23 Semantic Similarity 4
Ontologies Tools of information representation on a
subject Hierarchical categorization of terms from
general to most specific terms object artifact construction stadium
Domain Ontologies representing knowledge of a domain e.g., MeSH medical ontology
General Ontologies representing common sense knowledge about the world e.g., WordNet
04/20/23 Semantic Similarity 5
WordNet A vocabulary and a thesaurus offering a
hierarchical categorization of natural language terms
More than 100,000 terms An ontology of natural language terms Nouns, verbs, adjectives and adverbs are
grouped into synonym sets (synsets) Synsets represent terms or concepts
stadium, bowl, arena, sports stadium – (a large structure for open-air sports or entertainments)
04/20/23 Semantic Similarity 6
WordNet Hierarchies The synsets are also organized into senses Senses: Different meanings of the same
term The synsets are related to other synsets
higher or lower in the hierarchy by different types of relationships e.g. Hyponym/Hypernym (Is-A relationships) Meronym/Holonym (Part-Of relationships)
Nine noun and several verb Is-A hierarchies
04/20/23 Semantic Similarity 7
A Fragment of the WordNet Is-A Hierarchy
04/20/23 Semantic Similarity 8
04/20/23 Semantic Similarity 9
Semantic Similarity Methods Map terms to an ontology and compute
their relationship in that ontology Four main categories of methods:
Edge counting: path length between terms Information content: as a function of their
probability of occurrence in corpus Feature based: similarity between their
properties (e.g., definitions) or based on their relationships to other similar terms
Hybrid: combine the above ideas
04/20/23 Semantic Similarity 10
Example Edge counting
distance between “conveyance” and “ceramic” is 2
An information content method, would associate the two terms with their common subsumer and with their probabilities of occurrence in a corpus
04/20/23 Semantic Similarity 11
Semantic Similarity on WordNet
The most popular methods are evaluated
All methods applied on a set of 38 term pairs
Their similarity values are correlated with scores obtained by humans
The higher the correlation of a method the better the method is
04/20/23 Semantic Similarity 12
EvaluationMethod Type Correlation
Rada 1989 Edge Counting 0.59
Wu 1994 Edge Counting 0.74
Li 2003 Edge Counting 0.82
Leackok 1998 Edge Counting 0.82
Richardson 1994 Edge Counting 0.63
Resnik 1999 Info. Content 0.79
Lin 1993 Info. Content 0.82
Lord 2003 Info. Content 0.79
Jiang 1998 Info. Content 0.83
Tversky 1977 Feature Based 0.73
Rodriguez 2003 Hybrid 0.71
04/20/23 Semantic Similarity 13
Observations Edge counting/Info. Content methods work
by exploiting structure information Good methods take the position of the
terms into account Higher similarity for terms which are close
together but lower in the hierarchy e.g., [Li et.al. 2003]
Information Content is measured on WordNet rather than on corpus [Seco2002]
Similarity only for nouns and verbs No taxonomic structure for other p.o.s
04/20/23 Semantic Similarity 14
http://www.ece.tuc.gr/similarity
04/20/23 Semantic Similarity 15
Semantic Similarity Retrieval Model (SSRM)
Classic retrieval models retrieve documents with the same query terms
SSRM will retrieve documents which also contain semantically similar terms
Queries and documents are initially assigned tfxidf weights
q=(q1,q2,…qN) , d=(d1,d2,…dN)
04/20/23 Semantic Similarity 16
SSRM
I. Query term re-weighting similar terms reinforce each other
I. Query term expansion with synonyms and similar terms
II. Document similarity
ij
tjisimjii jisimqqq
),(
),('
),(1
'),(
jisimqn
qqji
QjTjisim
jii
tjisim
dq
jisimdqdqSim
i j ji
i j ji
),(
),(),(
04/20/23 Semantic Similarity 17
Query Term Expansion
04/20/23 Semantic Similarity 18
Observations Specification of T ? Large T may lead to topic drift Word sense disambiguation for expanding
with the correct sense Expansion with co-concurring terms?
SVD, local/global analysis
Semantic similarity between terms of different parts of speech?
Work with compound terms (phrases)
04/20/23 Semantic Similarity 19
Evaluation of SSRM SSRM is evaluated through
intellisearch a system for information retrieval on the WWW
1,5 Million Web pages with images Images are described by surrounding
text The problem of image retrieval is
transformed into a problem of text retrieval
04/20/23 Semantic Similarity 20
http://www.ece.tuc.gr/intellisearch
04/20/23 Semantic Similarity 21
Methods
Vector Space Model (VSM) SSRM Each method is represented by a
precision/recall plot Each point is the average
precision/recall over 20 queries 20 queries from the list of the most
frequent Google image queries
04/20/23 Semantic Similarity 22
Experimental Results
04/20/23 Semantic Similarity 23
MeSH and MedLine
MeSH: ontology for medical and biological terms by the N.L.M. 22,000 terms
MedLine: the premier bibliographic medical database of N.L.M. 13 Million references
04/20/23 Semantic Similarity 24
Evaluation on MedLine
04/20/23 Semantic Similarity 25
Conclusions Semantic similarity methods
approximated the human notion of similarity reaching correlation up to 83%
SSRM exploits this information for improving the performance of retrieval
SSRM can work with any semantic similarity method and any ontology
04/20/23 Semantic Similarity 26
Future Work Experimentation with more data sets
(TREC) and ontologies Extend SSRM to work with
Compound terms More parts of speech (e.g., adverbs) Co-occurring terms More terms relationships in WordNet More elaborate methods for specification
of thresholds
04/20/23 Semantic Similarity 27
Try our system on the Web
Semantic Similarity System: http://www.ece.tuc.gr/similarity
SRRM: http://www.ece.tuc.gr/intellisearch