27
06/20/22 Semantic Similarity 1 Semantic Similarity Methods in WordNet and Their Application to Information Retrieval on the Web Giannis Varelas Epimenidis Voutsakis Paraskevi Raftopoulou Euripides G.M. Petrakis Evangelos Milios

Semantic Similarity Methods in WordNet and Their Application to Information Retrieval on the Web

Embed Size (px)

DESCRIPTION

Semantic Similarity Methods in WordNet and Their Application to Information Retrieval on the Web. Giannis Varelas Epimenidis Voutsakis Paraskevi Raftopoulou Euripides G.M. Petrakis Evangelos Milios. Semantic Similarity. - PowerPoint PPT Presentation

Citation preview

Page 1: Semantic Similarity Methods in WordNet and Their Application to Information Retrieval on the Web

04/20/23 Semantic Similarity 1

Semantic Similarity Methods in WordNet andTheir Application to Information Retrieval onthe Web

Giannis VarelasEpimenidis VoutsakisParaskevi RaftopoulouEuripides G.M. PetrakisEvangelos Milios

Page 2: Semantic Similarity Methods in WordNet and Their Application to Information Retrieval on the Web

04/20/23 Semantic Similarity 2

Semantic Similarity

Semantic Similarity relates to computing the conceptual similarity between terms which are not lexicographically similar “car” “automobile”

Map two terms to an ontology and compute their relationship in that ontology

Page 3: Semantic Similarity Methods in WordNet and Their Application to Information Retrieval on the Web

04/20/23 Semantic Similarity 3

Objectives We investigate several Semantic Similarity

Methods and we evaluate their performance http://www.ece.tuc.gr/similarity

We propose the Semantic Similarity Retrieval Model (SSRM) for computing similarity between documents containing semantically similar but not necessarily lexicographically similar terms http://www.ece.tuc.gr/intellisearch

Page 4: Semantic Similarity Methods in WordNet and Their Application to Information Retrieval on the Web

04/20/23 Semantic Similarity 4

Ontologies Tools of information representation on a

subject Hierarchical categorization of terms from

general to most specific terms object artifact construction stadium

Domain Ontologies representing knowledge of a domain e.g., MeSH medical ontology

General Ontologies representing common sense knowledge about the world e.g., WordNet

Page 5: Semantic Similarity Methods in WordNet and Their Application to Information Retrieval on the Web

04/20/23 Semantic Similarity 5

WordNet A vocabulary and a thesaurus offering a

hierarchical categorization of natural language terms

More than 100,000 terms An ontology of natural language terms Nouns, verbs, adjectives and adverbs are

grouped into synonym sets (synsets) Synsets represent terms or concepts

stadium, bowl, arena, sports stadium – (a large structure for open-air sports or entertainments)

Page 6: Semantic Similarity Methods in WordNet and Their Application to Information Retrieval on the Web

04/20/23 Semantic Similarity 6

WordNet Hierarchies The synsets are also organized into senses Senses: Different meanings of the same

term The synsets are related to other synsets

higher or lower in the hierarchy by different types of relationships e.g. Hyponym/Hypernym (Is-A relationships) Meronym/Holonym (Part-Of relationships)

Nine noun and several verb Is-A hierarchies

Page 7: Semantic Similarity Methods in WordNet and Their Application to Information Retrieval on the Web

04/20/23 Semantic Similarity 7

A Fragment of the WordNet Is-A Hierarchy

Page 8: Semantic Similarity Methods in WordNet and Their Application to Information Retrieval on the Web

04/20/23 Semantic Similarity 8

Page 9: Semantic Similarity Methods in WordNet and Their Application to Information Retrieval on the Web

04/20/23 Semantic Similarity 9

Semantic Similarity Methods Map terms to an ontology and compute

their relationship in that ontology Four main categories of methods:

Edge counting: path length between terms Information content: as a function of their

probability of occurrence in corpus Feature based: similarity between their

properties (e.g., definitions) or based on their relationships to other similar terms

Hybrid: combine the above ideas

Page 10: Semantic Similarity Methods in WordNet and Their Application to Information Retrieval on the Web

04/20/23 Semantic Similarity 10

Example Edge counting

distance between “conveyance” and “ceramic” is 2

An information content method, would associate the two terms with their common subsumer and with their probabilities of occurrence in a corpus

Page 11: Semantic Similarity Methods in WordNet and Their Application to Information Retrieval on the Web

04/20/23 Semantic Similarity 11

Semantic Similarity on WordNet

The most popular methods are evaluated

All methods applied on a set of 38 term pairs

Their similarity values are correlated with scores obtained by humans

The higher the correlation of a method the better the method is

Page 12: Semantic Similarity Methods in WordNet and Their Application to Information Retrieval on the Web

04/20/23 Semantic Similarity 12

EvaluationMethod Type Correlation

Rada 1989 Edge Counting 0.59

Wu 1994 Edge Counting 0.74

Li 2003 Edge Counting 0.82

Leackok 1998 Edge Counting 0.82

Richardson 1994 Edge Counting 0.63

Resnik 1999 Info. Content 0.79

Lin 1993 Info. Content 0.82

Lord 2003 Info. Content 0.79

Jiang 1998 Info. Content 0.83

Tversky 1977 Feature Based 0.73

Rodriguez 2003 Hybrid 0.71

Page 13: Semantic Similarity Methods in WordNet and Their Application to Information Retrieval on the Web

04/20/23 Semantic Similarity 13

Observations Edge counting/Info. Content methods work

by exploiting structure information Good methods take the position of the

terms into account Higher similarity for terms which are close

together but lower in the hierarchy e.g., [Li et.al. 2003]

Information Content is measured on WordNet rather than on corpus [Seco2002]

Similarity only for nouns and verbs No taxonomic structure for other p.o.s

Page 14: Semantic Similarity Methods in WordNet and Their Application to Information Retrieval on the Web

04/20/23 Semantic Similarity 14

http://www.ece.tuc.gr/similarity

Page 15: Semantic Similarity Methods in WordNet and Their Application to Information Retrieval on the Web

04/20/23 Semantic Similarity 15

Semantic Similarity Retrieval Model (SSRM)

Classic retrieval models retrieve documents with the same query terms

SSRM will retrieve documents which also contain semantically similar terms

Queries and documents are initially assigned tfxidf weights

q=(q1,q2,…qN) , d=(d1,d2,…dN)

Page 16: Semantic Similarity Methods in WordNet and Their Application to Information Retrieval on the Web

04/20/23 Semantic Similarity 16

SSRM

I. Query term re-weighting similar terms reinforce each other

I. Query term expansion with synonyms and similar terms

II. Document similarity

ij

tjisimjii jisimqqq

),(

),('

),(1

'),(

jisimqn

qqji

QjTjisim

jii

tjisim

dq

jisimdqdqSim

i j ji

i j ji

),(

),(),(

Page 17: Semantic Similarity Methods in WordNet and Their Application to Information Retrieval on the Web

04/20/23 Semantic Similarity 17

Query Term Expansion

Page 18: Semantic Similarity Methods in WordNet and Their Application to Information Retrieval on the Web

04/20/23 Semantic Similarity 18

Observations Specification of T ? Large T may lead to topic drift Word sense disambiguation for expanding

with the correct sense Expansion with co-concurring terms?

SVD, local/global analysis

Semantic similarity between terms of different parts of speech?

Work with compound terms (phrases)

Page 19: Semantic Similarity Methods in WordNet and Their Application to Information Retrieval on the Web

04/20/23 Semantic Similarity 19

Evaluation of SSRM SSRM is evaluated through

intellisearch a system for information retrieval on the WWW

1,5 Million Web pages with images Images are described by surrounding

text The problem of image retrieval is

transformed into a problem of text retrieval

Page 20: Semantic Similarity Methods in WordNet and Their Application to Information Retrieval on the Web

04/20/23 Semantic Similarity 20

http://www.ece.tuc.gr/intellisearch

Page 21: Semantic Similarity Methods in WordNet and Their Application to Information Retrieval on the Web

04/20/23 Semantic Similarity 21

Methods

Vector Space Model (VSM) SSRM Each method is represented by a

precision/recall plot Each point is the average

precision/recall over 20 queries 20 queries from the list of the most

frequent Google image queries

Page 22: Semantic Similarity Methods in WordNet and Their Application to Information Retrieval on the Web

04/20/23 Semantic Similarity 22

Experimental Results

Page 23: Semantic Similarity Methods in WordNet and Their Application to Information Retrieval on the Web

04/20/23 Semantic Similarity 23

MeSH and MedLine

MeSH: ontology for medical and biological terms by the N.L.M. 22,000 terms

MedLine: the premier bibliographic medical database of N.L.M. 13 Million references

Page 24: Semantic Similarity Methods in WordNet and Their Application to Information Retrieval on the Web

04/20/23 Semantic Similarity 24

Evaluation on MedLine

Page 25: Semantic Similarity Methods in WordNet and Their Application to Information Retrieval on the Web

04/20/23 Semantic Similarity 25

Conclusions Semantic similarity methods

approximated the human notion of similarity reaching correlation up to 83%

SSRM exploits this information for improving the performance of retrieval

SSRM can work with any semantic similarity method and any ontology

Page 26: Semantic Similarity Methods in WordNet and Their Application to Information Retrieval on the Web

04/20/23 Semantic Similarity 26

Future Work Experimentation with more data sets

(TREC) and ontologies Extend SSRM to work with

Compound terms More parts of speech (e.g., adverbs) Co-occurring terms More terms relationships in WordNet More elaborate methods for specification

of thresholds

Page 27: Semantic Similarity Methods in WordNet and Their Application to Information Retrieval on the Web

04/20/23 Semantic Similarity 27

Try our system on the Web

Semantic Similarity System: http://www.ece.tuc.gr/similarity

SRRM: http://www.ece.tuc.gr/intellisearch