30
WORD SENSE DISAMBIGUATION STUDY ON WORD NET ONTOLOGY Akilan Velmurugan Computer Networks – CS 790G

WORD SENSE DISAMBIGUATION STUDY ON WORD NET ONTOLOGY Akilan Velmurugan Computer Networks – CS 790G

Embed Size (px)

Citation preview

WORD SENSE DISAMBIGUATION

STUDY ON WORD NET ONTOLOGYAkilan VelmuruganComputer Networks – CS 790G

Overview

What is WSD ? How wordnet is analyzed as a Complex

Network What are the results

Project Methodology Area of study

Key Findings/Results New approaches Improvement techniques

Conclusion

Project Description

Objective Study on WSD

Effects of WSD in Word Sense Ontology Characteristics of WordNet

Results How do match words with other words

Parameters taken for study of word sense Improvise them by making necessary changes

Study network characteristics

WordNet - overview

Machine readable semantic dictionary interlinked by semantic relations

Developed at Princeton University as a large lexical database for English language

Most widely used linguistic resource Free for public (GPL ) Forms a scale free network with small

average shortest path having words as nodes and concepts as links

Easily navigable

WordNet (Structure)

Shows the relation in the form of Noun, Verb, Adjective, adverb

Synonym Hypernym (Is a kind of …) Hyponym (… Is a kind of) Troponym (particular ways to …) Meronym (parts of …) ---- about 25 relations

Also available for online navigation

WordNet online - by Princeton University

WordNet Browser

WordNet (working)

WSD: Corpus based approaches

Set of samples that enables the system Knowledge based approaches

Machine readable dictionary with relations

WordNet Research Open source

Ranking of synsets derived from word frequencies in the British National Corpus

Top 1000 Content manipulation of text

Dataset I – controlled and calibrated study Dataset II – collected using mechanical trunk using pairs

Word Sense Disambiguation (WSD) Task of determining the meaning of an

ambiguous word in the given context Bank

Edge of a riveror

Financial institution that accepts money Refers to the resolution of lexical

semantic ambiguity and its goal is to attribute the correct senses to words (AI-complete problem)

WSD: Area of Research

Assigning correct sense to words having electronic dictionary as source of word definitions

Open research field in Natural Language Processing (NLP)

Hard Problem which is a popular area for research

Used in speech synthesis by identifying the correct sense of the word

JavaScript Visual WordNet

Visual Thesaurus

WordNet – Theoretical aspects Wordnet – word sense ontology

Symbols are words Synset: list of words and semantic relations

between them Word sense disambiguation

Wordnet structure using latent semantics Variable lexical notation for a concept Citibase – Thesaurus Semantic relatedness And few others…

WSD: using latent semantics Measures the semantic distance of concepts Relatedness and between-ness are calculated Matrix form of wordnet data structure is used Can be used to integrate with other applications Uses Singular Value Decomposition (SVD)

algorithm Example: Multiple synsets are

{car, gondola} {car, railway car} {car, automobile}

{Motor vehicle}, {Coupe}, {Sedan}, {Taxi}

MDS-example

1 2 3 4 5 6 7 8 9 10 11 12 131 0 1 1 1 2 2 3 1 1 2 4 2 22 1 0 2 2 1 2 3 2 2 3 4 3 33 1 2 0 2 3 3 4 2 2 3 5 3 34 1 2 2 0 3 2 3 2 2 1 4 1 35 2 1 3 3 0 1 2 2 2 2 3 3 36 2 2 3 2 1 0 1 1 1 1 2 2 27 3 3 4 3 2 1 0 2 2 2 1 3 38 1 2 2 2 2 1 2 0 2 2 3 3 19 1 2 2 2 2 1 2 2 0 2 3 3 1

10 2 3 3 1 2 1 2 2 2 0 3 1 3

11 4 4 5 4 3 2 1 3 3 3 0 4 412 2 3 3 1 3 2 3 3 3 1 4 0 413 2 3 3 3 3 2 3 1 1 3 4 4 0

1, 2, 3, 4, 10,

12

5, 6, 7, 8, 9, 11,

13

Geodesic Distance Matrix

MDS

k-means

S

15

WSD: using latent semantics

WSD: variable lexical notations for a concept

Generic concept notation: D = I ∪ J ∪ K∴ J = D − (I ∪ K) = (D − I )∩(D − K) = D∩ (I∪ K) J = D∩ ( I ∩K)since, B = D ∪ E ∪ F D = B − (E∪F) =(B − E)∩(B − F) = B∩(E ∪F) D =B ∩(E ∩ F)

Source: Proceedings of the 20th International Conference on Advanced Information Networking and Applications

¯¯¯¯

¯ ¯

¯¯¯¯

¯ ¯

WSD: variable lexical notations for a concept

J = D∩ ( I ∩K) =( B∩(E ∩ F) )∩( I ∩ K) J = B∩( (E ∩ F)∩( I ∩

K) )when J = fly, D = fish lure I = spinner k = troll And introducing boolean

operators, AND for ∩ OR for ∪ NOT for

¯ ¯

¯ ¯ ¯ ¯

¯ ¯ ¯ ¯

¯Source: Proceedings of the 20th International Conference on Advanced Information Networking and Applications

WSD: variable lexical notations for a concept

(“fly”) becomes : (“fisherman's lure” OR “fish

lure”) AND ( (NOT “spinner”) AND (NOT “troll”) )

then B = lure,

E = ground bait,

F = stool pigeon

(“fly”) becomes :

(“bait” OR “decoy” OR “lure”) AND ( ((NOT “ground bait”) AND (NOT “stoolpigeon”) AND((NOT “spinner”)AND(NOT “troll”)) )

Source: Proceedings of the 20th International Conference on Advanced Information Networking and Applications

Thesaurus as a complex network

As a Directed Graph sink composed of the 73,046

terms with kout = 0 source are the 30,260 terms

with at least one outgoing link (kout > 0) – Root words

absolute source : without incoming links kin = 0

normal source : (kout > 0 and kin > 0)

bridge source : without outgoing links to root words (kout(source) = 0)

1 – Normal source2 – Bridge source3 – Absolute source4 – sink

Source: arXiv:cond-mat/0312586 v1 2003

WSD: Semantic relatedness and word sense disambiguation

Source: Proceedings of the 20th International Conference on Advanced Information Networking and Applications

Concepts that occur more frequently and closer with each others are “more related” to each others than the concepts that appear less frequently and farther one

WordNet Relationship

Semantic relatedness Involves relationships among words

car-wheel (meronym) hot-cold (antonym) pencil-paper (functional) penguin-antarctica (association) Bank-trust company (synonym)

Probability and Distance calculation Frequency of synsets or words

Performance in NLP applications

WordNet Relationship Browser

WordNet Connect

Program to find all possible connections between two words in WordNet

Used in computing Semantic Opposition among word sense ontology

WordNet lexical database dictionary is used to read the semantic relations

Capabilities like number of paths, shortest path, overall network structure is studied

WordNet Connect

WordNet Connect

WordNet Connect

Future work

WordNet structure in terms of complex network

Key assumptions WordNet lexical dictionary analyzed under the

scope of source node, target node with an additional reference node

Achieve a cost effective path which is conditionally related to mean reference node

Control the path traversal with a relation of focus Include Common File Number to make it more

efficient

Conclusion

A single visualization can not reveal the entire structure of wordnet

There are different ways of analyzing the effectiveness of the overall system

A new method to evaluate the usefullness of the WordNet network structure

Questions and Comments