Oana Adriana Şoica

Preview:

DESCRIPTION

Building and Ordering a SenDiS Lexicon Network. Oana Adriana Şoica. SenDiS operates on a specific lexicon network ( LexNet ) – “sense tagged glosses” relations lexicon networks obtained from other semantic / lexical relations obtaining a SenDiS LexNet : - PowerPoint PPT Presentation

Citation preview

Oana Adriana Şoica

Building and Ordering a SenDiS Lexicon Network

Page 2

SenDiS

SenDiS operates on a specific lexicon network (LexNet) – “sense tagged glosses” relations

lexicon networks obtained from other semantic / lexical relations

obtaining a SenDiS LexNet: build a “sense tagged glosses” LexNet

(manually annotate the lexicon with a specific tool) import a “sense tagged glosses” LexNet (WordNet tagged glosses, as of 2008)

preprocessing (ordering) the SenDiS LexNet (before WSD) truncation of the LexNet leveling the LexNet

Outline

Page 3

SenDiS

o hypernyms

o hyponyms

o similar to

o has part

o synonyms

o antonyms

o holonyms

o meronyms

o coordinate terms

o troponyms

o entailment

Semantic/Lexical Relations

Page 4

SenDiS

An excerpt of the WordNet semantic network* Navigli, R. 2009.Word sense disambiguation: A survey. ACM Comput. Surv. 41, 2, Article 10 (2009)

Semantic/Lexical relations: WordNet

Page 5

SenDiSSemantic/Lexical relations: GRAALANTail of relation Head of relation Relation type

{synonym } {synonym} Bidirectional, symmetric

{antonym } {antonym} Bidirectional, symmetric

{paronym} {paronym} Bidirectional, symmetric

{ hypernym } {hyponym} Bidirectional, asymmetric

{connotation} - Unidirectional

{holonym} {meronym} Bidirectional, asymmetric

{homonym} {homonym} Bidirectional, symmetric

{heteronym} {heteronym} Bidirectional, symmetric

{homophone} {homophone} Bidirectional, symmetric

{diminutive of} {diminutive by} Bidirectional, asymmetric

{augmentative of} {augmentative by} Bidirectional, asymmetric

{extension from} {extension into} Bidirectional, asymmetric

{reduction from} {reduction into} Bidirectional, asymmetric

{generalization from} {generalization into} Bidirectional, asymmetric

{specialization from} {specialization into} Bidirectional, asymmetric

{figurative of} {literal for} Bidirectional, asymmetric

{reference to} - Unidirectional

{derived from} {derived into} Bidirectional, asymmetric

{back formatted form} {back formats} Bidirectional, asymmetric

{abstract for} {concretized from} Bidirectional, asymmetric

{with variant} {variant for} Bidirectional, asymmetric

Page 6

SenDiS

manually annotating the glosses from a lexicon(using a specific tool that can ease the process)

importing an existing “gloss tagged” lexicon net (also obtained manually or semi-automatically), this usually translates in a dependency to a specific list of meanings/glosses

Obtaining a SenDiS LexNet

Page 7

SenDiS

o implied a significant effort, usually measured in months, involving several trained linguists

o using a specialized collaborative tool(BuildLNTool – Build Lexicon Network Tool)

o enriching the “gloss tagged” relation with three relative degrees of importance (in the gloss context) weak medium strong or ignoring the gloss word

o SenDiS objective, two LexNets: “gloss tagged” LexNet for the Romanian language “gloss tagged” LexNet for the English language

Creating the SenDiS LexNet

Page 8

SenDiS

o BuildLNTool (Build Lexicon Network Tool) provides:

a visual and effective mechanism to manually annotate the lexicon glosses

a synchronized overview of the already created relations

a browsing mechanism for inspecting the already tagged glosses and relations

BuildLNTool

Page 9

SenDiS

“Lemmas & MWEs” “Lemma \ MWE Info” “Competence & Definition Trees”

“Root & Leaf Meanings” Messages and progress

BuildLNTool - Sections

Page 10

SenDiS

o “Lemmas & MWEs”: list of lexicon entries

o “Root & Leaf Meanings”: list of roots and leafs for the lexicon network

o “Lemma/MWE Info”: current lexicon entry being analyzed

o “Competence & Definition Trees”: spanning trees for a given meaning over the current lexicon net

o section for messages and progress

BuildLNTool – Sections II

Page 11

SenDiS

selection of lexicon entry type

selection of unfinished lexicon entries filter

selection of viewing interval

text filter

lexicon entry text

lexicon entry status

BuildLNTool – Lemmas & MWEs

Page 12

SenDiS

double click

BuildLNTool – Selection of a current lexicon entry

Page 13

SenDiS

lexicon entry text morphologic interpretation

list of meanings filters

meaning/gloss fully tagged

meaning/gloss partially tagged

meaning/gloss not tagged

BuildLNTool – Browsing the meanings of the current lexicon entry

Page 14

SenDiS

double click

BuildLNTool – Selection of a current meaning for tagging

Page 15

SenDiS

unrecognizedgloss constituent

‘Enter’

BuildLNTool – Gloss constituent without interpretations

Page 16

SenDiS

Default setting: Medium

BuildLNTool – Degrees of relevance (in gloss context)

Page 17

SenDiS

‘Strong’ tokens

‘Medium’ tokens

‘Weak’ tokens

Ignored (X) tokens

BuildLNTool – Degrees of relevance II

Page 18

SenDiS

Unsavedannotations

Savedannotations

BuildLNTool – Gloss tagging

Page 19

SenDiS

view of meaning tagging tree

selection of constituent / group of gloss constituents

set / modifyrelevance degree

edit textof gloss constituent

select / modify the sense for the gloss constituent

further annotate meaning / save annotations

chose the next meaning

further on

save annotations

current gloss constituent

withoutsense interpretations

BuildLNTool – Gloss tagging protocol

Page 20

SenDiS

LexNets AllTokens OperatedTokens OpTokensValid OpTokensRelated OpTokens V & R

LL_Romanian - 99% 1,528,819 1,191,942 691,010 720,420 686,210

LL_English - 2% 36,828 30,350 18,523 17,641 17,505

LexNets Glosses Tagged Glosses Targeted Glosses Tags Density

LL_Romanian - 99% 130,087 118,536 58,976 0.5757

LL_English - 2% 259,651 3,496 7,551 0.5767

Built LexNets for Romanian and English

Page 21

SenDiS

o WordNet (3.0) is organized in synsets 117,659 synsets 155,287 words (lexicon entries) 206,941 word-sense pairs (gloss + usage examples)

o the synsets were split and transformed in to a classical lexicon format

o the lexicon network imported:

LexNets Glosses Tagged Glosses Targeted Glosses Tags Density

WordNet 206,941 206,938 59,251 0.3486

WordNet_extendedGlosses 206,941 206,941 83,174 0.3006

LexNets AllTokens OperatedTokens OpTokensValid OpTokensRelated OpTokens V & R

WordNet 2,394,190 2,394,190 2,394,189 834,803 834,803WordNet_extendedGlosses 3,114,968 3,114,968 3,114,967 936,397 936,397

Imported WordNet tagged glosses

Page 22

SenDiS

o “gloss tagged” lexicon nets are large and dense graphs between 100,000 and 200.000 vertices over 1,000,000 edges / arcs

o to ease the operation with such graphs, “gloss tagged” lexicon nets can be preprocessed and optimized truncation of a lexicon net leveling of a lexicon net

o aims when optimizing a lexicon net elimination of loops or strong connected components a minimum number of removed edges leveling on a minimum number of levels minimization/maximization of roots/leafs vertices

Ordering a SenDiS LexNet

Page 23

SenDiS

e9

e4 e5 e6 e7

e8

e1 e2 e3

A minimal lexicon net in the original form

Unordered LexNet

Page 24

SenDiS

9

1

2

3

4

5

6

7

8

V

e11

e1

e2

e3

e4

e5

e6

e7

e8

e9

10

e10

11

B

The same minimal lexicon net leveled

Ordered (leveled) LexNet

Page 25

SenDiS

LNs Vertices Edges InOLN

Algorithm Edges Out Edges Removed Levels Time (s)

wn 202,361 834,803 Patentv1 821,048 13,755 192 4.5

wn_ex 205,188 936,397 Patentv1 936,397 74,526 382 5.7

ro_48% 72,067 318,741 Patentv1 308,592 10,149 195 1.6

ro_78% 100,175 523,192 Patentv1 504,210 18,982 244 2.3

ro_99% 120,472 686,784 Patentv1 659,030 27,754 291 2.8

ro_48% 130,407 318,741 NT_eades 308,334 10,407 58 60

ro_99% 130,099 686,784 NT_eades 654,025 32,759 70 330

wn_ex 206,941 936,397 NT_eades 904,992 31,405 46 1,315

Results on leveling experimental LexNets

Recommended