CLIN 25

SYNSET EMBEDDINGS FOR NAMED ENTITY DISAMBIGUATIONMinh Ngoc Le (VUA)

SYNSETS = CONCEPTS+ENTITIES

2

BabelFy: Joint disambiguation of word senses and named-entities help improve both.

PLAN

1. Learning synset embeddings from BabelNet+DBpedia

2. Use DBpedia Spotlight to detect mention and generate candidates

3. Use cosine similarity as edge weight in coherence graph

DATASET

Using half of the synsets (for efficiency)

Synsets: 1,435,175 (1.4 million) coming from:

WordNet only: 8k (0.6%)

WordNet+Wikipedia: 32k (2.3%)

Wikipedia only: 1.39m (97%)

Unique relations: 13,146,719 (13 million)

MODEL

5

NEGATIVE SAMPLING

6

Positive examples (D):

1. 1 cat has_part tail

2. dog has_part tail

3. car has_part wheel

4. …

Negative examples (N):

5. cat has_part wheel

6. car has_part tail

7. car has_part tail

8. …

ACCURACY

WORDNET FREEBASE 15K BABELNET

MY MODEL - - 81.0

BILINEAR 84.1 87.7 -

NEURAL TENSOR 86.2 90.0 -

7

VECTOR SPACE

8

ZOOM 1

9

ZOOM 2

10

ZOOM 3

11

OOPS!

CONLL KORE50

DBPEDIASPOTLIGHT 54.9% 52.8%

MY MODEL 40.9% 37.8%

BABELFY 82.1% 72.5%

WHY?

1. Learning synset embeddings from BabelNet+DBpedia?

2. Use DBpedia Spotlight to detect mention and generate candidates

3. Use cosine similarity as edge weight in coherence graph

DEBUG

14

SIMLEX-999 MEN WORDSIM

MY MODEL 0.06 -0.00 0.11

DEBUG

MODEL SIMLEX MEN WORDSIM HITS@10

UNSTRUCTURED 0.43 0.55 0.42 35.3

TRANSE 0.42 0.41 0.39 75.4

SME (LINEAR) 0.19 0.13 0.07 65.1

SME (BILINEAR) 0.12 0.12 0.08 54.7

SE 0.03 0.02 -0.04 68.5

STATE-OF-THE-ART 0.52 0.80 0.81 -

DEBUG

MODEL SIMLEX MEN WORDSIM HITS@10

UNSTRUCTURED 1 1 1 5

TRANSE 2 2 2 1

SME (LINEAR) 3 3 4 3

SME (BILINEAR) 4 4 3 4

SE 5 5 5 2

DEBUG

MODEL SIMLEX MEN WORDSIM #PARAMETERS

UNSTRUCTURED 0.43 0.55 0.42 O(nek)

TRANSE 0.42 0.41 0.39 O(nek+nrk)

SME (LINEAR) 0.19 0.13 0.07 O(nek+nrk+4k2)

SME (BILINEAR) 0.12 0.12 0.08 O(nek+nrk+2k3)

SE 0.03 0.02 -0.04 O(nek+2nrk2)

LESSON?

Careful with generalization

Don’t trust your eyes (do more than one type of evaluation)

We need better models of synset embeddings that can work well in link prediction and similarity/relatedness assessment

Thank you!

Science

CLIN 25