19
SYNSET EMBEDDINGS FOR NAMED ENTITY DISAMBIGUATION Minh Ngoc Le (VUA)

CLIN 25

Embed Size (px)

Citation preview

Page 1: CLIN 25

SYNSET EMBEDDINGS FOR NAMED ENTITY DISAMBIGUATIONMinh Ngoc Le (VUA)

Page 2: CLIN 25

SYNSETS = CONCEPTS+ENTITIES

2

BabelFy: Joint disambiguation of word senses and named-entities help improve both.

Page 3: CLIN 25

PLAN

1. Learning synset embeddings from BabelNet+DBpedia

2. Use DBpedia Spotlight to detect mention and generate candidates

3. Use cosine similarity as edge weight in coherence graph

Page 4: CLIN 25

DATASET

Using half of the synsets (for efficiency)

Synsets: 1,435,175 (1.4 million) coming from:

WordNet only: 8k (0.6%)

WordNet+Wikipedia: 32k (2.3%)

Wikipedia only: 1.39m (97%)

Unique relations: 13,146,719 (13 million)

Page 5: CLIN 25

MODEL

5

Page 6: CLIN 25

NEGATIVE SAMPLING

6

Positive examples (D):

1. 1 cat has_part tail

2. dog has_part tail

3. car has_part wheel

4. …

Negative examples (N):

5. cat has_part wheel

6. car has_part tail

7. car has_part tail

8. …

Page 7: CLIN 25

ACCURACY

WORDNET FREEBASE 15K BABELNET

MY MODEL - - 81.0

BILINEAR 84.1 87.7 -

NEURAL TENSOR 86.2 90.0 -

7

Page 8: CLIN 25

VECTOR SPACE

8

Page 9: CLIN 25

ZOOM 1

9

Page 10: CLIN 25

ZOOM 2

10

Page 11: CLIN 25

ZOOM 3

11

Page 12: CLIN 25

OOPS!

CONLL KORE50

DBPEDIASPOTLIGHT 54.9% 52.8%

MY MODEL 40.9% 37.8%

BABELFY 82.1% 72.5%

Page 13: CLIN 25

WHY?

1. Learning synset embeddings from BabelNet+DBpedia?

2. Use DBpedia Spotlight to detect mention and generate candidates

3. Use cosine similarity as edge weight in coherence graph

Page 14: CLIN 25

DEBUG

14

SIMLEX-999 MEN WORDSIM

MY MODEL 0.06 -0.00 0.11

Page 15: CLIN 25

DEBUG

MODEL SIMLEX MEN WORDSIM HITS@10

UNSTRUCTURED 0.43 0.55 0.42 35.3

TRANSE 0.42 0.41 0.39 75.4

SME (LINEAR) 0.19 0.13 0.07 65.1

SME (BILINEAR) 0.12 0.12 0.08 54.7

SE 0.03 0.02 -0.04 68.5

STATE-OF-THE-ART 0.52 0.80 0.81 -

Page 16: CLIN 25

DEBUG

MODEL SIMLEX MEN WORDSIM HITS@10

UNSTRUCTURED 1 1 1 5

TRANSE 2 2 2 1

SME (LINEAR) 3 3 4 3

SME (BILINEAR) 4 4 3 4

SE 5 5 5 2

Page 17: CLIN 25

DEBUG

MODEL SIMLEX MEN WORDSIM #PARAMETERS

UNSTRUCTURED 0.43 0.55 0.42 O(nek)

TRANSE 0.42 0.41 0.39 O(nek+nrk)

SME (LINEAR) 0.19 0.13 0.07 O(nek+nrk+4k2)

SME (BILINEAR) 0.12 0.12 0.08 O(nek+nrk+2k3)

SE 0.03 0.02 -0.04 O(nek+2nrk2)

Page 18: CLIN 25

LESSON?

Careful with generalization

Don’t trust your eyes (do more than one type of evaluation)

We need better models of synset embeddings that can work well in link prediction and similarity/relatedness assessment

Page 19: CLIN 25

Thank you!