Upload
minh-le-ngoc
View
78
Download
1
Tags:
Embed Size (px)
Citation preview
SYNSET EMBEDDINGS FOR NAMED ENTITY DISAMBIGUATIONMinh Ngoc Le (VUA)
SYNSETS = CONCEPTS+ENTITIES
2
BabelFy: Joint disambiguation of word senses and named-entities help improve both.
PLAN
1. Learning synset embeddings from BabelNet+DBpedia
2. Use DBpedia Spotlight to detect mention and generate candidates
3. Use cosine similarity as edge weight in coherence graph
DATASET
Using half of the synsets (for efficiency)
Synsets: 1,435,175 (1.4 million) coming from:
WordNet only: 8k (0.6%)
WordNet+Wikipedia: 32k (2.3%)
Wikipedia only: 1.39m (97%)
Unique relations: 13,146,719 (13 million)
MODEL
5
NEGATIVE SAMPLING
6
Positive examples (D):
1. 1 cat has_part tail
2. dog has_part tail
3. car has_part wheel
4. …
Negative examples (N):
5. cat has_part wheel
6. car has_part tail
7. car has_part tail
8. …
ACCURACY
WORDNET FREEBASE 15K BABELNET
MY MODEL - - 81.0
BILINEAR 84.1 87.7 -
NEURAL TENSOR 86.2 90.0 -
7
VECTOR SPACE
8
ZOOM 1
9
ZOOM 2
10
ZOOM 3
11
OOPS!
CONLL KORE50
DBPEDIASPOTLIGHT 54.9% 52.8%
MY MODEL 40.9% 37.8%
BABELFY 82.1% 72.5%
WHY?
1. Learning synset embeddings from BabelNet+DBpedia?
2. Use DBpedia Spotlight to detect mention and generate candidates
3. Use cosine similarity as edge weight in coherence graph
DEBUG
14
SIMLEX-999 MEN WORDSIM
MY MODEL 0.06 -0.00 0.11
DEBUG
MODEL SIMLEX MEN WORDSIM HITS@10
UNSTRUCTURED 0.43 0.55 0.42 35.3
TRANSE 0.42 0.41 0.39 75.4
SME (LINEAR) 0.19 0.13 0.07 65.1
SME (BILINEAR) 0.12 0.12 0.08 54.7
SE 0.03 0.02 -0.04 68.5
STATE-OF-THE-ART 0.52 0.80 0.81 -
DEBUG
MODEL SIMLEX MEN WORDSIM HITS@10
UNSTRUCTURED 1 1 1 5
TRANSE 2 2 2 1
SME (LINEAR) 3 3 4 3
SME (BILINEAR) 4 4 3 4
SE 5 5 5 2
DEBUG
MODEL SIMLEX MEN WORDSIM #PARAMETERS
UNSTRUCTURED 0.43 0.55 0.42 O(nek)
TRANSE 0.42 0.41 0.39 O(nek+nrk)
SME (LINEAR) 0.19 0.13 0.07 O(nek+nrk+4k2)
SME (BILINEAR) 0.12 0.12 0.08 O(nek+nrk+2k3)
SE 0.03 0.02 -0.04 O(nek+2nrk2)
LESSON?
Careful with generalization
Don’t trust your eyes (do more than one type of evaluation)
We need better models of synset embeddings that can work well in link prediction and similarity/relatedness assessment
Thank you!