23
紹介:Preferred Networks 野健太 ([email protected] ) 2014/11/22 EMNLP2014読み会@PFI

Learning Image Embeddings using Convolutional Neural Networks for Improved Multi-Modal Semantics

Embed Size (px)

DESCRIPTION

2014/11/22 EMNLP 2014 Reading @ PFI

Citation preview

2. (@delta2323_) 2012.3 PFI2014.10 PFN http://delta2323.github.io2 3. CNN DL3 4. + / + / + etc. NN 4 5. [Feng+ 10][Leong+ 11][Bruni+ 12][Roller+ 13] [Kiela+ 14] local feature(SIFT, HOG, PHOW)BOVW NN[Driancourt+ 90] Deep Learning RBM[Srivastava+ 12][Feng+ 13]Auto-encoder[Wu+ 13]RNN[Socher+ 14]Stacked Auto-encoder[Silberer+ 14] Deep Learning [Frome+ 13]BOVW[Lazaridou+ 14]5 6. 7CNN[Oquab+ 14]6 ImageNet : 12.5M images, 22K synsets ESPGame : 100K images (ave. 14tags/image), 20525 words 61446 7. ImageNetESP Game ESP Game1 ImageNet : golden retriever ESP Game : dog, golden retriever, grass, field, house, door ImageNetsynset7http://wordnetweb.princeton.edu/perl/webwn 8. golden retrieverESP GameImageNet8 9. ImageNet[Krizhevsky+ 12] 256x256 16224x224 128 ESPGame 224x224 128 0padding9 10. BOVW VLFeatDense SIFT k-means100cluster10 11. [Mikolov+ 13]log-linear skip-gram model Wikipedia(400M words) + British NationalCorpus(100M words) 10011 12. (CNN-Mean/BOVW-Mean) or (CNN-Max) 0L21 12 13. ImageNet NN=1000 Nhypernym ESPGame N13 14. WordSim353 (W353) : 353 word pairs (OPEC/Arafat/Maradona), (antecedent/credibility) MEN : 751 word, 3000 word pairs ESP Game(50 images) ImageNet W353-Relevant/MEN-Relevant ImageNet/ESP Game14 15. 2 15 16. (1)Ground Truth16 17. (1) ImageNet/ESP Game (BOVWCNN) MEN WordSim353 WordSim353multimodal17 18. (2)MENW353AllRelevant18 19. (2) =0.5 =0.5 WordSim353=19 20. (1)ImageNetESP GameW353-RelMEN-Rel20 21. (2) ImageNet potatoes/tomato (MEN)ImageNet king/queen(W353) ESP Game stock/market(W353) dessert/bread/fruit (MEN)21 22. GPU 100 vs. 6144 22 23. CNN 23