Upload
others
View
13
Download
0
Embed Size (px)
Citation preview
Methodology
Illud: Utilizing Semantic Similarity for Image Search
Team Members: Kristene Aguinaldo, Seerat Aziz, and Kristian Wu
Advisor: Jorge Ortiz, Department of Electrical & Computer Engineering
Introduction
References
Results
Doc2vec
Conceptual Captions(3.3M [caption, image] pairs)
LDA Topic Model
Caption URL Vector Topic Neighborhood
Parse Document
Apply LDA Topic Model
StaySenseCosine Similarity on
Records in Topic
Execution Pipeline
Data Management Pipeline
Search engines commonly use properties such askey words to query and return the mostappropriate results. However, this procedure doesnot always return the most relevant results. For thisreason, our project explores the use of naturallanguage processing to enhance image search asnumerous image captioning datasets are available.Through this project, we seek to:
• Bridge the gap between visual and textualcommunication
• Make texts more digestible by breaking themdown and finding relevant images
AcknowledgementsWe would like to thank our advisor, Professor Jorge Ortiz, for hisinput and guidance through this project. We would also like to thankour friends and family for their support and response to our project.Lastly, we would to thank the ECE department for making thisproject happen.
LDA Topic
Model -
Intertopic
Distance Map Challenges and Future Steps
A musical instrument is a device created to make musical sounds Anything that makes asound can be used as a musical instrument The history of musical instruments goes backto the beginning of culture People first used instruments as ritual a hunter might use atrumpet to signal a successful hunt a drum might be used in a religious ceremonyCultures later composed and performed a set of sounds called a melody forentertainment Musical instruments were needed Some historians report that theearliest musical instrument was a simple flute. Many of the earliest musical instrumentswere made from animal skins bone wood and other non-durable materials
Snippet of a Sample Text: Musical Instruments
Resulting Images (Decreasing Cosine Similarity)
Version 1 Challenges• Postgres SQL very slow with querying results from
database (stored all 3.3 million records)• Difficult to use KNN to search high dimensional
vectorsVersion 2 Challenges• AWS Elasticsearch (ES) service does not allow the
ability to install custom plugins• AWS ES stores 3.1 million results (35 GB maximum)• EC2 instance did not have enough storage to store
conceptual captions in local ES index Dataset Challenges• Short captions and LDA model resulted in many
overlapping clustersNext Steps • Generate multiple captions per image so that LDA
model is more robust (short captions → noise)• Check correlation between image features and caption
1. Q. Le, T. Mikolov. 2014. Distributed Representations of Sentences and Documents. In Proceedings of ICML 2014.
2. T. Doll, “LDA Topic Modeling,” Towards Data Science, 24-Jun-2018. [Online]. Available: https://towardsdatascience.com/lda-topic-modeling-an-explanation-e184c90aadcd. [Accessed: 20-Apr-2019].
3. https://github.com/StaySense/fast-cosine-similarity
Fig1 LDA Clusters
Fig2 Doc2Vec Algorithm [1]
Fig3 Fig4
Fig5
• Quality of output was acquired through asurvey answered by 54 people, rating therelevance between the paragraph-to-image and paragraph-to-caption from 1 –4 (Fig 3)
• Survey participants responded thatimages seemed more relevant than theircorresponding captions (Fig 5)
Fig 6
• Doc2Vec: utilizes paragraph vectors forpredicting words in a paragraph and providingthe context of the paragraph [1]
• LDA Topic Model: statistical model forclassifying text in a document to a set of topics[2]
• StaySense: fast vector scoring on ElasticSearch6.4.x+ using vector embeddings. [3]