Illud: Utilizing Semantic Similarity for Image Search · Topic Model StaySense Cosine Similarity on Records in Topic Execution Pipeline Data Management Pipeline Search engines commonly

Methodology

Illud: Utilizing Semantic Similarity for Image Search

Team Members: Kristene Aguinaldo, Seerat Aziz, and Kristian Wu

Advisor: Jorge Ortiz, Department of Electrical & Computer Engineering

Introduction

References

Results

Doc2vec

Conceptual Captions(3.3M [caption, image] pairs)

LDA Topic Model

Caption URL Vector Topic Neighborhood

Parse Document

Apply LDA Topic Model

StaySenseCosine Similarity on

Records in Topic

Execution Pipeline

Data Management Pipeline

Search engines commonly use properties such askey words to query and return the mostappropriate results. However, this procedure doesnot always return the most relevant results. For thisreason, our project explores the use of naturallanguage processing to enhance image search asnumerous image captioning datasets are available.Through this project, we seek to:

• Bridge the gap between visual and textualcommunication

• Make texts more digestible by breaking themdown and finding relevant images

AcknowledgementsWe would like to thank our advisor, Professor Jorge Ortiz, for hisinput and guidance through this project. We would also like to thankour friends and family for their support and response to our project.Lastly, we would to thank the ECE department for making thisproject happen.

LDA Topic

Model -

Intertopic

Distance Map Challenges and Future Steps

A musical instrument is a device created to make musical sounds Anything that makes asound can be used as a musical instrument The history of musical instruments goes backto the beginning of culture People first used instruments as ritual a hunter might use atrumpet to signal a successful hunt a drum might be used in a religious ceremonyCultures later composed and performed a set of sounds called a melody forentertainment Musical instruments were needed Some historians report that theearliest musical instrument was a simple flute. Many of the earliest musical instrumentswere made from animal skins bone wood and other non-durable materials

Snippet of a Sample Text: Musical Instruments

Resulting Images (Decreasing Cosine Similarity)

Version 1 Challenges• Postgres SQL very slow with querying results from

database (stored all 3.3 million records)• Difficult to use KNN to search high dimensional

vectorsVersion 2 Challenges• AWS Elasticsearch (ES) service does not allow the

ability to install custom plugins• AWS ES stores 3.1 million results (35 GB maximum)• EC2 instance did not have enough storage to store

conceptual captions in local ES index Dataset Challenges• Short captions and LDA model resulted in many

overlapping clustersNext Steps • Generate multiple captions per image so that LDA

model is more robust (short captions → noise)• Check correlation between image features and caption

1. Q. Le, T. Mikolov. 2014. Distributed Representations of Sentences and Documents. In Proceedings of ICML 2014.

2. T. Doll, “LDA Topic Modeling,” Towards Data Science, 24-Jun-2018. [Online]. Available: https://towardsdatascience.com/lda-topic-modeling-an-explanation-e184c90aadcd. [Accessed: 20-Apr-2019].

3. https://github.com/StaySense/fast-cosine-similarity

Fig1 LDA Clusters

Fig2 Doc2Vec Algorithm [1]

Fig3 Fig4

Fig5

• Quality of output was acquired through asurvey answered by 54 people, rating therelevance between the paragraph-to-image and paragraph-to-caption from 1 –4 (Fig 3)

• Survey participants responded thatimages seemed more relevant than theircorresponding captions (Fig 5)

Fig 6

• Doc2Vec: utilizes paragraph vectors forpredicting words in a paragraph and providingthe context of the paragraph [1]

• LDA Topic Model: statistical model forclassifying text in a document to a set of topics[2]

• StaySense: fast vector scoring on ElasticSearch6.4.x+ using vector embeddings. [3]

https://arxiv.org/pdf/1405.4053v2.pdf

https://github.com/StaySense/fast-cosine-similarity