1
Methodology Illud: Utilizing Semantic Similarity for Image Search Team Members: Kristene Aguinaldo, Seerat Aziz, and Kristian Wu Advisor: Jorge Ortiz, Department of Electrical & Computer Engineering Introduction References Results Doc2vec Conceptual Captions (3.3M [caption, image] pairs) LDA Topic Model Caption URL Vector Topic Neighborhood Parse Document Apply LDA Topic Model StaySense Cosine Similarity on Records in Topic Execution Pipeline Data Management Pipeline Search engines commonly use properties such as key words to query and return the most appropriate results. However, this procedure does not always return the most relevant results. For this reason, our project explores the use of natural language processing to enhance image search as numerous image captioning datasets are available. Through this project, we seek to: Bridge the gap between visual and textual communication Make texts more digestible by breaking them down and finding relevant images Acknowledgements We would like to thank our advisor, Professor Jorge Ortiz, for his input and guidance through this project. We would also like to thank our friends and family for their support and response to our project. Lastly, we would to thank the ECE department for making this project happen. LDA Topic Model - Intertopic Distance Map Challenges and Future Steps A musical instrument is a device created to make musical sounds Anything that makes a sound can be used as a musical instrument The history of musical instruments goes back to the beginning of culture People first used instruments as ritual a hunter might use a trumpet to signal a successful hunt a drum might be used in a religious ceremony Cultures later composed and performed a set of sounds called a melody for entertainment Musical instruments were needed Some historians report that the earliest musical instrument was a simple flute. Many of the earliest musical instruments were made from animal skins bone wood and other non-durable materials Snippet of a Sample Text: Musical Instruments Resulting Images (Decreasing Cosine Similarity) Version 1 Challenges Postgres SQL very slow with querying results from database (stored all 3.3 million records) Difficult to use KNN to search high dimensional vectors Version 2 Challenges AWS Elasticsearch (ES) service does not allow the ability to install custom plugins AWS ES stores 3.1 million results (35 GB maximum) EC2 instance did not have enough storage to store conceptual captions in local ES index Dataset Challenges Short captions and LDA model resulted in many overlapping clusters Next Steps Generate multiple captions per image so that LDA model is more robust (short captions noise) Check correlation between image features and caption 1. Q. Le, T. Mikolov. 2014. Distributed Representations of Sentences and Documents . In Proceedings of ICML 2014. 2. T. Doll, “LDA Topic Modeling,” Towards Data Science, 24-Jun-2018. [Online]. Available: https://towardsdatascience.com/lda-topic-modeling-an-explanation-e184c90aadcd. [Accessed: 20-Apr- 2019]. 3. https://github.com/StaySense/fast-cosine-similarity Fig1 LDA Clusters Fig2 Doc2Vec Algorithm [1] Fig3 Fig4 Fig5 Quality of output was acquired through a survey answered by 54 people, rating the relevance between the paragraph-to- image and paragraph-to-caption from 1 – 4 (Fig 3) Survey participants responded that images seemed more relevant than their corresponding captions (Fig 5) Fig 6 Doc 2Vec: utilizes paragraph vectors for predicting words in a paragraph and providing the context of the paragraph [1] LDA Topic Model: statistical model for classifying text in a document to a set of topics [2] StaySense : fast vector scoring on ElasticSearch 6.4.x+ using vector embeddings. [3]

Illud: Utilizing Semantic Similarity for Image Search · Topic Model StaySense Cosine Similarity on Records in Topic Execution Pipeline Data Management Pipeline Search engines commonly

  • Upload
    others

  • View
    13

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Illud: Utilizing Semantic Similarity for Image Search · Topic Model StaySense Cosine Similarity on Records in Topic Execution Pipeline Data Management Pipeline Search engines commonly

Methodology

Illud: Utilizing Semantic Similarity for Image Search

Team Members: Kristene Aguinaldo, Seerat Aziz, and Kristian Wu

Advisor: Jorge Ortiz, Department of Electrical & Computer Engineering

Introduction

References

Results

Doc2vec

Conceptual Captions(3.3M [caption, image] pairs)

LDA Topic Model

Caption URL Vector Topic Neighborhood

Parse Document

Apply LDA Topic Model

StaySenseCosine Similarity on

Records in Topic

Execution Pipeline

Data Management Pipeline

Search engines commonly use properties such askey words to query and return the mostappropriate results. However, this procedure doesnot always return the most relevant results. For thisreason, our project explores the use of naturallanguage processing to enhance image search asnumerous image captioning datasets are available.Through this project, we seek to:

• Bridge the gap between visual and textualcommunication

• Make texts more digestible by breaking themdown and finding relevant images

AcknowledgementsWe would like to thank our advisor, Professor Jorge Ortiz, for hisinput and guidance through this project. We would also like to thankour friends and family for their support and response to our project.Lastly, we would to thank the ECE department for making thisproject happen.

LDA Topic

Model -

Intertopic

Distance Map Challenges and Future Steps

A musical instrument is a device created to make musical sounds Anything that makes asound can be used as a musical instrument The history of musical instruments goes backto the beginning of culture People first used instruments as ritual a hunter might use atrumpet to signal a successful hunt a drum might be used in a religious ceremonyCultures later composed and performed a set of sounds called a melody forentertainment Musical instruments were needed Some historians report that theearliest musical instrument was a simple flute. Many of the earliest musical instrumentswere made from animal skins bone wood and other non-durable materials

Snippet of a Sample Text: Musical Instruments

Resulting Images (Decreasing Cosine Similarity)

Version 1 Challenges• Postgres SQL very slow with querying results from

database (stored all 3.3 million records)• Difficult to use KNN to search high dimensional

vectorsVersion 2 Challenges• AWS Elasticsearch (ES) service does not allow the

ability to install custom plugins• AWS ES stores 3.1 million results (35 GB maximum)• EC2 instance did not have enough storage to store

conceptual captions in local ES index Dataset Challenges• Short captions and LDA model resulted in many

overlapping clustersNext Steps • Generate multiple captions per image so that LDA

model is more robust (short captions → noise)• Check correlation between image features and caption

1. Q. Le, T. Mikolov. 2014. Distributed Representations of Sentences and Documents. In Proceedings of ICML 2014.

2. T. Doll, “LDA Topic Modeling,” Towards Data Science, 24-Jun-2018. [Online]. Available: https://towardsdatascience.com/lda-topic-modeling-an-explanation-e184c90aadcd. [Accessed: 20-Apr-2019].

3. https://github.com/StaySense/fast-cosine-similarity

Fig1 LDA Clusters

Fig2 Doc2Vec Algorithm [1]

Fig3 Fig4

Fig5

• Quality of output was acquired through asurvey answered by 54 people, rating therelevance between the paragraph-to-image and paragraph-to-caption from 1 –4 (Fig 3)

• Survey participants responded thatimages seemed more relevant than theircorresponding captions (Fig 5)

Fig 6

• Doc2Vec: utilizes paragraph vectors forpredicting words in a paragraph and providingthe context of the paragraph [1]

• LDA Topic Model: statistical model forclassifying text in a document to a set of topics[2]

• StaySense: fast vector scoring on ElasticSearch6.4.x+ using vector embeddings. [3]