Illud: Utilizing Semantic Similarity for Image Search · Topic Model StaySense Cosine Similarity on...

Methodology

Illud: Utilizing Semantic Similarity for Image Search

Team Members: Kristene Aguinaldo, Seerat Aziz, and Kristian Wu

Advisor: Jorge Ortiz, Department of Electrical & Computer Engineering

Introduction

References

Results

Doc2vec

Conceptual Captions(3.3M [caption, image] pairs)

LDA Topic Model

Caption URL Vector Topic Neighborhood

Parse Document

Apply LDA Topic Model

StaySenseCosine Similarity on

Records in Topic

Execution Pipeline

Data Management Pipeline

Search engines commonly use properties such askey words to query and return the mostappropriate results. However, this procedure doesnot always return the most relevant results. For thisreason, our project explores the use of naturallanguage processing to enhance image search asnumerous image captioning datasets are available.Through this project, we seek to:

• Bridge the gap between visual and textualcommunication

• Make texts more digestible by breaking themdown and finding relevant images

AcknowledgementsWe would like to thank our advisor, Professor Jorge Ortiz, for hisinput and guidance through this project. We would also like to thankour friends and family for their support and response to our project.Lastly, we would to thank the ECE department for making thisproject happen.

LDA Topic

Model -

Intertopic

Distance Map Challenges and Future Steps

A musical instrument is a device created to make musical sounds Anything that makes asound can be used as a musical instrument The history of musical instruments goes backto the beginning of culture People first used instruments as ritual a hunter might use atrumpet to signal a successful hunt a drum might be used in a religious ceremonyCultures later composed and performed a set of sounds called a melody forentertainment Musical instruments were needed Some historians report that theearliest musical instrument was a simple flute. Many of the earliest musical instrumentswere made from animal skins bone wood and other non-durable materials

Snippet of a Sample Text: Musical Instruments

Resulting Images (Decreasing Cosine Similarity)

Version 1 Challenges• Postgres SQL very slow with querying results from

database (stored all 3.3 million records)• Difficult to use KNN to search high dimensional

vectorsVersion 2 Challenges• AWS Elasticsearch (ES) service does not allow the

ability to install custom plugins• AWS ES stores 3.1 million results (35 GB maximum)• EC2 instance did not have enough storage to store

conceptual captions in local ES index Dataset Challenges• Short captions and LDA model resulted in many

overlapping clustersNext Steps • Generate multiple captions per image so that LDA

model is more robust (short captions → noise)• Check correlation between image features and caption

1. Q. Le, T. Mikolov. 2014. Distributed Representations of Sentences and Documents. In Proceedings of ICML 2014.

2. T. Doll, “LDA Topic Modeling,” Towards Data Science, 24-Jun-2018. [Online]. Available: https://towardsdatascience.com/lda-topic-modeling-an-explanation-e184c90aadcd. [Accessed: 20-Apr-2019].

3. https://github.com/StaySense/fast-cosine-similarity

Fig1 LDA Clusters

Fig2 Doc2Vec Algorithm [1]

Fig3 Fig4

• Quality of output was acquired through asurvey answered by 54 people, rating therelevance between the paragraph-to-image and paragraph-to-caption from 1 –4 (Fig 3)

• Survey participants responded thatimages seemed more relevant than theircorresponding captions (Fig 5)

• Doc2Vec: utilizes paragraph vectors forpredicting words in a paragraph and providingthe context of the paragraph [1]

• LDA Topic Model: statistical model forclassifying text in a document to a set of topics[2]

• StaySense: fast vector scoring on ElasticSearch6.4.x+ using vector embeddings. [3]

Illud: Utilizing Semantic Similarity for Image Search · Topic Model StaySense Cosine Similarity on...

Documents

String Similarity Join With Diﬀerent Similarity Thresholds ...ynsilva/publications/StringSimilarityJoin.pdf · String Similarity Join With Diﬀerent Similarity Thresholds Based

Similarity 50 40 30 20 10 Indirect Measurement SSS~, SAS~, AA~ Similarity Ratio Similarity Statements Proportions

Exploring the similarity between Social Knowledge Sources and Twitter for Cross-Domain Topic Classification of Tweets #KECSM 2012 #ISWC2012

Chapter 6: Similarity Perform Similarity Transformations 6.7

Similarity Checker ‘turn it in’ Guide for self-checking...⑤Viewing a similarity report • Result: Percentage of similarity ※A first similarity reports is usually available

A Pipeline for Distributed Topic and Sentiment Analysis of Tweets on Pivotal Greenplum Database

Latent Topic Similarity for Music Retrieval and Its

Do you know the topic today? §What is the similarity among these drinks?

Offshore Pipelines - catatanabimanyu · pipeline development process has become a vitally important topic for achieving cost-effective management in offshore and deepwater pipeline

1 Lesson 3.4.6 Congruence and Similarity Congruence and Similarity

Learning Similarity Functions for Topic Detection in Online Reputation Monitoring

Using Semantic Similarity for Input Topic …Using Semantic Similarity for Input Topic Identification in Crawling-based Web Application Testing Jun-Wei Lin and Farn Wang Graduate Institute

Practice exam: corrections Modeling fractions (on board) Continue CIRCLES New topic: SIMILARITY

Predictive and Similarity Analytics for Healthcare · 2013. 11. 28. · Analytics Pipeline for Patient Similarity Baseline Similarity Factors combined using expert defined weights

Ohio’s State Tests...similarity. Use congruence and similarity criteria for triangles to solve ... similarity in terms of similarity transformations. ... and that the lateral face

INFORMATION EXTRACTIONENGINE FORSENTIMENT-TOPIC …...WEBPAGE PIPELINE Content extraction: Boilerplateremoval(comments, ads, teasersetc.) Rawtextextraction(withouthtmltags) Store metadata

Proving triangle similarity using sas and sss similarity

Ch. 11 - Similarity Class Notes. What Is Similarity?

The Minion Search Engine: Search, Text Similarity …€¢Configure: Indexing pipeline, tokenizers, postings types, dictionaries, weighting functions, ... Automatic document classification

Triangle Similarity: AA, SSS, SASTriangle Similarity: AA