Information Retrieval using Neural Networks

1

Information Retrieval using Neural Networks

Vineeth Reddy Anugu

Computer Science Graduate Student

CMSC 676 Information Retrieval

University of Maryland, Baltimore County

2

Table of Contents

Abstract ........................................................................................................................................... 3

Introduction .................................................................................................................................... 3

Introduction to Neural Networks ................................................................................................... 5

Convolutional Neural Networks .................................................................................................. 5

Survey of Relevant Work ................................................................................................................ 6

Deep Relevance Matching Model ............................................................................................... 6

Wold2Vec Model ........................................................................................................................ 8

Compare and Contrast .................................................................................................................... 9

DRMM Comparison ..................................................................................................................... 9

Word2Vec vs FastText............................................................................................................... 11

Conclusion ..................................................................................................................................... 11

References .................................................................................................................................... 12

3

Abstract

The field of machine learning has been one that has seen a lot of development in recent

times due to the virtue of its various areas of application. A major part of the machine learning

field is the use of Neural Networks, which is constantly being researched and improved to enable

its use and application for various purposes. The various models available that use neural

networks provide amazing performances many tasks such as computer vision, speech recognition

and natural language processing. One such field where neural networks can be used is the field

of Information Retrieval. Some example of such applications is the use of neural network models

in tasks such as ‘Query suggestion’ and ‘Conversational Agents’ and many more. In this paper we

will be discussing some of the applications of neural networks that are well established in the

field of Information Retrieval and how these architectures fare against other implementations.

Introduction

For this term paper I will be discussing the aspects of automation in the field of

Information Retrieval. Some of the best methods of achieving this automation is by using various

techniques of machine learning. The most useful field of machine learning that can be used for

information retrieval purposes are Neural Networks. Specifically, a sub field known as Deep

Learning seems to present us great possibilities. This use of shallow and deep neural networks

for specific retrieval tasks is known as Neural Information Retrieval. Many of these Neural IR

techniques are very demanding when it comes to the size of data required as neural networks

feed off the data it is provided. These networks improve as the size of the data increases for its

4

training purposes. These techniques can be implemented by supervised or unsupervised means.

Some of the more well researched neural network architectures that can be used for information

retrieval are ‘Convolutional Neural Networks, ‘Generative Neural Networks’ and many more. In

this paper we will be discussing about the basic architectures of neural networks that are

implemented for Information Retrieval purposes by discussing some of the academic literature

written for them.

Neural Information Retrieval refers to the use of neural networks for the purpose of

retrieving information, both deep and shallow neural networks can be used for this process.

Many of these Neural IR techniques are very demanding when it comes to the size of data

required as neural networks feed off the data it is provided. These networks improve as the size

of the data increases for its training purposes. These techniques can be implemented by

supervised or unsupervised means. This requirement of huge amounts of data is very different

when compared to traditional information retrieval models who do not require as much data to

perform well, this is due to the fact that these neural information retrieval techniques get better

as more data is fed into them. Coming to the difference between the supervised and

unsupervised technique of neural information retrieval models, the supervised approach needs

data that is optimized for the given task. Usually this data is labelled query document pairs for

the specific task. If the data we possess is not in the form of query-document pairs, then the

model we implement will be an unsupervised one which only gains inferences based on either

query or document data exclusively.

5

Introduction to Neural Networks

A very important characteristic of neural networks that gives neural networks an

advantage in setting itself apart from conventional information retrieval techniques is its use of

raw data to learn from and work with to do the required task.

Convolutional Neural Networks

A Convolutional Neural Network defines a set of linear filters also known as kernels

connecting only spatially local regions of the input, greatly reducing computation. These filters

are known to extract locally occurring patterns. A CNN mainly is used for image related tasks such

as image recognition but given a specific input of data, the architecture can be modified to be

compatible with textual data as well. The architecture of CNN is very similar to a multi-layer

perceptron which also is a type of Neural Network. Convolutional Neural Networks usually do

not contain any cycles in them, due to this they do not remember any of the input data may have

passed through it previously. Due to this nature, they are called feed-forward networks.

Fig 1: CNN Architecture[15]

6

Survey of Relevant Work

In this section we will be taking a look at three different Neural Information Retrieval

Techniques that have been well researched and proven to be very beneficial for the purposes of

Information retrieval while making us of neural networks as their architecture. Even though it

well known that anything relating to the machine learning field and especially neural networks

requires the understanding of the very complex math behind those techniques. But for the

purpose of this paper we will be focusing on their design aspect of their architectures.

Deep Relevance Matching Model

In order to get a good understanding of the Deep Relevance Matching Model also known

as DRMM, we will be taking a look at a recent 2017 paper by Jiafeng Guo et. al. [16] called A Deep

Relevance Matching Model for Ad-hoc retrieval. While most neural networks are designed for

Natural Language Processing, this paper discusses the use of relevance matching for Ad-hoc

Retrieval. There are two types of model that can achieve ad-hoc retrieval and they are

representation-based and interaction-based models. The approach discussed in this paper is an

interaction-based one, therefore we will first build local interactions between two texts based on

some basic representations. And then use deep neural networks to learn the hierarchical

patterns for matching. The deep neural network for us will be a feed-forward neural network

with many neurons which is similar to a Convolutional Neural Network.

7

Fig 2: DRMM Architecture [16]

We can observe from the above architecture that we are using a Matching Histogram

Mapping as the input for our network which his achieved by finding local interactions between

each pair of terms from a query and a document. After the input has been fed into our network,

the model is trained using hinge loss as the goal of ad-hoc retrieval is to rank terms based on

relevance and also made use of a term gating network to explicitly model query-term importance

and produce aggregate weights for the query terms which contributes in calculating the final

relevance scores. To train the model, they implemented a pairwise ranking loss and for

optimization stochastic gradient descent with mini batches of size 20 have been applied so that

the model can be easily parallelized on a single machine with multi-cores. After the training is

done, the relevance score of the query can be calculated with the given set of documents.

8

Wold2Vec Model

World2Vec is a model that is discussed in the paper by Tomas Mikolov et. al. 2013 [4]

called “Efficient Estimation of Word Representations in Vector Space”. This model is used to learn

embeddings which are nothing but vector representation of words. These embeddings or learned

vectors are then fed into a neural network based on their requirements. Noise introduced into

textual data can be critical as any addition or removal of words can completely change the

meaning of the data which is different to image or video data where does not play such a major

role. Two methods can be used in order to retrieve the above discussed embeddings using the

Word2Vec model, which are the skip-gram model and the Continuous bag-of-words(CBOW)

model.

Fig 3: Word2Vec model architectures [4]

9

The main difference between the two models is that skip-gram model is designed to

predict context based on words whereas the CBOW model is designed to predict words based on

context. Thus, these models are used in a very application specific manner as they perform

essentially different tasks. We can observe from the image above that these architectures closely

represent a Shallow Neural Network with one hidden layer and one neuron. In order to utilize

these models, the input must be transformed into binary vectors as a lot of mathematical

functions run in the background of a neural network which cannot be done using string inputs.

After this conversion, the model is trained on the Maximum Likelihood Principle with SoftMax

and Negative Sampling which resembles the loss implemented in the popular TensorFlow

package. After training this model on the required data, we get the embeddings as our output

based on the model we choose.

Compare and Contrast

In this section we will be comparing the model we discussed in the previous section to

other models that are currently in use.

DRMM Comparison

Experiments have been conducted on the DRMM model using two collections, namely,

ClueWeb-09-Cat-B and Robust04. These are large datasets containing and average of 200

queries. The DRMM model is compared with Traditional IR models such as BM25 and QL models,

representation-based models such as DSSMT /DSSMD, C-DSSMT /C-DSSMD and ARC-I and

interaction-based models like ARC-II and MP-MatchPyramid. These models are compared on

10

metrics like Mean Average Precision(MAP) , normalized Discounted Cumulative Gain at rank 20

(nDCG@20), and Precision at Rank 20 (P@20). The results can be seen below for both the

datasets.

Fig 4: DRMM Comparison with other models [16]

11

Word2Vec vs FastText

Similar to Word2Vec, there is an industry used library called FastText developed by Facebook

which is known as an extension of Word2Vec. Unlike Word2Vec, obtains breaks the word into

several n-grams instead of obtaining embeddings. One big difference with FastText is that it can

be used to generate embeddings of rare and out of vocabulary words by constructing n-grams.

The main difference between these two libraries can be noticed on a task specific basis as one

might work better than the other in different contexts. Thus, both models should be considered

and then chosen based on the requirement.

Conclusion

In this paper e discuss various technique where neural networks are used for the purpose of

Information Retrieval. We also discussed how they compare against information retrieval

methods that have already been in use and other methods that are under research.

12

References

[1] Bhaskar Mitra and Nick Craswell. 2017. Neural Models for Information Retrieval. CoRR

abs/1705.01509 (2017). arXiv:1705.01509

[2] Ye Zhang, Md Mustafizur Rahman, Alex Braylan, Brandon Dang, Heng- Lu Chang, Henna Kim,

Quinten McNamara, Aaron Angert, Edward Banner, Vivek Khetan, et al. 2016. Neural information

retrieval: A literature review. arXiv preprint arXiv:1611.06792.

[3] Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner. Gradient-based learning applied to document

recognition. Proceedings of the IEEE, November 1998.

[4] Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. Efficient estimation of word

representations in vector space. In: ICLR: Proceeding of the International Conference on Learning

Representations Workshop Track.

[5] J. Guo, Y. Fan, Q. Ai, W. B. Croft, A deep relevance matching model for ad-hoc retrieval, in:

Proceedings of the 25th ACM International on Conference on Information and Knowledge

Management, CIKM 16, ACM, New York, NY, USA, 2016.

[6] J. Duchi, E. Hazan, and Y. Singer. Adaptive subgradient methods for online learning and

stochastic optimization. JMLR, 12:21212159, 2011.

[7] R. C. S. L. L. Giles. Overfitting in neural nets: Backpropagation, conjugate gradient, and early

stopping. In NIPS, volume 13, page 402. MIT Press, 2001.

[8] M. Shokouhi. Learning to personalize query auto-completion. In Proc. SIGIR, pages 103112,

2013.

13

[9] A. Joulin, E. Grave, P. Bojanowski, M. Douze, H. Jgou, T. Mikolov, Fasttext. zip: Compressing

text classification models, 2016.

[10] C. Zhai and J. Lafferty. A study of smoothing methods for language models applied to ad hoc

information retrieval. In SIGIR, pages 334342. ACM, 2001.

[11] S. E. Robertson and S. Walker. Some simple effective approximations to the 2-poisson model

for probabilistic weighted retrieval. In SIGIR, pages 232241. ACM, 1994.

[12] P.-S. Huang, X. He, J. Gao, L. Deng, A. Acero, and L. Heck. Learning deep structured semantic

models for web search using clickthrough data. In CIKM, pages 23332338. ACM, 2013.

[13] Y. Shen, X. He, J. Gao, L. Deng, and G. Mesnil. Learning semantic representations using

convolutional neural networks for web search. In WWW, pages 373374, 2014.

[14] B. Hu, Z. Lu, H. Li, and Q. Chen. Convolutional neural network architectures for matching

natural language sentences. In NIPS, pages 20422050, 2014.

[15]https://missinglink.ai/guides/convolutional-neural-networks/convolutional-neural-

network-architecture-forging-pathways-future/

[16] Jiafeng Guo, Yixing Fan, Qingyao Ai, and W. Bruce Croft. 2016. A Deep Relevance Matching

Model for Ad-hoc Retrieval. In Proceedings of the 25th ACM International on Conference on

Information and Knowledge Management (CIKM ’16). Association for Computing Machinery,

New York, NY, USA, 55–64. DOI:https://doi.org/10.1145/2983323.2983769

https://missinglink.ai/guides/convolutional-neural-networks/convolutional-neural-network-architecture-forging-pathways-future/

https://missinglink.ai/guides/convolutional-neural-networks/convolutional-neural-network-architecture-forging-pathways-future/

Documents

Information Retrieval using Neural Networks