Upload
others
View
5
Download
0
Embed Size (px)
Citation preview
1
Information Retrieval using Neural Networks
Vineeth Reddy Anugu
Computer Science Graduate Student
CMSC 676 Information Retrieval
University of Maryland, Baltimore County
2
Table of Contents
Abstract ........................................................................................................................................... 3
Introduction .................................................................................................................................... 3
Introduction to Neural Networks ................................................................................................... 5
Convolutional Neural Networks .................................................................................................. 5
Survey of Relevant Work ................................................................................................................ 6
Deep Relevance Matching Model ............................................................................................... 6
Wold2Vec Model ........................................................................................................................ 8
Compare and Contrast .................................................................................................................... 9
DRMM Comparison ..................................................................................................................... 9
Word2Vec vs FastText............................................................................................................... 11
Conclusion ..................................................................................................................................... 11
References .................................................................................................................................... 12
3
Abstract
The field of machine learning has been one that has seen a lot of development in recent
times due to the virtue of its various areas of application. A major part of the machine learning
field is the use of Neural Networks, which is constantly being researched and improved to enable
its use and application for various purposes. The various models available that use neural
networks provide amazing performances many tasks such as computer vision, speech recognition
and natural language processing. One such field where neural networks can be used is the field
of Information Retrieval. Some example of such applications is the use of neural network models
in tasks such as ‘Query suggestion’ and ‘Conversational Agents’ and many more. In this paper we
will be discussing some of the applications of neural networks that are well established in the
field of Information Retrieval and how these architectures fare against other implementations.
Introduction
For this term paper I will be discussing the aspects of automation in the field of
Information Retrieval. Some of the best methods of achieving this automation is by using various
techniques of machine learning. The most useful field of machine learning that can be used for
information retrieval purposes are Neural Networks. Specifically, a sub field known as Deep
Learning seems to present us great possibilities. This use of shallow and deep neural networks
for specific retrieval tasks is known as Neural Information Retrieval. Many of these Neural IR
techniques are very demanding when it comes to the size of data required as neural networks
feed off the data it is provided. These networks improve as the size of the data increases for its
4
training purposes. These techniques can be implemented by supervised or unsupervised means.
Some of the more well researched neural network architectures that can be used for information
retrieval are ‘Convolutional Neural Networks, ‘Generative Neural Networks’ and many more. In
this paper we will be discussing about the basic architectures of neural networks that are
implemented for Information Retrieval purposes by discussing some of the academic literature
written for them.
Neural Information Retrieval refers to the use of neural networks for the purpose of
retrieving information, both deep and shallow neural networks can be used for this process.
Many of these Neural IR techniques are very demanding when it comes to the size of data
required as neural networks feed off the data it is provided. These networks improve as the size
of the data increases for its training purposes. These techniques can be implemented by
supervised or unsupervised means. This requirement of huge amounts of data is very different
when compared to traditional information retrieval models who do not require as much data to
perform well, this is due to the fact that these neural information retrieval techniques get better
as more data is fed into them. Coming to the difference between the supervised and
unsupervised technique of neural information retrieval models, the supervised approach needs
data that is optimized for the given task. Usually this data is labelled query document pairs for
the specific task. If the data we possess is not in the form of query-document pairs, then the
model we implement will be an unsupervised one which only gains inferences based on either
query or document data exclusively.
5
Introduction to Neural Networks
A very important characteristic of neural networks that gives neural networks an
advantage in setting itself apart from conventional information retrieval techniques is its use of
raw data to learn from and work with to do the required task.
Convolutional Neural Networks
A Convolutional Neural Network defines a set of linear filters also known as kernels
connecting only spatially local regions of the input, greatly reducing computation. These filters
are known to extract locally occurring patterns. A CNN mainly is used for image related tasks such
as image recognition but given a specific input of data, the architecture can be modified to be
compatible with textual data as well. The architecture of CNN is very similar to a multi-layer
perceptron which also is a type of Neural Network. Convolutional Neural Networks usually do
not contain any cycles in them, due to this they do not remember any of the input data may have
passed through it previously. Due to this nature, they are called feed-forward networks.
Fig 1: CNN Architecture[15]
6
Survey of Relevant Work
In this section we will be taking a look at three different Neural Information Retrieval
Techniques that have been well researched and proven to be very beneficial for the purposes of
Information retrieval while making us of neural networks as their architecture. Even though it
well known that anything relating to the machine learning field and especially neural networks
requires the understanding of the very complex math behind those techniques. But for the
purpose of this paper we will be focusing on their design aspect of their architectures.
Deep Relevance Matching Model
In order to get a good understanding of the Deep Relevance Matching Model also known
as DRMM, we will be taking a look at a recent 2017 paper by Jiafeng Guo et. al. [16] called A Deep
Relevance Matching Model for Ad-hoc retrieval. While most neural networks are designed for
Natural Language Processing, this paper discusses the use of relevance matching for Ad-hoc
Retrieval. There are two types of model that can achieve ad-hoc retrieval and they are
representation-based and interaction-based models. The approach discussed in this paper is an
interaction-based one, therefore we will first build local interactions between two texts based on
some basic representations. And then use deep neural networks to learn the hierarchical
patterns for matching. The deep neural network for us will be a feed-forward neural network
with many neurons which is similar to a Convolutional Neural Network.
7
Fig 2: DRMM Architecture [16]
We can observe from the above architecture that we are using a Matching Histogram
Mapping as the input for our network which his achieved by finding local interactions between
each pair of terms from a query and a document. After the input has been fed into our network,
the model is trained using hinge loss as the goal of ad-hoc retrieval is to rank terms based on
relevance and also made use of a term gating network to explicitly model query-term importance
and produce aggregate weights for the query terms which contributes in calculating the final
relevance scores. To train the model, they implemented a pairwise ranking loss and for
optimization stochastic gradient descent with mini batches of size 20 have been applied so that
the model can be easily parallelized on a single machine with multi-cores. After the training is
done, the relevance score of the query can be calculated with the given set of documents.
8
Wold2Vec Model
World2Vec is a model that is discussed in the paper by Tomas Mikolov et. al. 2013 [4]
called “Efficient Estimation of Word Representations in Vector Space”. This model is used to learn
embeddings which are nothing but vector representation of words. These embeddings or learned
vectors are then fed into a neural network based on their requirements. Noise introduced into
textual data can be critical as any addition or removal of words can completely change the
meaning of the data which is different to image or video data where does not play such a major
role. Two methods can be used in order to retrieve the above discussed embeddings using the
Word2Vec model, which are the skip-gram model and the Continuous bag-of-words(CBOW)
model.
Fig 3: Word2Vec model architectures [4]
9
The main difference between the two models is that skip-gram model is designed to
predict context based on words whereas the CBOW model is designed to predict words based on
context. Thus, these models are used in a very application specific manner as they perform
essentially different tasks. We can observe from the image above that these architectures closely
represent a Shallow Neural Network with one hidden layer and one neuron. In order to utilize
these models, the input must be transformed into binary vectors as a lot of mathematical
functions run in the background of a neural network which cannot be done using string inputs.
After this conversion, the model is trained on the Maximum Likelihood Principle with SoftMax
and Negative Sampling which resembles the loss implemented in the popular TensorFlow
package. After training this model on the required data, we get the embeddings as our output
based on the model we choose.
Compare and Contrast
In this section we will be comparing the model we discussed in the previous section to
other models that are currently in use.
DRMM Comparison
Experiments have been conducted on the DRMM model using two collections, namely,
ClueWeb-09-Cat-B and Robust04. These are large datasets containing and average of 200
queries. The DRMM model is compared with Traditional IR models such as BM25 and QL models,
representation-based models such as DSSMT /DSSMD, C-DSSMT /C-DSSMD and ARC-I and
interaction-based models like ARC-II and MP-MatchPyramid. These models are compared on
10
metrics like Mean Average Precision(MAP) , normalized Discounted Cumulative Gain at rank 20
(nDCG@20), and Precision at Rank 20 (P@20). The results can be seen below for both the
datasets.
Fig 4: DRMM Comparison with other models [16]
11
Word2Vec vs FastText
Similar to Word2Vec, there is an industry used library called FastText developed by Facebook
which is known as an extension of Word2Vec. Unlike Word2Vec, obtains breaks the word into
several n-grams instead of obtaining embeddings. One big difference with FastText is that it can
be used to generate embeddings of rare and out of vocabulary words by constructing n-grams.
The main difference between these two libraries can be noticed on a task specific basis as one
might work better than the other in different contexts. Thus, both models should be considered
and then chosen based on the requirement.
Conclusion
In this paper e discuss various technique where neural networks are used for the purpose of
Information Retrieval. We also discussed how they compare against information retrieval
methods that have already been in use and other methods that are under research.
12
References
[1] Bhaskar Mitra and Nick Craswell. 2017. Neural Models for Information Retrieval. CoRR
abs/1705.01509 (2017). arXiv:1705.01509
[2] Ye Zhang, Md Mustafizur Rahman, Alex Braylan, Brandon Dang, Heng- Lu Chang, Henna Kim,
Quinten McNamara, Aaron Angert, Edward Banner, Vivek Khetan, et al. 2016. Neural information
retrieval: A literature review. arXiv preprint arXiv:1611.06792.
[3] Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner. Gradient-based learning applied to document
recognition. Proceedings of the IEEE, November 1998.
[4] Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. Efficient estimation of word
representations in vector space. In: ICLR: Proceeding of the International Conference on Learning
Representations Workshop Track.
[5] J. Guo, Y. Fan, Q. Ai, W. B. Croft, A deep relevance matching model for ad-hoc retrieval, in:
Proceedings of the 25th ACM International on Conference on Information and Knowledge
Management, CIKM 16, ACM, New York, NY, USA, 2016.
[6] J. Duchi, E. Hazan, and Y. Singer. Adaptive subgradient methods for online learning and
stochastic optimization. JMLR, 12:21212159, 2011.
[7] R. C. S. L. L. Giles. Overfitting in neural nets: Backpropagation, conjugate gradient, and early
stopping. In NIPS, volume 13, page 402. MIT Press, 2001.
[8] M. Shokouhi. Learning to personalize query auto-completion. In Proc. SIGIR, pages 103112,
2013.
13
[9] A. Joulin, E. Grave, P. Bojanowski, M. Douze, H. Jgou, T. Mikolov, Fasttext. zip: Compressing
text classification models, 2016.
[10] C. Zhai and J. Lafferty. A study of smoothing methods for language models applied to ad hoc
information retrieval. In SIGIR, pages 334342. ACM, 2001.
[11] S. E. Robertson and S. Walker. Some simple effective approximations to the 2-poisson model
for probabilistic weighted retrieval. In SIGIR, pages 232241. ACM, 1994.
[12] P.-S. Huang, X. He, J. Gao, L. Deng, A. Acero, and L. Heck. Learning deep structured semantic
models for web search using clickthrough data. In CIKM, pages 23332338. ACM, 2013.
[13] Y. Shen, X. He, J. Gao, L. Deng, and G. Mesnil. Learning semantic representations using
convolutional neural networks for web search. In WWW, pages 373374, 2014.
[14] B. Hu, Z. Lu, H. Li, and Q. Chen. Convolutional neural network architectures for matching
natural language sentences. In NIPS, pages 20422050, 2014.
[15]https://missinglink.ai/guides/convolutional-neural-networks/convolutional-neural-
network-architecture-forging-pathways-future/
[16] Jiafeng Guo, Yixing Fan, Qingyao Ai, and W. Bruce Croft. 2016. A Deep Relevance Matching
Model for Ad-hoc Retrieval. In Proceedings of the 25th ACM International on Conference on
Information and Knowledge Management (CIKM ’16). Association for Computing Machinery,
New York, NY, USA, 55–64. DOI:https://doi.org/10.1145/2983323.2983769