Reasoning Over Knowledge Base

Reasoning Over Knowledge BaseInformation Extraction and Retrieval Project

Surbhi Gupta201505533

Shubham Agarwal201330144

Darshan Jaju201301098

Team No. 15

Abstract➔ Reasoning with Neural Tensor Networks for Knowledge Base Completio

n has become something of a seminal paper in the short span of three years, cited by nearly every knowledge base completion (KBC) paper since it's publication in 2013. It was one of the first major successful forays into the field of Deep Learning approaches to knowledge base completion, and was unique for using deep learning "end to end".

➔ We reimplemented the paper’s architecture in Torch framework, achieving similar accuracy results with an elegant implementation in a modern language.

➔ Here we work on the goal of predicting the likely truth of additional facts based on existing facts in a KB(knowledge base) and giving the system reasoning capability. For instance, when told that a new species of monkeys has been discovered, a person does not need to find textual evidence to know that this new monkey, too, will have legs.

http://papers.nips.cc/paper/5028-reasoning-with-neural-tensor-networks-for-knowledge-base-completion.pdf

http://papers.nips.cc/paper/5028-reasoning-with-neural-tensor-networks-for-knowledge-base-completion.pdf

Neural Tensor Networks for Knowledge Base (KB) Completion

➔ A knowledge base is a representation of factual knowledge. A KB characteristically suffer from incompleteness, in the form of missing edges. If A is related to B, and B is related to C, oftentimes A is related to C, but KB often don't have these relations explicitly listed, because they're simply common sense. This is the "common sense" often missing in Artificially Intelligent systems, especially question answering systems which rely on KB heavily. Hence, the goal is to develop a model that assigns a high likelihood to ("MIT", "located in", "Massachusetts") but a low likelihood of ("MIT", "located in", "Belize"), a model that recognizes that some facts hold purely because of other relations in the KB.

➔ Socher introduces "Neural Tensor Networks" which differ from regular neural networks due to the way they relate entities directly to one another with the bilinear tensor product, which is the core operation in a Neural Tensor Network.

Entity Representation Neural Tensor Network Training

Project Pipeline/Architecture

Entity Representation - A word2vec approach

➔ One of the insights from this paper is that entities can be represented as some function of their constituent words, which provides for the sharing of statistical strength between similar entities. For example, "African Elephant" and "Asian Elephant" might have much in common, but previous approaches oftentimes embedded each of these separately. We embed each word ("African", "Asian", and "Elephant") and then build representations for entities as the average of those entities' constituent word vectors.

➔ Word embedding vectors may be initialized randomly, but the model benefits strongly from using pre-trained word embeddings (e.g. by word2vec) which already have some structure built into them. However, once we have vectors representing words, we pass gradients back to them, meaning that those embeddings are by no means static, or final.

Entity Representation - Visualization

The visualization (on FreeBase database) reveals some aspects of the vector space. First, the universe of the problem is divided into relations and persons (notice that nothing told the model about that, it learned from data). Second, two families are clearly separated. With some exceptions, females are to the right and males are to the left in each family. It seems that family and gender are the most informative dimensions in this problem.

Neural Tensor Network (NTN) - The Architecture

➔ The goal of our approach is to be able to state whether two entities (e1 , e2 ) are in a certain relationship R. For instance, whether the relationship (e1 , R, e2 ) = (Bengal tiger, has part, tail ) is true and with what certainty. To this end, we define a set of parameters indexed by R for each relation’s scoring function. Let e1 , e2 ∈ R be the vector representations (or features) of the two entities. For now we can assume that each value of this vector is randomly initialized to a small uniformly random number.

➔ The NTN replaces a standard linear neural network layer with a bilinear tensor layer that directly relates the two entity vectors across multiple dimensions. The model computes a score of how likely it is that two entities are in a certain relationship by the following NTN-based function:

Neural Tensor Network (NTN) - Visualization

Visualization of the Neural Tensor Network. Each dashed box represents one slice of the tensor, in this case there are k = 2 slices.

Neural Tensor Network (NTN) - Training

➔ All models are trained with contrastive max-margin objective functions. The main idea is that each triplet in the training set Ti = (ei

1 , Ri , ei2 )

should receive a higher score than a triplet in which one of the entities is replaced with a random entity.

➔ Each relation has its associated neural tensor net parameters. We call the corrupted triplet as Tc

i = (ei1 , Ri , ec ), where we sampled entity ec randomly

from the set of all entities that can appear at that position in that relation.

➔ We minimize the following objective:

where N is the number of training triplets and we score the correct relation triplet higher than its corrupted one up to a margin of 1.

➔ We use mini-batched L-BFGS for optimization which converges to a local optimum of our non-convex objective function

Experiments

➔ Experiments are conducted on both WordNet and FreeBase to predict whether some relations hold using other facts in the database. Our goal is to predict correct facts in the form of relations (e1 , R, e2 ) in the testing data. This could be seen as answering questions such as Does a dog have a tail?, using the scores g(dog, has part, tail) computed by the various models.

➔ We use the development set to find a threshold TR for each relation such that if g(e1 , R, e2 ) ≥ TR , the relation (e1 , R, e2 ) holds, otherwise it does not hold.

➔ The final accuracy is based on how many triplets are classified correctly.

Experiments - ResultsAccuracy for NTN model on FreeBase Database

Using learned Word embeddings Using random Word embeddings

Experiments - ResultsAccuracy for NTN model on WordNet Database

Using learned Word embeddings Using random Word embeddings

Model Comparisons for the 2 databases

Experiments - Results

Experiments - Results Visualization

A reasoning example in FreeBase. Black lines denote relationships in training, red lines denote relationships the model inferred. The dashed line denotes word vector sharing.

Conclusions➔ We introduced Neural Tensor Networks for knowledge base

completion. Unlike previous models for predicting relationships using entities in knowledge bases, our model allows mediated interaction of entity vectors via a tensor. The model obtains the highest accuracy in terms of predicting unseen relationships between entities through reasoning inside a given knowledge base. It enables the extension of databases even without external textual resources.

➔ We further show that by representing entities through their constituent words and initializing these word representations using readily available word vectors, performance of all models improves substantially

.Applications➔ Link Prediction➔ Semantic Parsing➔ Question Answering➔ Knowledge Base Expansion➔ Named Entity Recognition

Thank You!

Engineering

Reasoning Over Knowledge Base