Dynamic pooling and unfolding recursive autoencoders for paraphrase detection

Dynamic Pooling and Unfolding Recursive Autoencoders for Paraphrase Detection

R. Socher, et al, 2011Presenter: Shun Yoshida

Purpose of This Paper

Objective: To detect paraphrase

S1 The judge also refused to postpone the trial date of Sept. 29.S2 Obus also denied a defense motion to postpone the September trial date.

➔Identifying paraphrases is an important task for information retrieval, text summarization, evaluation of machine translation etc.

Relevance to My Research:This can help me to classify sentiment more precisely

2

Word Representation

In general, words are represented as vectors.

1. One-hot representationThis assigns ID to each word individually.

3

[ 0,0,…,1,0,…,0]

Problem:• Very sparse• High dimension• Unable to measure the similarity

between words

1:apple2:book

200:zoo

Vocabulary

Word Representation

2. Distributed Representation

This method aims to learn this representation

Merit:• Low dimension• Similar words take similar vector

4

zoo [ 1.5, 1.8, 0.3, 4 ]

This represents the semantic, syntactic information

:word embedding

Autoencoder

One kind of neural networks

#units of hidden is less than #units of input

Trained to reconstruct its own input

5

➔Enable to learn low dimensional representations which capture the information well

Autoencoder

Considered as binary tree;Input:2 childs Hidden:

6

:word embedding(initialized by neural language model)

weight of encode

:weight of decode

𝑐1 𝑐2

𝑝

𝑐1′ 𝑐2

′

childs to parent:

reconstruction:

reconstruction error:

Recursive Auto Encoders

The dimension of child and parent is same, thus we can repeat same step until full tree is constructed.

7

word embedding

phrase vector

reconstruction error of tree:

Unfolding RAE

Unfolding RAE tries to encode each hidden layer such that it best reconstructs its entire subtree to the leaf nodes.

8

Why Unfolding RAE?

Problem of RAE:• Equal weight to both children

though each child could represent a different number of words

• Lowers by making the hidden layer very small

➔Unfolding RAE can solve there problems.

9

1word 3words

RAE Training

Training is computed by minimizing the sum of all node’s and all tree’s reconstruction error. is function of (word embedding)　　　　　 and (weight of neural network) ➔Able to obtain word embeddings and phrase vectors after training

10

Similarity Matrix

After training, we compute the similarities (Euclidean distances) between all word and phrase vectors of the two sentences.These distances fill a similarity matrix

11

S[3,4] represents the similarity between node 4 of sentence1(mice) and node 3 of sentence2 (mice).➔zero distance

Why Dynamic Pooling?

Classifying from average distance or histogram distances of does not result in good performance.➔Need to feed into a classifier.

Problem:The matrix dimensions vary based on the sentence length

Solution:Map into a matrix of fixed size

➔Dynamic Pooling

12

Dynamic Pooling13

2𝑛−1=3 2𝑚−1=9

Example: ( are divisible by )

1. Produce an grid grid window size: =1×3

2. Define element of to be minimum value of each grid(small value means that there are similar words or phrases in both sentences, thus take minimum to keep this information)

𝑛𝑝=3

𝑛𝑝=3

takeminimum

Dynamic Pooling14

2𝑛−1=3 2𝑚−1=9

Example: ( are NOT divisible by )

1. Produce an grid grid window size: =1×4

2. Distribute remaining rows/columns to the last M grid.

𝑛𝑝=2

𝑛𝑝=2

takeminimum

Experiments

1. Does autoencoders capture the phrase information?

➔Unfolding RAE is better.

15

Experiments

2. Does unfolding RAE really decode the leaf nodes?

➔Unfolding RAE is betterThis can reconstruct phrases up to length five very well

16

Experiments

3. How is the performance of proposed method to detect paraphrase?

➔Proposed method achieves state-of-the-art performance

17

Experiments

4. Examples of classified data.

18

おわり

Science

Dynamic pooling and unfolding recursive autoencoders for paraphrase detection