19
Dynamic Pooling and Unfolding Recursive Autoencoders for Paraphrase Detection R. Socher, et al, 2011 Presenter: Shun Yoshida

Dynamic pooling and unfolding recursive autoencoders for paraphrase detection

Embed Size (px)

DESCRIPTION

Paper survey of Dynamic pooling and unfolding recursive autoencoders for paraphrase detection

Citation preview

Page 1: Dynamic pooling and unfolding recursive autoencoders for paraphrase detection

Dynamic Pooling and Unfolding Recursive Autoencoders for Paraphrase Detection

R. Socher, et al, 2011Presenter: Shun Yoshida

Page 2: Dynamic pooling and unfolding recursive autoencoders for paraphrase detection

Purpose of This Paper

Objective: To detect paraphrase

S1 The judge also refused to postpone the trial date of Sept. 29.S2 Obus also denied a defense motion to postpone the September trial date.

➔Identifying paraphrases is an important task for information retrieval, text summarization, evaluation of machine translation etc.

Relevance to My Research:This can help me to classify sentiment more precisely

2

Page 3: Dynamic pooling and unfolding recursive autoencoders for paraphrase detection

Word Representation

In general, words are represented as vectors.

1. One-hot representationThis assigns ID to each word individually.

3

[ 0,0,…,1,0,…,0]

Problem:• Very sparse• High dimension• Unable to measure the similarity

between words

1:apple2:book

200:zoo

Vocabulary

Page 4: Dynamic pooling and unfolding recursive autoencoders for paraphrase detection

Word Representation

2. Distributed Representation

This method aims to learn this representation

Merit:• Low dimension• Similar words take similar vector

4

zoo [ 1.5, 1.8, 0.3, 4 ]

This represents the semantic, syntactic information

:word embedding

Page 5: Dynamic pooling and unfolding recursive autoencoders for paraphrase detection

Autoencoder

One kind of neural networks

#units of hidden is less than #units of input

Trained to reconstruct its own input

5

➔Enable to learn low dimensional representations which capture the information well

Page 6: Dynamic pooling and unfolding recursive autoencoders for paraphrase detection

Autoencoder

Considered as binary tree;Input:2 childs Hidden:

6

:word embedding(initialized by neural language model)

weight of encode

:weight of decode

𝑐1 𝑐2

𝑝

𝑐1′ 𝑐2

childs to parent:

reconstruction:

reconstruction error:

Page 7: Dynamic pooling and unfolding recursive autoencoders for paraphrase detection

Recursive Auto Encoders

The dimension of child and parent is same, thus we can repeat same step until full tree is constructed.

7

word embedding

phrase vector

reconstruction error of tree:

Page 8: Dynamic pooling and unfolding recursive autoencoders for paraphrase detection

Unfolding RAE

Unfolding RAE tries to encode each hidden layer such that it best reconstructs its entire subtree to the leaf nodes.

8

Page 9: Dynamic pooling and unfolding recursive autoencoders for paraphrase detection

Why Unfolding RAE?

Problem of RAE:• Equal weight to both children

though each child could represent a different number of words

• Lowers by making the hidden layer very small

➔Unfolding RAE can solve there problems.

9

1word 3words

Page 10: Dynamic pooling and unfolding recursive autoencoders for paraphrase detection

RAE Training

Training is computed by minimizing the sum of all node’s and all tree’s reconstruction error. is function of (word embedding)      and (weight of neural network) ➔Able to obtain word embeddings and phrase vectors after training

10

Page 11: Dynamic pooling and unfolding recursive autoencoders for paraphrase detection

Similarity Matrix

After training, we compute the similarities (Euclidean distances) between all word and phrase vectors of the two sentences.These distances fill a similarity matrix

11

S[3,4] represents the similarity between node 4 of sentence1(mice) and node 3 of sentence2 (mice).➔zero distance

Page 12: Dynamic pooling and unfolding recursive autoencoders for paraphrase detection

Why Dynamic Pooling?

Classifying from average distance or histogram distances of does not result in good performance.➔Need to feed into a classifier.

Problem:The matrix dimensions vary based on the sentence length

Solution:Map into a matrix of fixed size

➔Dynamic Pooling

12

Page 13: Dynamic pooling and unfolding recursive autoencoders for paraphrase detection

Dynamic Pooling13

2𝑛−1=3 2𝑚−1=9

Example: ( are divisible by )

1. Produce an grid grid window size: =1×3

2. Define element of to be minimum value of each grid(small value means that there are similar words or phrases in both sentences, thus take minimum to keep this information)

𝑛𝑝=3

𝑛𝑝=3

takeminimum

Page 14: Dynamic pooling and unfolding recursive autoencoders for paraphrase detection

Dynamic Pooling14

2𝑛−1=3 2𝑚−1=9

Example: ( are NOT divisible by )

1. Produce an grid grid window size: =1×4

2. Distribute remaining rows/columns to the last M grid.

𝑛𝑝=2

𝑛𝑝=2

takeminimum

Page 15: Dynamic pooling and unfolding recursive autoencoders for paraphrase detection

Experiments

1. Does autoencoders capture the phrase information?

➔Unfolding RAE is better.

15

Page 16: Dynamic pooling and unfolding recursive autoencoders for paraphrase detection

Experiments

2. Does unfolding RAE really decode the leaf nodes?

➔Unfolding RAE is betterThis can reconstruct phrases up to length five very well

16

Page 17: Dynamic pooling and unfolding recursive autoencoders for paraphrase detection

Experiments

3. How is the performance of proposed method to detect paraphrase?

➔Proposed method achieves state-of-the-art performance

17

Page 18: Dynamic pooling and unfolding recursive autoencoders for paraphrase detection

Experiments

4. Examples of classified data.

18

Page 19: Dynamic pooling and unfolding recursive autoencoders for paraphrase detection

おわり