59
DCNN for text B01902004 蔡蔡蔡

Dcnn for text

  • Upload
    -

  • View
    123

  • Download
    0

Embed Size (px)

Citation preview

DCNN for text

B01902004 蔡捷恩

A CNN for modeling Sentences

Kalchbrenner, Nal, Edward Grefenstette, and Phil Blunsom. "A convolutional neural network for modelling sentences."arXiv:1404.2188 (2014).

Sentence model

• Sentence -> feature vector, that’s all !• However, it is the core of:• Sentiment analysis, paraphrase detection,

entailment recognition, summarisation, discourse analysis, machine translation, grounded language learning, image retrieval …

contribution

• Does not rely on parse tree• Easily applicable to any language ?

How to model a sentence?

• Composition based method• Need human knowledge to compose

• Automatically extracted logical forms• Ex. RNN, TDNN

Brief network structure

• Interleaving k-max pooling & 1-dim-conv. + TDNN => generate a sentence graph

A kind of syntax tree ?

NN sentence model with syntax tree(Recursive NN, RecNN)

Reference syntax treewhile training

Share weightand stack up to form the network

RNN for sentence modelLinear “structure”

Back to DCNN

• Convolution• TDNN• K-max pooling( Dynamic k-max pooling)

ConvolutionNarrow type, win=5

wide type, win=5 (0-padding)

Max-TDNNGOAL: recognize features independent of time-shift

(i.e. sequence position)

Take a look at DCNN

Need to be optimized during training

If we use Max-TDNN

K-max pooling

• Given k, no matter how many dimension an input get, pool the top-k ones as output, “the order of output corresponds to their input”

• Better than max-TDNN by:– Preserve the order of features– Discern more finely how high activated feature

react• Guarantee the length of input to FC

independent of sentence length

Only fully connected need fix length

• Intermediate layers can be more flexible• Dynamic k-max Pooling !

Dynamic k-max Pooling

• K is a function of length of the input sentence and depth of the network

The k of currently concerned layer

Fixed k-max pooling’s k at the top

Total # of conv. in the network ( the depth)

Input sentence length

Folding

• Feature detectors in different rows are independent of each other until the top fully connected layer

• Simply do vector sum

+

Properties

• Sensitive to the order of words• Filters of the first layer model n-grams, n ≤m• Properties invariance of absolute position

captured by upper layer convs.• Induce feature graph property

ExperimentsSentiment analysisStanford Sentiment TreebankMovie review, 5 scense, +/- label

Experiments Question type predictionon TREC

Experiments Twitter sentiment dataset, binary label

Experiments

• Visualizing feature detectors

Think about it

• Can this kind of k-max pooling apply to image tasks ?

A CNN for matching nature language sentences

Hu, Baotian, et al. "Convolutional neural network architectures for matching natural language sentences." Advances in Neural Information Processing Systems. 2014

Why convolution approach

• No need prior knowledge

Contribution

• Hierarchical sentence modeling

• The capturing of rich matching patterns at different levels of abstraction

Convolutional Sentence Modeling

Word2vec pre-trained

2-window max poolingFixed input len

A trick on zero-padding

• The variable length of sentence may be in a fairly broad range

• Introduce gate operation

• g(z) = <0> while z = <0>, otherwise, <1>• No bias !

Conv + Max poolComposition

RNN vs ConvNet

ConvNet RNN

Hierarchical structure

W L

Parallelism W L

Capture far away information

- -

Explainable W L

Variety L W

Architecture-I

• Drawback: in forward phase, the representation of each sentenceIs built without knowledge of each other

Architecture-II

• Build directly on the interaction space between 2 sentences• From 1D to 2D convolution

Good trick at pooling

2D max-pooling

Model Generality

• Arc-II subsumes Arc-I as a special case

Cost function

• Large margin objective

e(.)

Experiment – Sentence Completion

Experiment – Matching Response to Tweet

Experiment – Paraphrase Identification

• Determine whether two sentences have the same meaning

Discussion

• Sequence is important

Zhang, Xiang, and Yann LeCun. "Text Understanding from Scratch." arXiv preprint arXiv:1502.01710 (2015)

Text Understanding from Scratch

Contribution

• Character-level input• No OOV• Work for both English and Chinese

The model

character encoding spaceNot encoded character or space=> All-zero vector

Fixed length window

H e l l o w o r l

More detail

What about various input length?

• Set to the longest sentence we are going to see (1014 character used in their experiments)

Data augmentation - Thesaurus

• Thesaurus: “a book that lists words in groups of synonyms and related concepts”

• http://www.libreoffice.org/

Comparison models

• Bag-of-word: 5000 most freq. words

• Bag-of-centroids: 5000-means word vectors on Google News corpus

DBpedia Ontology Classification

DBpedia Ontology Classification

Amazon review sentiment analysis

• 1~5 indicating user’s subjective rating of a product.

• Collected by SNAP project

Amazon review sentiment analysis

Amazon review sentiment analysis

Yahoo! Answer Topic Classification

Yahoo! Answer Topic Classification

News Categorization in English

News Categorization in English

News Categorization in Chinese

• SogouCA and SogouCS• pypinyin package + jieba Chinese

segmentation system

News Categorization in Chinese

Conclusion

• We can play a lot of trick with Pooling

Thank you