Text Summarization via Semantic Representation

Preview:

DESCRIPTION

Text Summarization via Semantic Representation. 吳旻誠 2014/07/16. Gist-content Question. Ask the main idea of the talk. The correct answers describe the closest overall theme of the content, and the distractors refer to only small portions of the content. Gist-content Question. - PowerPoint PPT Presentation

Citation preview

Text Summarization via Semantic Representation

吳旻誠

2014/07/16

Gist-content Question

• Ask the main idea of the talk.

• The correct answers describe the closest overall theme of the content, and the distractors refer to only small portions of the content.

Gist-content Question

• Q. Which of the following is closest to the main idea of this talk?– (A)We've had three explanations for why we might

sleep.– (B)When you're tired, and you lack sleep, you have poor

memory, poor creativity, increased impulsiveness, and overall poor judgment.

– (C)If you have good sleep, it increases your concentration, attention, decision-making, creativity, social skills, health.

– (D)You do not do anything much while you're asleep.

Gist-content question generation

• Seem the most important sentence as the main idea of the talk.

• Use LexRank to Measure the importance of sentences.

LexRank

• Measure the importance of sentences.

• Graph-based Model(Undirected).

• The nodes represent the sentences.

• The edges are the cosine similarity between nodes.

LexRank

LexRank

Conditions that should be satisfied

• Stochastic matrix.

• Irreducible.

• Aperiodic.

LexRank

Similarity Between Sentences

• But…what’s the similarity between the following sentences?– I will fully support you.– I'll back you up all the way.

Deep Learning

• A set of algorithms in Machine Learning area.

• Learning representations of data.

• Has been applied to fields like computer vision , automatic speech recognition, natural language processing.

Reduce the Dimensionality of Data with Neural Networks

Word2Vec

• A Google open source tool.

• Compute vector representations of words.

• Provides an efficient implementation of– Continuous Bag-of-Words(CBOW) architecture. – Skip-gram architecture.

Continuous Bag-of-Words Model(CBOW)

Skip-gram Model

Softmax

Hierarchical Softmax

• Uses a binary tree representation of the output layer with the W words as its leaves.

• Each word w can be reached by an appropriate path from the root of the tree.

Sentence Representations

Now we have the representations of words , but how can we represent sentences by these representations? A recursive Deep Learning model was proposed by Stanford Natural Language Processing Group.

Recursive Autoencoder

Dynamic Pooling

Recommended