60
Introduction Introduction What is Text Summarization?

Introduction What is Text Summarization?. Introduction A summary. What is Text Summarization? A summary

  • View
    248

  • Download
    5

Embed Size (px)

Citation preview

Page 1: Introduction What is Text Summarization?. Introduction A summary. What is Text Summarization? A summary

IntroductionIntroduction

What is Text Summarization?What is Text Summarization?

Page 2: Introduction What is Text Summarization?. Introduction A summary. What is Text Summarization? A summary

IntroductionIntroduction

What is Text Summarization?

A summary.

What is Text Summarization?

A summary.

Page 3: Introduction What is Text Summarization?. Introduction A summary. What is Text Summarization? A summary

IntroductionIntroduction

What is Text Summarization?

An automatically generated summary.

What is Text Summarization?

An automatically generated summary.

Page 4: Introduction What is Text Summarization?. Introduction A summary. What is Text Summarization? A summary

IntroductionIntroduction

What is Text Summarization?

An automatically generated summary of a document or collection.

What is Text Summarization?

An automatically generated summary of a document or collection.

Page 5: Introduction What is Text Summarization?. Introduction A summary. What is Text Summarization? A summary

IntroductionIntroduction

What is Text Summarization?

An automatically generated summary of a document or collection which is at

least as good as a human can produce.

What is Text Summarization?

An automatically generated summary of a document or collection which is at

least as good as a human can produce.

Page 6: Introduction What is Text Summarization?. Introduction A summary. What is Text Summarization? A summary

IntroductionIntroduction

We do not know good ways of doing it, so what are some other fields that we can borrow from to do what we need to do?

Information ExtractionInformation RetrievalText MiningText Generation

We do not know good ways of doing it, so what are some other fields that we can borrow from to do what we need to do?

Information ExtractionInformation RetrievalText MiningText Generation

Page 7: Introduction What is Text Summarization?. Introduction A summary. What is Text Summarization? A summary

Types of Text SummarizationTypes of Text Summarization

What types of summaries are there?

indicative versus informativeextract versus abstractgeneric versus query-orientedbackground versus just-the-newssingle-document versus multi-document source

What types of summaries are there?

indicative versus informativeextract versus abstractgeneric versus query-orientedbackground versus just-the-newssingle-document versus multi-document source

Page 8: Introduction What is Text Summarization?. Introduction A summary. What is Text Summarization? A summary

Types of Text SummarizationTypes of Text Summarization

Summarization tasks can vary on what information is considered as the source:

Summaries can look at all the information in a document(s) or

only the information that is deemed relevant for a specific task

Summarization tasks can vary on what information is considered as the source:

Summaries can look at all the information in a document(s) or

only the information that is deemed relevant for a specific task

Page 9: Introduction What is Text Summarization?. Introduction A summary. What is Text Summarization? A summary

Types of Text SummarizationTypes of Text Summarization

This can be re-stated as:

top-down (query-driven focus)

versus

bottom-up (text-driven focus)

This can be re-stated as:

top-down (query-driven focus)

versus

bottom-up (text-driven focus)

Page 10: Introduction What is Text Summarization?. Introduction A summary. What is Text Summarization? A summary

What do Human Summarizers Do?What do Human Summarizers Do?

Generally,

delete extraneous information

generalize concepts

make concepts more compact

Generally,

delete extraneous information

generalize concepts

make concepts more compact

Page 11: Introduction What is Text Summarization?. Introduction A summary. What is Text Summarization? A summary

What do Human Summarizers Do?What do Human Summarizers Do?

Example:

Father was washing dishes. Mother was working on her new book. The daughter was busy painting the window frames.

After summarization:

The whole family was busy.

Example:

Father was washing dishes. Mother was working on her new book. The daughter was busy painting the window frames.

After summarization:

The whole family was busy.

Page 12: Introduction What is Text Summarization?. Introduction A summary. What is Text Summarization? A summary

What do Human Summarizers Do?What do Human Summarizers Do?

Example 2:

Father was washing dishes. Mother was working on her new book. The daughter was busy painting the window frames. All of a sudden, the publisher called in and told mother that he needed the manuscript a month earlier than foreseen. Father left the dishes and finished the drawings instead. The daughter dropped the brush and rushed to do the proofreading. Supported by her family, mother managed to finish her book in time.

Example 2:

Father was washing dishes. Mother was working on her new book. The daughter was busy painting the window frames. All of a sudden, the publisher called in and told mother that he needed the manuscript a month earlier than foreseen. Father left the dishes and finished the drawings instead. The daughter dropped the brush and rushed to do the proofreading. Supported by her family, mother managed to finish her book in time.

Page 13: Introduction What is Text Summarization?. Introduction A summary. What is Text Summarization? A summary

What do Human Summarizers Do?What do Human Summarizers Do?

Topic of story has shifted

Example stresses importance of understanding entire story before abstracting from it

Humans read entire document before summarizing

Computational approaches can look at entire document or subpart related to task

Topic of story has shifted

Example stresses importance of understanding entire story before abstracting from it

Humans read entire document before summarizing

Computational approaches can look at entire document or subpart related to task

Page 14: Introduction What is Text Summarization?. Introduction A summary. What is Text Summarization? A summary

What do Human Summarizers Do?What do Human Summarizers Do?

Discourse Cues that aid in summarization:

knowledge of the topic domainsyntactic cues (topic-comment, connectives (but,

however, because, for example))stylistic and rhetorical cues (The most pressing

thing to do was, I conclude that)structural cues (narrative structure)context or situational cues

Discourse Cues that aid in summarization:

knowledge of the topic domainsyntactic cues (topic-comment, connectives (but,

however, because, for example))stylistic and rhetorical cues (The most pressing

thing to do was, I conclude that)structural cues (narrative structure)context or situational cues

Page 15: Introduction What is Text Summarization?. Introduction A summary. What is Text Summarization? A summary

What do Human Summarizers Do?What do Human Summarizers Do?

General strategies:

What to keep: facts, items relating to the topic, items that discuss purpose, items that are stated positively, items that contrast other items, items that are stressed

What to delete: reasons, comments, examples

General strategies:

What to keep: facts, items relating to the topic, items that discuss purpose, items that are stated positively, items that contrast other items, items that are stressed

What to delete: reasons, comments, examples

Page 16: Introduction What is Text Summarization?. Introduction A summary. What is Text Summarization? A summary

What do Human Summarizers Do?What do Human Summarizers Do?

Studies on consistency found that when abstracting documents:

Single human subjects vary widely in consistency using the same article over two different periods of time

Variation among different abstractors was even more significant

Even without a lot of consistency, all abstracts produced were adequate

Studies on consistency found that when abstracting documents:

Single human subjects vary widely in consistency using the same article over two different periods of time

Variation among different abstractors was even more significant

Even without a lot of consistency, all abstracts produced were adequate

Page 17: Introduction What is Text Summarization?. Introduction A summary. What is Text Summarization? A summary

Computational ApproachesComputational Approaches

How do we do Text Summarization?

Knowledge-based

Selection-based

How do we do Text Summarization?

Knowledge-based

Selection-based

Page 18: Introduction What is Text Summarization?. Introduction A summary. What is Text Summarization? A summary

Historical ApproachesHistorical Approaches

First text summarization algorithm by Luhn (1958):

1. words are input from the text;

2. common/non-substantive words are deleted through table look-up;

3. content words are stored, along with their position in the text, as well as any punctuation that is located immediately to the left and/or right of the word;

4. content words are sorted alphabetically

First text summarization algorithm by Luhn (1958):

1. words are input from the text;

2. common/non-substantive words are deleted through table look-up;

3. content words are stored, along with their position in the text, as well as any punctuation that is located immediately to the left and/or right of the word;

4. content words are sorted alphabetically

Page 19: Introduction What is Text Summarization?. Introduction A summary. What is Text Summarization? A summary

Historical ApproachesHistorical Approaches

Luhn Algorithm (cont.)

5. similar spellings are consolidated into word types (a rough approximation of a stemmer)

5a. any token with less than seven letter non-matches are considered to be of the same word type:

Luhn Algorithm (cont.)

5. similar spellings are consolidated into word types (a rough approximation of a stemmer)

5a. any token with less than seven letter non-matches are considered to be of the same word type:

frequently

frequent

10 letters,

8 match, 2 non-match

frequently

frequent

10 letters,

8 match, 2 non-match

Page 20: Introduction What is Text Summarization?. Introduction A summary. What is Text Summarization? A summary

Historical ApproachesHistorical Approaches

Luhn Algorithm (cont.):

5b. the frequencies of word types are compared

5c. low frequencies deleted

5d. remaining words were considered significant

Problems: anaphora white elephant

those big animals

they are big and white

Luhn Algorithm (cont.):

5b. the frequencies of word types are compared

5c. low frequencies deleted

5d. remaining words were considered significant

Problems: anaphora white elephant

those big animals

they are big and white

Page 21: Introduction What is Text Summarization?. Introduction A summary. What is Text Summarization? A summary

Historical ApproachesHistorical Approaches

Luhn Algorithm (cont.)

6. remaining word types are sorted into location order;

7. sentence representativeness determined by dividing sentences into substrings defined by distances between significant words

Luhn Algorithm (cont.)

6. remaining word types are sorted into location order;

7. sentence representativeness determined by dividing sentences into substrings defined by distances between significant words

Better to see you with, my dear

Better to

to see

you with, my

with, my dear

Better to see you with, my dear

Better to

to see

you with, my

with, my dear

Page 22: Introduction What is Text Summarization?. Introduction A summary. What is Text Summarization? A summary

Historical ApproachesHistorical Approaches

Substring 1: Better (2) to

Substring 2: to see(4)

Substring 3: you(6) with, my

Substring 4: with, my dear(1)

Better to see you with, my dear.

Substring 1: Better (2) to

Substring 2: to see(4)

Substring 3: you(6) with, my

Substring 4: with, my dear(1)

Better to see you with, my dear.

Substring 1: 2/2=1

Substring 2: 4/2=2

Substring 3: 6/3=2

Substring 4: 1/3=0.333

Total value for sentence = 5.33

Substring 1: 2/2=1

Substring 2: 4/2=2

Substring 3: 6/3=2

Substring 4: 1/3=0.333

Total value for sentence = 5.33

8. for each substring, a representativeness value was calculated by dividing the

number of representative tokens in the cluster by the total number of tokens in the cluster;

9. sentences reaching a value above a preset threshold were selected for inclusion

Page 23: Introduction What is Text Summarization?. Introduction A summary. What is Text Summarization? A summary

Historical ApproachesHistorical Approaches

TRW (1960s) builds upon Luhn model by:

adding weights for words that occurred in the title or subtitles of the document

sentences earlier or later in a paragraph were given higher weights than those in the middle

However, largest drawback at this point is that whole sentences are extracted, not rewritten.

TRW (1960s) builds upon Luhn model by:

adding weights for words that occurred in the title or subtitles of the document

sentences earlier or later in a paragraph were given higher weights than those in the middle

However, largest drawback at this point is that whole sentences are extracted, not rewritten.

Page 24: Introduction What is Text Summarization?. Introduction A summary. What is Text Summarization? A summary

Historical ApproachesHistorical Approaches

Models Influenced by Cognitive Science

make use of frames and scripts to simulate schemas, which are formats of knowledge representations

FRUMPPAULINE

Models Influenced by Cognitive Science

make use of frames and scripts to simulate schemas, which are formats of knowledge representations

FRUMPPAULINE

Page 25: Introduction What is Text Summarization?. Introduction A summary. What is Text Summarization? A summary

Historical ApproachesHistorical Approaches

FRUMP:

expectation driven modelknowledge base are sketchy scripts looks for instances of the knowledge-base in the text to

be summarizedFull parsing is not necessary for this method to work

FRUMP:

expectation driven modelknowledge base are sketchy scripts looks for instances of the knowledge-base in the text to

be summarizedFull parsing is not necessary for this method to work

Page 26: Introduction What is Text Summarization?. Introduction A summary. What is Text Summarization? A summary

Historical ApproachesHistorical Approaches

PAULINE:

pragmatically drivencan generate 100 different summaries from 1 originalinitially asks user for information to help guide its behaviorasks user for conversation topicscollects information on the topic and then creates sentencespragmatics that are used include: make listener like me, use a

"highfalutin" tone of voice, persuade the listener to change their opinion

PAULINE:

pragmatically drivencan generate 100 different summaries from 1 originalinitially asks user for information to help guide its behaviorasks user for conversation topicscollects information on the topic and then creates sentencespragmatics that are used include: make listener like me, use a

"highfalutin" tone of voice, persuade the listener to change their opinion

Page 27: Introduction What is Text Summarization?. Introduction A summary. What is Text Summarization? A summary

Current ApproachesCurrent Approaches

Newer methods are characterized by:

stochastic methodsintegration of corpus linguisticsshallow parsing methodslexical semantics knowledge through use of WordNetintegration of different methods in one modelsummarization from structured knowledgeintegration of information from different media

Newer methods are characterized by:

stochastic methodsintegration of corpus linguisticsshallow parsing methodslexical semantics knowledge through use of WordNetintegration of different methods in one modelsummarization from structured knowledgeintegration of information from different media

Page 28: Introduction What is Text Summarization?. Introduction A summary. What is Text Summarization? A summary

Current ApproachesCurrent Approaches

Using related fields:

IE DB Compression Text Generation

Using related fields:

IE DB Compression Text Generation

Page 29: Introduction What is Text Summarization?. Introduction A summary. What is Text Summarization? A summary

Current ApproachesCurrent Approaches

Think Smaller!

[Sentence Compression]

Think Smaller!

[Sentence Compression]

Page 30: Introduction What is Text Summarization?. Introduction A summary. What is Text Summarization? A summary

Current ApproachesCurrent Approaches

Sentence Compression

Noisy Channel

Sentence Compression

Noisy Channel

Page 31: Introduction What is Text Summarization?. Introduction A summary. What is Text Summarization? A summary

Current ApproachesCurrent Approaches

Sentence Compression

Source

Channel

Decoder

Sentence Compression

Source

Channel

Decoder

Page 32: Introduction What is Text Summarization?. Introduction A summary. What is Text Summarization? A summary

Current ApproachesCurrent Approaches

Sentence Compression

Focus of the Compression

Sentence Compression

Focus of the Compression

Page 33: Introduction What is Text Summarization?. Introduction A summary. What is Text Summarization? A summary

Current ApproachesCurrent Approaches

Sentence Compression

Sentences or Trees?

Sentence Compression

Sentences or Trees?

Page 34: Introduction What is Text Summarization?. Introduction A summary. What is Text Summarization? A summary

Current ApproachesCurrent Approaches

Sentence Compression

Q: So, how do we do it?

A: Probability that original sentence is an expansion of generated sentence

Sentence Compression

Q: So, how do we do it?

A: Probability that original sentence is an expansion of generated sentence

Page 35: Introduction What is Text Summarization?. Introduction A summary. What is Text Summarization? A summary

Current ApproachesCurrent Approaches

Example

Beyond that basic level, the operations of the three products vary widely (1514588)

Example

Beyond that basic level, the operations of the three products vary widely (1514588)

Page 36: Introduction What is Text Summarization?. Introduction A summary. What is Text Summarization? A summary

Current ApproachesCurrent Approaches

Example

Beyond that level, the operations of the three products vary widely (1430374)

Example

Beyond that level, the operations of the three products vary widely (1430374)

Page 37: Introduction What is Text Summarization?. Introduction A summary. What is Text Summarization? A summary

Current ApproachesCurrent Approaches

Example

Beyond that level, the operations of the three products vary (1249223)

Example

Beyond that level, the operations of the three products vary (1249223)

Page 38: Introduction What is Text Summarization?. Introduction A summary. What is Text Summarization? A summary

Current ApproachesCurrent Approaches

Example

Beyond that basic level, the operations of the products vary (1181377)

Example

Beyond that basic level, the operations of the products vary (1181377)

Page 39: Introduction What is Text Summarization?. Introduction A summary. What is Text Summarization? A summary

Current ApproachesCurrent Approaches

Example

The operations of the three products vary widely (939912)

Example

The operations of the three products vary widely (939912)

Page 40: Introduction What is Text Summarization?. Introduction A summary. What is Text Summarization? A summary

Current ApproachesCurrent Approaches

Example

The operations of the products vary widely (872066)

Example

The operations of the products vary widely (872066)

Page 41: Introduction What is Text Summarization?. Introduction A summary. What is Text Summarization? A summary

Current ApproachesCurrent Approaches

Example

The operations of the products vary (748761)

Example

The operations of the products vary (748761)

Page 42: Introduction What is Text Summarization?. Introduction A summary. What is Text Summarization? A summary

Current ApproachesCurrent Approaches

Example

The operations of products vary (809158)

Example

The operations of products vary (809158)

Page 43: Introduction What is Text Summarization?. Introduction A summary. What is Text Summarization? A summary

Current ApproachesCurrent Approaches

Example

The operations vary (522402)

Example

The operations vary (522402)

Page 44: Introduction What is Text Summarization?. Introduction A summary. What is Text Summarization? A summary

Current ApproachesCurrent Approaches

Example

Operations vary (662642)

Example

Operations vary (662642)

Page 45: Introduction What is Text Summarization?. Introduction A summary. What is Text Summarization? A summary

Current ApproachesCurrent Approaches

Example

Finally, another advantage of broadband is distance.

Example

Finally, another advantage of broadband is distance.

Page 46: Introduction What is Text Summarization?. Introduction A summary. What is Text Summarization? A summary

Current ApproachesCurrent Approaches

Example

Finally another advantage of broadband is distance.

Example

Finally another advantage of broadband is distance.

Page 47: Introduction What is Text Summarization?. Introduction A summary. What is Text Summarization? A summary

Current ApproachesCurrent Approaches

Example

Another advantage of broadband is distance.

Example

Another advantage of broadband is distance.

Page 48: Introduction What is Text Summarization?. Introduction A summary. What is Text Summarization? A summary

Current ApproachesCurrent Approaches

Example

Advantage of broadband is distance.

Example

Advantage of broadband is distance.

Page 49: Introduction What is Text Summarization?. Introduction A summary. What is Text Summarization? A summary

Current ApproachesCurrent Approaches

Example

Another advantage is distance.

Example

Another advantage is distance.

Page 50: Introduction What is Text Summarization?. Introduction A summary. What is Text Summarization? A summary

Current ApproachesCurrent Approaches

Example

Advantage is distance.

Example

Advantage is distance.

Page 51: Introduction What is Text Summarization?. Introduction A summary. What is Text Summarization? A summary

Current ApproachesCurrent Approaches

Example

The documentation is typical of Epson quality; excellent.

Documentation is excellent.

Example

The documentation is typical of Epson quality; excellent.

Documentation is excellent.

Page 52: Introduction What is Text Summarization?. Introduction A summary. What is Text Summarization? A summary

Current ApproachesCurrent Approaches

Example

All of our design goals were achieved and the delivered performance matches the speed of the

underlying device.

All design goals were achieved.

Example

All of our design goals were achieved and the delivered performance matches the speed of the

underlying device.

All design goals were achieved.

Page 53: Introduction What is Text Summarization?. Introduction A summary. What is Text Summarization? A summary

Current ApproachesCurrent Approaches

Example

Reachs E-mail product, MailMan, is a message- management system designed initially for VINES LANs that will eventually be operation system-

independent.

MailMan will eventually be system-independent.

Example

Reachs E-mail product, MailMan, is a message- management system designed initially for VINES LANs that will eventually be operation system-

independent.

MailMan will eventually be system-independent.

Page 54: Introduction What is Text Summarization?. Introduction A summary. What is Text Summarization? A summary

Current ApproachesCurrent Approaches

Example

Although the modules themselves may be physically and/or electronically incompatible, the cable-

specific jacks on them provide industry-standard connections.

Cable-specific jacks provide industry-standard connections.

Example

Although the modules themselves may be physically and/or electronically incompatible, the cable-

specific jacks on them provide industry-standard connections.

Cable-specific jacks provide industry-standard connections.

Page 55: Introduction What is Text Summarization?. Introduction A summary. What is Text Summarization? A summary

Current ApproachesCurrent Approaches

Example

Ingres/Star prices start at $2,100.

Ingres/Star prices start at $2,100.

Example

Ingres/Star prices start at $2,100.

Ingres/Star prices start at $2,100.

Page 56: Introduction What is Text Summarization?. Introduction A summary. What is Text Summarization? A summary

Current ApproachesCurrent Approaches

Example

Original: Beyond the basic level, the operations of the three products vary widely.

Baseline: Beyond the basic level, the operations of the three products vary widely.

Noisy-Channel: The operations of the three products vary widely.

Decision-based: The operations of the three products vary widely.

Humans: The operations of the three products vary widely.

Example

Original: Beyond the basic level, the operations of the three products vary widely.

Baseline: Beyond the basic level, the operations of the three products vary widely.

Noisy-Channel: The operations of the three products vary widely.

Decision-based: The operations of the three products vary widely.

Humans: The operations of the three products vary widely.

Page 57: Introduction What is Text Summarization?. Introduction A summary. What is Text Summarization? A summary

Current ApproachesCurrent Approaches

Example

Original: Arborscan is reliable and worked accurately in testing, but it produces very large dxf files.

Baseline: Arborscan and worked in, but very large dxf.

Noisy-Channel: Arborscan is reliable and worked accurately in testing, but it produces very large dxf files.

Decision-based: Arborscan is reliable and worked accurately in testing very large dxf files.

Humans: Arborscan produces very large dxf files.

Example

Original: Arborscan is reliable and worked accurately in testing, but it produces very large dxf files.

Baseline: Arborscan and worked in, but very large dxf.

Noisy-Channel: Arborscan is reliable and worked accurately in testing, but it produces very large dxf files.

Decision-based: Arborscan is reliable and worked accurately in testing very large dxf files.

Humans: Arborscan produces very large dxf files.

Page 58: Introduction What is Text Summarization?. Introduction A summary. What is Text Summarization? A summary

Current ApproachesCurrent Approaches

Example

Original: Many debugging features, including user-defined break points and variable-watching and message-watching windows, have been

added.

Baseline: Debugging, user-defined and variable-watching and message-watching, have been.

Noisy-Channel: Many debugging features, including user-defined points and variable-watching and message-watching windows,

have been added.

Decision-based: Many debugging features.

Humans: Many debugging features have been added.

Example

Original: Many debugging features, including user-defined break points and variable-watching and message-watching windows, have been

added.

Baseline: Debugging, user-defined and variable-watching and message-watching, have been.

Noisy-Channel: Many debugging features, including user-defined points and variable-watching and message-watching windows,

have been added.

Decision-based: Many debugging features.

Humans: Many debugging features have been added.

Page 59: Introduction What is Text Summarization?. Introduction A summary. What is Text Summarization? A summary

Future WorkFuture Work

Noisy Channel

Knowledge-Based - CYC

Other

Noisy Channel

Knowledge-Based - CYC

Other

Page 60: Introduction What is Text Summarization?. Introduction A summary. What is Text Summarization? A summary

SummarySummary

Text summarization has several different methods and subtasks and, like most recent developments in the area of CompLing, there is more to be done to make automatic processes match human expectations.

Text summarization has several different methods and subtasks and, like most recent developments in the area of CompLing, there is more to be done to make automatic processes match human expectations.