22
UNC-CH at DUC2007: UNC-CH at DUC2007: Query Expansion, Lexical Query Expansion, Lexical Simplification, and Sentence Simplification, and Sentence Selection Strategies Selection Strategies for Multi-Document Summarization for Multi-Document Summarization Catherine Blake Julia Kampov Andreas Orphanides David West Cory Lown

UNC-CH at DUC2007: Query Expansion, Lexical Simplification, and Sentence Selection Strategies for Multi-Document Summarization Catherine Blake Julia Kampov

Embed Size (px)

Citation preview

UNC-CH at DUC2007:UNC-CH at DUC2007:Query Expansion, Lexical Simplification, and Query Expansion, Lexical Simplification, and

Sentence Selection Strategies Sentence Selection Strategies for Multi-Document Summarizationfor Multi-Document Summarization

Catherine BlakeJulia Kampov

Andreas OrphanidesDavid WestCory Lown

Goals in 2007Goals in 2007

Get a system up and running Components

• Query Expansion• WordNet

• Lexical Compression• Linguistically motivated pruning

• Sentence Selection• Clustering

System ArchitectureSystem Architecture

Clustering

Add original

sentences

Add sub-sentences

Simplify sentences:

· Noun Appositives· Attributions· Part Modifiers

Select the best sentence

Order finalsentences

Simplify sentences:

· Lead adverbial

Summary

Sentences

> 250words

YES

NO

IDF

TF

Novel terms

Query expansion

Add best sent to final

Topic articles

TopicKey Experiment Setting

Processing Path

Query Expansion - ApproachQuery Expansion - Approach

Goal: Increase responsiveness Approach

• A – Weak Baseline• any term in topic or query

• B – Baseline• remove stop words inc. small set of tailored terms

• C – Weak WordNet• WordNet synsets from terms in B

• D – WordNet• Synsets from C + synonyms

Query Expansion - EvaluationQuery Expansion - Evaluation

Query selection

• Rank 2006 queries by overall responsiveness

Relevance

• 3 annotators identified sentences with “information pertinent to the topic” for 9 topics

• For evaluation a sentence was identified when a term from in ABC or D appeared in a gold standard sentence

Inter-rater reliability

• Topic 6 and 34 had fair to moderate agreement

• Annotators reached consensus for topic 6 and 34

• Annotators then reworked other topics

Annotators didn’t know how the system would

summarize text, but knew that the task was going to

be automated

Query SelectionQuery Selection

40

50

60

70

80

90

100

110

0 5 10 15 20 25 30 35 40 45 50

Rank

Re

spo

nsi

ven

ess

Ove

rall

34

6

Query Expansion – EvaluationQuery Expansion – Evaluation

Lexical SimplificationLexical Simplification

Clustering

Add original

sentences

Add sub-sentences

Simplify sentences:

· Noun Appositives· Attributions· Part Modifiers

Select the best sentence

Order finalsentences

Simplify sentences:

· Lead adverbial

Summary

Sentences

> 250words

YES

NO

IDF

TF

Novel terms

Query expansion

Add best sent to final

Topic articles

TopicKey Experiment Setting

Processing Path

Decision: No WordNet Query Expansion

Lexical SimplificationLexical Simplification

Goal• Increase linguistic quality

Approach• Representation

• Type Dependency Tree (de Marneffe, et al, 2006)

• Stanford Parser Version 1.5 (Klein & Manning, 2002; 2003)

• Identify short, stand-alone sentences

• Prune both original and short sentences using• Parser tags

• Cue phrases identified in previous DUC submissions

Short Stand-Alone SentencesShort Stand-Alone Sentences

But it went on to say that economic reform has not brought political freedom and that

Chinese who try to dissent “live in an environment filled with repression.”

nsubj

dep

dobj

nsubj

dep

cccomp

aux dep

conj

amod

aux

advmod

conj

amodcc

nsubj

dep

dep dep dep dep dep dep

det partmod

dep dep

Sub-Sentences

PruningPruning

Noun Appositive

Participial Modifier

For nearly a decade, Queen Latifah, the first lady of hip-hop, has been bobbing and weaving questions about …

Indeed, some people reading this report could get the impression that Amnesty believes violence can be a legitimate instrument, the statement said

PruningPruning

Lead Adverbials• 15 cue phrases from previous DUCs

Attribution• Parser tags

• Cue phrases: said, according

Separately, the report said that the murder rate by Indians in 1996 was 4 per 100000, below the national average …

Lexical SimplificationLexical Simplification

Clustering

Add original

sentences

Add sub-sentences

Simplify sentences:

· Noun Appositives· Attributions· Part Modifiers

Select the best sentence

Order finalsentences

Simplify sentences:

· Lead adverbial

Summary

Sentences

> 250words

YES

NO

IDF

TF

Novel terms

Query expansion

Add best sent to final

Topic articles

TopicKey Experiment Setting

Processing Path

Sentence Selection - SettingsSentence Selection - Settings

No WordNet query expansion• original + base form

Percentage of Topic/Query Terms• Num stemmed terms in query

num stemmed terms in sentence Percentage of Unique Terms

• Num stem terms new sent that not in selected sentNum of stemmed terms in sentence

Weighted Term Frequency * IDF

Sentence Selection - SettingsSentence Selection - Settings

Weighted Term Frequency (tottf)

Feature Weight

Stopword or punctuation0

Topic/Query ^ ¬Summary 1

Topic/Query ^ Summary 0.5

¬ Topic/Query ^ ¬Summary 0.01

¬ Topic/Query ^ Summary 0.001

Sentence SelectionSentence Selection

Clustering• Oracle clustering tool

• K-means

• 1000 iterations

• removed determiners, prepositions etc

Favor Sentences from• Different clusters

• Popular clusters – ie lots of sentences

• How representative the sentence is of the cluster

Sentence Selection – EvaluationSentence Selection – Evaluation

ID Description DUC06 DUC07

I tottf/numWdSent * CW 0.3981 0.4212

F %WdTopic * CW 0.3979 0.4171

E tottf * CW 0.3977 0.4183

B CW 0.3947 0.4169

D Tfidf 0.3912 0.4086

G Tottf 0.3904 0.4109

H tottf/numWdSent 0.3754 0.3913

A %WdTopic + %WdNew+CW 0.3749 0.3963

C %WdTopic + %WdNew 0.3623 0.3786

ROUGE-1 Score

Sentence Selection – EvaluationSentence Selection – Evaluation

Official DUC 2007 EvaluationOfficial DUC 2007 Evaluation

UNC-CH = System 22 Automatic Evaluation

• ROUGE-2 score 0.10329 (13th)

Manual Evaluation• Responsiveness = 2.956 (7th)

• Linguistic Quality = 2.987 (24th)

What we have learned so farWhat we have learned so far

Sentence selection• Optimal Strategy: weighted term frequency /

sentence length * cluster weight

• Clustering really helps Lexical simplification

• Rework sub-sentences

• Pronoun resolution Query expansion had negligible effect

Next StepsNext Steps

Alternative Query Expansion• Error analysis of medical questions underway • Concept representation• Unified Medical Language System (UMLS)

Tune sentence selection strategy Lexical simplification

• Rework sub-sentences• Add basic pronoun resolution

Sentence Re-Ordering• Combine with lexical simplification

AcknowledgementsAcknowledgements

The organizers for running this conference and providing manual summaries

Previous DUC paper authors for making their system designs explicit

Monica Sanchez and Stephanie Haas for earlier discussions

Thom Hailey, Scott Krauss and Toshiba Burns-Johnson for annotating queries