Upload
victor-anthony
View
217
Download
0
Tags:
Embed Size (px)
Citation preview
UNC-CH at DUC2007:UNC-CH at DUC2007:Query Expansion, Lexical Simplification, and Query Expansion, Lexical Simplification, and
Sentence Selection Strategies Sentence Selection Strategies for Multi-Document Summarizationfor Multi-Document Summarization
Catherine BlakeJulia Kampov
Andreas OrphanidesDavid WestCory Lown
Goals in 2007Goals in 2007
Get a system up and running Components
• Query Expansion• WordNet
• Lexical Compression• Linguistically motivated pruning
• Sentence Selection• Clustering
System ArchitectureSystem Architecture
Clustering
Add original
sentences
Add sub-sentences
Simplify sentences:
· Noun Appositives· Attributions· Part Modifiers
Select the best sentence
Order finalsentences
Simplify sentences:
· Lead adverbial
Summary
Sentences
> 250words
YES
NO
IDF
TF
Novel terms
Query expansion
Add best sent to final
Topic articles
TopicKey Experiment Setting
Processing Path
Query Expansion - ApproachQuery Expansion - Approach
Goal: Increase responsiveness Approach
• A – Weak Baseline• any term in topic or query
• B – Baseline• remove stop words inc. small set of tailored terms
• C – Weak WordNet• WordNet synsets from terms in B
• D – WordNet• Synsets from C + synonyms
Query Expansion - EvaluationQuery Expansion - Evaluation
Query selection
• Rank 2006 queries by overall responsiveness
Relevance
• 3 annotators identified sentences with “information pertinent to the topic” for 9 topics
• For evaluation a sentence was identified when a term from in ABC or D appeared in a gold standard sentence
Inter-rater reliability
• Topic 6 and 34 had fair to moderate agreement
• Annotators reached consensus for topic 6 and 34
• Annotators then reworked other topics
Annotators didn’t know how the system would
summarize text, but knew that the task was going to
be automated
Query SelectionQuery Selection
40
50
60
70
80
90
100
110
0 5 10 15 20 25 30 35 40 45 50
Rank
Re
spo
nsi
ven
ess
Ove
rall
34
6
Lexical SimplificationLexical Simplification
Clustering
Add original
sentences
Add sub-sentences
Simplify sentences:
· Noun Appositives· Attributions· Part Modifiers
Select the best sentence
Order finalsentences
Simplify sentences:
· Lead adverbial
Summary
Sentences
> 250words
YES
NO
IDF
TF
Novel terms
Query expansion
Add best sent to final
Topic articles
TopicKey Experiment Setting
Processing Path
Decision: No WordNet Query Expansion
Lexical SimplificationLexical Simplification
Goal• Increase linguistic quality
Approach• Representation
• Type Dependency Tree (de Marneffe, et al, 2006)
• Stanford Parser Version 1.5 (Klein & Manning, 2002; 2003)
• Identify short, stand-alone sentences
• Prune both original and short sentences using• Parser tags
• Cue phrases identified in previous DUC submissions
Short Stand-Alone SentencesShort Stand-Alone Sentences
But it went on to say that economic reform has not brought political freedom and that
Chinese who try to dissent “live in an environment filled with repression.”
nsubj
dep
dobj
nsubj
dep
cccomp
aux dep
conj
amod
aux
advmod
conj
amodcc
nsubj
dep
dep dep dep dep dep dep
det partmod
dep dep
Sub-Sentences
PruningPruning
Noun Appositive
Participial Modifier
For nearly a decade, Queen Latifah, the first lady of hip-hop, has been bobbing and weaving questions about …
Indeed, some people reading this report could get the impression that Amnesty believes violence can be a legitimate instrument, the statement said
PruningPruning
Lead Adverbials• 15 cue phrases from previous DUCs
Attribution• Parser tags
• Cue phrases: said, according
Separately, the report said that the murder rate by Indians in 1996 was 4 per 100000, below the national average …
Lexical SimplificationLexical Simplification
Clustering
Add original
sentences
Add sub-sentences
Simplify sentences:
· Noun Appositives· Attributions· Part Modifiers
Select the best sentence
Order finalsentences
Simplify sentences:
· Lead adverbial
Summary
Sentences
> 250words
YES
NO
IDF
TF
Novel terms
Query expansion
Add best sent to final
Topic articles
TopicKey Experiment Setting
Processing Path
Sentence Selection - SettingsSentence Selection - Settings
No WordNet query expansion• original + base form
Percentage of Topic/Query Terms• Num stemmed terms in query
num stemmed terms in sentence Percentage of Unique Terms
• Num stem terms new sent that not in selected sentNum of stemmed terms in sentence
Weighted Term Frequency * IDF
Sentence Selection - SettingsSentence Selection - Settings
Weighted Term Frequency (tottf)
Feature Weight
Stopword or punctuation0
Topic/Query ^ ¬Summary 1
Topic/Query ^ Summary 0.5
¬ Topic/Query ^ ¬Summary 0.01
¬ Topic/Query ^ Summary 0.001
Sentence SelectionSentence Selection
Clustering• Oracle clustering tool
• K-means
• 1000 iterations
• removed determiners, prepositions etc
Favor Sentences from• Different clusters
• Popular clusters – ie lots of sentences
• How representative the sentence is of the cluster
Sentence Selection – EvaluationSentence Selection – Evaluation
ID Description DUC06 DUC07
I tottf/numWdSent * CW 0.3981 0.4212
F %WdTopic * CW 0.3979 0.4171
E tottf * CW 0.3977 0.4183
B CW 0.3947 0.4169
D Tfidf 0.3912 0.4086
G Tottf 0.3904 0.4109
H tottf/numWdSent 0.3754 0.3913
A %WdTopic + %WdNew+CW 0.3749 0.3963
C %WdTopic + %WdNew 0.3623 0.3786
ROUGE-1 Score
Official DUC 2007 EvaluationOfficial DUC 2007 Evaluation
UNC-CH = System 22 Automatic Evaluation
• ROUGE-2 score 0.10329 (13th)
Manual Evaluation• Responsiveness = 2.956 (7th)
• Linguistic Quality = 2.987 (24th)
What we have learned so farWhat we have learned so far
Sentence selection• Optimal Strategy: weighted term frequency /
sentence length * cluster weight
• Clustering really helps Lexical simplification
• Rework sub-sentences
• Pronoun resolution Query expansion had negligible effect
Next StepsNext Steps
Alternative Query Expansion• Error analysis of medical questions underway • Concept representation• Unified Medical Language System (UMLS)
Tune sentence selection strategy Lexical simplification
• Rework sub-sentences• Add basic pronoun resolution
Sentence Re-Ordering• Combine with lexical simplification
AcknowledgementsAcknowledgements
The organizers for running this conference and providing manual summaries
Previous DUC paper authors for making their system designs explicit
Monica Sanchez and Stephanie Haas for earlier discussions
Thom Hailey, Scott Krauss and Toshiba Burns-Johnson for annotating queries