Upload
kevin-jacobs
View
214
Download
0
Tags:
Embed Size (px)
Citation preview
INTERESTING NUGGETS AND THEIR IMPACT ON DEFINITIONAL QUESTION ANSWERING
Kian-Wei Kor, Tat-Seng Chua
Department of Computer Science School of Computing National University of Singapore
(SIGIR’07)
Speaker : Yi-Ling Tai
Date : 2009/07/28
OUTLINE Introduction
Definitional Question Answer Topic nuggets
Human Interest Model Web Resources Implementation
Initial Experiments Refinements
Weighting Interesting terms Selecting Web resources
Unifying Conclusion
DEFINITIONAL QUESTION ANSWERING
TREC Question Answering Track main task in 2003.
Given Topic X, “What is X?” 、” Who is X?”
The definitional QA system is to search through a corpus and return a set of answers best describes the topic.
Each answer should be a unique topic-specific nugget that makes up one facet in the definition of topic.
THE TWO ASPECTS OF TOPIC NUGGETS
Informative nuggets A sentence fragment that describes some factual
information Including topic properties, relationships with other
related entity, or events that happened to the topic
Interesting nuggets Trivia-like quality Typically, these can pique a human reader’s
interesting “What is X famous for?”, “What is extraordinary
about X?”
THE TWO ASPECTS OF TOPIC NUGGETS
TREC 2005 topic of “George Foreman” Informative nuggets
Was graduate of Job Corps. Became oldest world champion in boxing history.
Interesting nuggets Has lent his name to line of food preparation products. Waved American flag after winning 1968 Olympics
championship. Returned to boxing after 10 yr hiatus.
HUMAN INTEREST MODEL
To indentify sentences that a human reader would find interesting is a tall order.
Assume that most sentences within web documents will contain interesting facets about the topic.
The articles are written by humans, for human readers and thus contain the critical human world knowledge.
WEB RESOURCES
To build the “Interest Corpus” by collecting articles from the following external resources
Wikipedia We use a snapshot of Wikipedia taken in March 2006 Include the most relevant article
NewsLibrary For each topic, we download the 50 most relevant
articles and include the title and first paragraph Google Snippets
For each topic , extract the top 100 snippets.
WEB RESOURCES
Some are more specific in nature, and we do not always get any single relevant document
Biography.com Bartleby.com s9.com Google Definitions WordNet
MULTIPLE INTERESTING CENTROIDS Relevance-based approaches are focused on
identifying highly relevant sentences
The use of only a single collection of centroid words has over-emphasized topic relevance.
Perform a pairwise sentence comparison between Interest Corpus and candidate sentences retrieved from AQUAINT corpus.
An answer can only be highly ranked if it is strongly similar to a sentence in the Interest Corpus, and is also strongly relevant to the topic.
IMPLEMENTATION
IMPLEMENTATION
The AQUAINT Retrieval module Given a set of words describing the topic, the
AQUAINT Retrieval module does query expansion using Google and searches an index of AQUAINT documents to retrieve the 800 most relevant documents
Web Retrieval module searches the online resources to populate the
Interest Corpus.
IMPLEMENTATION
HIM Ranker First build the unigram language model, I, from
the collected web documents. This language model will be used to weight the
importance of terms within sentences. Segment all 800 retrieved documents into
individual sentences. Perform a pairwise similarity comparison
between a candidate sentence and sentences in our external documents using a weighted-term edit distance algorithm.
Select the top 12 highest ranked and non redundant sentences as definitional answers.
INITIAL EXPERIMENTS
Compare with soft-pattern bigram model
To ensure comparable results identical input data same web articles retrieved by our Web Retrieval
module rank the same set of candidate sentences
retrieved by our AQUAINT Retrieval module.
INITIAL EXPERIMENTS
TREC provides a list of vital and okay nuggets for each question topic.
Every question is scored on nugget recall (NR) and nugget precision (NP) and a single final score is computed using F-Measure.
INITIAL EXPERIMENTS
How well the Human Interest Model performs for different types of topics
Manually divided the TREC 2005 topics into four broad categories PERSON ORGANIZATION THING EVENT
REFINEMENTS Weighting Interesting Terms
Interesting nuggets often has a trivia-like quality we hypothesize that interesting nuggets are
likely to occur rarely in a text corpora. Three different term weighting schemes that can
provide more weight to low-frequency terms. TFIDF Kullback-Leiber divergence A : the AQUAINT corpus as a unigram language model
of general English I : Interest Corpus as a unigram language model
consisting of topic specific terms and general English terms
Jensen-Shannon divergence
REFINEMENTS
REFINEMENTS
TFIDF performed the worst, The reason is that most terms only appear once within each sentence, resulting in a term frequency of 1.
Both KL and JS divergence, we observed that high frequency relevant terms still dominate the top of the weighted term list.
SELECTING WEB RESOURCES
The quality of web resources included in the Interest Corpus may have a direct impact on the results.
split web resources into four major groups N - News: Title and first paragraph of the top 50
most relevant articles found in NewsLibrary W - Wikipedia: Text from the most relevant article
found in Wikipedia. S - Snippets: Snippets extracted from the top 100
most relevant links after querying Google. M-Miscellaneous sources
SELECTING WEB RESOURCES
UNIFYING INFORMATIVENESS WITHINTERESTINGNESS
From the perspective of a human reader, both informative and interesting nuggets are useful and definitional.
A good definitional question answering system should provide the reader with a combined mixture of both nugget types.
We had initially hoped to unify The Soft Pattern Bigram Model and The Human Interest Model.
However, none of the ensemble learning methods we attempted could outperform our Human Interest Model.
UNIFYING INFORMATIVENESS WITHINTERESTINGNESS The two models are disagreeing on which
sentences are definitional. In the top 10 sentences from both systems,
only 4.4% of these sentences appeared in both answer sets.
To verify if both systems are selecting the same answer nuggets randomly select a subset of 10 topics from the
TREC 2005 question set manually identified correct answer nuggets from
both systems the nugget agreement rate between both
systems was 16.6%. Definitions are indeed made up of a mixture
of informative and interesting nuggets
UNIFYING INFORMATIVENESS WITHINTERESTINGNESS
The best approach we found for combining both answer sets is to merge and rerank both answer sets with boosting agreements.
When both systems agree agree that the sentence is definitional, the sentence’s score is boosted
Using the approach described here, we achieve a F3 score of 0.3081
CONCLUSION
A novel perspective for answering definitional questions through the identification of interesting nuggets.
Using a combination of difference external corpus, we can build a definitional question answering module.
The inherent differences between both types of nuggets seemingly caused by the low agreement rates between both models have made combination a difficult task.