INTERESTING NUGGETS AND THEIR IMPACT ON DEFINITIONAL QUESTION ANSWERING Kian-Wei Kor, Tat-Seng Chua Department of Computer Science School of Computing

INTERESTING NUGGETS AND THEIR IMPACT ON DEFINITIONAL QUESTION ANSWERING

Kian-Wei Kor, Tat-Seng Chua

Department of Computer Science School of Computing National University of Singapore

(SIGIR’07)

Speaker : Yi-Ling Tai

Date : 2009/07/28

OUTLINE Introduction

Definitional Question Answer Topic nuggets

Human Interest Model Web Resources Implementation

Initial Experiments Refinements

Weighting Interesting terms Selecting Web resources

Unifying Conclusion

DEFINITIONAL QUESTION ANSWERING

TREC Question Answering Track main task in 2003.

Given Topic X, “What is X?” 、” Who is X?”

The definitional QA system is to search through a corpus and return a set of answers best describes the topic.

Each answer should be a unique topic-specific nugget that makes up one facet in the definition of topic.

THE TWO ASPECTS OF TOPIC NUGGETS

Informative nuggets A sentence fragment that describes some factual

information Including topic properties, relationships with other

related entity, or events that happened to the topic

Interesting nuggets Trivia-like quality Typically, these can pique a human reader’s

interesting “What is X famous for?”, “What is extraordinary

about X?”

THE TWO ASPECTS OF TOPIC NUGGETS

TREC 2005 topic of “George Foreman” Informative nuggets

Was graduate of Job Corps. Became oldest world champion in boxing history.

Interesting nuggets Has lent his name to line of food preparation products. Waved American flag after winning 1968 Olympics

championship. Returned to boxing after 10 yr hiatus.

HUMAN INTEREST MODEL

To indentify sentences that a human reader would find interesting is a tall order.

Assume that most sentences within web documents will contain interesting facets about the topic.

The articles are written by humans, for human readers and thus contain the critical human world knowledge.

WEB RESOURCES

To build the “Interest Corpus” by collecting articles from the following external resources

Wikipedia We use a snapshot of Wikipedia taken in March 2006 Include the most relevant article

NewsLibrary For each topic, we download the 50 most relevant

articles and include the title and first paragraph Google Snippets

For each topic , extract the top 100 snippets.

WEB RESOURCES

Some are more specific in nature, and we do not always get any single relevant document

Biography.com Bartleby.com s9.com Google Definitions WordNet

MULTIPLE INTERESTING CENTROIDS Relevance-based approaches are focused on

identifying highly relevant sentences

The use of only a single collection of centroid words has over-emphasized topic relevance.

Perform a pairwise sentence comparison between Interest Corpus and candidate sentences retrieved from AQUAINT corpus.

An answer can only be highly ranked if it is strongly similar to a sentence in the Interest Corpus, and is also strongly relevant to the topic.

IMPLEMENTATION

IMPLEMENTATION

The AQUAINT Retrieval module Given a set of words describing the topic, the

AQUAINT Retrieval module does query expansion using Google and searches an index of AQUAINT documents to retrieve the 800 most relevant documents

Web Retrieval module searches the online resources to populate the

Interest Corpus.

IMPLEMENTATION

HIM Ranker First build the unigram language model, I, from

the collected web documents. This language model will be used to weight the

importance of terms within sentences. Segment all 800 retrieved documents into

individual sentences. Perform a pairwise similarity comparison

between a candidate sentence and sentences in our external documents using a weighted-term edit distance algorithm.

Select the top 12 highest ranked and non redundant sentences as definitional answers.

INITIAL EXPERIMENTS

Compare with soft-pattern bigram model

To ensure comparable results identical input data same web articles retrieved by our Web Retrieval

module rank the same set of candidate sentences

retrieved by our AQUAINT Retrieval module.

INITIAL EXPERIMENTS

TREC provides a list of vital and okay nuggets for each question topic.

Every question is scored on nugget recall (NR) and nugget precision (NP) and a single final score is computed using F-Measure.

INITIAL EXPERIMENTS

How well the Human Interest Model performs for different types of topics

Manually divided the TREC 2005 topics into four broad categories PERSON ORGANIZATION THING EVENT

REFINEMENTS Weighting Interesting Terms

Interesting nuggets often has a trivia-like quality we hypothesize that interesting nuggets are

likely to occur rarely in a text corpora. Three different term weighting schemes that can

provide more weight to low-frequency terms. TFIDF Kullback-Leiber divergence A : the AQUAINT corpus as a unigram language model

of general English I : Interest Corpus as a unigram language model

consisting of topic specific terms and general English terms

Jensen-Shannon divergence

REFINEMENTS

REFINEMENTS

TFIDF performed the worst, The reason is that most terms only appear once within each sentence, resulting in a term frequency of 1.

Both KL and JS divergence, we observed that high frequency relevant terms still dominate the top of the weighted term list.

SELECTING WEB RESOURCES

The quality of web resources included in the Interest Corpus may have a direct impact on the results.

split web resources into four major groups N - News: Title and first paragraph of the top 50

most relevant articles found in NewsLibrary W - Wikipedia: Text from the most relevant article

found in Wikipedia. S - Snippets: Snippets extracted from the top 100

most relevant links after querying Google. M-Miscellaneous sources

SELECTING WEB RESOURCES

UNIFYING INFORMATIVENESS WITHINTERESTINGNESS

From the perspective of a human reader, both informative and interesting nuggets are useful and definitional.

A good definitional question answering system should provide the reader with a combined mixture of both nugget types.

We had initially hoped to unify The Soft Pattern Bigram Model and The Human Interest Model.

However, none of the ensemble learning methods we attempted could outperform our Human Interest Model.

UNIFYING INFORMATIVENESS WITHINTERESTINGNESS The two models are disagreeing on which

sentences are definitional. In the top 10 sentences from both systems,

only 4.4% of these sentences appeared in both answer sets.

To verify if both systems are selecting the same answer nuggets randomly select a subset of 10 topics from the

TREC 2005 question set manually identified correct answer nuggets from

both systems the nugget agreement rate between both

systems was 16.6%. Definitions are indeed made up of a mixture

of informative and interesting nuggets

UNIFYING INFORMATIVENESS WITHINTERESTINGNESS

The best approach we found for combining both answer sets is to merge and rerank both answer sets with boosting agreements.

When both systems agree agree that the sentence is definitional, the sentence’s score is boosted

Using the approach described here, we achieve a F3 score of 0.3081

CONCLUSION

A novel perspective for answering definitional questions through the identification of interesting nuggets.

Using a combination of difference external corpus, we can build a definitional question answering module.

The inherent differences between both types of nuggets seemingly caused by the low agreement rates between both models have made combination a difficult task.

Documents

INTERESTING NUGGETS AND THEIR IMPACT ON DEFINITIONAL QUESTION ANSWERING Kian-Wei Kor, Tat-Seng Chua Department of Computer Science School of Computing