Upload
reginald-pierce
View
221
Download
0
Embed Size (px)
Citation preview
Third International Conference on Language Resources and Evaluation, Las Palmas, Canary Islands, Spain, 29-31 May 2002
Columbia University
Catalogued recommended information from 5 prescriptive guidelines for A.B.E.’s
Using the Annotated Bibliography as a Resource for Indicative Summarization
Min-Yen Kan*, Judith L. Klavans** and Kathleen R. McKeown*{min, judith, kathy}@cs.columbia.edu
1. Extract versus Abstract2. Informative versus Indicative
3. Generic versus Query biased4. Single document versus Multiple
Selected Summary Dimensions
News Summaries
Scientific Summaries
Snippets
Card Catalog Entries
Annotated Bibliography Entries Corpus Collection & Encoding
Corpus AvailabilityThe corpus is available for academic and not-for-profit research, by request to:
An annotation guide, explaining the annotation tagging guidelines in more detail, is also available. Command-line and web .CGI utilities are also provided to modify, insert and extract attributes from the corpus.
* Departmentof ComputerScience
** Center for Research on Information Access
<bibEntry id="id26" title="Analysis of covariance" url="http://www.math.yorku.ca/SCS/biblio.html" type="paper" domain="statistics“ microCollection="Analysis of Covariance" offset="4">
<beforeContext>Maxwell, S. E., Delaney, H. D., & O'Callaghan, M. F. (1993). Analysis of...</beforeContext>
<entry><OVERVIEW>This <MEDIATYPES>paper </MEDIATYPES>gives a brief history of ANCOVA, and then discusses ANCOVA in ... contains no matrix algebra.</DIFFICULTY></entry>
<parsedEntry>PROB 14659 -112.252 0 TOP -112.252 S -105.049 NP-A -8.12201 NPB -7.82967 DT 0 This NN 0 paper ...</parsedEntry>
</bibEntry>
Our language resource of annotated bibliography entries was designed to ease the collection of the corpus as well as to make many features available for subsequent analysis for summarization and related natural language applications.
Presently:
- 1200 documents containing “annotated bibliography” were spidered - of those, 64 documents were hand parsed yield 2000 entries - of those 2000, 100 of the parsed <entry> tags were further annotated with semantic tags
<beforeContext>: the text before the body of the entry
the subject or theme location of the source document
coarser granularity than title
the position of the entry on the page
Other fields, also optional:
- <afterContext>: text that is distinctly marked off as coming after the entry - <macroCollection>: the division that the page represents in the set of related pages
the internal division in the page that this entry belongs to
<entry>: the text with the 24 semantic tags
<parsedEntry>: Collins’ 96 parse of the entry
Annotated Bibliography Entries are indicative summaries. - longer than both card catalog summaries and snippets
- organized around a theme; ideal standard for ``query-based'' summaries
- have explicit comparisons of one resource versus another
- have prefacing overviews of the documents in the bibliography.
- rich in meta-information features.
We study them as models for summaries, by examining prescriptive guidelines and performing a corpus study
Media Type 55 48%Author/Editor 43 27%Content Types/Special Feature x x 41 29%Subjective Assess/Coverage x x x x 36 24%Authority/Authoritativeness x x x 26 20%Background/Source 21 16%Navigation/Internal Structure x 16 11%Collection Size 13 10%Purpose x x x 13 10%Audience x x x x 12 12%Contributor 12 12%Cross-resource comparison x 10 9%Size/Length 9 7%Style 8 6%Query Relevance x x 4 3%Readability 4 3%Difficulty 4 4%Edition/Publication Information 3 3%Language 2 2%Copyright 2 1%Award/Quality/Defects x x x 2 1%
Detail 139 47%Overview 72 64%Topic 34 28%
Topicality Features
Prescriptive Guidelines Corpus Study
# tags in corpus % entries with tagRee70 EBC98 Les01 AACC98 Wil02
Metadata and Other Features
x x
consist of structured fields, of which a summary is an optional field. Other types of information (such as notes, or book jacket texts, or book reviews) are often substituted for summaries.
are short indicative descriptions given by authors of web pages. Often very short, (e.g. Yahoo! or ODP category pages). Amitay (2000) shows strategies for locating and extracting snippets and how to rank different ones for fitness as a summary.
There have been a number of studies using abstracts of scientific articles as a target summary (e.g., Kupiec et al 1995). Abstracts tend to summarize the document's topics well but do not include much use of metadata.
DUC provides a large corpus for informative summaries. Jing and McKeown (1999) use source document and target summary relations for ``cut and paste'' summarization.
Abstract Both Both Mostly Single Yes Corpus
Both Informative Generic Both Yes Corpus
Abstract Indicative Both Single Yes AlgorithmAbstract Indicative Generic Single Yes Corpus
Abstract Informative Generic Single No Corpus Mostly Extract Informative Generic Single No Corpus
Extract vs. Indicative vs. Generic vs. Single vs. Uses Corpus vs.Abstract Informative Query-based Multidocument Metadata? Algorithm
Scientific AbstractsSnippetsCard Catalog Entries
Ziff DavisDUC
Annotated Bibliography Entries
Corpus
Performed study of 100 entries (see right)