Building a Bank of Semantically Encoded ?· Building a Bank of Semantically Encoded Narratives David…

  • View
    212

  • Download
    0

Embed Size (px)

Transcript

  • Building a Bank of Semantically Encoded Narratives

    David K. Elson, Kathleen R. McKeown

    Columbia University, New York City{delson, kathy}@cs.columbia.edu

    AbstractWe propose a methodology for a novel type of discourse annotation whose model is tuned to the analysis of a text as narrative. Thisis intended to be the basis of a story bank resource that would facilitate the automatic analysis of narrative structure and content.The methodology calls for annotators to construct propositions that approximate a reference text, by selecting predicates and argumentsfrom among controlled vocabularies drawn from resources such as WordNet and VerbNet. Annotators then integrate the propositionsinto a conceptual graph that maps out the entire discourse; the edges represent temporal, causal and other relationships at the level ofstory content. Because annotators must identify the recurring objects and themes that appear in the text, they also perform coreferenceresolution and word sense disambiguation as they encode propositions. We describe a collection experiment and a method for determininginter-annotator agreement when multiple annotators encode the same short story. Finally, we describe ongoing work toward extendingthe method to integrate the annotators interpretations of character agency (the goals, plans and beliefs that are relevant, yet not explictlystated in the text).

    1. IntroductionThis paper describes a methodology for constructing anovel type of language resource consisting of a corpus ofannotated textual stories. The annotation takes the form ofa single interconnected graph that encodes the entire dis-course. Nodes in the graph represent spans of the refer-ence text, objects in the story, and propositions (predicate-argument structures) that convey actions, statives, and mod-ifiers. All of the symbols are linked to formal ontolo-gies. Nodes are connected by arcs that represent textualconnections such as causality and motivation. In addi-tion, we describe an initial collection experiment and amethod for measuring inter-annotator agreement. We detailin (Elson and McKeown, 2009) an annotation tool calledSCHEHERAZADE that elicits such story graphs from non-expert annotators. In this paper, we focus on the procedurefor encoding a story graph from a reference text.We have been motivated to develop this annotation schemeby the lack of an existing language resource that covers allthese facets of a text. For example, large-scale annotationprojects have focused on identifying predicates and argu-ments at the sentence level (Kingsbury and Palmer, 2002)or the connections between sentences at the discourse level(Prasad et al., 2008; Carlson et al., 2001). Temporal andmodal properties of a corpus have been separately mod-eled (Pustejovsky et al., 2003). We are interested in a moreholistic view of a story, so that the interplay of many fea-tures can be understood our model involves word sensedisambiguation, coreference, interpretation of tense and as-pect into a symbolic view of time and modality, and theencoding of actions, statives and modifiers into proposi-tional structures. In addition, we are currently extendingthe model to include crucial elements of character agency(goals, plans and beliefs) even when they are not stated inthe text. We believe that a corpus of narratives annotatedinto story graphs (a story bank) would galvanize workin this area of text understanding, and allow us to developthe tools that reveal how stories work: the structure thatthe narrator chooses, the roles that characters take on, thesimilarities between stories and across genres, and so on.

    In the following sections, we describe our method for syn-thesizing several modes of annotation into a single proce-dure in which annotators build a structure that reflects thecontent of a reference text. Section 2 reviews our formalrepresentation. We outline the three stages of annotationin Section 3: extraction of objects and themes, construc-tion of predicate-argument structures, and assembly into asingle interconnected graph. In Section 4, we describe acollection experiment and a method for calculating inter-annotator agreement that measures the similarity betweentwo story graphs. Section 5 describes ongoing work ineliciting implied information about character agency, whichwas not included in the collection experiment. Finally, wereview related work and conclude.

    2. Story graph approachFollowing theorists such as (Bal, 1997), we define a nar-rative discourse as one which refers to events that tran-spire in the world of the story, ordered with temporal re-lations and linked by threads of causality and motivation.Our methodology guides annotators toward encoding theunderlying events present in a story, rather than the lex-ical choice or other elements of the storys realization intext. In an earlier paper (Elson and McKeown, 2007), weoutline our method of encoding these events with a combi-nation of propositions (predicate-argument structures) andconceptual graphs. While this paper focuses on the encod-ing process from the annotators perspective, this sectionsummarizes our approach to story representation.Nodes in the conceptual graph represent actions, statives,modifiers, time states, spans of reference text and meta-data such as the characters, locations and props found inthe text. Action, stative and modifier nodes contain defini-tional propositions. The predicate-argument structures areadapted from the VerbNet lexical database (Kipper et al.,2006) while a taxonomy of nouns and adjectival modifiersare supplied by WordNet (Fellbaum, 1998). The arcs ofthe conceptual graph include temporal relationships (whichactions take place at which time states or intervals), di-rect causal relationships (which travel from an action to its

    2068

  • State1 State2

    crow1=newfox1=newtree1=newcheese1=newpiece()

    ObjectExtrac,ons

    Timeline

    sit(crow1,branch(tree1))in(cheese1,beak(crow1))

    beginsAt

    Ac,onsandsta,ves

    observe(fox1,crow1)

    beginsAt

    ACrowwassi9ngonabranchofatreewithapieceofcheeseinherbeakwhenaFoxobservedher.

    ReferenceText

    Figure 1: Schematic of a portion of a story graph.

    direct consequence actions, if any) and equivalence rela-tions between spans of reference text and their correspond-ing propositions. Not all propositions need to describe ac-tual story events; actions, statives and even entire alternatetimelines can be set to modalities including the obliga-tions, beliefs and fears of characters. For example, a char-acter can hope that something will not happen in the futureor has not happened in the past. Figure 1 illustrates a por-tion of a story graph schematic, showing the first sentenceof a reference text, as well as objects, statives, actions andpoints in time identified in the text by an annotator.The reference text we will use in this paper is one of thefables attributed to Aesop. We have developed our method-ology using Aesops fables as a development corpus, butour knowledge sources are not tuned to these stories or theirdomain. In this paper, we will discuss a fable titled The Foxand the Crow, which has been featured in other story repre-sentation schemes (Ryan, 1991), and is reproduced below:

    A Crow was sitting on a branch of a tree with apiece of cheese in her beak when a Fox observedher and set his wits to work to discover some wayof getting the cheese. Coming and standing underthe tree he looked up and said, What a noble birdI see above me! Her beauty is without equal, thehue of her plumage exquisite. If only her voice isas sweet as her looks are fair, she ought withoutdoubt to be Queen of the Birds. The Crow washugely flattered by this, and just to show the Foxthat she could sing she gave a loud caw. Downcame the cheese, of course, and the Fox, snatch-ing it up, said, You have a voice, madam, I see:what you want is wits.

    One important aspect of the story graph is that it is a staticmodel of the entire story, without a beginning or end.The telling of the story that is, the ordering of proposi-tions that constitutes a narration of the graph from a be-ginning to an ending is given through the arcs that theannotator provides between the spans of reference text andtheir related proposition(s). As one progresses through thereference text in a story graph from start to finish, one willencounter arcs to propositional nodes attached to the sym-bolic timeline. Many narratives, including Aesops, revealthe actions in their timelines in a nonlinear order (i.e., withjumps to the past or to the future and back again). Whilethis fable is strictly linear, the foxs dialogue refers to a hy-pothetical future.

    2.1. Encoding toolWe have implemented and publicly released a graphical an-notation tool that facilitates the process of encoding a refer-ence text into a story graph. Figure 2 shows the main screenas configured for annotating The Fox and the Crow. Themethodology we describe below takes place in the contextof the buttons and menus offered by this tool. The centralpanel shows a timeline with states (called Story Points)arranged in a linear graph. Alternate timelines can also beaccessed from this screen. The top-right and bottom-rightpanels contain the reference text and a feedback text, re-spectively. The latter is an automatically generated realiza-tion of the story graph that uses syntactic frames from Verb-Net and a model of tense and aspect assignment. It facil-itates the encoding process by giving annotators a natural-language equivalent of the story graph which they can com-pare to the reference text at every step.

    3. Annotation ProcedureThe first task for an annotator is to read the text and fullycomprehend it. As a story graph is a discourse unit ratherthan a sentential unit, it is important for the annotator totake a holistic view of the text before beginning the encod-ing process.The method for creati