REFERENTIAL CHOICE: FACTORS AND MODELING

  • Upload
    sol

  • View
    33

  • Download
    0

Embed Size (px)

DESCRIPTION

REFERENTIAL CHOICE: FACTORS AND MODELING. Andrej A. Kibrik , Mariya V. Khudyakova, Grigoriy B. Dobrov , and Anastasia S. Linnik [email protected] Night Whites SPb February 28, 2014. Referential choice in discourse. - PowerPoint PPT Presentation

Citation preview

  • REFERENTIAL CHOICE:

    FACTORS AND MODELING Andrej A. Kibrik, Mariya V. Khudyakova, Grigoriy B. Dobrov, and Anastasia S. [email protected]

    Night Whites SPbFebruary 28, 2014

  • **Referential choice in discourse

    When a speaker needs to mention (or refer to) a specific, definite referent, s/he chooses between several options, including:Full noun phraseProper name (e.g. Peter)Description = common noun (with or without modifiers) (e.g. the tzar)Mix: Peter the GreatReduced NP, particularly a third person pronoun (e.g. he)

  • ExampleThe Victorian house that Ms. Johnson is inspecting has been deemed unsafe by town officials. But she asks a workman toting the bricks from the lawn to give her a boost through an open first-floor window. Once inside, she spends nearly four hours measuring and diagramming each room in the 80-year-old house, gathering enough information to estimate what it would cost to rebuild it. She snaps photos of the buckled floors and the plaster that has fallen away from the walls. DescriptionProper namePronounZero

  • Research questionHow is referential choice made?

  • Why is this question important?Reference is among the most basic cognitive operations performed by language usersReference constitutes a lions share of all information in natural communicationConsider text manipulation according to the method of Biber et al. 1999: 230-232

  • Referential expressions marked in greenThe Victorian house that Ms. Johnson is inspecting has been deemed unsafe by town officials. But she asks a workman toting the bricks from the lawn to give her a boost through an open first-floor window.

  • Referential expressions removedThe Victorian house that Ms. Johnson is inspecting has been deemed unsafe by town officials. But she asks a workman toting the bricks from the lawn to give her a boost through an open first-floor window.

  • Referential expressions keptThe Victorian house that Ms. Johnson is inspecting has been deemed unsafe by town officials. But she asks a workman toting the bricks from the lawn to give her a boost through an open first-floor window.

  • Types of referential devices: levels of granularityWe mostly concentrate on the two upper levelsin this hierarchy

    REG tradition: most attention to varieties of descriptive full NPs

  • **Multi-factorial character of referential choiceMultiple factors of referential choiceDistance to antecedentAlong the linear discourse structure (Givn)Along the hierarchical discourse structure (Fox, Kibrik)Antecedent role (Centering theory)Referent animacy (Dahl)Protagonisthood (Grimes)......................................... Properties of the discourse contextProperties of the referent

  • Cognitive multi-factorial model of referential choice Discourse contextReferent activation in working memoryReferents propertiesReferential choiceFactors of referential choice

  • Rhetorical distanceDistance along the hierarchical discourse structure betweenthe current point in discourse, where referential choice is to be madethe antecedentMeasured in elementary discourse unitsroughly equaling clausesRhetorical structure theory by Mann and Thompson (RST)Very important factorRST Discourse Treebank corpus (Marcu et al.)

  • Example of a rhetorical graph from RST Discourse Treebank

  • RefRhet and MoRARST Discourse Treebank + our annotation = RefRhet corpusSubcorpus RefRhet 3 (2013-2014)Annotation scheme MoRA (Moscow Referential Annotation)

  • RefRhet 364 texts6294 markables1852 anaphor-antecedent pairs475 pronouns1377 full NPs706 descriptions671 proper names

  • Candidate factors of ref. choiceSome values are drawn from MoRA annotationSome other are computed automaticallyFactor-predicted variableDiscourse context

  • Windows of the MMAX2 program

  • Some properties of the MoRA schemeWide range of activation factors and their valuesE.g. multiple values of the grammatical role factorAnnotation of groupscomplex markables serving as antecedentsand-coordinateor-coordinateprepositional (children with their parents)discontinuous

  • A discontinuous group

  • Tasks for machine learningCandidate factors:All potential parameters implemented in corpus annotationFactor-predicted variable: Form of referential expression (np_form)Two-way task: Full NP vs. pronounThree-way task:Definite description vs. proper name vs. pronounAccuracy maximization: Ratio of correct predictions to the overall number of instances

  • **Machine learning methods (Weka, a data mining system)Logical algorithms Decision trees (C4.5)Decision rules (JRip)Logistic regressionCompositionsBoostingBaggingQuality control the cross-validation method

  • Results of machine learning on RefRhet 3 and MoRA

  • Non-categorical referential choice (Kibrik 1999)min Referent activation maxCognitive plane: graded variableLinguistic plane: binary variable

  • Non-categorical referential choiceIn many instances, more than one referential options can be usedReferential choice is less than fully categorical (cf. Belz & Varges 2007, van Deemter et al. 2012: 173179)In the intermediate activation instances both the original text author and the algorithm: more or less randomly make a categorical decision at the linguistic planethose decisions do not have to always coincideTherefore, no model can predict the actual referential choice with 100% accuracy

  • Experiment: Understanding (allegedly non-categorical) referential expressions9 texts, in which the algorithms have diverged in their prediction from the original referential choice9 original texts (proper name) and 9 altered texts (pronoun) distributed between 2 experimental lists 60 participants1 experimental question + 2 control questionIf the instances of divergence are explained by intermediate referent activation, the accuracy in experimental questions should not be lower than the accuracy in control questions*

  • Control questions 84%Questions to proper names 84%Questions to pronouns 75%If we exclude questions #2 and #5, then the accuracy for questions to pronouns is 80%, not differing significantly from control and PN questionsIn general, the algorithm diverges from the original in the places where that is acceptable, that is, referent activation is intermediate

    Experiment: results*

  • Non-categorical referential choiceSometimes referential choice allows more than one optionA proper model of referential choice must account for this property of human speakersOur modeling procedures actually conform to this requirement

  • Further studiesExplore logistic regressions ability to evaluate the certainty of predictionand attempt to correlate that with the humans assessment of non-categorical referential choiceas well as with the theoretical notion of intermediate referent activationCheap data modelingSecondary referential options, such as demonstrative descriptionsGenres and referential choice

  • ConclusionsMulti-factorial approachCorpus large enough for machine-learning modelingResults of prediction close to theoretical maximumAccount of the non-deterministic character of referential choiceThis approach can be applied to a wide range of other linguistic choices

  • Thank youfor your attention

    *****