48
Classification of Classification of Discourse Coherence Discourse Coherence Relations: An Relations: An Exploratory Study Exploratory Study using Multiple using Multiple Knowledge Sources Knowledge Sources Ben Wellner Ben Wellner *, James *, James Pustejovsky Pustejovsky , Catherine , Catherine Havasi Havasi , Anna Rumshisky , Anna Rumshisky and Roser Saurí and Roser Saurí mitre Brandeis University Brandeis University * The MITRE Corporation * The MITRE Corporation

Classification of Discourse Coherence Relations: An Exploratory Study using Multiple Knowledge Sources Ben Wellner † *, James Pustejovsky †, Catherine

  • View
    220

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Classification of Discourse Coherence Relations: An Exploratory Study using Multiple Knowledge Sources Ben Wellner † *, James Pustejovsky †, Catherine

Classification of Classification of Discourse Coherence Discourse Coherence

Relations: An Relations: An Exploratory Study using Exploratory Study using

Multiple Knowledge Multiple Knowledge SourcesSources

Ben WellnerBen Wellner††*, James *, James PustejovskyPustejovsky††, Catherine , Catherine

HavasiHavasi††, Anna Rumshisky, Anna Rumshisky†† and Roser Sauríand Roser Saur톆

mitre† † Brandeis UniversityBrandeis University

* The MITRE Corporation* The MITRE Corporation

Page 2: Classification of Discourse Coherence Relations: An Exploratory Study using Multiple Knowledge Sources Ben Wellner † *, James Pustejovsky †, Catherine

Outline of TalkOutline of Talk Overview and Motivation for Modeling DiscourseOverview and Motivation for Modeling Discourse BackgroundBackground ObjectivesObjectives The Discourse GraphBankThe Discourse GraphBank

OverviewOverview Coherence RelationsCoherence Relations Issues with the GraphBankIssues with the GraphBank

Modeling DiscourseModeling Discourse Machine learning approachMachine learning approach

Knowledge Sources and FeaturesKnowledge Sources and Features Experiments and AnalysisExperiments and Analysis Conclusions and Future WorkConclusions and Future Work

Page 3: Classification of Discourse Coherence Relations: An Exploratory Study using Multiple Knowledge Sources Ben Wellner † *, James Pustejovsky †, Catherine

Modeling Discourse: Modeling Discourse: MotivationMotivation

Why model discourse?Why model discourse? DialogueDialogue General text understanding applicationsGeneral text understanding applications

Text summarization and generationText summarization and generation

Information extraction Information extraction MUC Scenario Template TaskMUC Scenario Template Task

Discourse is vital for understanding how Discourse is vital for understanding how events are relatedevents are related

Modeling discourse Modeling discourse generallygenerally may aid may aid specific extraction tasksspecific extraction tasks

Page 4: Classification of Discourse Coherence Relations: An Exploratory Study using Multiple Knowledge Sources Ben Wellner † *, James Pustejovsky †, Catherine

BackgroundBackground DifferentDifferent approaches to discourse approaches to discourse

Semantics/formalisms: Hobbs [1985], Mann and Semantics/formalisms: Hobbs [1985], Mann and Thomson[1987], Grosz and Sidner[1986], Asher [1993], Thomson[1987], Grosz and Sidner[1986], Asher [1993], othersothers

DifferentDifferent objectives objectives Informational vs. intentional, dialog vs. general textInformational vs. intentional, dialog vs. general text

DifferentDifferent inventories of discourse relations inventories of discourse relations Coarse vs. fine-grainedCoarse vs. fine-grained

DifferentDifferent representations representations Tree representation vs. GraphTree representation vs. Graph

SameSame steps involved: steps involved: 1. Identifying discourse segments1. Identifying discourse segments 2. Grouping discourse segments into sequences2. Grouping discourse segments into sequences 3. Identifying the presence of a relation3. Identifying the presence of a relation 4. Identifying the type of the relation4. Identifying the type of the relation

Page 5: Classification of Discourse Coherence Relations: An Exploratory Study using Multiple Knowledge Sources Ben Wellner † *, James Pustejovsky †, Catherine

Discourse Steps #1*Discourse Steps #1*

Mary is in a bad mood because Fred played tuba while she was taking a nap.

A

B C

A B C

r1

r2

1. Segment:

2. Group

4. Relation Type: r1 = cause-effectr2 = elaboration

* Example from [Danlos 2004]

3. Connect segments

Page 6: Classification of Discourse Coherence Relations: An Exploratory Study using Multiple Knowledge Sources Ben Wellner † *, James Pustejovsky †, Catherine

Discourse Steps #2*Discourse Steps #2*

Fred played the tuba. Next he prepared a pizza to please Mary.

A B C

A B C

r1 r2

1. Segment:

4. Relation Type: r1 = temporal precedencer2 = cause-effect

* Example from [Danlos 2004]

2. Group3. Connect segments

Page 7: Classification of Discourse Coherence Relations: An Exploratory Study using Multiple Knowledge Sources Ben Wellner † *, James Pustejovsky †, Catherine

ObjectivesObjectives

Our Main Focus: Step 4 - classifying discourse Our Main Focus: Step 4 - classifying discourse relationsrelations Important for all approaches to discourseImportant for all approaches to discourse Can be approached independently of representationCan be approached independently of representation

But – relation types and structure are probably quite But – relation types and structure are probably quite dependentdependent

Task will vary with inventory of relation typesTask will vary with inventory of relation types What types of knowledge/features are What types of knowledge/features are

important for this taskimportant for this task Can we apply the same approach to Step 3:Can we apply the same approach to Step 3:

Identifying whether two segment groups are linkedIdentifying whether two segment groups are linked

Page 8: Classification of Discourse Coherence Relations: An Exploratory Study using Multiple Knowledge Sources Ben Wellner † *, James Pustejovsky †, Catherine

Discourse GraphBank: Discourse GraphBank: OverviewOverview

Graph-based representation of discourseGraph-based representation of discourse Tree-representation inadequate: multiple parents, crossing Tree-representation inadequate: multiple parents, crossing

dependenciesdependencies Discourse composed of clausal segmentsDiscourse composed of clausal segments

Segments can be grouped into sequencesSegments can be grouped into sequences Relations need not exist between segments within a groupRelations need not exist between segments within a group

Coherence relations between segment groupsCoherence relations between segment groups Roughly those of Hobbs Roughly those of Hobbs [1985][1985]

Why GraphBank?Why GraphBank? Similar inventory of relations as SDRTSimilar inventory of relations as SDRT

Linked to lexical representationsLinked to lexical representations Semantics well-developedSemantics well-developed

Includes non-local discourse linksIncludes non-local discourse links Existing annotated corpus, unexplored outside of Existing annotated corpus, unexplored outside of [Wolf and [Wolf and

Gibson, 2005]Gibson, 2005]

[Wolf and Gibson, 2005]

Page 9: Classification of Discourse Coherence Relations: An Exploratory Study using Multiple Knowledge Sources Ben Wellner † *, James Pustejovsky †, Catherine

Resemblance RelationsResemblance RelationsSimilarity:(parallel)

Contrast:

Example:

Elaboration*:

Generalization:

The first flight to Frankfurt this morning was delayed.The second flight arrived late as well.

The first flight to Frankfurt this morning was delayed.The second flight arrived on time.

A probe to Mars was launched from the Ukraine this week.The European-built “Mars Express” is scheduled to reach Mars by Dec.

There have been many previous missions to Mars.A famous example is the Pathfinder mission.

Two missions to Mars in 1999 failed.There are many missions to Mars that have failed.

* The elaboration relation is given one or more sub-types:organization, person, location, time, number, detail

Page 10: Classification of Discourse Coherence Relations: An Exploratory Study using Multiple Knowledge Sources Ben Wellner † *, James Pustejovsky †, Catherine

Causal, Temporal and Causal, Temporal and Attribution RelationsAttribution Relations

Cause-effect:

Conditional:

Violated Expectation:

Precedence:

Attribution:

There was bad weather at the airportand so our flight got delayed

If the new software works,everyone should be happy.

The new software worked great,but nobody was happy.

First, John went grocery shopping.Then, he disappeared into a liquor store.

John said thatthe weather would be nice tomorrow.

Same: The economy, according to analysts, is expected to improve by early next year.

Causal

Temporal

Attribution

Page 11: Classification of Discourse Coherence Relations: An Exploratory Study using Multiple Knowledge Sources Ben Wellner † *, James Pustejovsky †, Catherine

Some Issues with Some Issues with GraphBankGraphBank

Coherence relationsCoherence relations Conflation of actual causation and Conflation of actual causation and

intention/purposeintention/purpose

GranularityGranularity Desirable for relations hold between eventualities Desirable for relations hold between eventualities

or entities, not necessarily entire clausal segments:or entities, not necessarily entire clausal segments:

The university spent $30,000 to upgrade lab equipment in 1987

cause

the new policy came about after President Reagan’s historic decision in mid-Decemberto reverse the policy of refusing to deal with members of the organization,

long shunned as a band of terrorists. Reagan said PLO chairman Yasser Arafat had met US demands.

elaboration

?? John pushed the door to open it.

cause

Page 12: Classification of Discourse Coherence Relations: An Exploratory Study using Multiple Knowledge Sources Ben Wellner † *, James Pustejovsky †, Catherine

A Classifier-based ApproachA Classifier-based Approach

For each pair of discourse segments, For each pair of discourse segments, classifyclassify relation type between themrelation type between them For segment pairs on which we know a relation existsFor segment pairs on which we know a relation exists

AdvantagesAdvantages Include arbitrary knowledge sources as featuresInclude arbitrary knowledge sources as features Easier than implementing inference on top of semantic Easier than implementing inference on top of semantic

interpretationsinterpretations Robust performanceRobust performance Gain insight into how different knowledge sources Gain insight into how different knowledge sources

contributecontribute DisadvantagesDisadvantages

Difficult to determine why mistakes happenDifficult to determine why mistakes happen Maximum EntropyMaximum Entropy

Commonly used discriminative classifierCommonly used discriminative classifier Allows for a high-number of non-independent featuresAllows for a high-number of non-independent features

Page 13: Classification of Discourse Coherence Relations: An Exploratory Study using Multiple Knowledge Sources Ben Wellner † *, James Pustejovsky †, Catherine

Knowledge SourcesKnowledge Sources

Knowledge Sources:Knowledge Sources: ProximityProximity Cue WordsCue Words Lexical SimilarityLexical Similarity EventsEvents Modality and Subordinating RelationsModality and Subordinating Relations Grammatical RelationsGrammatical Relations Temporal relationsTemporal relations

Associate with each knowledge sourceAssociate with each knowledge source One or more Feature ClassesOne or more Feature Classes

Page 14: Classification of Discourse Coherence Relations: An Exploratory Study using Multiple Knowledge Sources Ben Wellner † *, James Pustejovsky †, Catherine

ExampleExampleSEG2: The university spent $30000SEG1: to upgrade lab equipment in 1987

Page 15: Classification of Discourse Coherence Relations: An Exploratory Study using Multiple Knowledge Sources Ben Wellner † *, James Pustejovsky †, Catherine

ProximityProximity

MotivationMotivation Some relations tend to be local – i.e. Their Some relations tend to be local – i.e. Their

arguments appear nearby in the textarguments appear nearby in the text Attribution, cause-effect, temporal precedence, Attribution, cause-effect, temporal precedence,

violated expectationviolated expectation Other relations can span larger portions of textOther relations can span larger portions of text

ElaborationElaboration Similar, contrastSimilar, contrast

Proximity:- Whether segments are adjacent or not- Directionality (which argument appears earlier in the text)- Number of intervening segments

Feature Class

Page 16: Classification of Discourse Coherence Relations: An Exploratory Study using Multiple Knowledge Sources Ben Wellner † *, James Pustejovsky †, Catherine

ExampleExampleSEG2: The university spent $30000SEG1: to upgrade lab equipment in 1987

Fea. Fea. ClassClass

Example FeatureExample Feature

ProximityProximity adjacent; dist<3; dist<5; direction-reverse; same-adjacent; dist<3; dist<5; direction-reverse; same-sentencesentence

Page 17: Classification of Discourse Coherence Relations: An Exploratory Study using Multiple Knowledge Sources Ben Wellner † *, James Pustejovsky †, Catherine

Cue WordsCue Words Motivation:Motivation:

Many coherence relations are frequently identified by a Many coherence relations are frequently identified by a discourse cue word or phrase: “therefore”, “but”, “in contrast”discourse cue word or phrase: “therefore”, “but”, “in contrast”

Cues are generally captured by the first word in a Cues are generally captured by the first word in a segmentsegment Obviates enumerating all potential cue wordsObviates enumerating all potential cue words Non-traditional discourse markers (e.g. adverbials or even Non-traditional discourse markers (e.g. adverbials or even

determiners) may indicate a preference for certain relation determiners) may indicate a preference for certain relation typestypes

Cue Words:- First word in each segment

Feature Class

Page 18: Classification of Discourse Coherence Relations: An Exploratory Study using Multiple Knowledge Sources Ben Wellner † *, James Pustejovsky †, Catherine

ExampleExampleSEG2: The university spent $30000SEG1: to upgrade lab equipment in 1987

Fea. Fea. ClassClass

Example FeatureExample Feature

ProximityProximity adjacent; dist<3; dist<5; direction-reverse; same-adjacent; dist<3; dist<5; direction-reverse; same-sentencesentence

Cue Cue WordsWords

First1=“to”; First2=“The”First1=“to”; First2=“The”

Page 19: Classification of Discourse Coherence Relations: An Exploratory Study using Multiple Knowledge Sources Ben Wellner † *, James Pustejovsky †, Catherine

Lexical CoherenceLexical Coherence Motivation:Motivation:

Identify lexical associations, lexical/semantic similaritiesIdentify lexical associations, lexical/semantic similarities E.g. push/fall, crash/injure, lab/universityE.g. push/fall, crash/injure, lab/university

Brandeis Semantic Ontology (BSO)Brandeis Semantic Ontology (BSO) Taxonomy of types (i.e. senses)Taxonomy of types (i.e. senses) Includes Includes qualiaqualia information for words information for words

Telic (purpose), agentive (creation), constitutive (parts)Telic (purpose), agentive (creation), constitutive (parts) Word Sketch Engine (WSE)Word Sketch Engine (WSE)

Similarity of words as measured by their contexts in a Similarity of words as measured by their contexts in a corpus (BNC)corpus (BNC)

BSO:- Paths between words up to length 10

WSE:- Number of word pairs with similarity > 0.05, > 0.01- Segment similarities (sum of word-pair similarities / #

words)

Feature Class

Page 20: Classification of Discourse Coherence Relations: An Exploratory Study using Multiple Knowledge Sources Ben Wellner † *, James Pustejovsky †, Catherine

ExampleExampleSEG2: The university spent $30000SEG1: to upgrade lab equipment in 1987

Fea. Fea. ClassClass

Example FeatureExample Feature

ProximityProximity adjacent; dist<3; dist<5; direction-reverse; same-adjacent; dist<3; dist<5; direction-reverse; same-sentencesentence

Cue Cue WordsWords

First1=“to”; First2=“The”First1=“to”; First2=“The”

BSOBSO Research Lab=>Educational Activity=>UniversityResearch Lab=>Educational Activity=>University

WSEWSE WSE>0.05; WSE-sentence-similarity=0.005417WSE>0.05; WSE-sentence-similarity=0.005417

Page 21: Classification of Discourse Coherence Relations: An Exploratory Study using Multiple Knowledge Sources Ben Wellner † *, James Pustejovsky †, Catherine

EventsEvents Motivation:Motivation:

Certain events and event-pairs are indicative of Certain events and event-pairs are indicative of certain relation types (e.g. “push”-”fall”: cause)certain relation types (e.g. “push”-”fall”: cause)

Allow learner to associate events and event-pairs Allow learner to associate events and event-pairs with particular relation typeswith particular relation types

Evita: EVents In Text AnalyzerEvita: EVents In Text Analyzer Performs domain independent identification of Performs domain independent identification of

eventsevents Identifies all event-referring expressions (that can Identifies all event-referring expressions (that can

be temporally ordered)be temporally ordered)

Events:- Event mentions in each segment- Event mention pairs drawn from both segments

Feature Class

Page 22: Classification of Discourse Coherence Relations: An Exploratory Study using Multiple Knowledge Sources Ben Wellner † *, James Pustejovsky †, Catherine

ExampleExampleSEG2: The university spent $30000SEG1: to upgrade lab equipment in 1987

Fea. Fea. ClassClass

Example FeatureExample Feature

ProximityProximity adjacent; dist<3; dist<5; direction-reverse; same-adjacent; dist<3; dist<5; direction-reverse; same-sentencesentence

Cue Cue WordsWords

First1=“to”; First2=“The”First1=“to”; First2=“The”

BSOBSO Research Lab=>Educational Activity=>UniversityResearch Lab=>Educational Activity=>University

WSEWSE WSE>0.05; WSE-sentence-similarity=0.005417WSE>0.05; WSE-sentence-similarity=0.005417

EventsEvents Event1=“upgrade”; event2=“spent”; event-Event1=“upgrade”; event2=“spent”; event-pair=“upgrade-spent”pair=“upgrade-spent”

Page 23: Classification of Discourse Coherence Relations: An Exploratory Study using Multiple Knowledge Sources Ben Wellner † *, James Pustejovsky †, Catherine

Modality and Subordinating Modality and Subordinating RelationsRelations

Motivation: Motivation: Event modality and subordinating relations are Event modality and subordinating relations are

indicative of certain relationsindicative of certain relations SlinkET SlinkET [Saurí et al. 2006][Saurí et al. 2006]

Identifies subordinating contexts and classifying as:Identifies subordinating contexts and classifying as: Factive, counter-factive, evidential, negative evidential, or Factive, counter-factive, evidential, negative evidential, or

modalmodal E.g. evidential => attribute relationE.g. evidential => attribute relation

Event class, polarity, tense, etc.Event class, polarity, tense, etc.

SlinkET: - Event class, polarity, tense and modality of events in each segment - Subordinating relations between event pairs

Feature Class

Page 24: Classification of Discourse Coherence Relations: An Exploratory Study using Multiple Knowledge Sources Ben Wellner † *, James Pustejovsky †, Catherine

ExampleExampleSEG2: The university spent $30000SEG1: to upgrade lab equipment in 1987

Fea. Fea. ClassClass

Example FeatureExample Feature

ProximityProximity adjacent; dist<3; dist<5; direction-reverse; same-adjacent; dist<3; dist<5; direction-reverse; same-sentencesentence

Cue Cue WordsWords

First1=“to”; First2=“The”First1=“to”; First2=“The”

BSOBSO Research Lab=>Educational Activity=>UniversityResearch Lab=>Educational Activity=>University

WSEWSE WSE>0.05; WSE-sentence-similarity=0.005417WSE>0.05; WSE-sentence-similarity=0.005417

EventsEvents Event1=“upgrade”; event2=“spent”; event-Event1=“upgrade”; event2=“spent”; event-pair=“upgrade-spent”pair=“upgrade-spent”

SlinkETSlinkET Class1=“occurrence”; Class2=“occurrence”; Class1=“occurrence”; Class2=“occurrence”; Tense1=“infinitive”; Tense2=“past”; modal-relationTense1=“infinitive”; Tense2=“past”; modal-relation

Page 25: Classification of Discourse Coherence Relations: An Exploratory Study using Multiple Knowledge Sources Ben Wellner † *, James Pustejovsky †, Catherine

Cue Words and EventsCue Words and Events

MotivationMotivation Certain events (event types) are likely Certain events (event types) are likely

to appear in particular discourse to appear in particular discourse contexts keyed by certain connectives.contexts keyed by certain connectives.

Pairing connectives with events Pairing connectives with events captures this more precisely than captures this more precisely than connectives or events on their ownconnectives or events on their own

CueWords + Events: - First word of SEG1 and each event mention in SEG2 - First word of SEG2 and each event mention in SEG1

Feature Class

Page 26: Classification of Discourse Coherence Relations: An Exploratory Study using Multiple Knowledge Sources Ben Wellner † *, James Pustejovsky †, Catherine

ExampleExampleSEG2: The university spent $30000SEG1: to upgrade lab equipment in 1987

Fea. Fea. ClassClass

Example FeatureExample Feature

ProximityProximity adjacent; dist<3; dist<5; direction-reverse; same-adjacent; dist<3; dist<5; direction-reverse; same-sentencesentence

Cue Cue WordsWords

First1=“to”; First2=“The”First1=“to”; First2=“The”

BSOBSO Research Lab=>Educational Activity=>UniversityResearch Lab=>Educational Activity=>University

WSEWSE WSE>0.05; WSE-sentence-similarity=0.005417WSE>0.05; WSE-sentence-similarity=0.005417

EventsEvents Event1=“upgrade”; event2=“spent”; event-Event1=“upgrade”; event2=“spent”; event-pair=“upgrade-spent”pair=“upgrade-spent”

SlinkETSlinkET Class1=“occurrence”; Class2=“occurrence”; Class1=“occurrence”; Class2=“occurrence”; Tense1=“infinitive”; Tense2=“past”; modal-relationTense1=“infinitive”; Tense2=“past”; modal-relation

CueWord CueWord ++

EventsEvents

First1=“to”-Event2=“spent”; First2=“The”-First1=“to”-Event2=“spent”; First2=“The”-Event1=“upgrade”Event1=“upgrade”

Page 27: Classification of Discourse Coherence Relations: An Exploratory Study using Multiple Knowledge Sources Ben Wellner † *, James Pustejovsky †, Catherine

Grammatical RelationsGrammatical Relations

Motivation:Motivation: Certain intra-sentential relations captured Certain intra-sentential relations captured

or ruled out by particular dependency or ruled out by particular dependency relations between clausal headwordsrelations between clausal headwords

Identification of headwords also importantIdentification of headwords also important Main events identifiedMain events identified

RASP parserRASP parser

Syntax: - Grammatical relations between two segments - GR + SEG1 head word - GR + SEG2 head word - GR + Both head words

Feature Class

Page 28: Classification of Discourse Coherence Relations: An Exploratory Study using Multiple Knowledge Sources Ben Wellner † *, James Pustejovsky †, Catherine

ExampleExampleSEG2: The university spent $30000SEG1: to upgrade lab equipment in 1987

Fea. ClassFea. Class Example FeatureExample Feature

ProximityProximity adjacent; dist<3; dist<5; direction-reverse; same-adjacent; dist<3; dist<5; direction-reverse; same-sentencesentence

Cue Cue WordsWords

First1=“to”; First2=“The”First1=“to”; First2=“The”

BSOBSO Research Lab=>Educational Activity=>UniversityResearch Lab=>Educational Activity=>University

WSEWSE WSE>0.05; WSE-sentence-similarity=0.005417WSE>0.05; WSE-sentence-similarity=0.005417

EventsEvents Event1=“upgrade”; event2=“spent”; event-Event1=“upgrade”; event2=“spent”; event-pair=“upgrade-spent”pair=“upgrade-spent”

SlinkETSlinkET Class1=“occurrence”; Class2=“occurrence”; Class1=“occurrence”; Class2=“occurrence”; Tense1=“infinitive”; Tense2=“past”; modal-relationTense1=“infinitive”; Tense2=“past”; modal-relation

CueWord CueWord ++

EventsEvents

First1=“to”-Event2=“spent”; First2=“The”-First1=“to”-Event2=“spent”; First2=“The”-Event1=“upgrade”Event1=“upgrade”

SyntaxSyntax Gr=“ncmod”; Gr=“ncmod”-Head1=“equipment”; Gr=“ncmod”; Gr=“ncmod”-Head1=“equipment”; Gr=“ncmod”-Head2=“spent”Gr=“ncmod”-Head2=“spent”

Page 29: Classification of Discourse Coherence Relations: An Exploratory Study using Multiple Knowledge Sources Ben Wellner † *, James Pustejovsky †, Catherine

Temporal RelationsTemporal Relations Motivation:Motivation:

Temporal ordering between events constrains Temporal ordering between events constrains possible coherence relationspossible coherence relations

E.g. E1 BEFORE E2 => NOT(E2 CAUSE E1)E.g. E1 BEFORE E2 => NOT(E2 CAUSE E1) Temporal Relation ClassifierTemporal Relation Classifier

Trained on TimeBank 1.2 using MaxEntTrained on TimeBank 1.2 using MaxEnt See See [Mani et al. “Machine Learning of [Mani et al. “Machine Learning of

Temporal Relations” ACL 2006]Temporal Relations” ACL 2006]

TLink: - Temporal Relations holding between segments

Feature Class

Page 30: Classification of Discourse Coherence Relations: An Exploratory Study using Multiple Knowledge Sources Ben Wellner † *, James Pustejovsky †, Catherine

ExampleExampleSEG2: The university spent $30000SEG1: to upgrade lab equipment in 1987

Fea. ClassFea. Class Example FeatureExample Feature

ProximityProximity adjacent; dist<3; dist<5; direction-reverse; same-adjacent; dist<3; dist<5; direction-reverse; same-sentencesentence

Cue Cue WordsWords

First1=“to”; First2=“The”First1=“to”; First2=“The”

BSOBSO Research Lab=>Educational Activity=>UniversityResearch Lab=>Educational Activity=>University

WSEWSE WSE>0.05; WSE-sentence-similarity=0.005417WSE>0.05; WSE-sentence-similarity=0.005417

EventsEvents Event1=“upgrade”; event2=“spent”; event-Event1=“upgrade”; event2=“spent”; event-pair=“upgrade-spent”pair=“upgrade-spent”

SlinkETSlinkET Class1=“occurrence”; Class2=“occurrence”; Class1=“occurrence”; Class2=“occurrence”; Tense1=“infinitive”; Tense2=“past”; modal-relationTense1=“infinitive”; Tense2=“past”; modal-relation

CueWord CueWord ++

EventsEvents

First1=“to”-Event2=“spent”; First2=“The”-First1=“to”-Event2=“spent”; First2=“The”-Event1=“upgrade”Event1=“upgrade”

SyntaxSyntax Gr=“ncmod”; Gr=“ncmod”-Head1=“equipment”; Gr=“ncmod”; Gr=“ncmod”-Head1=“equipment”; Gr=“ncmod”-Head2=“spent”Gr=“ncmod”-Head2=“spent”

TlinkTlink Seg2-before-Seg1Seg2-before-Seg1

Page 31: Classification of Discourse Coherence Relations: An Exploratory Study using Multiple Knowledge Sources Ben Wellner † *, James Pustejovsky †, Catherine

Relation ClassificationRelation Classification

IdentifyIdentify Specific coherence relationSpecific coherence relation

Ignoring elaboration subtypes (too sparse)Ignoring elaboration subtypes (too sparse) Coarse-grained relation (resemblance, cause-effect, Coarse-grained relation (resemblance, cause-effect,

temporal, attributive)temporal, attributive) Evaluation MethodologyEvaluation Methodology

Used Maximum Entropy classifier ( Gaussian prior variance Used Maximum Entropy classifier ( Gaussian prior variance = 2.0 )= 2.0 )

8-fold cross validation8-fold cross validation Specific relation accuracy: 81.06%Specific relation accuracy: 81.06% Inter-annotator agreement: 94.6%Inter-annotator agreement: 94.6% Majority Class Baseline: 45.7%Majority Class Baseline: 45.7%

Classifying all relations as Classifying all relations as elaborationelaboration Coarse-grain relation accuracy: 87.51%Coarse-grain relation accuracy: 87.51%

Page 32: Classification of Discourse Coherence Relations: An Exploratory Study using Multiple Knowledge Sources Ben Wellner † *, James Pustejovsky †, Catherine

F-Measure ResultsF-Measure Results

RelationRelation PrecisionPrecision RecallRecall F-measureF-measure # True # True positivespositives

elaborationelaboration 88.7288.72 95.3195.31 91.9091.90 512512

attributionattribution 91.1491.14 95.1095.10 93.0993.09 184184

similar (parallel)similar (parallel) 71.8971.89 83.3383.33 77.1977.19 132132

samesame 87.0987.09 75.0075.00 80.6080.60 7272

cause-effectcause-effect 78.7878.78 41.2641.26 54.1654.16 6363

contrastcontrast 65.5165.51 66.6766.67 66.0866.08 5757

exampleexample 78.9478.94 48.3948.39 60.0060.00 3131

temporal temporal precedenceprecedence

50.0050.00 20.8320.83 29.4129.41 2424

violated violated expectationexpectation

33.3333.33 16.6716.67 22.2222.22 1212

conditionalconditional 45.4545.45 62.5062.50 52.6352.63 88

generalizationgeneralization 00 00 00 00

Page 33: Classification of Discourse Coherence Relations: An Exploratory Study using Multiple Knowledge Sources Ben Wellner † *, James Pustejovsky †, Catherine

Results: Confusion Results: Confusion MatrixMatrix

elabelab parpar attrattr cece temtempp

contcontrr

samsamee

exmexmpp

expvexpv condcond gengen

elabelab 488488 33 77 33 11 00 22 44 00 33 11

parpar 66 110110 22 22 00 88 22 00 00 22 00

attrattr 44 00 175175 00 00 11 22 00 11 11 00

cece 1818 99 33 2626 33 22 22 00 00 00 00

temtempp

66 88 22 00 55 33 00 00 00 00 00

contcontrr

44 1212 00 00 00 3838 00 00 33 00 00

samsamee

33 99 22 22 00 22 5454 00 00 00 00

exmexmpp

1515 11 00 00 00 00 00 1515 00 00 00

expvexpv 33 11 11 00 11 44 00 00 22 00 00

condcond 33 00 00 00 00 00 00 00 00 55 00

gengen 00 00 00 00 00 00 00 00 00 00 00

Hypothesis

Refe

ren

ce

Page 34: Classification of Discourse Coherence Relations: An Exploratory Study using Multiple Knowledge Sources Ben Wellner † *, James Pustejovsky †, Catherine

Feature Class AnalysisFeature Class Analysis What is the utility of each feature class?What is the utility of each feature class? Features overlap significantly – highly Features overlap significantly – highly

correlatedcorrelated How can we estimate utility?How can we estimate utility?

IndependentlyIndependently Start with Proximity feature class (baseline)Start with Proximity feature class (baseline) Add each feature class separatelyAdd each feature class separately Determine improvement over baselineDetermine improvement over baseline

In combination with other featuresIn combination with other features Start with all featuresStart with all features Remove each feature class individuallyRemove each feature class individually Determine reduction from removal of feature classDetermine reduction from removal of feature class

Page 35: Classification of Discourse Coherence Relations: An Exploratory Study using Multiple Knowledge Sources Ben Wellner † *, James Pustejovsky †, Catherine

Feature Class Analysis Feature Class Analysis ResultsResults

Feature Feature ClassClass

AccuracyAccuracy Coarse-Coarse-grain Acc.grain Acc.

ProximityProximity 60.08%60.08% 69.43%69.43%

+ + CuewordsCuewords

76.77%76.77% 83.50%83.50%

+ BSO+ BSO 62.92%62.92% 74.40%74.40%

+ WSE+ WSE 62.20%62.20% 70.10%70.10%

+ Events+ Events 63.84%63.84% 78.16%78.16%

+ SlinkET+ SlinkET 69.00%69.00% 75.91%75.91%

+ CueWord + CueWord / Event/ Event

67.18%67.18% 78.63%78.63%

+ Syntax+ Syntax 70.30%70.30% 80.84%80.84%

+ TLink+ TLink 64.19%64.19% 72.30%72.30%

Feature Feature ClassClass

AccuracyAccuracy Coarse-Coarse-grain Acc.grain Acc.

All All FeaturesFeatures

81.06%81.06% 87.51%87.51%

- Proximity- Proximity 71.52%71.52% 84.88%84.88%

- Cuewords- Cuewords 75.71%75.71% 84.69%84.69%

- BSOBSO 80.65%80.65% 87.04%87.04%

- WSEWSE 80.26%80.26% 87.14%87.14%

- EventsEvents 80.90%80.90% 86.92%86.92%

- SlinkET- SlinkET 79.68%79.68% 86.89%86.89%

- - CueWord / CueWord / EventEvent

80.41%80.41% 87.14%87.14%

- Syntax- Syntax 80.20%80.20% 86.89%86.89%

- TLink- TLink 80.30%80.30% 87.36%87.36%Feature Class Contributions in ConjunctionFeature Class Contributions in Isolation

Page 36: Classification of Discourse Coherence Relations: An Exploratory Study using Multiple Knowledge Sources Ben Wellner † *, James Pustejovsky †, Catherine

Relation IdentificationRelation Identification

GivenGiven Discourse segments (and segment sequences)Discourse segments (and segment sequences)

IdentifyIdentify For each pair of segments, whether a relation For each pair of segments, whether a relation

(any relation) exists on those segments(any relation) exists on those segments Two issues:Two issues:

Highly skewed classificationHighly skewed classification Many negatives, few positivesMany negatives, few positives

Many of the relations are transitiveMany of the relations are transitive These aren’t annotated and will be false negative These aren’t annotated and will be false negative

instancesinstances

Page 37: Classification of Discourse Coherence Relations: An Exploratory Study using Multiple Knowledge Sources Ben Wellner † *, James Pustejovsky †, Catherine

Relation Identification Relation Identification ResultsResults

For all pairs of segment sequence in a For all pairs of segment sequence in a documentdocument Used same features as for classificationUsed same features as for classification Achieved accuracy only slightly above Achieved accuracy only slightly above

majority class baselinemajority class baseline For segment pairs in same sentenceFor segment pairs in same sentence

Accuracy: 70.04% (baseline 58%)Accuracy: 70.04% (baseline 58%) Identification and classification in same Identification and classification in same

sentencesentence Accuracy: 64.53% (baseline 58%)Accuracy: 64.53% (baseline 58%)

Page 38: Classification of Discourse Coherence Relations: An Exploratory Study using Multiple Knowledge Sources Ben Wellner † *, James Pustejovsky †, Catherine

Inter-relation Inter-relation DependenciesDependencies

Each relation shouldn’t be identified in isolationEach relation shouldn’t be identified in isolation When identifying a relation between When identifying a relation between ssii and and ssjj, consider , consider

other relations involving other relations involving ssii and and ssjj

Include as features the other (gold standard true) Include as features the other (gold standard true) relation types both segments are involved inrelation types both segments are involved in Adding this feature class improves performance to 82.3%Adding this feature class improves performance to 82.3% 6.3% error reduction6.3% error reduction

Indicates room for improvement withIndicates room for improvement with Collective classification (where outputs influence each Collective classification (where outputs influence each

other)other) Incorporating explicit modeling constraintsIncorporating explicit modeling constraints

Tree-based parsing modelTree-based parsing model Constrained DAGs Constrained DAGs [Danlos 2004][Danlos 2004]

Including, deducing transitive links may help furtherIncluding, deducing transitive links may help further

}|),({ jkssR ki }|),({ ilssR lj

Page 39: Classification of Discourse Coherence Relations: An Exploratory Study using Multiple Knowledge Sources Ben Wellner † *, James Pustejovsky †, Catherine

ConclusionsConclusions Classification approach with many features Classification approach with many features

achieves good performance at classifying achieves good performance at classifying coherence relation typescoherence relation types

All feature classes helpful, but:All feature classes helpful, but: Discriminative power of most individual feature classes Discriminative power of most individual feature classes

captured by union of remaining feature classescaptured by union of remaining feature classes Proximity + CueWords acheives 76.77%Proximity + CueWords acheives 76.77% Remaining features reduce error by 23.7%Remaining features reduce error by 23.7%

Classification approach performs less well on Classification approach performs less well on task of identifying the presence of a relationtask of identifying the presence of a relation Using same features as for classifying coherence Using same features as for classifying coherence

relation typesrelation types ““Parsing” may prove better for local relationshipsParsing” may prove better for local relationships

Page 40: Classification of Discourse Coherence Relations: An Exploratory Study using Multiple Knowledge Sources Ben Wellner † *, James Pustejovsky †, Catherine

Future WorkFuture Work Additional linguistic analysisAdditional linguistic analysis

Co-reference – both entities and eventsCo-reference – both entities and events Word-sense Word-sense

lexical similarity confounded with multiple types for a lexemelexical similarity confounded with multiple types for a lexeme Pipelined or ‘stacked’ architecturePipelined or ‘stacked’ architecture

Classify coarse-grained category first, then specific Classify coarse-grained category first, then specific coherence relationcoherence relation

Justification: different categories require different types of Justification: different categories require different types of knowledgeknowledge

Relational classificationRelational classification Model decisions collectivelyModel decisions collectively Include constraints on structureInclude constraints on structure

Investigate transitivity of Investigate transitivity of resemblanceresemblance relations relations Consider other approaches for identification of Consider other approaches for identification of

relationsrelations

Page 41: Classification of Discourse Coherence Relations: An Exploratory Study using Multiple Knowledge Sources Ben Wellner † *, James Pustejovsky †, Catherine

Questions?Questions?

Page 42: Classification of Discourse Coherence Relations: An Exploratory Study using Multiple Knowledge Sources Ben Wellner † *, James Pustejovsky †, Catherine

Backup SlidesBackup Slides

Page 43: Classification of Discourse Coherence Relations: An Exploratory Study using Multiple Knowledge Sources Ben Wellner † *, James Pustejovsky †, Catherine

GraphBank Annotation GraphBank Annotation StatisticsStatistics

Corpus and Annotator StatisticsCorpus and Annotator Statistics 135 doubly annotated newswire articles135 doubly annotated newswire articles Identifying discourse segments had high Identifying discourse segments had high

agreement (> 90% from pilot study of 10 agreement (> 90% from pilot study of 10 documents) documents)

Corpus segments ultimately annotated once (by both Corpus segments ultimately annotated once (by both annotators together)annotators together)

Segment grouping - Kappa 0.8424Segment grouping - Kappa 0.8424 Relation identification and typing - Kappa Relation identification and typing - Kappa

0.83550.8355

Page 44: Classification of Discourse Coherence Relations: An Exploratory Study using Multiple Knowledge Sources Ben Wellner † *, James Pustejovsky †, Catherine

Factors Involved in Factors Involved in Identifying Coherence Identifying Coherence

RelationsRelations ProximityProximity E.g. Attribution local, elaboration non-localE.g. Attribution local, elaboration non-local

Lexical and phrasal cuesLexical and phrasal cues Constrain possible relation typesConstrain possible relation types

But => ‘contrast’, ‘expected violation’But => ‘contrast’, ‘expected violation’ And => ‘elaboration’, ‘similar’, ‘contrast’And => ‘elaboration’, ‘similar’, ‘contrast’

Co-referenceCo-reference Coherence established with references to mentioned Coherence established with references to mentioned

entities/eventsentities/events Argument structureArgument structure

E.g. similar => similar/same event and/or participantsE.g. similar => similar/same event and/or participants Lexical KnowledgeLexical Knowledge

Type inclusion, word senseType inclusion, word sense Qualia (purpose of an object, resulting state of an action), Qualia (purpose of an object, resulting state of an action),

event structureevent structure Paraphrases: delay => arrive lateParaphrases: delay => arrive late

World KnowledgeWorld Knowledge E.g. Ukraine is part of EuropeE.g. Ukraine is part of Europe

Page 45: Classification of Discourse Coherence Relations: An Exploratory Study using Multiple Knowledge Sources Ben Wellner † *, James Pustejovsky †, Catherine

ArchitectureArchitecture

Pre-processing

KnowledgeSource 1

KnowledgeSource 2

KnowledgeSource n

FeatureConstructor

Training

Model

Prediction

Classifications

Page 46: Classification of Discourse Coherence Relations: An Exploratory Study using Multiple Knowledge Sources Ben Wellner † *, James Pustejovsky †, Catherine

Scenario Extraction: Scenario Extraction: MUCMUC

Pull together relevant facts related to a Pull together relevant facts related to a “complex event”“complex event” Management SuccessionManagement Succession Mergers and AcquisitionsMergers and Acquisitions Natural DisastersNatural Disasters Satellite launchesSatellite launches

Requires identifying relations between events:Requires identifying relations between events: Parallel, cause-effect, elaborationParallel, cause-effect, elaboration Also: identity, part-ofAlso: identity, part-of

Hypothesis:Hypothesis: Task independent identification of discourse Task independent identification of discourse

relations will allow rapid development of Scenario relations will allow rapid development of Scenario Extraction systemsExtraction systems

Page 47: Classification of Discourse Coherence Relations: An Exploratory Study using Multiple Knowledge Sources Ben Wellner † *, James Pustejovsky †, Catherine

Information Extraction: Information Extraction: CurrentCurrent

Pre-process

Domain 2

Domain 1

Domain N

Task 1.1

Task 1.N

Task 2.1

Task 2.N

Fact ExtractionScenario Extraction

Page 48: Classification of Discourse Coherence Relations: An Exploratory Study using Multiple Knowledge Sources Ben Wellner † *, James Pustejovsky †, Catherine

Information Extraction: Information Extraction: FutureFuture

Pre-process Fact Extraction Discourse