Experiments with Annotating Discourse Relations in the Hindi Discourse Relation Bank (HDRB) Umangi...
43
Experiments with Annotating Discourse Relations in the Hindi Discourse Relation Bank (HDRB) Umangi Oza†, Rashmi Prasad‡ Sudheer Kolachina†, Suman Meena§ Dipti Misra Sharma†, Aravind Joshi‡ † Language Technologies Research Center International Institute of Information Technology, Hyderabad, India § Center for Language, Literature and Cultural studies Jawaharlal Nehru University, NewDelhi, India ‡ Institute for Research in Cognitive Science/ Computer and Information Science University of Pennsylvania, Philadelphia, PA, USA
Experiments with Annotating Discourse Relations in the Hindi Discourse Relation Bank (HDRB) Umangi Oza†, Rashmi Prasad‡ Sudheer Kolachina†, Suman Meena§
Experiments with Annotating Discourse Relations in the Hindi
Discourse Relation Bank (HDRB) Umangi Oza, Rashmi Prasad Sudheer
Kolachina, Suman Meena Dipti Misra Sharma, Aravind Joshi Language
Technologies Research Center International Institute of Information
Technology, Hyderabad, India Center for Language, Literature and
Cultural studies Jawaharlal Nehru University, NewDelhi, India
Institute for Research in Cognitive Science/ Computer and
Information Science University of Pennsylvania, Philadelphia, PA,
USA
Slide 2
December 2009, ICONHDRB, Umangi et al.2 Introduction: Why
Discourse? For many NLP applications, such as Question-Answering,
Text Summarization, and Language Generation, sentence- level
analysis derived from an annotated corpus is insufcient, e.g., Penn
Treebank (PTB) (Marcus et al., 1993) Propbank (Palmer et al., 2005)
Need for discourse-level information Penn Discourse Treebank
(Prasad et al.,2008) Annotation over the same WSJ raw corpus as PTB
and Propbank has resulted in an enriched annotated resource The
browser allows the viewing of both annotations
Slide 3
December 2009, ICONHDRB, Umangi et al.3 Penn Discourse TreeBank
(PDTB) (Prasad et al., 2008): Large-scale corpus of
lexically-grounded annotations of discourse relations between
abstract objects (AOs) Discourse Relations: cause, contrast,
elaboration, etc. Abstract Objects: eventualities and propositions
(Asher, 1993) Discourse relation triggers: Explicit connectives
closed-class expressions from well-defined grammatical classes
Alternative Lexicalizations (AltLex) Expressions not definable as
explicit connectives Implicit connectives: inferred relations for
which connectives are inserted
Slide 4
December 2009, ICONHDRB, Umangi et al.4 PDTB When no discourse
relation can be inferred: EntRel (an entity-based coherence
relation) NoRel (no discourse relation) Abstract object arguments
of a discourse relation: called Arg1 and Arg2 Arg2 goes with the
clause/AO in which the connective occurs Minimality Principle:
Select only as much as the argument text span as is minimally
necessary to interpret the relation Sense Annotation: Each relation
assigned a sense label based on a hierarchical sense classification
scheme
Slide 5
December 2009, ICONHDRB, Umangi et al.5 PDTB examples
Convention: Arg1 [] and Arg2 {} Explicit Connective: [By most
measures, the nations industrial sector is now growing very
slowly]. Factory payrolls fell in September. So did the Federal
Reserve Boards industrial production index. Yet, {many economists
arent predicting that the economy is about to slip into recession.}
(sense: Concession) AltLex: [Under a post-1987 crash reform, the
Chicago Mercantile Exchange wouldnt permit the December S&P
futures to fall further than 12 points for an hour.] {That caused a
brief period of panic selling of stocks on the Big Board.} (sense:
Result) Implicit Conective: [The voters, as well as numerous Latin
American and East European countries that hope to adopt the Spanish
model, are supporting the direction Spain is taking.] IMPLICIT=SO
{It would be sad for Mr. Gonzalez to abandon them to appease his
foes.} (sense: Result)
Slide 6
December 2009, ICONHDRB, Umangi et al.6 Hindi Discourse
Relation Bank (HDRB) HDRB aims at creating a large-scale annotated
corpus of discourse relations in Hindi texts, following the PDTB
approach. Corpus: 200 K size corpus drawn from 400 K on which Hindi
syntactic dependency annotation being conducted independently
(Begum et al., 2008) Multi-domain newspaper corpus Other
cross-linguistic discourse annotation projects: Chinese (Xue, 2005)
Czech (Mladova et al., 2008) Turkish (Zeyrek and Webber, 2008)
Slide 7
December 2009, ICONHDRB, Umangi et al.7 Syntactic Classes of
Explicit Connectives Explicit Connectives are closed-class
expressions drawn from a set of well-defined grammatical classes.
Subordinating conjunctions Coordinating conjunctions Adverbials
Pied-piped sentential relativizers Subordinators Particles
Slide 8
December 2009, ICONHDRB, Umangi et al.8 Subordinating
Conjunctions Lexical items conjoining nite adverbial clauses to
their matrix clause Typically occur clause-initially Both single
(e.g., (because)) and paired forms (e.g., ... (if..then)) [ ] { .}
(Cause) [Today the lamp has been lit] because {it is my birthday}.
[ ] { } (Conditional) If [one were to ask you to quit taking salt]
then {even you would not quit}.
Slide 9
December 2009, ICONHDRB, Umangi et al.9 Coordinating
Conjunctions Lexical items conjoining clauses or phrases of the
same syntactic status Occur clause-initially, e.g., (and), (but)
Single as well as paired forms e.g., (not only...but also) [ ] { .
} (Concession) [There are many groups in the Sangh] but {there is
just one ideology.}
Slide 10
December 2009, ICONHDRB, Umangi et al.10 Adverbials Adverbial
and prepositional phrases claimed to function as anaphoric
discourse (Webber et al., 2003) Some examples of these are (so),
(then), (otherwise), (in fact), (just then), (in addition to this)
etc. [ .] { .} (Expansion) [The coastal vegetation on the west
coast of the Andaman has been completely destroyed due to wild
waves]. In addition, {the coral reefs have also been damaged}.
Slide 11
December 2009, ICONHDRB, Umangi et al.11 Pied-piped Sentential
Relativizers Pied-piped relative phrases that conjoin a relative
clause with the predication of its matrix clause (rather than some
NP) Examples are (so that), (because of which) [ ] { .} (Cause)
[Dropping all his work, he picked up the bird and ran towards the
dispensary] so that {it could be given proper treatment} The
relative pronoun modifies the event expressed in the matrix
clause
Slide 12
December 2009, ICONHDRB, Umangi et al.12 Subordinators
Post-positions, verbal participles, and suffixes that introduce
non-nite clauses { } [ .] (Succession) After {Baa left} [he called
the boy to him]. ... { } [ .] (Synchronous) ...while [playing] {he
forgets that if his friend too didnt let him touch his toy, then he
would feel very bad too].
Slide 13
December 2009, ICONHDRB, Umangi et al.13 Particles Particles
such as , can function as discourse connectives in Hindi [ .] { } {
.} (Conjunction) [People see this as a consequence of the improving
relation between the two countries]. {The Kashmiris are} also
{learning a political lesson from this}. Instances only where they
indicate the inclusion of verbs taken as discourse connectives . He
didnt eat anything.
Slide 14
Arguments of Discourse Relations In PDTB, Arg2 is the argument
syntactically associated with the connective, and Arg1 is the other
argument. In HDRB, argument naming is based on the sense of the
relation. Each relation definition specifies its own convention for
argument naming. E.g., In the cause relation, one argument is the
cause and the other is the effect. HDRB convention: Arg1=effect;
Arg2=cause Advantages of semantic naming scheme: More meaningful,
and simplifies the sense classification hierarchy. December 2009,
ICONHDRB, Umangi et al.14
Slide 15
Arguments of Discourse Relations Cause after effect. Hence,
Arg1-Arg2 [ ] { .} (Cause) After the competition, Sonal said that
[when her name was announced as the winner, she could not believe
herself for some time], because {she was thinking that the
competition was xed}. Cause before effect (Arg2-Arg1) . { } [ .]
(Cause) Fashion designers say that the most prevalent thefts or
copies are of monopoly designs. {Designers know this fact very
well} so [it does not matter to them many times]. December 2009,
ICONHDRB, Umangi et al.15
Slide 16
December 2009, ICONHDRB, Umangi et al.16 Implicit Discourse
Relations For adjacent sentences not related by an explicit
connective, four possibilities are considered in order: (1)Infer a
discourse relation and insert an implicit connective between them {
.} IMPLICIT = [ .] (Causal) {All the players in this game are
greater than even Sachin Tendulkar} so [it is not possible for
anyone to get them clean bowled.] (2) If relation is inferred but
insertion of connective leads to redundancy, find and annotate an
alternate Lexicalization (AltLex) of the relation { } AltLex [ ]
{Bangladeshs judiciary has seen an improvement}. That is why [India
has decided to participate in the conference.]
Slide 17
Other Relations (3)If no discourse relation is inferred but
coherence results from an entity-based relation, annotate relation
as EntRel. [ ] EntRel { .} [Prakash Jhas latest lm Apaharan will be
premiered at the lm festival.] {This is Jhas second lm on a
different subject after Gangajal.} (4) If no discourse relation or
EntRel is perceived, annotate relation as NoRel December 2009,
ICONHDRB, Umangi et al.17
Slide 18
December 2009, ICONHDRB, Umangi et al.18 HDRB Sense
Classification Adapted from PDTB
Slide 19
December 2009, ICONHDRB, Umangi et al.19 Results Annotated 35
texts from the HDRB corpus Total of 602 relations annotated (both
explicit and implicit) Overall distribution of relation types
Comparison with PDTB distributions (Prasad et al., 2008)
Slide 20
December 2009, ICONHDRB, Umangi et al.20 Types and Tokens of
Discourse Relations Lexical strategies employed equally often as
morphological marking Design difference between the two projects
(implicit relations annotated between all adjacent sentences in
HDRB unlike PDTB) probably the reason for the different relative
proportions of explicit and implicit relations across the two
corpora The higher proportion of AltLex compared to PDTB suggests
that Hindi makes greater use of cohesive strategies to link with
the prior discourse
Slide 21
December 2009, ICONHDRB, Umangi et al.21 Senses of Discourse
Relations Sense distributions are similar cross-linguistically
Chances of Expansion and Contingency relations being explicit lower
compared to Comparison and Temporal relations
Slide 22
December 2009, ICONHDRB, Umangi et al.22 Additional Exploration
of Discourse Adverbials Discourse adverbials are argued to be
anaphoric (Webber et al., 2003) so that their arguments may be
harder to identify than other types of connectives Investigated the
disributions of two discourse adverbials to explore the extent of
difficulty in resolving their arguments Contrastive adverbial,
(nevertheless) Conjunctive adverbial, (in addition) Observations
had non-adjacent LHArgs in 16% of cases always took adjacent
arguments Thus, despite their anaphoric properties, some discourse
adverbials seem to be more constrained than others, and therefore
easier to resolve
Slide 23
December 2009, ICONHDRB, Umangi et al.23 Summary Adapting the
PDTB scheme to Hindi discourse annotation led to Identication of
syntactic categories for explicit connectives that appear to be
more frequent than English (Particles, Sentential Relatives) More
meaningful and simplified sense classification hierarchy Some of
our observations from the initial annotations The correlation
between the use of cohesive strategies and morphological richness
of a language is not completely settled by our annotations so far.
Perhaps, study of languages with morphology richer than Hindi may
shed further light on this issue Sense distributions in both PDTB
and HDRB were similar and conrm the lack of expectation of
cross-linguistic semantic differences Annotation of discourse
adverbials show further evidence of the locality of arguments which
can significantly benefit anaphora resolution for connectives
Slide 24
December 2009, ICONHDRB, Umangi et al.24 Thank you
Slide 25
Questions? December 2009, ICONHDRB, Umangi et al.25
Slide 26
December 2009, ICONHDRB, Umangi et al.26 Back-up slides
Slide 27
December 2009, ICONHDRB, Umangi et al.27 Subordinators (11) { }
[ .] Upon {hearing Baas words}, [Gandhiji felt very ashamed]. Some
instances of subordinators are not discourse connectives, such as
when they denote the manner of an action (Ex.12) (12) . [Lit.] He
caught Baas hand and took her to the door by dragging her.
Preliminary annotation experiments suggest that distinguishing the
discourse and non-discourse usage of subordinators is a difcult
task Annotate them in a later phase of the project
Slide 28
December 2009, ICONHDRB, Umangi et al.28 Arguments of Discourse
Relations In PDTB, assignment of Arg1/Arg2 labels syntactically
driven In HDRB, it is semantically driven i.e, based on the sense
of the relation to which the argument belong In examples 15 and 16,
both relations have the sense cause. Cause sense definition: one
cause and one effect In 15, cause after effect. Hence, Arg1-Arg2
(15) [ ] { .} After the competition, Sonal said that [when her name
was announced as the winner, she could not believe herself for some
time], because {she was thinking that the competition was
xed}.
Slide 29
December 2009, ICONHDRB, Umangi et al.29 Arguments of discourse
relations In 16, cause before effect (Arg2-Arg1) (16) . { } [ .]
Fashion designers say that the most prevalent thefts or copies are
of monopoly designs. {Designers know this fact very well} so [it
does not matter to them many times]. According to the PDTB
convention, both would have been Arg1- Arg2 (syntactic argument
order) Semantics-based convention has the added advantage of
simplifying the Sense classication scheme
Slide 30
December 2009, ICONHDRB, Umangi et al.30 Implicit Discourse
Relations If a relation can be inferred between sentences, an
implicit connective is inserted Insertable connectives drawn
primarily from the list of explicit connectives, but can include
others too (17) { .} IMPLICIT = [ .] {All the players in this game
are greater than even Sachin Tendulkar} so [it is not possible for
anyone to get them clean bowled.] In this example, an implicit
connective expressing a causal relation is inserted
Slide 31
December 2009, ICONHDRB, Umangi et al.31 Implicit Discourse
Relations If a discourse relation can be inferred but insertion of
a connective leads to redundancy in the expression of the relation,
it suggests that the second sentence of the pair contains an
alternatively lexicalized non-connective expression: AltLex AltLex
not a closed class element (18) { } AltLex [ ] {Bangladeshs
judiciary has seen an improvement}. That is why [India has decided
to participate in the conference.]
Slide 32
December 2009, ICONHDRB, Umangi et al.32 Implicit Discourse
Relations If no discourse relation can be inferred, Identify an
entity-based relation (EntRel) across the two sentences The second
sentence provides further description about an entity (or entities)
from the previous sentence (19) [ ] EntRel { .} [Prakash Jhas
latest lm Apaharan will be premiered at the lm festival.] {This is
Jhas second lm on a different subject after Gangajal.} Only purpose
of the second sentence to provide additional information about Jhas
second film If neither a discourse relation nor an EntRel, then
NoRel (no relation)
Slide 33
December 2009, ICONHDRB, Umangi et al.33 Points of Departure
from PDTB Sense Scheme Elimination of senses due to syntactic
argument-naming conventions (as per HDRB semantic naming convention
for arguments) Restricted back-offs in the sense hierarchy (PDTB
allowed back-offs upto the top level. But HDRB belief is that top
level senses are too coarse-grained to be useful. Thus, back-offs
allowed only to the second level) Uniform treatment of pragmatic
relations into more refined senses (Epistemic, speech-act,
propositional (Sweetser, 1990)) Addition of the Goal sense (Was
included as a causal relation in PDTB)
Slide 34
December 2009, ICONHDRB, Umangi et al.34 Elimination of
argument-specic labels While some of the subtype distinctions do
represent the arguments relative semantic roles; others continue to
be renements of the relations semantics Furthermore, it was
observed that the distinctions expressed by the subtype labels were
related to the variation in the linear order of the arguments e.g.,
reason and result subtypes under Contingency.cause, reason (cause
after effect) and result (cause before effect) In HDRB, the
assignment of argument labels is semantically driven and therefore,
these subtype labels are eliminated to avoid inconsistencies
Slide 35
December 2009, ICONHDRB, Umangi et al.35 Restricted back-offs
In PDTB, annotators were allowed to backoff to higher levels in the
hierarchy when they found it difcult to identify the more rened
senses at the lower levels For example, Comparison at the class
level instead of Comparison.Contrast or Comparison.Concession in
case of ambiguity between Contrast and Concession In HDRB, such
backoffs restricted only upto the type level Senses are too
coarse-grained to be useful Guideline consistent with the fact that
argument ordering specifications are provided at the type
level
Slide 36
December 2009, ICONHDRB, Umangi et al.36 Uniform treatment of
pragmatic relations Pragmatic relations in HDRB based broadly on
the distinction made in PDTB between semantic and pragmatic
relations (Sanders et al., 1992) Discourse relations are viewed as
Semantic when they relate the propositional content of the
arguments Pragmatic when their relations have to be inferred from
the propositional content of the arguments In HDRB, the PDTB
pragmatic senses are replaced with a uniform three-way classication
epistemic, speech-act, and propositional
Slide 37
December 2009, ICONHDRB, Umangi et al.37 Uniform treatment of
pragmatic relations Epistemic and speech-act inferences based on
Sweetsers (Sweetser, 1990) analysis of polysemous connectives in
terms of conceptual domains Epistemic interpretation obtained when
the relation involves a conclusion (expressed in one argument)
based on some observation (expressed in the other argument) John
loved Mary, because he came back Speech-act interpretations obtain
when the relation is between a speech-act and the speakers
justication for performing it What are you doing tonight, because
theres a good movie on. In both, the relation is a pragmatic one,
since they involve the inference of a modality-epistemic (e.g.,
conclude(speaker, X)) or a speech-act (e.g., ask(speaker, X)) -
that takes scope over the propositional content of one of the
arguments (X)
Slide 38
December 2009, ICONHDRB, Umangi et al.38 Uniform treatment of
pragmatic relations Propositional inference involves the inference
of a complete proposition Relation taken to hold between this
inferred proposition and the propositional content of one of the
arguments Example below illustrates pragmatic concession of the
propositional subtype (20) [ .] { } [One of the drivers denied his
involvement in the issue inspite of his knowledge about the
weapons]. But {the court said that had he informed the police on
time, the blast could have been prevented}.
Slide 39
December 2009, ICONHDRB, Umangi et al.39 The Goal sense Under
the Contingency class, a new type Goal has been added Applies to
relations where the situation described in one of the arguments is
the goal of the situation described in the other argument (which
enables the achievement of the goal) The argument describing the
goal marked as Arg2, and the other argument is marked Arg1 (Ex.21)
(21) [ ] [ ], { .} Subhash has alleged that [the RJD chief wants to
give a ticket to Rana] so that {he does not become a government
witness in the fodder scam trial}. In PDTB, goal subsumed by the
result subtype Distinguishing between cause and goal has important
consequences for example, in the way questions are formulated over
the relation
Slide 40
December 2009, ICONHDRB, Umangi et al.40 Additional Exploration
of Discourse Adverbials (22) [ .] { .} [The coastal vegetation on
the west coast of the Andaman has been completely destroyed due to
wild waves]. In addition, {the coral reefs have also been damaged}.
(23) [ .] PSU ONGC , , . { PSU .} [Raha was avoiding the
formalities from the beginning itself.] Of all the oil PSUs, ONGC
is the only company which has not even signed the agreement on the
prot, loss, etc. of this scal year. In addition, {Raha had also
refused to participate in the PSU review meeting}. Example 22 and
23 illustrate adjacent and non-adjacent arguments of isalAvA,
respectively
Slide 41
December 2009, ICONHDRB, Umangi et al.41 References Florian
Wolf and Edward Gibson. 2005. Representing Discourse Coherence: A
Corpus-Based Study in Computational Linguistics, Vol. 31, No.
2.,pp. 249-288. Lynn Carlson, Daniel Marcu, and Mary Ellen
Okurowski. 2001. Building a Discourse-Tagged Corpus in the
Framework of Rhetorical Structure Theory. In Proceedings of the 2nd
SIGDIAL Workshop on Discourse and Dialogue, Eurospeech 2001,
Denmark, September 2001. NicholasAsher. 1993. Reference to Abstract
Objects in Discourse. Kluwer, Dordrecht. Raya Begum, Samar Husain,
Arun Dhwaj, Dipti Mishra Sharma, Lakshmi Bai, and Rajeev Sangal.
2008. Dependency annotation scheme for Indian languages. In
Proceedings of IJCNLP-2008. Alistair Knott. 1996. A Data-driven
Methodology for Motivating a Set of Coherence Relations.
Ph.D.thesis, Department of Articial Intelligence, University of
Edinburgh.
Slide 42
December 2009, ICONHDRB, Umangi et al.42 References Yamuna
Kachru. 2006. Hindi. John Benjamins Publishing Co., Amsterdam.
James R. Martin. 1992. English text: System and structure.
Benjamins, Amsterdam. Lucie Mladova, Sarka Zikanova, and Eva
Hajicova. 2008. From sentence to discourse: Building an annotation
scheme for discourse based on Prague dependency treebank. In
Proceedings of LREC-2008. Rashmi Prasad, Nikhil Dinesh, Alan Lee,
Aravind Joshi, and Bonnie Webber. 2007. Attribution and its
annotation in the Penn discourse treebank. Traitement Automatique
des Langues, Special Issue on Computational Approaches to Document
and Discourse, 47(2). Rashmi Prasad, Nikhil Dinesh, Alan Lee, Eleni
Miltsakaki, Livio Robaldo, Aravind Joshi, and Bonnie Webber. 2008.
The Penn Discourse Treebank 2.0. In Proceedings of LREC-2008.
Chaturbhuj Sahay. 2007. Hindi Padvigyaan. Kumar Prakashan.
Agra.
Slide 43
December 2009, ICONHDRB, Umangi et al.43 References Ted J. M.
Sanders, Wilbert P. M. Spooren, and Leo G. M. Noordman. 1992.
Toward a taxonomy of coherence relations. Discourse Processes,
15:135. Eve Sweetser. 1990. From etymology to pragmatics:
Metaphorical and cultural aspects of semantic structure. Cambridge
University Press. Bonnie Webber and Aravind Joshi. 1998. Anchoring
a lexicalized tree- adjoining grammar for discourse. In Proceedings
of the ACL/COLING Workshop on Discourse Relations and Discourse
Markers. BonnieWebber, Aravind Joshi, Matthew Stone, and
AlistairKnott. 2003. Anaphora and discourse structure.
Computational Linguistics, 29(4):545 587. Nianwen Xue. 2005.
Annotating discourse connectives in the Chinese treebank. In
Proceedings of the ACL Workshop on Frontiers in Corpus Annotation
II : Pie in the Sky. Deniz Zeyrek and Bonnie Webber. 2008. A
discourse resource for Turkish: Annotating discourse connectives in
the metu corpus. In Proceedings of IJCNLP-2008.