72
Exploring PropBanks for English and Hindi Ashwini Vaidya Dept of Linguistics University of Colorado, Boulder

Exploring PropBanks for English and Hindifaculty.washington.edu/fxia/lsa2011/slides/propbank.pdfExploring PropBanks for English and Hindi Ashwini Vaidya Dept of Linguistics University

  • Upload
    others

  • View
    5

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Exploring PropBanks for English and Hindifaculty.washington.edu/fxia/lsa2011/slides/propbank.pdfExploring PropBanks for English and Hindi Ashwini Vaidya Dept of Linguistics University

Exploring PropBanks for English and Hindi

Ashwini Vaidya

Dept of Linguistics

University of Colorado, Boulder

Page 2: Exploring PropBanks for English and Hindifaculty.washington.edu/fxia/lsa2011/slides/propbank.pdfExploring PropBanks for English and Hindi Ashwini Vaidya Dept of Linguistics University

Why is semantic information important?

• Imagine an automatic question answering system

• Who created the first effective polio vaccine?

• Two possible choices:

– Becton Dickinson created the first disposable syringe for use with the mass administration of the first effective polio vaccine

– The first effective polio vaccine was created in 1952 by Jonas Salk at the University of Pittsburgh

Page 3: Exploring PropBanks for English and Hindifaculty.washington.edu/fxia/lsa2011/slides/propbank.pdfExploring PropBanks for English and Hindi Ashwini Vaidya Dept of Linguistics University

Question Answering

• Who created the first effective polio vaccine?

– Becton Dickinson created the first disposable syringe for use with the mass administration of the first effective polio vaccine

– The first effective polio vaccine was created in 1952 by Jonas Salk at the University of Pittsburgh

Page 4: Exploring PropBanks for English and Hindifaculty.washington.edu/fxia/lsa2011/slides/propbank.pdfExploring PropBanks for English and Hindi Ashwini Vaidya Dept of Linguistics University

Question Answering

• Who created the first effective polio vaccine?

– [Becton Dickinson] created the [first disposable syringe] for use with the mass administration of the first effective polio vaccine

– [The first effective polio vaccine] was created in 1952 by [Jonas Salk] at the University of Pittsburgh

Page 5: Exploring PropBanks for English and Hindifaculty.washington.edu/fxia/lsa2011/slides/propbank.pdfExploring PropBanks for English and Hindi Ashwini Vaidya Dept of Linguistics University

Question Answering

• Who created the first effective polio vaccine?

– [Becton Dickinsonagent] created the [first disposable syringetheme] for use with the mass administration of the first effective polio vaccine

– [The first effective polio vaccinetheme] was created in 1952 by [Jonas Salkagent] at the University of Pittsburgh

Page 6: Exploring PropBanks for English and Hindifaculty.washington.edu/fxia/lsa2011/slides/propbank.pdfExploring PropBanks for English and Hindi Ashwini Vaidya Dept of Linguistics University

Question Answering

• We need semantic information to prefer the right answer

• The theme of create should be ‘the first effective polio vaccine’

• The theme in the first sentence was ‘the first disposable syringe’

• We can filter out the wrong answer

Page 7: Exploring PropBanks for English and Hindifaculty.washington.edu/fxia/lsa2011/slides/propbank.pdfExploring PropBanks for English and Hindi Ashwini Vaidya Dept of Linguistics University

We need semantic information

• To find out about events and their participants

• To capture semantic information across syntactic variation

Page 8: Exploring PropBanks for English and Hindifaculty.washington.edu/fxia/lsa2011/slides/propbank.pdfExploring PropBanks for English and Hindi Ashwini Vaidya Dept of Linguistics University

Semantic information

• Semantic information about verbs and participants expressed through semantic roles

• Agent, Experiencer, Theme, Result etc.

• However, difficult to have a standard set of thematic roles

Page 9: Exploring PropBanks for English and Hindifaculty.washington.edu/fxia/lsa2011/slides/propbank.pdfExploring PropBanks for English and Hindi Ashwini Vaidya Dept of Linguistics University

Proposition Bank

• Proposition Bank (PropBank) provides a way to carry out general purpose Semantic role labelling

• A PropBank is a large annotated corpus of predicate-argument information

• A set of semantic roles is defined for each verb

• A syntactically parsed corpus is then tagged with verb-specific semantic role information

Page 10: Exploring PropBanks for English and Hindifaculty.washington.edu/fxia/lsa2011/slides/propbank.pdfExploring PropBanks for English and Hindi Ashwini Vaidya Dept of Linguistics University

Outline

• English PropBank• Background

• Annotation

• Frame files & Tagset

• Hindi PropBank development• Adapting Frame files

• Light verbs

• Mapping from dependency labels

Page 11: Exploring PropBanks for English and Hindifaculty.washington.edu/fxia/lsa2011/slides/propbank.pdfExploring PropBanks for English and Hindi Ashwini Vaidya Dept of Linguistics University

Proposition Bank

• The first (English) PropBank was created on a 1 million syntactically parsed Wall Street Journal corpus

• PropBank annotation has also been done on different genres e.g. web text, biomedical text

• Arabic, Chinese & Hindi PropBanks have been created

Page 12: Exploring PropBanks for English and Hindifaculty.washington.edu/fxia/lsa2011/slides/propbank.pdfExploring PropBanks for English and Hindi Ashwini Vaidya Dept of Linguistics University

English PropBank

• English PropBank envisioned as the next level of Penn Treebank (Kingsbury & Palmer, 2003)

• Added a layer of predicate-argument information to the Penn Treebank

• Broad in its coverage- covering every instance of a verb and its semantic arguments in the corpus

• Amenable to collecting representative statistics

Page 13: Exploring PropBanks for English and Hindifaculty.washington.edu/fxia/lsa2011/slides/propbank.pdfExploring PropBanks for English and Hindi Ashwini Vaidya Dept of Linguistics University

English PropBank Annotation

• Two steps are involved in annotation

– Choose a sense ID for the predicate

– Annotate the arguments of that predicate with semantic roles

• This requires two components: frame files and PropBank tagset

Page 14: Exploring PropBanks for English and Hindifaculty.washington.edu/fxia/lsa2011/slides/propbank.pdfExploring PropBanks for English and Hindi Ashwini Vaidya Dept of Linguistics University

PropBank Frame files

• PropBank defines semantic roles on a verb-by-verb basis

• This is defined in a verb lexicon consisting of frame files

• Each predicate will have a set of roles associated with a distinct usage

• A polysemous predicate can have several rolesets within its frame file

Page 15: Exploring PropBanks for English and Hindifaculty.washington.edu/fxia/lsa2011/slides/propbank.pdfExploring PropBanks for English and Hindi Ashwini Vaidya Dept of Linguistics University

An example

• John rings the bell

ring.01 Make sound of bell

Arg0 Causer of ringing

Arg1 Thing rung

Arg2 Ring for

Page 16: Exploring PropBanks for English and Hindifaculty.washington.edu/fxia/lsa2011/slides/propbank.pdfExploring PropBanks for English and Hindi Ashwini Vaidya Dept of Linguistics University

An example

• John rings the bell

• Tall aspen trees ring the lake

ring.01 Make sound of bell

Arg0 Causer of ringing

Arg1 Thing rung

Arg2 Ring for

ring.02 To surround

Arg1 Surrounding entity

Arg2 Surrounded entity

Page 17: Exploring PropBanks for English and Hindifaculty.washington.edu/fxia/lsa2011/slides/propbank.pdfExploring PropBanks for English and Hindi Ashwini Vaidya Dept of Linguistics University

An example

• [John] rings [the bell]

• [Tall aspen trees] ring [the lake]

ring.01 Make sound of bell

Arg0 Causer of ringing

Arg1 Thing rung

Arg2 Ring forring.02 To surround

Arg1 Surrounding entity

Arg2 Surrounded entity

Ring.01

Ring.02

Page 18: Exploring PropBanks for English and Hindifaculty.washington.edu/fxia/lsa2011/slides/propbank.pdfExploring PropBanks for English and Hindi Ashwini Vaidya Dept of Linguistics University

An example

• [JohnARG0] rings [the bellARG1]

• [Tall aspen treesARG1] ring [the lakeARG2]

ring.01 Make sound of bell

Arg0 Causer of ringing

Arg1 Thing rung

Arg2 Ring forring.02 To surround

Arg1 Surrounding entity

Arg2 Surrounded entity

Ring.01

Ring.02

Page 19: Exploring PropBanks for English and Hindifaculty.washington.edu/fxia/lsa2011/slides/propbank.pdfExploring PropBanks for English and Hindi Ashwini Vaidya Dept of Linguistics University

Frame files

• The Penn Treebank had about 3185 unique lemmas (Palmer, Gildea, Kingsbury, 2005)

• Most frequently occurring verb: say

• Small number of verbs had several framesets e.g. go, come, take, make

• Most others had only one frameset per file

Page 20: Exploring PropBanks for English and Hindifaculty.washington.edu/fxia/lsa2011/slides/propbank.pdfExploring PropBanks for English and Hindi Ashwini Vaidya Dept of Linguistics University

PropBank annotation pane in Jubilee

Page 21: Exploring PropBanks for English and Hindifaculty.washington.edu/fxia/lsa2011/slides/propbank.pdfExploring PropBanks for English and Hindi Ashwini Vaidya Dept of Linguistics University

English PropBank Tagset

• Numbered arguments Arg0, Arg1, and so on until Arg4

• Modifiers with function tags e.g. ArgM-LOC (location) , ArgM-TMP (time), ArgM-PRP (purpose)

• Modifiers give additional information about when, where or how the event occurred

Page 22: Exploring PropBanks for English and Hindifaculty.washington.edu/fxia/lsa2011/slides/propbank.pdfExploring PropBanks for English and Hindi Ashwini Vaidya Dept of Linguistics University

PropBank tagset

Numbered Argument

Description

Arg0 Agent, causer, experiencer

Arg1 Theme, patient

Arg2 Instrument, benefactive, attribute

Arg3 starting point, benefactive, attribute

Arg4 ending point

• Correspond to the valency requirements of the verb• Or, those that occur with high frequency with that verb

Page 23: Exploring PropBanks for English and Hindifaculty.washington.edu/fxia/lsa2011/slides/propbank.pdfExploring PropBanks for English and Hindi Ashwini Vaidya Dept of Linguistics University

PropBank tagset

• 15 modifier labels for English PropBank

• [HeArg0] studied [economic growthArg1] [in IndiaArgM-LOC]

Modifier Description

ArgM-LOC Location

ArgM-TMP Time

ArgM-GOL Goal

ArgM-MNR Manner

ArgM-CAU Cause

ArgM-ADV Adverbial

Page 24: Exploring PropBanks for English and Hindifaculty.washington.edu/fxia/lsa2011/slides/propbank.pdfExploring PropBanks for English and Hindi Ashwini Vaidya Dept of Linguistics University

PropBank tagset

• Verb specific and more generalized

• Arg0 and Arg1 correspond to Dowty’s Proto Roles

• Leverage the commonalities among semantic roles

• Agents, causers, experiencers – Arg0

• Undergoers, patients, themes- Arg1

Page 25: Exploring PropBanks for English and Hindifaculty.washington.edu/fxia/lsa2011/slides/propbank.pdfExploring PropBanks for English and Hindi Ashwini Vaidya Dept of Linguistics University

PropBank tagset

• While annotating Arg0 and Arg1:

– Unaccusative verbs take Arg1 as their subject argument

• [The windowArg1] broke

– Unergatives will take Arg0

• [JohnArg0] sang

• Distinction is also made between internally caused events (blush: Arg0) & externally caused events (redden: Arg1)

Page 26: Exploring PropBanks for English and Hindifaculty.washington.edu/fxia/lsa2011/slides/propbank.pdfExploring PropBanks for English and Hindi Ashwini Vaidya Dept of Linguistics University

PropBank tagset

• How might these map to the more familiar thematic roles?

• Yi, Loper & Palmer (2007) describe such a mapping to VerbNet roles

Page 27: Exploring PropBanks for English and Hindifaculty.washington.edu/fxia/lsa2011/slides/propbank.pdfExploring PropBanks for English and Hindi Ashwini Vaidya Dept of Linguistics University

•More frequent Arg0 and Arg1 (85%) are learnt more easily by automatic systems•Arg2 is less frequent, maps to more than one thematic role•Arg3-5 are even more infrequent

Page 28: Exploring PropBanks for English and Hindifaculty.washington.edu/fxia/lsa2011/slides/propbank.pdfExploring PropBanks for English and Hindi Ashwini Vaidya Dept of Linguistics University

Using PropBank

• As a computational resource– Train semantic role labellers (Pradhan et al, 2005)– Question answering systems (with FrameNet)– Project semantic roles onto a parallel corpus in

another language (Pado & Lapata, 2005)

• For linguists, to study various phenomena related to predicate-argument structure

Page 29: Exploring PropBanks for English and Hindifaculty.washington.edu/fxia/lsa2011/slides/propbank.pdfExploring PropBanks for English and Hindi Ashwini Vaidya Dept of Linguistics University

Outline

• English PropBank• Background

• Annotation

• Frame files & Tagset

• Hindi PropBank development• Adapting Frame files

• Light verbs

• Mapping from dependency labels

Page 30: Exploring PropBanks for English and Hindifaculty.washington.edu/fxia/lsa2011/slides/propbank.pdfExploring PropBanks for English and Hindi Ashwini Vaidya Dept of Linguistics University

Developing PropBank for Hindi-Urdu

• Hindi-Urdu PropBank is part of a project to develop a Multi-layered and multi-representational treebank for Hindi-Urdu

– Hindi Dependency Treebank

– Hindi PropBank

– Hindi Phrase Structure Treebank

• Ongoing project at CU-Boulder

Page 31: Exploring PropBanks for English and Hindifaculty.washington.edu/fxia/lsa2011/slides/propbank.pdfExploring PropBanks for English and Hindi Ashwini Vaidya Dept of Linguistics University

Hindi-Urdu PropBank

• Corpus of 400,000 words for Hindi

• Smaller corpus of 150,000 words for Urdu

• Hindi corpus consists of newswire text from ‘Amar Ujala’

• So far..

– 220 verb frames

– ~100K words annotated

Page 32: Exploring PropBanks for English and Hindifaculty.washington.edu/fxia/lsa2011/slides/propbank.pdfExploring PropBanks for English and Hindi Ashwini Vaidya Dept of Linguistics University

Developing Hindi PropBank

• Making a PropBank resource for a new language

– Linguistic differences

• Capturing relevant language-specific phenomena

– Annotation practices

• Maintain similar annotation practices

– Consistency across PropBanks

Page 33: Exploring PropBanks for English and Hindifaculty.washington.edu/fxia/lsa2011/slides/propbank.pdfExploring PropBanks for English and Hindi Ashwini Vaidya Dept of Linguistics University

Developing Hindi PropBank

• PropBank annotation for English, Chinese & Arabic was done on top of phrase structure trees

• Hindi PropBank is annotated on dependency trees

Page 34: Exploring PropBanks for English and Hindifaculty.washington.edu/fxia/lsa2011/slides/propbank.pdfExploring PropBanks for English and Hindi Ashwini Vaidya Dept of Linguistics University

Dependency tree

• Represent relations that hold between constituents (chunks)

• Karaka labels show the relations between head verb and its dependents

दि येgave

k1k2

पैसेmoney

औरत कोwoman dat

राम नेRaam erg

k4

Page 35: Exploring PropBanks for English and Hindifaculty.washington.edu/fxia/lsa2011/slides/propbank.pdfExploring PropBanks for English and Hindi Ashwini Vaidya Dept of Linguistics University

Hindi PropBank

• There are three components to the annotation

• Hindi Frame file creation

• Insertion of empty categories

• Semantic role labelling

Page 36: Exploring PropBanks for English and Hindifaculty.washington.edu/fxia/lsa2011/slides/propbank.pdfExploring PropBanks for English and Hindi Ashwini Vaidya Dept of Linguistics University

Hindi PropBank

• There are three components to the annotation

• Hindi Frame file creation

• Insertion of empty categories

• Semantic role labelling

• Both frame creation and labelling require new strategies for Hindi

Page 37: Exploring PropBanks for English and Hindifaculty.washington.edu/fxia/lsa2011/slides/propbank.pdfExploring PropBanks for English and Hindi Ashwini Vaidya Dept of Linguistics University

Hindi PropBank

• Hindi frame files were adapted to include

– Morphological causatives

– Unaccusative verbs

– Experiencers

• Additionally, changes had to be made to analyze the large number (nearly 40%) of light verbs

Page 38: Exploring PropBanks for English and Hindifaculty.washington.edu/fxia/lsa2011/slides/propbank.pdfExploring PropBanks for English and Hindi Ashwini Vaidya Dept of Linguistics University

Causatives

• Verbs that are related via morphological derivation are grouped together as individual predicates in the same frame file.

• E.g. Cornerstone’s multi-lemma mode is used for the verbs खा (KA ‘eat’, खखऱा (KilA ‘feed’ and खखऱवा (KilvA‘cause to feed’)

Page 39: Exploring PropBanks for English and Hindifaculty.washington.edu/fxia/lsa2011/slides/propbank.pdfExploring PropBanks for English and Hindi Ashwini Vaidya Dept of Linguistics University

Causatives

raam neArg0 khaanaArg1 khaayaaRam erg food eat-perf

‘Ram ate the food’

mohan neArg0 raam koArg2 khaanaArg1 khilaayaaMohan erg Ram dat food eat-caus-perf

‘Mohan made Ram eat the food’

sita neArgC mohan seArgA raam koArg2 khaanaArg1 khilvaayaaSita erg Mohan instr Ram acc food eat-ind.caus-caus-perf

‘Sita, through Mohan made Ram eat the food ’

Roleset id: KA.01 to eat

Arg0 eater

Arg1 the thing being eaten

Roleset id: KilA.01 to feed

Arg0 feeder

Arg2 eater

Arg1 the thing being eaten

Roleset id: KilvA.01 to causeto be fed

ArgC Causer of feeding

ArgA feeder

Arg2 Eater

Arg1 the thing eaten

Page 40: Exploring PropBanks for English and Hindifaculty.washington.edu/fxia/lsa2011/slides/propbank.pdfExploring PropBanks for English and Hindi Ashwini Vaidya Dept of Linguistics University

Unaccusatives

• PropBank needs to distinguish proto agents and proto patients

• Sudha danced –– Unergative, animate agentive arguments- Arg0

• The door opened – Unaccusative, non animate patient arguments- Arg1

• For English, these distinctions were available in VerbNet

• For Hindi, various diagnostic tests are applied to make this distinction

Page 41: Exploring PropBanks for English and Hindifaculty.washington.edu/fxia/lsa2011/slides/propbank.pdfExploring PropBanks for English and Hindi Ashwini Vaidya Dept of Linguistics University

Yes

No- Second Stage

Ergative. E.g. naac, dauRa, bEtha?

First stage

Yes.Unaccus-ative. E.g. khulnaa, barasnaa

No-Third stage

Yes.Unacc-usative. Eliminate bEtha

No. Take the majority vote on the tests.

cognate object & ergative case tests

Applicable to verbs undergoing transitivity alternation: eliminate those that take (mostly) animate subjects

For non-alternating verbs and others that remain, the verb will be unaccusative if:• Impersonal passives are not possible• use of ‘huaa’ (past participial relative) is possible• the inanimate subject appears without overt genitive

Unaccusativity diagnostics (5)

Page 42: Exploring PropBanks for English and Hindifaculty.washington.edu/fxia/lsa2011/slides/propbank.pdfExploring PropBanks for English and Hindi Ashwini Vaidya Dept of Linguistics University

Unaccusative verb : महक mahak ‘to smell good’

Arg1 entity that smells good

• impersonal passive test• past participial relative test

Unaccusativity

• This information is then captured in the frame

Page 43: Exploring PropBanks for English and Hindifaculty.washington.edu/fxia/lsa2011/slides/propbank.pdfExploring PropBanks for English and Hindi Ashwini Vaidya Dept of Linguistics University

Experiencers

• Arg0 includes agents, causers, experiencers• In Hindi, the experiencer subjects occur with dikhnaa (to

glimpse), milnaa (to find), suujhnaa (to be struck with) etc.

• Typically marked with dative case– Mujh-ko chaand dikhaa

I. Dat moon glimpse.pf

(I glimpsed the moon)

• PropBank labels these as Arg0, with an additional descriptor field ‘experiencer’– Internally caused events

Page 44: Exploring PropBanks for English and Hindifaculty.washington.edu/fxia/lsa2011/slides/propbank.pdfExploring PropBanks for English and Hindi Ashwini Vaidya Dept of Linguistics University

Complex predicates

• A large number of complex predicates in Hindi

– Noun-verb complex predicates

• vichaar karna; think do (to think)

• dhyaan denaa; attention give (to pay attention)

– Adjectives can also occur

• accha lagnaa: good seem (to like)

– the predicating element is not the verb alone, but forms a composite with the noun/adj

Page 45: Exploring PropBanks for English and Hindifaculty.washington.edu/fxia/lsa2011/slides/propbank.pdfExploring PropBanks for English and Hindi Ashwini Vaidya Dept of Linguistics University

Complex predicates

• There are also a number of verb-verb complex predicates

• Combine with the bare form of the verb to convey different meanings

• ro paRaa; cry lie- burst out crying

• Add some aspectual meaning to the verb

• As these occur with only a certain set of verbs, we find them automatically, based on part of speech tag

Page 46: Exploring PropBanks for English and Hindifaculty.washington.edu/fxia/lsa2011/slides/propbank.pdfExploring PropBanks for English and Hindi Ashwini Vaidya Dept of Linguistics University

Complex predicates

• However, the noun-verb complex predicates are more pervasive

• They occur with a large number of nouns

• Frame files for the nominal predicate need to be created

• E.g. in a sample of 90K words, the light verb kar; ‘do’ occurred 1738 times with 298 different nominals

Page 47: Exploring PropBanks for English and Hindifaculty.washington.edu/fxia/lsa2011/slides/propbank.pdfExploring PropBanks for English and Hindi Ashwini Vaidya Dept of Linguistics University

Complex predicates

• Annotation strategy

• Additional resources (frame files)

• Cluster the large number of nominals?

Page 48: Exploring PropBanks for English and Hindifaculty.washington.edu/fxia/lsa2011/slides/propbank.pdfExploring PropBanks for English and Hindi Ashwini Vaidya Dept of Linguistics University

Light verb annotation

• A convention for the annotation of light verbs has been adopted across Hindi, Arabic, Chinese and English PropBanks (Hwang et. al. 2010)

• Annotation is carried out in three passes– Manual identification of light verb– Annotation of arguments based on frame file– Deterministic merging of the light verb and ‘true’

predicate

• In Hindi, this process may be simplified because of the dependency label ‘pof’ that identifies a light verb

Page 49: Exploring PropBanks for English and Hindifaculty.washington.edu/fxia/lsa2011/slides/propbank.pdfExploring PropBanks for English and Hindi Ashwini Vaidya Dept of Linguistics University

राम ने पैसे चोरी कीएRaam erg Money theft Do.perf

‘Ram stole the money’

1. Identify the N+V sequences that are complex predicates. 2. Annotate predicating expression with ARG-PRX.

REL: karARG-PRX: corii

3. Annotate the arguments and modifiers of the complex predicate with a nominal predicate frame

4. Automatically merge the nominal with the light verb.REL: corii_kiyeArg0: raamArg1: paese

Page 50: Exploring PropBanks for English and Hindifaculty.washington.edu/fxia/lsa2011/slides/propbank.pdfExploring PropBanks for English and Hindi Ashwini Vaidya Dept of Linguistics University

Hindi PropBank tagset

24 labels

Label Description

ARG0 Agent, causer, experiencer

ARG1 Patient, theme, undergoer

ARG2 Beneficiary

ARG3 Instrument

50

Page 51: Exploring PropBanks for English and Hindifaculty.washington.edu/fxia/lsa2011/slides/propbank.pdfExploring PropBanks for English and Hindi Ashwini Vaidya Dept of Linguistics University

Annotating Hindi PropBank

Label Description

ARG0 Agent, causer, experiencer

ARG1 Patient, theme, undergoer

ARG2 Beneficiary

ARG3 Instrument

ARG2-ATRARG2-LOC

attribute ARG2-GOLARG2-SOU

goal

location source

Function tags

51

Page 52: Exploring PropBanks for English and Hindifaculty.washington.edu/fxia/lsa2011/slides/propbank.pdfExploring PropBanks for English and Hindi Ashwini Vaidya Dept of Linguistics University

Annotating Hindi PropBank

Label Description

ARG0 Agent, causer, experiencer

ARG1 Patient, theme, undergoer

ARG2 Beneficiary

ARG3 Instrument

ARG2-ATRARG2-LOC

attribute ARG2-GOLARG2-SOU

goal

location source

ARGC causer

ARGA secondary causer

Function tags

Causative

52

Page 53: Exploring PropBanks for English and Hindifaculty.washington.edu/fxia/lsa2011/slides/propbank.pdfExploring PropBanks for English and Hindi Ashwini Vaidya Dept of Linguistics University

Annotating Hindi PropBank

Label Description

ARG0 Agent, causer, experiencer

ARG1 Patient, theme, undergoer

ARG2 Beneficiary

ARG3 Instrument

ARG2-ATRARG2-LOC

attribute ARG2-GOLARG2-SOU

goal

location source

ARGC causer

ARGA secondary causer

ARGM-VLV Verb-verb construction

ARGM-PRX Noun-verb construction

Function tags

Causative

Complex predicate

53

Page 54: Exploring PropBanks for English and Hindifaculty.washington.edu/fxia/lsa2011/slides/propbank.pdfExploring PropBanks for English and Hindi Ashwini Vaidya Dept of Linguistics University

Annotating Hindi PropBank

Label Description Label Description

ARGM-ADV adverb ARGM-CAU cause

ARGM-DIR direction ARGM-DIS discourse

ARGM-EXT extent ARGM-LOC location

ARGM-MNR manner ARGM-MNS means

ARGM-MOD modal ARGM-NEG negation

ARGM-PRP purpose ARGM-TMP time

•Other modifier labels

54

Page 55: Exploring PropBanks for English and Hindifaculty.washington.edu/fxia/lsa2011/slides/propbank.pdfExploring PropBanks for English and Hindi Ashwini Vaidya Dept of Linguistics University

Hindi PropBank annotation

Annotation pane Frameset file display

Page 56: Exploring PropBanks for English and Hindifaculty.washington.edu/fxia/lsa2011/slides/propbank.pdfExploring PropBanks for English and Hindi Ashwini Vaidya Dept of Linguistics University

Annotation practice

• Need to maintain good annotation practices

• Current practices- double blind, followed by adjudication

• Inter-annotator agreement measures the consistency in the annotation task

• English PropBank had high inter annotator agreement K=0.91

Page 57: Exploring PropBanks for English and Hindifaculty.washington.edu/fxia/lsa2011/slides/propbank.pdfExploring PropBanks for English and Hindi Ashwini Vaidya Dept of Linguistics University

Hindi PropBank annotation

• Improve consistency & annotation speed

• PropBank annotation on dependency trees has some advantages

• Hindi Treebank uses a large set of dependency labels that have rich semantic information

दि येgave

Arg0k1

Arg1k2

Arg2k4

पैसेmoney

औरत कोwoman dat

राम नेRaam erg

Page 58: Exploring PropBanks for English and Hindifaculty.washington.edu/fxia/lsa2011/slides/propbank.pdfExploring PropBanks for English and Hindi Ashwini Vaidya Dept of Linguistics University

Deriving PropBank labels from dependencies

• We can derive Hindi PropBank labels from Dependency labels

• Mapping will reduce annotation effort, improve speed

• The dependency tagset has labels that are in some ways fairly similar to PropBank

– Verb specific labels k1- 5

– Verb modifier labels k7p, k7t, rh etc.

Page 59: Exploring PropBanks for English and Hindifaculty.washington.edu/fxia/lsa2011/slides/propbank.pdfExploring PropBanks for English and Hindi Ashwini Vaidya Dept of Linguistics University

Label comparison

•Using linguistic intuition, we can compare HDT labels with the numbered arguments in HPB

59

Page 60: Exploring PropBanks for English and Hindifaculty.washington.edu/fxia/lsa2011/slides/propbank.pdfExploring PropBanks for English and Hindi Ashwini Vaidya Dept of Linguistics University

Label comparison

• Similarly, linguistic intuition gives us the mapping from HDT for HPB modifiers

60

Page 61: Exploring PropBanks for English and Hindifaculty.washington.edu/fxia/lsa2011/slides/propbank.pdfExploring PropBanks for English and Hindi Ashwini Vaidya Dept of Linguistics University

Label comparison

• These mappings are included in the PB frame files, for example, the verb ‘A: to come’

• Only for numbered arguments

• Basis for the linguistically motivated rules

Roleset Usage Rule

A.01 To come (path) k1 Arg1k2p Arg2-GOL

A.03 to arrive k1 Arg0k2p Arg2-GOLk5 Arg2-SOU

61

Page 62: Exploring PropBanks for English and Hindifaculty.washington.edu/fxia/lsa2011/slides/propbank.pdfExploring PropBanks for English and Hindi Ashwini Vaidya Dept of Linguistics University

Automatic mapping of DT to PB

• A rule based, probabilistic system for automatic mapping

• We use two kinds of resources:

– Annotated corpus [Treebank+ PropBank]

• 32,300 tokens, 2005 predicates

– Frame files with mapping rules

62

Page 63: Exploring PropBanks for English and Hindifaculty.washington.edu/fxia/lsa2011/slides/propbank.pdfExploring PropBanks for English and Hindi Ashwini Vaidya Dept of Linguistics University

Argument classification

• We use three kinds of rules to carry out automatic mapping– Deterministic rules

• Dependency label mapped directly onto PropBank

– Empirically derived rules• Using corpus statistics associated with dependency &

PropBank labels

– Linguistically motivated rules• Derived from linguistic intuition & captured in frame

files

63

Page 64: Exploring PropBanks for English and Hindifaculty.washington.edu/fxia/lsa2011/slides/propbank.pdfExploring PropBanks for English and Hindi Ashwini Vaidya Dept of Linguistics University

Example of the rules

Features Count PropBank labels

xe.01_active_k1(give)

32 Arg0:0.93

Arg1: 0.03

Arg2:0.03

xe.01_active_k2 65 Arg1:0.95

Arg2:0.01

Arg0:0.01

xe.01_active_k4 34 Arg2:0.94

Arg0:0.02

•Associate the probability of each label in combination with a particular feature tuple•We use only 3: roleset ID, voice, dependency label•For the verb give, we get the correct mapping to the Hindi labels

Page 65: Exploring PropBanks for English and Hindifaculty.washington.edu/fxia/lsa2011/slides/propbank.pdfExploring PropBanks for English and Hindi Ashwini Vaidya Dept of Linguistics University

Evaluation

• 32,300 tokens of annotated data

• Ten-fold cross validation for evaluation

65

Page 66: Exploring PropBanks for English and Hindifaculty.washington.edu/fxia/lsa2011/slides/propbank.pdfExploring PropBanks for English and Hindi Ashwini Vaidya Dept of Linguistics University

Results

PRECISION RECALL F1 SCORE

Empirically derived rules

90.59 47.92 62.69

Linguistically motivated rules

89.80 55.28 68.44

66

Page 67: Exploring PropBanks for English and Hindifaculty.washington.edu/fxia/lsa2011/slides/propbank.pdfExploring PropBanks for English and Hindi Ashwini Vaidya Dept of Linguistics University

Results

PRECISION RECALL F1 SCORE

Empirically derived rules

90.59 47.92 62.69

Linguistically motivated rules

89.80 55.28 68.44

Numbered Argument Accuracy

PRECISION RECALL F-SCORE

Empiricallyderived rules

93.63 58.76 72.21

Linguistically motivated

rules

91.87 72.36 80.96

67

Page 68: Exploring PropBanks for English and Hindifaculty.washington.edu/fxia/lsa2011/slides/propbank.pdfExploring PropBanks for English and Hindi Ashwini Vaidya Dept of Linguistics University

Evaluation

• Linguistically motivated rules improve the recall with a slight drop in the precision

• With the most frequent PropBank labels, empirically derived rules perform well

• More data should improve the performance for modifier arguments

68

Page 69: Exploring PropBanks for English and Hindifaculty.washington.edu/fxia/lsa2011/slides/propbank.pdfExploring PropBanks for English and Hindi Ashwini Vaidya Dept of Linguistics University

Evaluation

• Annotation practices also affect the mapping

– PB labels are coarse-grained: E.g. ArgM-ADV maps to four different dependency labels

– PB are fine-grained: E.g. ‘means’ and ‘causes’ are distinguished (ArgM-MNS, ArgM-CAU)but are lumped together under a single label in dependency treebank

69

Page 70: Exploring PropBanks for English and Hindifaculty.washington.edu/fxia/lsa2011/slides/propbank.pdfExploring PropBanks for English and Hindi Ashwini Vaidya Dept of Linguistics University

Evaluation

• Our goal is to find a useful set of mappings to use at a pre-annotation stage

• We expect that with more PropBanked data, empirically derived rules will perform better

• Using additional resources such as Hindi WordNet can help to make up for the lack of training data

70

Page 71: Exploring PropBanks for English and Hindifaculty.washington.edu/fxia/lsa2011/slides/propbank.pdfExploring PropBanks for English and Hindi Ashwini Vaidya Dept of Linguistics University

Summary

• Semantic role labels

• English PropBank

– Frame files

– PropBank tagset

• Development of Hindi PropBank

– Linguistic issues

– Experiments

Page 72: Exploring PropBanks for English and Hindifaculty.washington.edu/fxia/lsa2011/slides/propbank.pdfExploring PropBanks for English and Hindi Ashwini Vaidya Dept of Linguistics University

Questions?