54
The AVENUE Project Data Elicitation System Lori Levin Language Technologies Institute School of Computer Science Carnegie Mellon University

The AVENUE Project Data Elicitation System Lori Levin Language Technologies Institute School of Computer Science Carnegie Mellon University

Embed Size (px)

Citation preview

Page 1: The AVENUE Project Data Elicitation System Lori Levin Language Technologies Institute School of Computer Science Carnegie Mellon University

The AVENUE Project Data Elicitation System

Lori LevinLanguage Technologies Institute

School of Computer ScienceCarnegie Mellon University

Page 2: The AVENUE Project Data Elicitation System Lori Levin Language Technologies Institute School of Computer Science Carnegie Mellon University

Joint work with

• Dr. Jeff Good

• Dr. Robert Frederking

• Alison Alvarez

Page 3: The AVENUE Project Data Elicitation System Lori Levin Language Technologies Institute School of Computer Science Carnegie Mellon University

Outline

• The AVENUE MT project– Including a list of languages we have worked on

• The elicitation tool– Including which kinds of fonts it works for

• The elicitation corpus– Including which languages it has been translated into

• Tools for building and revising elicitation corpora

Page 4: The AVENUE Project Data Elicitation System Lori Levin Language Technologies Institute School of Computer Science Carnegie Mellon University

MT Approaches

Interlingua: introduce-self

Syntactic ParsingPronoun-acc-1-sg chiamare-1sg N

Semantic Analysis

Sentence Planning Text

Generation[np poss-1sg “name”] BE-pres N

SourceMi chiamo Lori

TargetMy name is Lori

Transfer Rules

Direct: SMT, EBMT

AVENUE: Automate Rule Learning

Page 5: The AVENUE Project Data Elicitation System Lori Levin Language Technologies Institute School of Computer Science Carnegie Mellon University

AVENUE Machine Translation System

Type informationSynchronous Context Free

RulesAlignments

x-side constraints

y-side constraints

xy-constraints, e.g. ((Y1 AGR) = (X1 AGR))

;SL: the old man, TL: ha-ish ha-zaqen

NP::NP [DET ADJ N] -> [DET N DET ADJ]((X1::Y1)(X1::Y3)(X2::Y4)(X3::Y2)

((X1 AGR) = *3-SING)((X1 DEF = *DEF)((X3 AGR) = *3-SING)((X3 COUNT) = +)

((Y1 DEF) = *DEF)((Y3 DEF) = *DEF)((Y2 AGR) = *3-SING)((Y2 GENDER) = (Y4 GENDER)))

Jaime Carbonell (PI), Alon Lavie (Co-PI), Lori Levin (Co-PI)

Rule learning: Katharina Probst

Page 6: The AVENUE Project Data Elicitation System Lori Levin Language Technologies Institute School of Computer Science Carnegie Mellon University

AVENUE

• Rules can be written by hand or learned automatically.

• Hybrid– Rule-based transfer– Statistical decoder– Multi-engine combinations with SMT and EBMT

Page 7: The AVENUE Project Data Elicitation System Lori Levin Language Technologies Institute School of Computer Science Carnegie Mellon University

AVENUE systems(Small and experimental, but tested on unseen data)

• Hebrew-to-English – Alon Lavie, Shuly Wintner, Katharina Probst– Hand-written and automatically learned– Automatic rules trained on 120 sentences perform

slightly better than about 20 hand-written rules.

• Hindi-to-English – Lavie, Peterson, Probst, Levin, Font, Cohen, Monson– Automatically learned– Performs better than SMT when training data is limited

to 50K words

Page 8: The AVENUE Project Data Elicitation System Lori Levin Language Technologies Institute School of Computer Science Carnegie Mellon University

AVENUE systems(Small and experimental, but tested on unseen data)

• English-to-Spanish– Ariadna Font Llitjos– Hand-written, automatically corrected

• Mapudungun-to-Spanish – Roberto Aranovich and Christian Monson– Hand-written

• Dutch-to-English – Simon Zwarts– Hand-written

Page 9: The AVENUE Project Data Elicitation System Lori Levin Language Technologies Institute School of Computer Science Carnegie Mellon University

Outline

• The AVENUE MT projectThe elicitation tool

• The questionnaire

• Tools for building questionnaires

Page 10: The AVENUE Project Data Elicitation System Lori Levin Language Technologies Institute School of Computer Science Carnegie Mellon University

Elicitation

• Get data from someone who is– Bilingual – Literate

• With consistent spelling

– Not experienced with linguistics

Page 11: The AVENUE Project Data Elicitation System Lori Levin Language Technologies Institute School of Computer Science Carnegie Mellon University

English-Hindi Example

Elicitation Tool: Erik Peterson

Page 12: The AVENUE Project Data Elicitation System Lori Levin Language Technologies Institute School of Computer Science Carnegie Mellon University

English-Chinese Example

Note: Translator has to insert spaces between words in Chinese.

Page 13: The AVENUE Project Data Elicitation System Lori Levin Language Technologies Institute School of Computer Science Carnegie Mellon University

English-Arabic Example

Page 14: The AVENUE Project Data Elicitation System Lori Levin Language Technologies Institute School of Computer Science Carnegie Mellon University

Outline

• The AVENUE MT project

• The elicitation toolThe elicitation corpus

• Tools for building elicitation corpora

Page 15: The AVENUE Project Data Elicitation System Lori Levin Language Technologies Institute School of Computer Science Carnegie Mellon University

Size of Questionnaire

• Around 3200 sentences

• 20K words

Page 16: The AVENUE Project Data Elicitation System Lori Levin Language Technologies Institute School of Computer Science Carnegie Mellon University

EC Sample: clause level• Mary is writing a book for John.• Who let him eat the sandwich?• Who had the machine crush the

car?• They did not make the policeman

run.• Mary had not blinked.• The policewoman was willing to

chase the boy.• Our brothers did not destroy files.• He said that there is not a manual.• The teacher who wrote a textbook

left.• The policeman chased the man

who was a thief.• Mary began to work.

• Tense, aspect, transitivity, animacy

• Questions, causation and permission

• Interaction of lexical and grammatical aspect

• Volitionality

• Embedded clauses and sequence of tense

• Relative clauses

• Phase aspect

Page 17: The AVENUE Project Data Elicitation System Lori Levin Language Technologies Institute School of Computer Science Carnegie Mellon University

EC Sample: noun phrase level

• The man quit in November.• The man works in the

afternoon.• The balloon floated over the

library.• The man walked over the

platform.• The man came out from

among the group of boys.• The long weekly meeting

ended.• The large bus to the post office

broke down.• The second man laughed.• All five boys laughed.

• Temporal and locative meanings• Quantifiers• Numbers• Combinations of different types of

modifers– My book

• Possession, definiteness– A book of mine

• Possession, indefiniteness

Page 18: The AVENUE Project Data Elicitation System Lori Levin Language Technologies Institute School of Computer Science Carnegie Mellon University

Organization into Minimal Pairs

srcsent: Tú caíste.tgtsent: Eymi ütrünagimi.aligned: ((1,1),(2,2))context: tú = Juan [masculino, 2a persona del singular]comment: You (John) fell

srcsent: Tú estás cayendo.tgtsent: Eymi petu ütrünagimi.aligned: ((1,1),(2 3,2 3))context: tú = Juan [masculino, 2a persona del singular]comment: You (John) are falling

srcsent: Tú caíste .tgtsent: Eymi ütrunagimi.aligned: ((1,1),(2,2))context: tú = María [femenino, 2a persona del singular]comment: You (Mary) fell

Page 19: The AVENUE Project Data Elicitation System Lori Levin Language Technologies Institute School of Computer Science Carnegie Mellon University

Feature Detection: Spanish

The girl saw a red book.((1,1)(2,2)(3,3)(4,4)(5,6)(6,5))La niña vió un libro rojo

A girl saw a red book((1,1)(2,2)(3,3)(4,4)(5,6)(6,5))Una niña vió un libro rojo

I saw the red book((1,1)(2,2)(3,3)(4,5)(5,4))Yo vi el libro rojo

I saw a red book.

((1,1)(2,2)(3,3)(4,5)(5,4)) Yo vi un libro rojo

Feature: definitenessValues: definite, indefiniteFunction-of-*: subj, objMarked-on-head-of-*: noMarked-on-dependent: yesMarked-on-governor: noMarked-on-other: noAdd/delete-word: noChange-in-alignment: no

Page 20: The AVENUE Project Data Elicitation System Lori Levin Language Technologies Institute School of Computer Science Carnegie Mellon University

Feature Detection: Chinese

A girl saw a red book.

((1,2)(2,2)(3,3)(3,4)(4,5)(5,6)(5,7)(6,8))

有 一个 女人 看见 了 一本 红色 的 书 。

The girl saw a red book.

((1,1)(2,1)(3,3)(3,4)(4,5)(5,6)(6,7))

女人 看见 了 一本 红色的 书

Feature: definiteness

Values: definite, indefinite

Function-of-*: subject

Marked-on-head-of-*: no

Marked-on-dependent: no

Marked-on-governor: no

Add/delete-word: yes

Change-in-alignment: no

Page 21: The AVENUE Project Data Elicitation System Lori Levin Language Technologies Institute School of Computer Science Carnegie Mellon University

Feature Detection: Chinese

I saw the red book((1, 3)(2, 4)(2, 5)(4, 1)(5, 2))

红色的 书, 我 看见 了

I saw a red book.((1,1)(2,2)(2,3)(2, 4)(4,5)(5,6))我 看见 了 一本 红色的 书 。

Feature: definitenesValues: definite, indefiniteFunction-of-*: objectMarked-on-head-of-*: noMarked-on-dependent: noMarked-on-governor: noAdd/delete-word: yesChange-in-alignment: yes

Page 22: The AVENUE Project Data Elicitation System Lori Levin Language Technologies Institute School of Computer Science Carnegie Mellon University

Feature Detection: Hebrew

A girl saw a red book.((2,1) (3,2)(5,4)(6,3))

ראתה ספר אדוםילדה

The girl saw a red book((1,1)(2,1)(3,2)(5,4)(6,3))

ראתה ספר אדוםהילדה

I saw a red book.((2,1)(4,3)(5,2))

אדוםספרראיתי

I saw the red book.((2,1)(3,3)(3,4)(4,4)(5,3))

האדוםהספרראיתי את

Feature: definitenessValues: definite, indefiniteFunction-of-*: subj, objMarked-on-head-of-*: yesMarked-on-dependent: yesMarked-on-governor: noAdd-word: noChange-in-alignment: no

Page 23: The AVENUE Project Data Elicitation System Lori Levin Language Technologies Institute School of Computer Science Carnegie Mellon University

Feature Detection Feeds into…

• Corpus Navigation: which minimal pairs to pursue next.– Don’t pursue gender in Mapudungun– Do pursue definiteness in Hebrew

• Morphology Learning:– Morphological learner identifies the forms of the morphemes– Feature detection identifies the functions

• Rule learning:– Rule learner will have to learn a constraint for each morpho-

syntactic marker that is discovered• E.g., Adjectives and nouns agree in gender, number, and definiteness

in Hebrew.

Page 24: The AVENUE Project Data Elicitation System Lori Levin Language Technologies Institute School of Computer Science Carnegie Mellon University

Languages

• The set of feature structures with English sentences has been delivered to the Linguistic Data Consortium as part of the Reflex program.

• Translated (by LDC) into:– Thai– Bengali

• Plans to translate into:– Seven “strategic” languages per year for five years.

• As one small part of a language pack (BLARK) for each language.

Page 25: The AVENUE Project Data Elicitation System Lori Levin Language Technologies Institute School of Computer Science Carnegie Mellon University

Languages

• Spanish version in progress at New Mexico State University (Helmreich and Cowie)– Plans to translate into Guarani

• Portuguese version in progress in Brazil (Marcello Modesto)– Plans to translate into Karitiana

• 200 speakers

• Plans to translate into Inupiaq (Kaplan and MacLean)

Page 26: The AVENUE Project Data Elicitation System Lori Levin Language Technologies Institute School of Computer Science Carnegie Mellon University

Previous Elicitation Work

• Pilot corpus– Around 900 sentences– No feature structures

• Mapudungun– Two partial translations

• Quechua– Three translations

• Aymara– Seven translations

• Hebrew• Hindi

– Several translations• Dutch

Page 27: The AVENUE Project Data Elicitation System Lori Levin Language Technologies Institute School of Computer Science Carnegie Mellon University

Feature Structures

• The EC is actually a corpus of feature structures that happen to have English or Spanish sentences attached to them.

Page 28: The AVENUE Project Data Elicitation System Lori Levin Language Technologies Institute School of Computer Science Carnegie Mellon University

Bengali example with feature structure

srcsent: The large bus to the post office broke down. context: tgtsent:

((actor ((modifier ((mod-role mod-descriptor)(mod-role role-loc-general-to))) (np-identifiability identifiable)(np-specificity specific)(np-biological-gender bio-gender-n/a)(np-animacy anim-inanimate)(np-person person-third)(np-function fn-actor)(np-general-type common-noun-type)(np-number num-sg)(np-pronoun-exclusivity inclusivity-n/a)(np-pronoun-antecedent antecedent-n/a)(np-distance distance-neutral)))

(c-general-type declarative-clause)(c-my-causer-intentionality intentionality-n/a)(c-comparison-type comparison-n/a)(c-relative-tense relative-n/a)(c-our-boundary boundary-n/a)(c-comparator-function comparator-n/a)(c-causee-control control-n/a)(c-our-situations situations-n/a)(c-comparand-type comparand-n/a)(c-causation-directness directness-n/a)(c-source source-neutral)(c-causee-volitionality volition-n/a)(c-assertiveness assertiveness-neutral)(c-solidarity solidarity-neutral)(c-polarity polarity-positive)(c-v-grammatical-aspect gram-aspect-neutral)(c-adjunct-clause-type adjunct-clause-type-n/a)(c-v-phase-aspect phase-aspect-neutral)(c-v-lexical-aspect activity-accomplishment)(c-secondary-type secondary-neutral)(c-event-modality event-modality-none)(c-function fn-main-clause)(c-minor-type minor-n/a)(c-copula-type copula-n/a)(c-v-absolute-tense past)(c-power-relationship power-peer)(c-our-shared-subject shared-subject-n/a)(c-question-gap gap-n/a))

Page 29: The AVENUE Project Data Elicitation System Lori Levin Language Technologies Institute School of Computer Science Carnegie Mellon University

Why feature structures?

• Decide what grammatical meaning to elicit.

• Represent it in a feature structure.

• Formulate an English or Spanish sentence that expresses that meaning.– We can use the same corpus of feature

structures for several elicitation languages

• Have the informant translate it.

Page 30: The AVENUE Project Data Elicitation System Lori Levin Language Technologies Institute School of Computer Science Carnegie Mellon University

Grammatical meanings vs syntactic categories

• Features and values are based on a collection of grammatical meanings– Many of which are similar to the

grammatemes of the Prague Treebanks

Page 31: The AVENUE Project Data Elicitation System Lori Levin Language Technologies Institute School of Computer Science Carnegie Mellon University

Grammatical Meanings

YES• Semantic Roles• Identifiability• Specificity• Time

– Before, after, or during time of speech

• Modality

NO• Case• Voice• Determiners• Auxiliary verbs

Page 32: The AVENUE Project Data Elicitation System Lori Levin Language Technologies Institute School of Computer Science Carnegie Mellon University

Grammatical Meanings

YES• How is identifiability

expressed?– Determiner– Word order– Optional case marker– Optional verb agreement

• How is specificity expressed?

• How are generics expressed?

• How are predicate nominals marked?

NO• How are English

determiners translated?– The boy cried.– The lion is a fierce beast.– I ate a sandwich.– He is a soldier.

• Il est soldat.

Page 33: The AVENUE Project Data Elicitation System Lori Levin Language Technologies Institute School of Computer Science Carnegie Mellon University

Argument Roles

• Actor

• Undergoer

• Predicate and predicatee– The woman is the manager.

• Recipient– I gave a book to the students.

• Beneficiary– I made a phone call for Sam.

Page 34: The AVENUE Project Data Elicitation System Lori Levin Language Technologies Institute School of Computer Science Carnegie Mellon University

Why not subject and object?

• Languages use their voice systems for different purposes.

• Mapudungun obligatorily uses an inverse marked verb when third person acts on first or second person.– Verb agrees with undergoer– Undergoer exhibits other subjecthood properties– Actor may be object.

• Yes: How are actor and undergoer encoded in combination with other semantic features like adversity (Japanese) and person (Mapudungun)?

• No: How is English voice translated into another language?

Page 35: The AVENUE Project Data Elicitation System Lori Levin Language Technologies Institute School of Computer Science Carnegie Mellon University

Argument Roles

• Accompaniment– With someone– With pleasure

• Material– (out) of wood

• About 20 more roles – From the Lingua checklist; Comrie & Smith (1977)– Many also found in tectogrammatical representations in the

Prague Treebanks

• Around 80 locative relations– From Lingua checklist

• Many temporal relations

Page 36: The AVENUE Project Data Elicitation System Lori Levin Language Technologies Institute School of Computer Science Carnegie Mellon University

Noun Phrase Features

• Person• Number• Biological gender• Animacy• Distance (for deictics)• Identifiability• Specificity• Possession• Other semantic roles

– Accompaniment, material, location, time, etc.

• Type– Proper, common, pronoun

• Cardinals• Ordinals• Quantifiers• Given and new

information– Not used yet because of

limited context in the elicitation tool.

Page 37: The AVENUE Project Data Elicitation System Lori Levin Language Technologies Institute School of Computer Science Carnegie Mellon University

Clause level features

• Tense• Aspect

– Lexical, grammatical, phase

• Type– Declarative, open-q,

yes-no-q

• Function– Main, argument,

adjunct, relative

• Source– Hearsay, first-hand,

sensory, assumed

• Assertedness– Asserted,

presupposed, wanted

• Modality– Permission, obligation– Internal, external

Page 38: The AVENUE Project Data Elicitation System Lori Levin Language Technologies Institute School of Computer Science Carnegie Mellon University

Other clause types(Constructions)

• Causative– Make/let/have someone do something

• Predication– May be expressed with or without an overt copula.

• Existential– There is a problem.

• Impersonal– One doesn’t smoke in restaurants in the US.

• Lament– If only I had read the paper.

• Conditional• Comparative• Etc.

Page 39: The AVENUE Project Data Elicitation System Lori Levin Language Technologies Institute School of Computer Science Carnegie Mellon University

Outline

• The AVENUE MT project

• The elicitation tool

• The elicitation corpusTools for elicitation corpora

Page 40: The AVENUE Project Data Elicitation System Lori Levin Language Technologies Institute School of Computer Science Carnegie Mellon University

Mar 1, 2006

Tools for Creating Elicitation Corpora

List of semantic features and values

The Corpus

Feature Maps: which combinations of features and values are of interest

…Clause-Level

Noun-Phrase

Tense & Aspect Modality

Feature Structure Sets

Feature Specification

Reverse Annotated Feature Structure Sets: add English sentences

Smaller CorpusSampling

XML SchemaXSLT Script

Page 41: The AVENUE Project Data Elicitation System Lori Levin Language Technologies Institute School of Computer Science Carnegie Mellon University

Mar 1, 2006

Tools for Creating Elicitation Corpora

List of semantic features and values

The Corpus

Feature Maps: which combinations of features and values are of interest

…Clause-Level

Noun-Phrase

Tense & Aspect Modality

Feature Structure Sets

Feature Specification

Reverse Annotated Feature Structure Sets: add English sentences

Smaller CorpusSampling

Combination Formalism

Page 42: The AVENUE Project Data Elicitation System Lori Levin Language Technologies Institute School of Computer Science Carnegie Mellon University

Mar 1, 2006

Tools for Creating Elicitation Corpora

List of semantic features and values

The Corpus

Feature Maps: which combinations of features and values are of interest

…Clause-Level

Noun-Phrase

Tense & Aspect Modality

Feature Structure Sets

Feature Specification

Reverse Annotated Feature Structure Sets: add English sentences

Smaller CorpusSampling

Feature Structure Viewer

Page 43: The AVENUE Project Data Elicitation System Lori Levin Language Technologies Institute School of Computer Science Carnegie Mellon University

Mar 1, 2006

Tools for Creating Elicitation Corpora

List of semantic features and values

The Corpus

Feature Maps: which combinations of features and values are of interest

…Clause-Level

Noun-Phrase

Tense & Aspect Modality

Feature Structure Sets

Feature Specification

Reverse Annotated Feature Structure Sets: add English sentences

Smaller CorpusSampling

Page 44: The AVENUE Project Data Elicitation System Lori Levin Language Technologies Institute School of Computer Science Carnegie Mellon University

Feature Specification

• Defines Features and their values

• Sets default values for features

• Specifies feature requirements and restrictions

• Written in XML

Page 45: The AVENUE Project Data Elicitation System Lori Levin Language Technologies Institute School of Computer Science Carnegie Mellon University

Feature SpecificationFeature: c-copula-type

(a copula is a verb like “be”; some languages do not have copulas)Values     

copula-n/a   Restrictions: 1. ~(c-secondary-type secondary-copula)Notes:

copula-role   Restrictions: 1. (c-secondary-type secondary-copula)Notes: 1. A role is something like a job or a function. "He is a teacher" "This is a vegetable peeler"

copula-identity   Restrictions: 1. (c-secondary-type secondary-copula)Notes: 1. "Clark Kent is Superman" "Sam is the teacher"

copula-location   Restrictions: 1. (c-secondary-type secondary-copula)Notes: 1. "The book is on the table" There is a long list of locative relations later in the feature specification.

copula-description   Restrictions: 1. (c-secondary-type secondary-copula)Notes: 1. A description is an attribute. "The children are happy." "The books are long."

Page 46: The AVENUE Project Data Elicitation System Lori Levin Language Technologies Institute School of Computer Science Carnegie Mellon University

Feature Maps

• Some features interact in the grammar– English –s reflects person and number of the subject and tense of

the verb.– In expressing the English present progressive tense, the auxiliary

verb is in a different place in a question and a statement:• He is running.

• Is he running?

• We need to check many, but not all combinations of features and values.

• Using unlimited feature combinations leads to an unmanageable number of sentences

Page 47: The AVENUE Project Data Elicitation System Lori Levin Language Technologies Institute School of Computer Science Carnegie Mellon University

Feature Combination Template((predicatee((np-general-type pronoun-type common-

noun-type)(np-person person-first person-second

person-third)(np-number num-sg num-pl)(np-biological-gender bio-gender-male bio-

gender-female)))

{[(predicate ((np-general-type common-noun-type)

(np-person person-third)))(c-copula-type role)][(predicate ((adj-general-type quality-type)(c-copula-type attributive)))][(predicate ((np-general-type common-

noun-type)(np-person person-third)(c-copula-type identity)))]}

(c-secondary-type secondary-copula) (c-polarity #all)

(c-general-type declarative)(c-speech-act sp-act-state)(c-v-grammatical-aspect gram-aspect-

neutral)(c-v-lexical-aspect state)(c-v-absolute-tense past present future)(c-v-phase-aspect durative))

Summarizes 288 feature structures, which are automatically generated.

Page 48: The AVENUE Project Data Elicitation System Lori Levin Language Technologies Institute School of Computer Science Carnegie Mellon University

Adding Sentences to Feature Structures

srcsent: Mary was not a leader.context: Translate this as though it were spoken to a peer co-

worker;

((actor ((np-function fn-actor)(np-animacy anim-human)(np- biological-gender bio-gender-female) (np-general-type proper-noun-type)(np-identifiability identifiable)(np- specificity specific)…))

(pred ((np-function fn-predicate-nominal)(np-animacy anim- human)(np-biological-gender bio-gender-female) (np- general-type common-noun-type)(np-specificity specificity- neutral)…))

(c-v-lexical-aspect state)(c-copula-type copula-role)(c-secondary-type secondary-copula)(c-solidarity solidarity-neutral) (c-v-grammatical-aspect gram-aspect-neutral)(c-v-absolute-tense past) (c-v-phase-aspect phase-aspect-neutral) (c-general-type declarative-clause)(c-polarity polarity-negative)(c-my-causer-intentionality intentionality-n/a)(c-comparison-type comparison-n/a)(c-relative-tense relative-n/a)(c-our-boundary boundary-n/a)…)

Page 49: The AVENUE Project Data Elicitation System Lori Levin Language Technologies Institute School of Computer Science Carnegie Mellon University

Difficult Issues in Adding Sentences

• Have to remember that the grammatical meanings don’t correspond exactly to English morphemes.– Identifiability and specificity vs the and a– Modality, tense, aspect vs auxiliary verbs

• The meaning has to be clear to a translator.– If English is going to be the source language for

translation, the clearest way to say something may not be the most common way it is said in real text or conversation.

Page 50: The AVENUE Project Data Elicitation System Lori Levin Language Technologies Institute School of Computer Science Carnegie Mellon University

Hard Problems

• Expressing meanings that are not grammaticalized in English.– Evidentiality:

• He stole the bread.• Context: Translate this as if you do not

have first hand knowledge. In English, we might say, “They say that he stole the bread” or “I hear that he stole the bread.”

Page 51: The AVENUE Project Data Elicitation System Lori Levin Language Technologies Institute School of Computer Science Carnegie Mellon University

Hard Problems

• Reverse annotating things that can be said in several ways in English.– Impersonals:

• One doesn’t smoke here.• You don’t smoke here.• They don’t smoke here.• There’s no smoking here.• Credit cards aren’t accepted.

– Problem in the Reflex corpus because space was limited.

Page 52: The AVENUE Project Data Elicitation System Lori Levin Language Technologies Institute School of Computer Science Carnegie Mellon University

Evaluation

• Current funding has not covered evaluation of the questionnaire.– Except for informal observations as it was

translated into several languages.

• Does it elicit the meanings it was intended to elicit?– Informal observation: usually

• Is it useful for machine translation?

Page 53: The AVENUE Project Data Elicitation System Lori Levin Language Technologies Institute School of Computer Science Carnegie Mellon University

Navigation

• Currently, feature combinations are specified by a human.

• Plan to work in active learning mode.– Build seed questionnaire– Translate some data– Do some learning– Identify most valuable pieces of information to get

next– Generate an RTB for those pieces of information– Translate more– Learn more– Generate more, etc.

Page 54: The AVENUE Project Data Elicitation System Lori Levin Language Technologies Institute School of Computer Science Carnegie Mellon University

Summary

• Feature Specification: – lists features and values – Grammatical meanings

• Feature Combinations

• Set of Feature Structures

• Add English or Spanish Sentences

• Get a translation and word alignment from a bilingual, literate informant