Upload
kelly-poole
View
214
Download
0
Tags:
Embed Size (px)
Citation preview
AVENUE/LETRAS:Learning-based MT for Languages
with Limited Resources
Faculty: Jaime Carbonell, Alon Lavie, Lori Levin, Ralf Brown, Robert FrederkingStudents and Staff: Erik Peterson, Christian Monson, Ariadna Font Llitjós, Alison Alvarez, Roberto Aranovich, Rodolfo Vega
Mar 1, 2006 AVENUE/LETRAS 2
Outline
• Scientific Objectives• Framework Overview• Learning Morphology• Elicitation• Learning Transfer Rules• Automatic Rule Refinement• Language Prototypes• New Directions
Mar 1, 2006 AVENUE/LETRAS 3
Why Machine Translation for Languages with Limited Resources?
• We are in the age of information explosion– The internet+web+Google anyone can get the information
they want anytime…• But what about the text in all those other languages?
– How do they read all this English stuff?– How do we read all the stuff that they put online?
• MT for these languages would Enable:– Better government access to native indigenous and minority
communities– Better minority and native community participation in
information-rich activities (health care, education, government) without giving up their languages.
– Civilian and military applications (disaster relief)– Language preservation
Mar 1, 2006 AVENUE/LETRAS 4
The Roadmap to Learning-based MT
• Automatic acquisition of necessary language resources and knowledge using machine learning methodologies
• A framework for integrating the acquired MT resources into effective MT prototype systems
• Effective integration of acquired knowledge with statistical/distributional information
Mar 1, 2006 AVENUE/LETRAS 5
CMU’s AVENUE Approach
• Elicitation: use bilingual native informants to produce a small high-quality word-aligned bilingual corpus of translated phrases and sentences
• Transfer-rule Learning: apply ML-based methods to automatically acquire syntactic transfer rules for translation between the two languages– Learn from major language to minor language– Translate from minor language to major language
• XFER + Decoder:– XFER engine produces a lattice of possible transferred structures
at all levels– Decoder searches and selects the best scoring combination
• Rule Refinement: automatically refine and correct the acquired transfer rules via a process of interaction with bilingual informants which help the system identify translation errors
• Morphology Learning: unsupervised learning of morpheme structure of words based on their organization into paradigms and distributional information
Mar 1, 2006 AVENUE/LETRAS 6
AVENUE MT Approach
Interlingua
Syntactic Parsing
Semantic Analysis
Sentence Planning
Text Generation
Source (e.g. Quechua)
Target(e.g. English)
Transfer Rules
Direct: SMT, EBMT
AVENUE: Automate Rule Learning
Mar 1, 2006 AVENUE/LETRAS 7
Avenue Architecture
Learning
Module
Learned Transfer
Rules
Lexical Resources
Run Time Transfer System
Decoder
Translation
Correction
Tool
Word-Aligned Parallel Corpus
Elicitation Tool
Elicitation Corpus
Elicitation Rule Learning
Run-Time System
Rule Refinement
Rule
Refinement
Module
Morphology
Morphology Analyzer
Learning Module Handcrafted
rules
INPUT TEXT
OUTPUT TEXT
Mar 1, 2006 AVENUE/LETRAS 8
Transfer Rule Formalism
Type information
Part-of-speech/constituent information
Alignments
x-side constraints
y-side constraints
xy-constraints,
e.g. ((Y1 AGR) = (X1 AGR))
;SL: the old man, TL: ha-ish ha-zaqen
NP::NP [DET ADJ N] -> [DET N DET ADJ]((X1::Y1)(X1::Y3)(X2::Y4)(X3::Y2)
((X1 AGR) = *3-SING)((X1 DEF = *DEF)((X3 AGR) = *3-SING)((X3 COUNT) = +)
((Y1 DEF) = *DEF)((Y3 DEF) = *DEF)((Y2 AGR) = *3-SING)((Y2 GENDER) = (Y4 GENDER)))
Mar 1, 2006 AVENUE/LETRAS 9
Transfer Rule Formalism (II)
Value constraints
Agreement constraints
;SL: the old man, TL: ha-ish ha-zaqen
NP::NP [DET ADJ N] -> [DET N DET ADJ]((X1::Y1)(X1::Y3)(X2::Y4)(X3::Y2)
((X1 AGR) = *3-SING)((X1 DEF = *DEF)((X3 AGR) = *3-SING)((X3 COUNT) = +)
((Y1 DEF) = *DEF)((Y3 DEF) = *DEF)((Y2 AGR) = *3-SING)((Y2 GENDER) = (Y4 GENDER)))
Mar 1, 2006 AVENUE/LETRAS 10
Transfer and Decoding
Learning
Module
Learned Transfer
Rules
Lexical Resources
Run Time Transfer System
Decoder
Translation
Correction
Tool
Word-Aligned Parallel Corpus
Elicitation Tool
Elicitation Corpus
Elicitation Rule Learning
Run-Time System
Rule Refinement
Rule
Refinement
Module
Morphology
Morphology Analyzer
Learning Module Handcrafted
rules
INPUT TEXT
OUTPUT TEXT
Mar 1, 2006 AVENUE/LETRAS 11
The Transfer Engine
AnalysisSource text is parsed into its grammatical structure. Determines transfer application ordering.
Example:
ראיתי את האיש הזקן
(I) saw *acc the man the old
S
VP
V P NP
D N D Adj
הזקן האיש את ראיתי
TransferA target language tree is created by reordering, insertion, and deletion.
S
NP VP
N V NP
DET Adj N
I saw the old man
Source words translated with transfer lexicon.
GenerationTarget language constraints are checked, target morphology applied, and final translation produced.
E.g. “saw” in past tense selected.
Final translation:
“I saw the old man”
Mar 1, 2006 AVENUE/LETRAS 12
Symbolic Decoder
• System rarely finds a full parse/transfer for complete input sentence• XFER engine produces comprehensive lattice of segment
translations• Decoder selects best combination of translation segments• Search for optimal scoring path of partial translations, based on
multiple features:– Target Language Model scores– XFER Rule Scores– Path Fragmentation– Other features…
• Symbolic decoding essential for scenarios where there is insufficient data for training large target LM– Effective Rule Scoring is crucial
Mar 1, 2006 AVENUE/LETRAS 13
Morphology Learning
Learning
Module
Learned Transfer
Rules
Lexical Resources
Run Time Transfer System
Decoder
Translation
Correction
Tool
Word-Aligned Parallel Corpus
Elicitation Tool
Elicitation Corpus
Elicitation Rule Learning
Run-Time System
Rule Refinement
Rule
Refinement
Module
Morphology
Morphology Analyzer
Learning Module Handcrafted
rules
INPUT TEXT
OUTPUT TEXT
Mar 1, 2006 AVENUE/LETRAS 14
The Challenge of Morphology
Mapudungun (Indigenous Language of Chile and Argentina, ~1 Million Speakers)
Allkütulekefun
Mar 1, 2006 AVENUE/LETRAS 15
The Challenge of Morphology
Mapudungun
-ke -fu -n-leAllkütu
Mar 1, 2006 AVENUE/LETRAS 16
The Challenge of Morphology
Mapudungun
-ke
-past
-fu
-indic.1sg
-n
-habitual
-le
-prog.
Allkütu
Listen
Mar 1, 2006 AVENUE/LETRAS 17
The Challenge of Morphology
Mapudungun
-ke
-past
-fu
-indic.1sg
-n
-habitual
-le
-prog.
Allkütu
Listen
I
Mar 1, 2006 AVENUE/LETRAS 18
The Challenge of Morphology
Mapudungun
I used to
-ke
-past
-fu
-indic.1sg
-n
-habitual
-le
-prog.
Allkütu
Listen
Mar 1, 2006 AVENUE/LETRAS 19
The Challenge of Morphology
Mapudungun
I used to listen
-ke
-past
-fu
-indic.1sg
-n
-habitual
-le
-prog.
Allkütu
Listen
Mar 1, 2006 AVENUE/LETRAS 20
The Challenge of Morphology
Mapudungun
I used to listen
-ke
-past
-fu
-indic.1sg
-n
-habitual
-le
-prog.
Allkütu
Listen
Tasks for Morphology• Segment Words• Map Morphemes onto Features
Mar 1, 2006 AVENUE/LETRAS 21
The Challenge of Morphology
Tasks for Morphology
• Segment Words• Map Morphemes
onto Features
• Learn these tasks– unsupervised – from data – for any language
Mar 1, 2006 AVENUE/LETRAS 22
Leverage the Natural Structure of Morphology
• Paradigm– Set of affixes that
interchangeably attach to a set of stems
Our Approach
Mar 1, 2006 AVENUE/LETRAS 23
Ø.sblamesolve
Example Vocabulary
blame blamed blames roamed
roaming roams solve solves solving
Our Approach
Leverage the Natural Structure of Morphology
• Paradigm– Set of affixes that
interchangeably attach to a set of stems
Mar 1, 2006 AVENUE/LETRAS 24
Ø.sblamesolve
Ø.s.dblame
Example Vocabulary
blame blamed blames roamed
roaming roams solve solves solving
Our Approach
Leverage the Natural Structure of Morphology
• Paradigm– Set of affixes that
interchangeably attach to a set of stems
Mar 1, 2006 AVENUE/LETRAS 25
Ø.sblamesolve
Ø.s.dblame
Example Vocabulary
blame blamed blames roamed
roaming roams solve solves solving
Our Approach
Leverage the Natural Structure of Morphology
• Paradigm– Set of affixes that
interchangeably attach to a set of stems
Mar 1, 2006 AVENUE/LETRAS 26
Ø.sblamesolve
Ø.s.dblame
sblameroamsolve
Example Vocabulary
blame blamed blames roamed
roaming roams solve solves solving
Our Approach
Leverage the Natural Structure of Morphology
• Paradigm– Set of affixes that
interchangeably attach to a set of stems
Mar 1, 2006 AVENUE/LETRAS 27
Ø.sblamesolve
Ø.s.dblame
sblameroamsolve
Example Vocabulary
blame blamed blames roamed
roaming roams solve solves solving
Our Approach
Leverage the Natural Structure of Morphology
• Paradigm– Set of affixes that
interchangeably attach to a set of stems
Mar 1, 2006 AVENUE/LETRAS 28
Ø.sblamesolve
Ø.s.dblame
sblameroamsolve
e.esblamsolv
Example Vocabulary
blame blamed blames roamed
roaming roams solve solves solving
Our Approach
Mar 1, 2006 AVENUE/LETRAS 29
Ø.sblamesolve
Example Vocabulary
blame blamed blames roamed
roaming roams solve solves solving
Ø.s.dblame
sblameroamsolve
e.esblamsolv
Our Approach
Mar 1, 2006 AVENUE/LETRAS 30
e.esblamsolv
e.edblam
esblamsolv
Ø.s.dblame
Ø.sblamesolve
Øblameblamesblamedroams
roamedroaming
solvesolvessolving
e.es.edblam
edblamroam
dblameroame
Ø.dblame
s.dblame
sblameroamsolve
es.edblam
eblamsolv
me.mesbla
me.medbla
mesbla
me.mes.medbla
medblaroa
mes.medbla
mebla
Mar 1, 2006 AVENUE/LETRAS 31
a.as.o.os43
african, cas, jurídic, l, ...
a.as.o.os.tro1
cas
a.as.os50
afectad, cas, jurídic, l, ...
a.as.o59
cas, citad, jurídic, l, ...
a.o.os105
impuest, indonesi, italian, jurídic, ...
a.as199
huelg, incluid, industri,
inundad, ...
a.os134
impedid, impuest, indonesi,
inundad, ...
as.os68
cas, implicad, inundad, jurídic, ...
a.o214
id, indi, indonesi,
inmediat, ...
as.o85
intern, jurídic, just, l, ...
a.tro2
cas.cen
a1237
huelg, ib, id, iglesi, ...
as404
huelg, huelguist, incluid,
industri, ...
os534
humorístic, human, hígad,
impedid, ...
o1139
hub, hug, human,
huyend, ...
tro16
catas, ce, cen, cua, ...
as.o.os54
cas, implicad, jurídic, l, ...
o.os268
human, implicad, indici,
indocumentad, ...
Spanish Newswire Corpus
40,011 Tokens
6,975 Types
31
Mar 1, 2006 AVENUE/LETRAS 32
a.as.o.os43
african, cas, jurídic, l, ...
a.as.o.os.tro1
cas
a.as.os50
afectad, cas, jurídic, l, ...
a.as.o59
cas, citad, jurídic, l, ...
a.o.os105
impuest, indonesi, italian, jurídic, ...
a.as199
huelg, incluid, industri,
inundad, ...
a.os134
impedid, impuest, indonesi,
inundad, ...
as.os68
cas, implicad, inundad, jurídic, ...
a.o214
id, indi, indonesi,
inmediat, ...
as.o85
intern, jurídic, just, l, ...
a.tro2
cas.cen
a1237
huelg, ib, id, iglesi, ...
as404
huelg, huelguist, incluid,
industri, ...
os534
humorístic, human, hígad,
impedid, ...
o1139
hub, hug, human,
huyend, ...
tro16
catas, ce, cen, cua, ...
as.o.os54
cas, implicad, jurídic, l, ...
o.os268
human, implicad, indici,
indocumentad, ...
32
Suffixes
Stems
Level 5 = 5 suffixes
Stem Type Count
Mar 1, 2006 AVENUE/LETRAS 33
a.as.o.os43
african, cas, jurídic, l, ...
Adjective Paradigm
33
a.as.o.os.tro1
cas
a.tro2
cas.cen
tro16
catas, ce, cen, cua, ...
a.as.os50
afectad, cas, jurídic, l, ...
a.as.o59
cas, citad, jurídic, l, ...
a.o.os105
impuest, indonesi, italian, jurídic, ...
a.as199
huelg, incluid, industri,
inundad, ...
a.os134
impedid, impuest, indonesi,
inundad, ...
as.os68
cas, implicad, inundad, jurídic, ...
a.o214
id, indi, indonesi,
inmediat, ...
as.o85
intern, jurídic, just, l, ...
a1237
huelg, ib, id, iglesi, ...
as404
huelg, huelguist, incluid,
industri, ...
os534
humorístic, human, hígad,
impedid, ...
o1139
hub, hug, human,
huyend, ...
as.o.os54
cas, implicad, jurídic, l, ...
o.os268
human, implicad, indici,
indocumentad, ...
From the spurious suffix “tro”
Mar 1, 2006 AVENUE/LETRAS 34
a.as.o.os.tro1
cas
a.tro2
cas.cen
tro16
catas, ce, cen, cua, ...
a.as.o.os43
african, cas, jurídic, l, ...
a.as.os50
afectad, cas, jurídic, l, ...
a.as.o59
cas, citad, jurídic, l, ...
a.o.os105
impuest, indonesi, italian, jurídic, ...
a.as199
huelg, incluid, industri,
inundad, ...
a.os134
impedid, impuest, indonesi,
inundad, ...
as.os68
cas, implicad, inundad, jurídic, ...
a.o214
id, indi, indonesi,
inmediat, ...
as.o85
intern, jurídic, just, l, ...
a1237
huelg, ib, id, iglesi, ...
as404
huelg, huelguist, incluid,
industri, ...
os534
humorístic, human, hígad,
impedid, ...
o1139
hub, hug, human,
huyend, ...
as.o.os54
cas, implicad, jurídic, l, ...
o.os268
human, implicad, indici,
indocumentad, ...
34
De
cre
asin
g S
tem
Co
un
t
Incr
ea
sin
g S
uffix
Co
unt
Basic Search Procedure
Mar 1, 2006 AVENUE/LETRAS 35
Examples and Evaluation of Automatically Selected Suffix Sets
Ø.ba.n.ndo ada.adas.ado.ados.aron.ó
a.aba.ado.ados.ar.ará.arán ada.ado.ados.ar.o
a.aciones.ación.adas.ado.ar ado.adores.o
a.ada.adas.ado.ar.ará ado.ados.arse.e
a.adas.ado.an.ar ado.ar.aron.arse.ará
a.ado.ados.ar.ó do.dos.ndo.r.ron
a.ado.an.arse.ó e.ida.ido
a.ado.aron.arse.ó emos.ido.ía.ían
aba.ada.ado.ar.o.os ida.ido.idos.ir.ió
aciones.ación.ado.ados ido.iendo.ir
aciones.ado.ados.ará ido.ir.ro
ación.ado.an.e
35
Global Suffix Evaluation
Precision: 0.506
Recall: 0.517
F1: 0.511
KeyCorrectWrong
Mar 1, 2006 AVENUE/LETRAS 36
Next Steps for Morphology Induction
• Improve the Quality of Induced Paradigms– Current Work
• Convert Paradigms into a Segmenter– Soon
• Learn Mappings from Morphemes to Features– Future Goal
Mar 1, 2006 AVENUE/LETRAS 37
Elicitation
Learning
Module
Learned Transfer
Rules
Lexical Resources
Run Time Transfer System
Decoder
Translation
Correction
Tool
Word-Aligned Parallel Corpus
Elicitation Tool
Elicitation Corpus
Elicitation Rule Learning
Run-Time System
Rule Refinement
Rule
Refinement
Module
Morphology
Morphology Analyzer
Learning Module Handcrafted
rules
INPUT TEXT
OUTPUT TEXT
Mar 1, 2006 AVENUE/LETRAS 38
Purpose of Elicitation
• Provide a small but highly targeted corpus of hand aligned data– To support machine
learning from a small data set
– To cover all basic morpho-syntactic phenomena.
newpairsrcsent: Tú caístetgtsent: eymi ütrünagimialigned: ((1,1),(2,2))context: tú = Juan [masculino, 2a persona del
singular]comment: You (John) fell
newpairsrcsent: Tú estás cayendotgtsent: eymi petu ütünagimialigned: ((1,1),(2 3,2 3))context: tú = Juan [masculino, 2a persona del
singular]comment: You (John) are falling
newpairsrcsent: Tú caíste tgtsent: eymi ütrunagimialigned: ((1,1),(2,2))context: tú = María [femenino, 2a persona del
singular]comment: You (Mary) fell
Mar 1, 2006 AVENUE/LETRAS 39
Purpose of Elicitation
• To get data from someone who is– Bilingual – Literate– Not experienced with linguistics
Mar 1, 2006 AVENUE/LETRAS 40
English-Hindi Example
Mar 1, 2006 AVENUE/LETRAS 41
English-Chinese Example
Mar 1, 2006 AVENUE/LETRAS 42
English-Arabic Example
Mar 1, 2006 AVENUE/LETRAS 43
The Elicitation Tool has been used with these languages
• Mapudungun• Hindi• Hebrew• Quechua• Aymara• Thai• Japanese• Chinese• Dutch• Arabic
Mar 1, 2006 AVENUE/LETRAS 44
Elicitation Corpus: List of Minimal Pairs of Sentences in a Major Language
Eliciting from Spanish
Canto
Canté
Estoy cantando
Cantaste
Eliciting from
English
I sing
I sang
I am singing
You sang
Mar 1, 2006 AVENUE/LETRAS 45
AVENUE Elicitation Corpora
• The Functional-Typological Corpus– Designed to elicit elements of meaning that
may have morpho-syntactic realization
• The Structural Elicitation Corpus– Based on sentence structures from the Penn
TreeBank
Mar 1, 2006
The Process
List of semantic features and values
The Corpus
Feature Maps: which combinations of features and values are of interest
…Clause-Level
Noun-Phrase
Tense & Aspect Modality
Feature Structure Sets
Feature Specification
Reverse Annotated Feature Structure Sets: add English sentences
Smaller CorpusSampling
Mar 1, 2006 AVENUE/LETRAS 47
Feature Structuressrcsent: Mary was not a leader.context: Translate this as though it were spoken to a peer co-
worker;
((actor ((np-function fn-actor)(np-animacy anim-human)(np- biological-gender bio-gender-female) (np-general-type proper-noun-type)(np-identifiability identifiable)(np- specificity specific)…))
(pred ((np-function fn-predicate-nominal)(np-animacy anim- human)(np-biological-gender bio-gender-female) (np- general-type common-noun-type)(np-specificity specificity- neutral)…))
(c-v-lexical-aspect state)(c-copula-type copula-role)(c-secondary-type secondary-copula)(c-solidarity solidarity-neutral) (c-v-grammatical-aspect gram-aspect-neutral)(c-v-absolute-tense past) (c-v-phase-aspect phase-aspect-neutral) (c-general-type declarative-clause)(c-polarity polarity-negative)(c-my-causer-intentionality intentionality-n/a)(c-comparison-type comparison-n/a)(c-relative-tense relative-n/a)(c-our-boundary boundary-n/a)…)
Mar 1, 2006 AVENUE/LETRAS 48
Feature Specification
• Defines Features and their values
• Sets default values for features
• Specifies feature requirements and restrictions
• Written in XML
Mar 1, 2006 AVENUE/LETRAS 49
Feature SpecificationFeature: c-copula-type
(a copula is a verb like “be”; some languages do not have copulas)Values
copula-n/a Restrictions: 1. ~(c-secondary-type secondary-copula)Notes:
copula-role Restrictions: 1. (c-secondary-type secondary-copula)Notes: 1. A role is something like a job or a function. "He is a teacher" "This is a vegetable peeler"
copula-identity Restrictions: 1. (c-secondary-type secondary-copula)Notes: 1. "Clark Kent is Superman" "Sam is the teacher"
copula-location Restrictions: 1. (c-secondary-type secondary-copula)Notes: 1. "The book is on the table" There is a long list of locative relations later in the feature specification.
copula-description Restrictions: 1. (c-secondary-type secondary-copula)Notes: 1. A description is an attribute. "The children are happy." "The books are long."
Mar 1, 2006 AVENUE/LETRAS 50
Feature Maps
• Some features interact in the grammar– English –s reflects person and number of the subject and tense of
the verb.– In expressing the English present progressive tense, the auxiliary
verb is in a different place in a question and a statement:• He is running.
• Is he running?
• We need to check many, but not all combinations of features and values.
• Using unlimited feature combinations leads to an unmanageable number of sentences
Mar 1, 2006 AVENUE/LETRAS 51
Mar 1, 2006 AVENUE/LETRAS 52
Evidentiality Map
Lexical Aspect
Assertiveness
Polarity
Source
Tense
Gram.
Aspect
activity-accomplishment
Assertiveness-asserted, Assetiveness-neutral
Polarity-positive, Polarity-negative
Hearsay, quotative, inferred, assumption
Visual, Auditory, non-visual-or-auditory
Past Present, Future Past Present
Perfective, progressive, habitual, neutral
habitual, neutral, progressive
Perfective, progressive, habitual, neutral
habitual, neutral, progressive
Mar 1, 2006 AVENUE/LETRAS 53
Current Work
• Navigation– Start: large search space of all possible
feature combinations– Finish: each feature has been eliminated as
irrelevant or has been explored– Goal: dynamically find the most efficient path
through the search space for each language.
Mar 1, 2006 AVENUE/LETRAS 54
Current Work
• Feature Detection– Which features have an effect on
morphosyntax?– What is the effect?– Drives the Navigation process
Mar 1, 2006 AVENUE/LETRAS 55
Feature Detection: Spanish
The girl saw a red book.((1,1)(2,2)(3,3)(4,4)(5,6)(6,5))La niña vió un libro rojo
A girl saw a red book((1,1)(2,2)(3,3)(4,4)(5,6)(6,5))Una niña vió un libro rojo
I saw the red book((1,1)(2,2)(3,3)(4,5)(5,4))Yo vi el libro rojo
I saw a red book.
((1,1)(2,2)(3,3)(4,5)(5,4)) Yo vi un libro rojo
Feature: definitenessValues: definite, indefiniteFunction-of-*: subj, objMarked-on-head-of-*: noMarked-on-dependent: yesMarked-on-governor: noMarked-on-other: noAdd/delete-word: noChange-in-alignment: no
Mar 1, 2006 AVENUE/LETRAS 56
Feature Detection: Chinese
A girl saw a red book.
((1,2)(2,2)(3,3)(3,4)(4,5)(5,6)(5,7)(6,8))
有 一个 女人 看见 了 一本 红色 的 书 。
The girl saw a red book.
((1,1)(2,1)(3,3)(3,4)(4,5)(5,6)(6,7))
女人 看见 了 一本 红色的 书
Feature: definiteness
Values: definite, indefinite
Function-of-*: subject
Marked-on-head-of-*: no
Marked-on-dependent: no
Marked-on-governor: no
Add/delete-word: yes
Change-in-alignment: no
Mar 1, 2006 AVENUE/LETRAS 57
Feature Detection: Chinese
I saw the red book((1, 3)(2, 4)(2, 5)(4, 1)(5, 2))
红色的 书, 我 看见 了
I saw a red book.((1,1)(2,2)(2,3)(2, 4)(4,5)(5,6))我 看见 了 一本 红色的 书 。
Feature: definitenesValues: definite, indefiniteFunction-of-*: objectMarked-on-head-of-*: noMarked-on-dependent: noMarked-on-governor: noAdd/delete-word: yesChange-in-alignment: yes
Mar 1, 2006 AVENUE/LETRAS 58
Feature Detection: Hebrew
A girl saw a red book.((2,1) (3,2)(5,4)(6,3))
ראתה ספר אדוםילדה
The girl saw a red book((1,1)(2,1)(3,2)(5,4)(6,3))
ראתה ספר אדוםהילדה
I saw a red book.((2,1)(4,3)(5,2))
אדוםספרראיתי
I saw the red book.((2,1)(3,3)(3,4)(4,4)(5,3))
האדוםהספרראיתי את
Feature: definitenessValues: definite, indefiniteFunction-of-*: subj, objMarked-on-head-of-*: yesMarked-on-dependent: yesMarked-on-governor: noAdd-word: noChange-in-alignment: no
Mar 1, 2006 AVENUE/LETRAS 59
Feature Detection Feeds into…
• Corpus Navigation: which minimal pairs to pursue next.– Don’t pursue gender in Mapudungun– Do pursue definiteness in Hebrew
• Morphology Learning:– Morphological learner identifies the forms of the morphemes– Feature detection identifies the functions
• Rule learning:– Rule learner will have to learn a constraint for each morpho-
syntactic marker that is discovered• E.g., Adjectives and nouns agree in gender, number, and definiteness
in Hebrew.
Mar 1, 2006 AVENUE/LETRAS 60
Rule Learning
Learning
Module
Learned Transfer
Rules
Lexical Resources
Run Time Transfer System
Decoder
Translation
Correction
Tool
Word-Aligned Parallel Corpus
Elicitation Tool
Elicitation Corpus
Elicitation Rule Learning
Run-Time System
Rule Refinement
Rule
Refinement
Module
Morphology
Morphology Analyzer
Learning Module Handcrafted
rules
INPUT TEXT
OUTPUT TEXT
Mar 1, 2006 AVENUE/LETRAS 61
Rule Learning - Overview
• Goal: Acquire Syntactic Transfer Rules• Use available knowledge from the major-
language side (grammatical structure)• Three steps:
1. Flat Seed Generation: first guesses at transfer rules; flat syntactic structure
2. Compositionality Learning: use previously learned rules to learn hierarchical structure
3. Constraint Learning: refine rules by learning appropriate feature constraints
Mar 1, 2006 AVENUE/LETRAS 62
Flat Seed Rule Generation
Learning Example: NP
Eng: the big apple
Heb: ha-tapuax ha-gadol
Generated Seed Rule:
NP::NP [ART ADJ N] [ART N ART ADJ]
((X1::Y1)
(X1::Y3)
(X2::Y4)
(X3::Y2))
Mar 1, 2006 AVENUE/LETRAS 63
Flat Seed Rule Generation
• Create a “flat” transfer rule specific to the sentence pair, partially abstracted to POS– Words that are aligned word-to-word and have the same POS in
both languages are generalized to their POS– Words that have complex alignments (or not the same POS)
remain lexicalized
• One seed rule for each translation example• No feature constraints associated with seed rules (but
mark the example(s) from which it was learned)
Mar 1, 2006 AVENUE/LETRAS 64
Compositionality Learning
Initial Flat Rules: S::S [ART ADJ N V ART N] [ART N ART ADJ V P ART N]
((X1::Y1) (X1::Y3) (X2::Y4) (X3::Y2) (X4::Y5) (X5::Y7) (X6::Y8))
NP::NP [ART ADJ N] [ART N ART ADJ]
((X1::Y1) (X1::Y3) (X2::Y4) (X3::Y2))
NP::NP [ART N] [ART N]
((X1::Y1) (X2::Y2))
Generated Compositional Rule:
S::S [NP V NP] [NP V P NP]
((X1::Y1) (X2::Y2) (X3::Y4))
Mar 1, 2006 AVENUE/LETRAS 65
Compositionality Learning
• Detection: traverse the c-structure of the English sentence, add compositional structure for translatable chunks
• Generalization: adjust constituent sequences and alignments
• Two implemented variants:– Safe Compositionality: there exists a transfer rule that
correctly translates the sub-constituent– Maximal Compositionality: Generalize the rule if supported
by the alignments, even in the absence of an existing transfer rule for the sub-constituent
Mar 1, 2006 AVENUE/LETRAS 66
Constraint LearningInput: Rules and their Example Sets
S::S [NP V NP] [NP V P NP] {ex1,ex12,ex17,ex26}
((X1::Y1) (X2::Y2) (X3::Y4))
NP::NP [ART ADJ N] [ART N ART ADJ] {ex2,ex3,ex13}
((X1::Y1) (X1::Y3) (X2::Y4) (X3::Y2))
NP::NP [ART N] [ART N] {ex4,ex5,ex6,ex8,ex10,ex11}
((X1::Y1) (X2::Y2))
Output: Rules with Feature Constraints:
S::S [NP V NP] [NP V P NP]
((X1::Y1) (X2::Y2) (X3::Y4)
(X1 NUM = X2 NUM)
(Y1 NUM = Y2 NUM)
(X1 NUM = Y1 NUM))
Mar 1, 2006 AVENUE/LETRAS 67
Constraint Learning
• Goal: add appropriate feature constraints to the acquired rules• Methodology:
– Preserve general structural transfer– Learn specific feature constraints from example set
• Seed rules are grouped into clusters of similar transfer structure (type, constituent sequences, alignments)
• Each cluster forms a version space: a partially ordered hypothesis space with a specific and a general boundary
• The seed rules in a group form the specific boundary of a version space
• The general boundary is the (implicit) transfer rule with the same type, constituent sequences, and alignments, but no feature constraints
Mar 1, 2006 AVENUE/LETRAS 68
Rule Refinement
Learning
Module
Learned Transfer
Rules
Lexical Resources
Run Time Transfer System
Decoder
Translation
Correction
Tool
Word-Aligned Parallel Corpus
Elicitation Tool
Elicitation Corpus
Elicitation Rule Learning
Run-Time System
Rule Refinement
Rule
Refinement
Module
Morphology
Morphology Analyzer
Learning Module Handcrafted
rules
INPUT TEXT
OUTPUT TEXT
Mar 1, 2006 AVENUE/LETRAS 69
Interactive and Automatic Refinement of Translation Rules
• Problem: Improve Machine Translation quality.
• Proposed Solution: Put bilingual speakers back into the loop; use their corrections to detect the source of the error and automatically improve the lexicon and the grammar.
• Approach: Automate post-editing efforts by feeding them back into the MT system.Automatic refinement of translation rules that
caused an error beyond post-editing.
• Goal: Improve MT coverage and overall quality.
Mar 1, 2006 AVENUE/LETRAS 70
Technical Challenges
Elicit minimal MT information from non-expert users
Automatically Refine and Expand
Translation Rules minimally
Manually written Automatically Learned
Automatic Evaluation of Refinement process
AVENUE/LETRAS 71
Error Typology for Automatic Rule Refinement (simplified)
Missing word
Extra word
Wrong word order
Incorrect word
Wrong agreement
Local vs Long distance
Word vs. phrase
+ Word change
Sense
Form
Selectional restrictions
Idiom
Missing constraint
Extra constraint
Mar 1, 2006 AVENUE/LETRAS 72
TCTool (Demo)• Add a word• Delete a word• Modify a word• Change word order
Actions:
Interactive elicitation of error information
precision recall
error detection 90% 89%
error classification 72% 71%
Mar 1, 2006 AVENUE/LETRAS 73
1. Refine a translation rule:R0 R1 (change R0 to make it more
specific or more general)
Types of Refinement Operations
Automatic Rule Adaptation
R0:
R1:
NP
DET N ADJ
NP
DET ADJ N
a nice house
una casa bonito
NP
DET N ADJ
NP
DET ADJ N
a nice house
una casa bonita
N gender = ADJ gender
Mar 1, 2006 AVENUE/LETRAS 74
2. Bifurcate a translation rule:R0 R0 (same, general rule)
R1 (add a new more specific rule)
Types of Refinement Operations
Automatic Rule Adaptation
R0: NP
DET N ADJ
NP
DET ADJ N
NP
DET ADJ N
NP
DET ADJ N
R1:
a nice house una casa bonita
a great artist un gran artista
ADJ type: pre-nominal
AVENUE/LETRAS 75
Error Information Elicitation
Refinement Operation Typology
Automatic Rule Adaptation
Change word orderSL: Gaudí was a great artist
MT system output:TL: Gaudí era un artista grande
Ucorrection: *Gaudí era un artista grande Gaudí era un gran artista
A concrete example
clue word
error
correction
76
Finding Triggering Feature(s): (error word, corrected word) =
need to postulate a new binary feature: feat1
Blame assignment (from MT system output)
tree: <((S,1 (NP,2 (N,5:1 "GAUDI") )
(VP,3 (VB,2 (AUX,17:2 "ERA") )
(NP,8 (DET,0:3 "UN")
(N,4:5 "ARTISTA")
(ADJ,5:4 "GRANDE") ) ) ) )>
Automatic Rule Adaptation
S,1
…
NP,1
…
NP,8
…Grammar
ADJ::ADJ |: [great] -> [grande]((X1::Y1)((x0 form) = great)((y0 agr num) = sg)((y0 agr gen) = masc))
ADJ::ADJ |: [great] -> [gran]((X1::Y1)((x0 form) = great)((y0 agr num) = sg)((y0 agr gen) = masc))
Mar 1, 2006 AVENUE/LETRAS 77
Refining Rules
• Bifurcate NP,8 NP,8 (R0) + NP,8’ (R1) (flip order of ADJ-N)
{NP,8’} NP::NP : [DET ADJ N] -> [DET ADJ N]( (X1::Y1) (X2::Y2) (X3::Y3)
((x0 def) = (x1 def)) (x0 = x3) ((y1 agr) = (y3 agr)) ; det-noun agreement ((y2 agr) = (y3 agr)) ; adj-noun agreement (y2 = x3) ((y2 feat1) =c + ))
Automatic Rule Adaptation
Mar 1, 2006 AVENUE/LETRAS 78
Refining Lexical EntriesADJ::ADJ |: [great] -> [grande]((X1::Y1)((x0 form) = great)((y0 agr num) = sg)((y0 agr gen) = masc)((y0 feat1) = -))
ADJ::ADJ |: [great] -> [gran]((X1::Y1)((x0 form) = great)((y0 agr num) = sg)((y0 agr gen) = masc)((y0 feat1) = +))
Automatic Rule Adaptation
Mar 1, 2006 AVENUE/LETRAS 79
Evaluating Improvement
Automatic Rule Adaptation
- Given the initial and final Translation Lattices, the Rule Refinement module needs to take into account, whether the following are present:- Corrected Translation Sentence- Original Translation Sentence (labelled as incorrect
by the user)
un artista gran
un gran artista
un grande artista
*un artista grande
Mar 1, 2006 AVENUE/LETRAS 80
Evaluating Improvement
Automatic Rule Adaptation
- Given the initial and final Translation Lattices, the Rule Refinement module needs to take into account, whether the following are present:- Corrected Translation Sentence- Original Translation Sentence (labelled as incorrect
by the user)
*un artista gran
un gran artista
*un grande artista
*un artista grande
Mar 1, 2006 AVENUE/LETRAS 81
Challenges and future work
• Credit and Blame assignment from TCTool Log Files and Xfer engine’s trace
• Order of corrections matters ~ explore rule interactions
• Explore the space between batch mode and fully interactive system
• Online TCTool always running to collect corrections from bilingual speakers make it into a game with rewards for the best users
Mar 1, 2006 AVENUE/LETRAS 82
AVENUE Prototypes
• General XFER framework under development for past three years
• Prototype systems so far:– German-to-English, Dutch-to-English– Chinese-to-English– Hindi-to-English– Hebrew-to-English
• In progress or planned:– Mapudungun-to-Spanish– Quechua-to-Spanish– Native Alaskan languages (Inupiaq) to English– Native-Bolivian languages (Aymara) to Spanish– Native-Brazilian languages to Brazilian Portuguese
Mar 1, 2006 AVENUE/LETRAS 83
Mapudungun
• Indigenous Language of Chile and Argentina• ~ 1 Million Mapuche Speakers
Mar 1, 2006 AVENUE/LETRAS 84
Collaboration
• Mapuche Language Experts – Universidad de la Frontera (UFRO)
• Instituto de Estudios Indígenas (IEI)– Institute for Indigenous Studies
• Chilean Funding– Chilean Ministry of Education
(Mineduc)• Bilingual and Multicultural Education
Program
Eliseo Cañulef
Rosendo Huisca
Hugo Carrasco
Hector Painequeo
Flor Caniupil
Luis Caniupil Huaiquiñir
Marcela Collio Calfunao
Cristian Carrillan Anton
Salvador Cañulef
Carolina Huenchullan Arrúe
Claudio Millacura Salas
Mar 1, 2006 AVENUE/LETRAS 85
Accomplishments
• Corpora Collection
– Spoken Corpus• Collected: Luis Caniupil Huaiquiñir • Medical Domain• 3 of 4 Mapudungun Dialects
– 120 hours of Nguluche– 30 hours of Lafkenche– 20 hours of Pwenche
• Transcribed in Mapudungun• Translated into Spanish
– Written Corpus• ~ 200,000 words• Bilingual Mapudungun – Spanish• Historical and newspaper text
nmlch-nmjm1_x_0405_nmjm_00:M: <SPA>no pütokovilu kay koC: no, si me lo tomaba con agua
M: chumgechi pütokoki femuechi pütokon pu <Noise> C: como se debe tomar, me lo tomé pués
nmlch-nmjm1_x_0406_nmlch_00:M: ChengewerkelafuymiürkeC: Ya no estabas como gente entonces!
Mar 1, 2006 AVENUE/LETRAS 86
Accomplishments
• Developed At UFRO– Bilingual Dictionary with Examples
• 1,926 entries
– Spelling Corrected Mapudungun Word List• 117,003 fully-inflected word forms
– Segmented Word List• 15,120 forms• Stems translated into Spanish
Mar 1, 2006 AVENUE/LETRAS 87
Accomplishments
• Developed at LTI using Mapudungun language resources from UFRO– Spelling Checker
• Integrated into OpenOffice
– Hand-built Morphological Analyzer– Prototype Machine Translation Systems
• Rule-Based• Example-Based
– Website: LenguasAmerindias.org
Mar 1, 2006 AVENUE/LETRAS 88
QuechuaSpanish MT
• V-Unit: funded Summer project in Cusco (Peru) June-August 2005 [preparations and data collection started earlier]
• Intensive Quechua course in Centro Bartolome de las Casas (CBC)
• Worked together with two Quechua native and one non-native speakers on developing infrastructure (correcting elicited translations, segmenting and translating list of most frequent words)
Mar 1, 2006 AVENUE/LETRAS 89
Quechua Spanish Prototype MT System
• Stem Lexicon (semi-automatically generated): 753 lexical entries
• Suffix lexicon: 21 suffixes – (150 Cusihuaman)
• Quechua morphology analyzer• 25 translation rules• Spanish morphology generation
module• User-Studies: 10 sentences, 3
users (2 native, 1 non-native)
Mar 1, 2006 AVENUE/LETRAS 90
Challenges for Hebrew MT
• Paucity in existing language resources for Hebrew– No publicly available broad coverage morphological
analyzer– No publicly available bilingual lexicons or dictionaries– No POS-tagged corpus or parse tree-bank corpus for
Hebrew– No large Hebrew/English parallel corpus
• Scenario well suited for CMU transfer-based MT framework for languages with limited resources
Mar 1, 2006 AVENUE/LETRAS 91
Hebrew Morphology Example
• Input word: B$WRH
0 1 2 3 4
|--------B$WRH--------|
|-----B-----|$WR|--H--|
|--B--|-H--|--$WRH---|
Mar 1, 2006 AVENUE/LETRAS 92
Hebrew Morphology Example
Y0: ((SPANSTART 0) Y1: ((SPANSTART 0) Y2: ((SPANSTART 1) (SPANEND 4) (SPANEND 2) (SPANEND 3) (LEX B$WRH) (LEX B) (LEX $WR) (POS N) (POS PREP)) (POS N) (GEN F) (GEN M) (NUM S) (NUM S) (STATUS ABSOLUTE)) (STATUS ABSOLUTE))
Y3: ((SPANSTART 3) Y4: ((SPANSTART 0) Y5: ((SPANSTART 1) (SPANEND 4) (SPANEND 1) (SPANEND 2) (LEX $LH) (LEX B) (LEX H) (POS POSS)) (POS PREP)) (POS DET))
Y6: ((SPANSTART 2) Y7: ((SPANSTART 0) (SPANEND 4) (SPANEND 4) (LEX $WRH) (LEX B$WRH) (POS N) (POS LEX)) (GEN F) (NUM S) (STATUS ABSOLUTE))
Mar 1, 2006 AVENUE/LETRAS 93
Sample Output (dev-data)
maxwell anurpung comes from ghana for israel four years ago and since worked in cleaning in hotels in eilat
a few weeks ago announced if management club hotel that for him to leave israel according to the government instructions and immigration police
in a letter in broken english which spread among the foreign workers thanks to them hotel for their hard work and announced that will purchase for hm flight tickets for their countries from their money
Mar 1, 2006 AVENUE/LETRAS 94
Future Research Directions
• Automatic Transfer Rule Learning:– In the “large-data” scenario: from large volumes of
uncontrolled parallel text automatically word-aligned– In the absence of morphology or POS annotated lexica– Learning mappings for non-compositional structures– Effective models for rule scoring for
• Decoding: using scores at runtime• Pruning the large collections of learned rules
– Learning Unification Constraints
• Integrated Xfer Engine and Decoder– Improved models for scoring tree-to-tree mappings,
integration with LM and other knowledge sources in the course of the search
Mar 1, 2006 AVENUE/LETRAS 95
Future Research Directions
• Automatic Rule Refinement
• Morphology Learning
• Feature Detection and Corpus Navigation
• Prototypes for New Languages
Mar 1, 2006 AVENUE/LETRAS 96
Publications• 2005, Carbonell, J. G., A. Lavie
, L. Levin and A. Black, "Language Technologies for Humanitarian Aid". In Technology for Humanitarian Action, K. M. Cahill (ed.), pp. 111-138, Fordham University Press, ISBN 0-8232-2393-0, 2005.
• 2005. Font Llitjós, A., R. Aranovich and L. Levin. "Building Machine translation systems for indigenous languages". Second Conference on the Indigenous Languages of Latin America (CILLA II), 27-29 October 2005, Texas, USA.
• 2005, Font-Llitjos, A., J.G. Carbonell and A. Lavie. "A Framework for Interactive and Automatic Refinement of Transfer-based Machine Translation" . In Proceedings of the 10th Annual Conference of the European Association for Machine Translation (EAMT-2005), Budapest, Hungary, May 2005.
• 2004, Lavie, A., S. Wintner, Y. Eytani, E. Peterson and K. Probst. "Rapid Prototyping of a Transfer-based Hebrew-to-English Machine Translation System". In Proceedings of the 10th International Conference on Theoretical and Methodological Issues in Machine Translation (TMI-2004), Baltimore, MD, October 2004. Pages 1-10.
• 2004, Probst, K. and A. Lavie. "A Structurally Diverse Minimal Corpus for Eliciting Structural Mappings between Languages". In Proceedings of the 6th Conference of the Association for Machine Translation in the Americas (AMTA-2004), Washington, DC, September 2004.
Mar 1, 2006 AVENUE/LETRAS 97
Publications• 2004. Font Llitjós, A., K. Probst and J.G. Carbonell .
"Error Analysis of Two Types of Grammar for the Purpose of Automatic Rule Refinement". In Proceedings of the 6th Conference of the Association for Machine Translation in the Americas (AMTA-2004), Washington, DC, September 2004.
• 2004, Monson, C., A. Lavie, J. Carbonell and L. Levin "Unsupervised Induction of Natural Language Morphology Inflection Classes". In Proceedings of Workshop on Current Themes in Computational Phonology and Morphology at the 42th Annual Meeting of the Association of Computational Linguistics (ACL-2004), Barcelona, Spain, July 2004.
• 2004, Monson, C., L. Levin, R. Vega, R. Brown, A. Font Llitjos, A. Lavie, J. Carbonell, E. Cañulef, R. Huisca. "Data Collection and Analysis of Mapudungun Morphology for Spelling Correction". In Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC-2004), Lisbon, Portugal, May 2004.
• 2004. Font Llitjós, A. and J.G. Carbonell . "The Translation Correction Tool: English-Spanish user studies“. In Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC-2004). Lisbon, Portugal, May 2004.
• 2004, Lavie, A., K. Probst, E. Peterson, S. Vogel, L.Levin, A. Font-Llitjos and J. Carbonell. "A Trainable Transfer-based Machine Translation Approach for Languages with Limited Resources". In Proceedings of Workshop of the European Association for Machine Translation (EAMT-2004), Valletta, Malta, April 2004.
Mar 1, 2006 AVENUE/LETRAS 98
Publications• 2003, Lavie, A., S. Vogel, L. Levin, E. Peterson, K. Probst, A. Font Llitjos
, R. Reynolds, J. Carbonell, and R. Cohen, "Experiments with a Hindi-to-English Transfer-based MT System under a Miserly Data Scenario". ACM Transactions on Asian Language Information Processing (TALIP), 2(2). June 2003. Pages 143-163.
• 2002, Probst, K., L. Levin, E. Peterson, A. Lavie, and J. Carbonell, "MT for Minority Languages Using Elicitation-Based Learning of Syntactic Transfer Rules". Machine Translation, 17(4). Pages 245-270.
• 2002, Carbonell, J., K. Probst, E. Peterson, C. Monson, A. Lavie, R. Brown and L. Levin. "Automatic Rule Learning for Resource Limited MT". In Proceedings of 5th Conference of the Association for Machine Translation in the Americas (AMTA-2002), Tiburon, CA, October 2002.
• 2002, Levin, L., R. Vega, J. Carbonell, R. Brown, A. Lavie, E. Canulef and C. Huenchullan. "Data Collection and Language Technologies for Mapudungun". In Proceedings of International Workshop on Resources and Tools in Field Linguistics at the Third International Conference on Language Resources and Evaluation (LREC-2002), Las Palmas, Canary Islands, Spain, June 2002.
• 2001, Probst, K., R. Brown, J. Carbonell, A. Lavie, L. Levin, and E. Peterson. "Design and Implementation of Controlled Elicitation for Machine Translation of Low-density Languages". In Proceedings of the MT-2010 Workshop at MT-Summit VIII, Santiago de Compostela, Spain, September 2001.
Mar 1, 2006 AVENUE/LETRAS 99
Mapudungun-to-Spanish Example
Mapudungun
pelafiñ Maria
Spanish
No vi a María
English
I didn’t see Maria
Mar 1, 2006 AVENUE/LETRAS 100
Mapudungun-to-Spanish Example
Mapudungun
pelafiñ Mariape -la -fi -ñ Mariasee -neg -3.obj -1.subj.indicative Maria
Spanish
No vi a MaríaNo vi a Maríaneg see.1.subj.past.indicative acc Maria
English
I didn’t see Maria
Mar 1, 2006 AVENUE/LETRAS 101
V
pe
pe-la-fi-ñ Maria
Mar 1, 2006 AVENUE/LETRAS 102
V
pe
pe-la-fi-ñ Maria
VSuff
laNegation = +
Mar 1, 2006 AVENUE/LETRAS 103
V
pe
pe-la-fi-ñ Maria
VSuff
la
VSuffGPass all features up
Mar 1, 2006 AVENUE/LETRAS 104
V
pe
pe-la-fi-ñ Maria
VSuff
la
VSuffG VSuff
fiobject person = 3
Mar 1, 2006 AVENUE/LETRAS 105
V
pe
pe-la-fi-ñ Maria
VSuff
la
VSuffG VSuff
fi
VSuffGPass all features up from both children
Mar 1, 2006 AVENUE/LETRAS 106
V
pe
pe-la-fi-ñ Maria
VSuff
la
VSuffG VSuff
fi
VSuffG VSuff
ñ
person = 1number = sgmood = ind
Mar 1, 2006 AVENUE/LETRAS 107
V
pe
pe-la-fi-ñ Maria
VSuff
la
VSuffG VSuff
fi
VSuffG VSuff
ñ
Pass all features up from both children
VSuffG
Mar 1, 2006 AVENUE/LETRAS 108
V
V
pe
pe-la-fi-ñ Maria
VSuff
la
VSuffG VSuff
fi
VSuffG VSuff
ñ
Pass all features up from both children
VSuffGCheck that:1) negation = +2) tense is undefined
Mar 1, 2006 AVENUE/LETRAS 109
V
pe
pe-la-fi-ñ Maria
VSuff
la
VSuffG VSuff
fi
VSuffG VSuff
ñ
VSuffG
V NP
N
Maria
N person = 3number = sghuman = +
Mar 1, 2006 AVENUE/LETRAS 110
Pass features up from
V
pe
pe-la-fi-ñ Maria
VSuff
la
VSuffG VSuff
fi
VSuffG VSuff
ñ
VSuffG
NP
N
Maria
N
S
V
Check that NP is human = +V VP
Mar 1, 2006 AVENUE/LETRAS 111
V
pe
Transfer to Spanish: Top-Down
VSuff
la
VSuffG VSuff
fi
VSuffG VSuff
ñ
VSuffG
NP
N
Maria
N
S
V
VP
S
VP
Mar 1, 2006 AVENUE/LETRAS 112
V
pe
Transfer to Spanish: Top-Down
VSuff
la
VSuffG VSuff
fi
VSuffG VSuff
ñ
VSuffG
NP
N
Maria
N
S
V
VP
S
VP
NP“a”V
Pass all features to Spanish side
Mar 1, 2006 AVENUE/LETRAS 113
V
pe
Transfer to Spanish: Top-Down
VSuff
la
VSuffG VSuff
fi
VSuffG VSuff
ñ
VSuffG
NP
N
Maria
N
S
V
VP
S
VP
NP“a”V
Pass all features down
Mar 1, 2006 AVENUE/LETRAS 114
V
pe
Transfer to Spanish: Top-Down
VSuff
la
VSuffG VSuff
fi
VSuffG VSuff
ñ
VSuffG
NP
N
Maria
N
S
V
VP
S
VP
NP“a”V
Pass object features down
Mar 1, 2006 AVENUE/LETRAS 115
V
pe
Transfer to Spanish: Top-Down
VSuff
la
VSuffG VSuff
fi
VSuffG VSuff
ñ
VSuffG
NP
N
Maria
N
S
V
VP
S
VP
NP“a”V
Accusative marker on objects is introduced because human = +
Mar 1, 2006 AVENUE/LETRAS 116
V
pe
Transfer to Spanish: Top-Down
VSuff
la
VSuffG VSuff
fi
VSuffG VSuff
ñ
VSuffG
NP
N
Maria
N
S
V
VP
S
VP
NP“a”V
VP::VP [VBar NP] -> [VBar "a" NP]( (X1::Y1)
(X2::Y3)
((X2 type) = (*NOT* personal)) ((X2 human) =c +)
(X0 = X1) ((X0 object) = X2)
(Y0 = X0)
((Y0 object) = (X0 object))(Y1 = Y0)(Y3 = (Y0 object))((Y1 objmarker person) = (Y3 person))((Y1 objmarker number) = (Y3 number))((Y1 objmarker gender) = (Y3 ender)))
Mar 1, 2006 AVENUE/LETRAS 117
V
pe
Transfer to Spanish: Top-Down
VSuff
la
VSuffG VSuff
fi
VSuffG VSuff
ñ
VSuffG
NP
N
Maria
N
S
V
VP
S
VP
NP“a”V
V“no”
Pass person, number, and mood features to Spanish Verb
Assign tense = past
Mar 1, 2006 AVENUE/LETRAS 118
V
pe
Transfer to Spanish: Top-Down
VSuff
la
VSuffG VSuff
fi
VSuffG VSuff
ñ
VSuffG
NP
N
Maria
N
S
V
VP
S
VP
NP“a”V
V“no”
Introduced because negation = +
Mar 1, 2006 AVENUE/LETRAS 119
V
pe
Transfer to Spanish: Top-Down
VSuff
la
VSuffG VSuff
fi
VSuffG VSuff
ñ
VSuffG
NP
N
Maria
N
S
V
VP
S
VP
NP“a”V
V“no”
ver
Mar 1, 2006 AVENUE/LETRAS 120
V
pe
Transfer to Spanish: Top-Down
VSuff
la
VSuffG VSuff
fi
VSuffG VSuff
ñ
VSuffG
NP
N
Maria
N
S
V
VP
S
VP
NP“a”V
V“no”
vervi
person = 1number = sgmood = indicativetense = past
Mar 1, 2006 AVENUE/LETRAS 121
V
pe
Transfer to Spanish: Top-Down
VSuff
la
VSuffG VSuff
fi
VSuffG VSuff
ñ
VSuffG
NP
N
Maria
N
S
V
VP
S
VP
NP“a”V
V“no”
vi N
María
N
Pass features over to Spanish side
Mar 1, 2006 AVENUE/LETRAS 122
V
pe
I Didn’t see Maria
VSuff
la
VSuffG VSuff
fi
VSuffG VSuff
ñ
VSuffG
NP
N
Maria
N
S
V
VP
S
VP
NP“a”V
V“no”
vi N
María
N
Mar 1, 2006 AVENUE/LETRAS 123