123
AVENUE/LETRAS: Learning-based MT for Languages with Limited Resources Faculty: Jaime Carbonell, Alon Lavie, Lori Levin, Ralf Brown, Robert Frederking Students and Staff: Erik Peterson, Christian Monson, Ariadna Font Llitjós, Alison Alvarez, Roberto Aranovich, Rodolfo Vega

AVENUE/LETRAS: Learning-based MT for Languages with Limited Resources Faculty: Jaime Carbonell, Alon Lavie, Lori Levin, Ralf Brown, Robert Frederking Students

Embed Size (px)

Citation preview

Page 1: AVENUE/LETRAS: Learning-based MT for Languages with Limited Resources Faculty: Jaime Carbonell, Alon Lavie, Lori Levin, Ralf Brown, Robert Frederking Students

AVENUE/LETRAS:Learning-based MT for Languages

with Limited Resources

Faculty: Jaime Carbonell, Alon Lavie, Lori Levin, Ralf Brown, Robert FrederkingStudents and Staff: Erik Peterson, Christian Monson, Ariadna Font Llitjós, Alison Alvarez, Roberto Aranovich, Rodolfo Vega

Page 2: AVENUE/LETRAS: Learning-based MT for Languages with Limited Resources Faculty: Jaime Carbonell, Alon Lavie, Lori Levin, Ralf Brown, Robert Frederking Students

Mar 1, 2006 AVENUE/LETRAS 2

Outline

• Scientific Objectives• Framework Overview• Learning Morphology• Elicitation• Learning Transfer Rules• Automatic Rule Refinement• Language Prototypes• New Directions

Page 3: AVENUE/LETRAS: Learning-based MT for Languages with Limited Resources Faculty: Jaime Carbonell, Alon Lavie, Lori Levin, Ralf Brown, Robert Frederking Students

Mar 1, 2006 AVENUE/LETRAS 3

Why Machine Translation for Languages with Limited Resources?

• We are in the age of information explosion– The internet+web+Google anyone can get the information

they want anytime…• But what about the text in all those other languages?

– How do they read all this English stuff?– How do we read all the stuff that they put online?

• MT for these languages would Enable:– Better government access to native indigenous and minority

communities– Better minority and native community participation in

information-rich activities (health care, education, government) without giving up their languages.

– Civilian and military applications (disaster relief)– Language preservation

Page 4: AVENUE/LETRAS: Learning-based MT for Languages with Limited Resources Faculty: Jaime Carbonell, Alon Lavie, Lori Levin, Ralf Brown, Robert Frederking Students

Mar 1, 2006 AVENUE/LETRAS 4

The Roadmap to Learning-based MT

• Automatic acquisition of necessary language resources and knowledge using machine learning methodologies

• A framework for integrating the acquired MT resources into effective MT prototype systems

• Effective integration of acquired knowledge with statistical/distributional information

Page 5: AVENUE/LETRAS: Learning-based MT for Languages with Limited Resources Faculty: Jaime Carbonell, Alon Lavie, Lori Levin, Ralf Brown, Robert Frederking Students

Mar 1, 2006 AVENUE/LETRAS 5

CMU’s AVENUE Approach

• Elicitation: use bilingual native informants to produce a small high-quality word-aligned bilingual corpus of translated phrases and sentences

• Transfer-rule Learning: apply ML-based methods to automatically acquire syntactic transfer rules for translation between the two languages– Learn from major language to minor language– Translate from minor language to major language

• XFER + Decoder:– XFER engine produces a lattice of possible transferred structures

at all levels– Decoder searches and selects the best scoring combination

• Rule Refinement: automatically refine and correct the acquired transfer rules via a process of interaction with bilingual informants which help the system identify translation errors

• Morphology Learning: unsupervised learning of morpheme structure of words based on their organization into paradigms and distributional information

Page 6: AVENUE/LETRAS: Learning-based MT for Languages with Limited Resources Faculty: Jaime Carbonell, Alon Lavie, Lori Levin, Ralf Brown, Robert Frederking Students

Mar 1, 2006 AVENUE/LETRAS 6

AVENUE MT Approach

Interlingua

Syntactic Parsing

Semantic Analysis

Sentence Planning

Text Generation

Source (e.g. Quechua)

Target(e.g. English)

Transfer Rules

Direct: SMT, EBMT

AVENUE: Automate Rule Learning

Page 7: AVENUE/LETRAS: Learning-based MT for Languages with Limited Resources Faculty: Jaime Carbonell, Alon Lavie, Lori Levin, Ralf Brown, Robert Frederking Students

Mar 1, 2006 AVENUE/LETRAS 7

Avenue Architecture

Learning

Module

Learned Transfer

Rules

Lexical Resources

Run Time Transfer System

Decoder

Translation

Correction

Tool

Word-Aligned Parallel Corpus

Elicitation Tool

Elicitation Corpus

Elicitation Rule Learning

Run-Time System

Rule Refinement

Rule

Refinement

Module

Morphology

Morphology Analyzer

Learning Module Handcrafted

rules

INPUT TEXT

OUTPUT TEXT

Page 8: AVENUE/LETRAS: Learning-based MT for Languages with Limited Resources Faculty: Jaime Carbonell, Alon Lavie, Lori Levin, Ralf Brown, Robert Frederking Students

Mar 1, 2006 AVENUE/LETRAS 8

Transfer Rule Formalism

Type information

Part-of-speech/constituent information

Alignments

x-side constraints

y-side constraints

xy-constraints,

e.g. ((Y1 AGR) = (X1 AGR))

;SL: the old man, TL: ha-ish ha-zaqen

NP::NP [DET ADJ N] -> [DET N DET ADJ]((X1::Y1)(X1::Y3)(X2::Y4)(X3::Y2)

((X1 AGR) = *3-SING)((X1 DEF = *DEF)((X3 AGR) = *3-SING)((X3 COUNT) = +)

((Y1 DEF) = *DEF)((Y3 DEF) = *DEF)((Y2 AGR) = *3-SING)((Y2 GENDER) = (Y4 GENDER)))

Page 9: AVENUE/LETRAS: Learning-based MT for Languages with Limited Resources Faculty: Jaime Carbonell, Alon Lavie, Lori Levin, Ralf Brown, Robert Frederking Students

Mar 1, 2006 AVENUE/LETRAS 9

Transfer Rule Formalism (II)

Value constraints

Agreement constraints

;SL: the old man, TL: ha-ish ha-zaqen

NP::NP [DET ADJ N] -> [DET N DET ADJ]((X1::Y1)(X1::Y3)(X2::Y4)(X3::Y2)

((X1 AGR) = *3-SING)((X1 DEF = *DEF)((X3 AGR) = *3-SING)((X3 COUNT) = +)

((Y1 DEF) = *DEF)((Y3 DEF) = *DEF)((Y2 AGR) = *3-SING)((Y2 GENDER) = (Y4 GENDER)))

Page 10: AVENUE/LETRAS: Learning-based MT for Languages with Limited Resources Faculty: Jaime Carbonell, Alon Lavie, Lori Levin, Ralf Brown, Robert Frederking Students

Mar 1, 2006 AVENUE/LETRAS 10

Transfer and Decoding

Learning

Module

Learned Transfer

Rules

Lexical Resources

Run Time Transfer System

Decoder

Translation

Correction

Tool

Word-Aligned Parallel Corpus

Elicitation Tool

Elicitation Corpus

Elicitation Rule Learning

Run-Time System

Rule Refinement

Rule

Refinement

Module

Morphology

Morphology Analyzer

Learning Module Handcrafted

rules

INPUT TEXT

OUTPUT TEXT

Page 11: AVENUE/LETRAS: Learning-based MT for Languages with Limited Resources Faculty: Jaime Carbonell, Alon Lavie, Lori Levin, Ralf Brown, Robert Frederking Students

Mar 1, 2006 AVENUE/LETRAS 11

The Transfer Engine

AnalysisSource text is parsed into its grammatical structure. Determines transfer application ordering.

Example:

ראיתי את האיש הזקן

(I) saw *acc the man the old

S

VP

V P NP

D N D Adj

הזקן האיש את ראיתי

TransferA target language tree is created by reordering, insertion, and deletion.

S

NP VP

N V NP

DET Adj N

I saw the old man

Source words translated with transfer lexicon.

GenerationTarget language constraints are checked, target morphology applied, and final translation produced.

E.g. “saw” in past tense selected.

Final translation:

“I saw the old man”

Page 12: AVENUE/LETRAS: Learning-based MT for Languages with Limited Resources Faculty: Jaime Carbonell, Alon Lavie, Lori Levin, Ralf Brown, Robert Frederking Students

Mar 1, 2006 AVENUE/LETRAS 12

Symbolic Decoder

• System rarely finds a full parse/transfer for complete input sentence• XFER engine produces comprehensive lattice of segment

translations• Decoder selects best combination of translation segments• Search for optimal scoring path of partial translations, based on

multiple features:– Target Language Model scores– XFER Rule Scores– Path Fragmentation– Other features…

• Symbolic decoding essential for scenarios where there is insufficient data for training large target LM– Effective Rule Scoring is crucial

Page 13: AVENUE/LETRAS: Learning-based MT for Languages with Limited Resources Faculty: Jaime Carbonell, Alon Lavie, Lori Levin, Ralf Brown, Robert Frederking Students

Mar 1, 2006 AVENUE/LETRAS 13

Morphology Learning

Learning

Module

Learned Transfer

Rules

Lexical Resources

Run Time Transfer System

Decoder

Translation

Correction

Tool

Word-Aligned Parallel Corpus

Elicitation Tool

Elicitation Corpus

Elicitation Rule Learning

Run-Time System

Rule Refinement

Rule

Refinement

Module

Morphology

Morphology Analyzer

Learning Module Handcrafted

rules

INPUT TEXT

OUTPUT TEXT

Page 14: AVENUE/LETRAS: Learning-based MT for Languages with Limited Resources Faculty: Jaime Carbonell, Alon Lavie, Lori Levin, Ralf Brown, Robert Frederking Students

Mar 1, 2006 AVENUE/LETRAS 14

The Challenge of Morphology

Mapudungun (Indigenous Language of Chile and Argentina, ~1 Million Speakers)

Allkütulekefun

Page 15: AVENUE/LETRAS: Learning-based MT for Languages with Limited Resources Faculty: Jaime Carbonell, Alon Lavie, Lori Levin, Ralf Brown, Robert Frederking Students

Mar 1, 2006 AVENUE/LETRAS 15

The Challenge of Morphology

Mapudungun

-ke -fu -n-leAllkütu

Page 16: AVENUE/LETRAS: Learning-based MT for Languages with Limited Resources Faculty: Jaime Carbonell, Alon Lavie, Lori Levin, Ralf Brown, Robert Frederking Students

Mar 1, 2006 AVENUE/LETRAS 16

The Challenge of Morphology

Mapudungun

-ke

-past

-fu

-indic.1sg

-n

-habitual

-le

-prog.

Allkütu

Listen

Page 17: AVENUE/LETRAS: Learning-based MT for Languages with Limited Resources Faculty: Jaime Carbonell, Alon Lavie, Lori Levin, Ralf Brown, Robert Frederking Students

Mar 1, 2006 AVENUE/LETRAS 17

The Challenge of Morphology

Mapudungun

-ke

-past

-fu

-indic.1sg

-n

-habitual

-le

-prog.

Allkütu

Listen

I

Page 18: AVENUE/LETRAS: Learning-based MT for Languages with Limited Resources Faculty: Jaime Carbonell, Alon Lavie, Lori Levin, Ralf Brown, Robert Frederking Students

Mar 1, 2006 AVENUE/LETRAS 18

The Challenge of Morphology

Mapudungun

I used to

-ke

-past

-fu

-indic.1sg

-n

-habitual

-le

-prog.

Allkütu

Listen

Page 19: AVENUE/LETRAS: Learning-based MT for Languages with Limited Resources Faculty: Jaime Carbonell, Alon Lavie, Lori Levin, Ralf Brown, Robert Frederking Students

Mar 1, 2006 AVENUE/LETRAS 19

The Challenge of Morphology

Mapudungun

I used to listen

-ke

-past

-fu

-indic.1sg

-n

-habitual

-le

-prog.

Allkütu

Listen

Page 20: AVENUE/LETRAS: Learning-based MT for Languages with Limited Resources Faculty: Jaime Carbonell, Alon Lavie, Lori Levin, Ralf Brown, Robert Frederking Students

Mar 1, 2006 AVENUE/LETRAS 20

The Challenge of Morphology

Mapudungun

I used to listen

-ke

-past

-fu

-indic.1sg

-n

-habitual

-le

-prog.

Allkütu

Listen

Tasks for Morphology• Segment Words• Map Morphemes onto Features

Page 21: AVENUE/LETRAS: Learning-based MT for Languages with Limited Resources Faculty: Jaime Carbonell, Alon Lavie, Lori Levin, Ralf Brown, Robert Frederking Students

Mar 1, 2006 AVENUE/LETRAS 21

The Challenge of Morphology

Tasks for Morphology

• Segment Words• Map Morphemes

onto Features

• Learn these tasks– unsupervised – from data – for any language

Page 22: AVENUE/LETRAS: Learning-based MT for Languages with Limited Resources Faculty: Jaime Carbonell, Alon Lavie, Lori Levin, Ralf Brown, Robert Frederking Students

Mar 1, 2006 AVENUE/LETRAS 22

Leverage the Natural Structure of Morphology

• Paradigm– Set of affixes that

interchangeably attach to a set of stems

Our Approach

Page 23: AVENUE/LETRAS: Learning-based MT for Languages with Limited Resources Faculty: Jaime Carbonell, Alon Lavie, Lori Levin, Ralf Brown, Robert Frederking Students

Mar 1, 2006 AVENUE/LETRAS 23

Ø.sblamesolve

Example Vocabulary

blame blamed blames roamed

roaming roams solve solves solving

Our Approach

Leverage the Natural Structure of Morphology

• Paradigm– Set of affixes that

interchangeably attach to a set of stems

Page 24: AVENUE/LETRAS: Learning-based MT for Languages with Limited Resources Faculty: Jaime Carbonell, Alon Lavie, Lori Levin, Ralf Brown, Robert Frederking Students

Mar 1, 2006 AVENUE/LETRAS 24

Ø.sblamesolve

Ø.s.dblame

Example Vocabulary

blame blamed blames roamed

roaming roams solve solves solving

Our Approach

Leverage the Natural Structure of Morphology

• Paradigm– Set of affixes that

interchangeably attach to a set of stems

Page 25: AVENUE/LETRAS: Learning-based MT for Languages with Limited Resources Faculty: Jaime Carbonell, Alon Lavie, Lori Levin, Ralf Brown, Robert Frederking Students

Mar 1, 2006 AVENUE/LETRAS 25

Ø.sblamesolve

Ø.s.dblame

Example Vocabulary

blame blamed blames roamed

roaming roams solve solves solving

Our Approach

Leverage the Natural Structure of Morphology

• Paradigm– Set of affixes that

interchangeably attach to a set of stems

Page 26: AVENUE/LETRAS: Learning-based MT for Languages with Limited Resources Faculty: Jaime Carbonell, Alon Lavie, Lori Levin, Ralf Brown, Robert Frederking Students

Mar 1, 2006 AVENUE/LETRAS 26

Ø.sblamesolve

Ø.s.dblame

sblameroamsolve

Example Vocabulary

blame blamed blames roamed

roaming roams solve solves solving

Our Approach

Leverage the Natural Structure of Morphology

• Paradigm– Set of affixes that

interchangeably attach to a set of stems

Page 27: AVENUE/LETRAS: Learning-based MT for Languages with Limited Resources Faculty: Jaime Carbonell, Alon Lavie, Lori Levin, Ralf Brown, Robert Frederking Students

Mar 1, 2006 AVENUE/LETRAS 27

Ø.sblamesolve

Ø.s.dblame

sblameroamsolve

Example Vocabulary

blame blamed blames roamed

roaming roams solve solves solving

Our Approach

Leverage the Natural Structure of Morphology

• Paradigm– Set of affixes that

interchangeably attach to a set of stems

Page 28: AVENUE/LETRAS: Learning-based MT for Languages with Limited Resources Faculty: Jaime Carbonell, Alon Lavie, Lori Levin, Ralf Brown, Robert Frederking Students

Mar 1, 2006 AVENUE/LETRAS 28

Ø.sblamesolve

Ø.s.dblame

sblameroamsolve

e.esblamsolv

Example Vocabulary

blame blamed blames roamed

roaming roams solve solves solving

Our Approach

Page 29: AVENUE/LETRAS: Learning-based MT for Languages with Limited Resources Faculty: Jaime Carbonell, Alon Lavie, Lori Levin, Ralf Brown, Robert Frederking Students

Mar 1, 2006 AVENUE/LETRAS 29

Ø.sblamesolve

Example Vocabulary

blame blamed blames roamed

roaming roams solve solves solving

Ø.s.dblame

sblameroamsolve

e.esblamsolv

Our Approach

Page 30: AVENUE/LETRAS: Learning-based MT for Languages with Limited Resources Faculty: Jaime Carbonell, Alon Lavie, Lori Levin, Ralf Brown, Robert Frederking Students

Mar 1, 2006 AVENUE/LETRAS 30

e.esblamsolv

e.edblam

esblamsolv

Ø.s.dblame

Ø.sblamesolve

Øblameblamesblamedroams

roamedroaming

solvesolvessolving

e.es.edblam

edblamroam

dblameroame

Ø.dblame

s.dblame

sblameroamsolve

es.edblam

eblamsolv

me.mesbla

me.medbla

mesbla

me.mes.medbla

medblaroa

mes.medbla

mebla

Page 31: AVENUE/LETRAS: Learning-based MT for Languages with Limited Resources Faculty: Jaime Carbonell, Alon Lavie, Lori Levin, Ralf Brown, Robert Frederking Students

Mar 1, 2006 AVENUE/LETRAS 31

a.as.o.os43

african, cas, jurídic, l, ...

a.as.o.os.tro1

cas

a.as.os50

afectad, cas, jurídic, l, ...

a.as.o59

cas, citad, jurídic, l, ...

a.o.os105

impuest, indonesi, italian, jurídic, ...

a.as199

huelg, incluid, industri,

inundad, ...

a.os134

impedid, impuest, indonesi,

inundad, ...

as.os68

cas, implicad, inundad, jurídic, ...

a.o214

id, indi, indonesi,

inmediat, ...

as.o85

intern, jurídic, just, l, ...

a.tro2

cas.cen

a1237

huelg, ib, id, iglesi, ...

as404

huelg, huelguist, incluid,

industri, ...

os534

humorístic, human, hígad,

impedid, ...

o1139

hub, hug, human,

huyend, ...

tro16

catas, ce, cen, cua, ...

as.o.os54

cas, implicad, jurídic, l, ...

o.os268

human, implicad, indici,

indocumentad, ...

Spanish Newswire Corpus

40,011 Tokens

6,975 Types

31

Page 32: AVENUE/LETRAS: Learning-based MT for Languages with Limited Resources Faculty: Jaime Carbonell, Alon Lavie, Lori Levin, Ralf Brown, Robert Frederking Students

Mar 1, 2006 AVENUE/LETRAS 32

a.as.o.os43

african, cas, jurídic, l, ...

a.as.o.os.tro1

cas

a.as.os50

afectad, cas, jurídic, l, ...

a.as.o59

cas, citad, jurídic, l, ...

a.o.os105

impuest, indonesi, italian, jurídic, ...

a.as199

huelg, incluid, industri,

inundad, ...

a.os134

impedid, impuest, indonesi,

inundad, ...

as.os68

cas, implicad, inundad, jurídic, ...

a.o214

id, indi, indonesi,

inmediat, ...

as.o85

intern, jurídic, just, l, ...

a.tro2

cas.cen

a1237

huelg, ib, id, iglesi, ...

as404

huelg, huelguist, incluid,

industri, ...

os534

humorístic, human, hígad,

impedid, ...

o1139

hub, hug, human,

huyend, ...

tro16

catas, ce, cen, cua, ...

as.o.os54

cas, implicad, jurídic, l, ...

o.os268

human, implicad, indici,

indocumentad, ...

32

Suffixes

Stems

Level 5 = 5 suffixes

Stem Type Count

Page 33: AVENUE/LETRAS: Learning-based MT for Languages with Limited Resources Faculty: Jaime Carbonell, Alon Lavie, Lori Levin, Ralf Brown, Robert Frederking Students

Mar 1, 2006 AVENUE/LETRAS 33

a.as.o.os43

african, cas, jurídic, l, ...

Adjective Paradigm

33

a.as.o.os.tro1

cas

a.tro2

cas.cen

tro16

catas, ce, cen, cua, ...

a.as.os50

afectad, cas, jurídic, l, ...

a.as.o59

cas, citad, jurídic, l, ...

a.o.os105

impuest, indonesi, italian, jurídic, ...

a.as199

huelg, incluid, industri,

inundad, ...

a.os134

impedid, impuest, indonesi,

inundad, ...

as.os68

cas, implicad, inundad, jurídic, ...

a.o214

id, indi, indonesi,

inmediat, ...

as.o85

intern, jurídic, just, l, ...

a1237

huelg, ib, id, iglesi, ...

as404

huelg, huelguist, incluid,

industri, ...

os534

humorístic, human, hígad,

impedid, ...

o1139

hub, hug, human,

huyend, ...

as.o.os54

cas, implicad, jurídic, l, ...

o.os268

human, implicad, indici,

indocumentad, ...

From the spurious suffix “tro”

Page 34: AVENUE/LETRAS: Learning-based MT for Languages with Limited Resources Faculty: Jaime Carbonell, Alon Lavie, Lori Levin, Ralf Brown, Robert Frederking Students

Mar 1, 2006 AVENUE/LETRAS 34

a.as.o.os.tro1

cas

a.tro2

cas.cen

tro16

catas, ce, cen, cua, ...

a.as.o.os43

african, cas, jurídic, l, ...

a.as.os50

afectad, cas, jurídic, l, ...

a.as.o59

cas, citad, jurídic, l, ...

a.o.os105

impuest, indonesi, italian, jurídic, ...

a.as199

huelg, incluid, industri,

inundad, ...

a.os134

impedid, impuest, indonesi,

inundad, ...

as.os68

cas, implicad, inundad, jurídic, ...

a.o214

id, indi, indonesi,

inmediat, ...

as.o85

intern, jurídic, just, l, ...

a1237

huelg, ib, id, iglesi, ...

as404

huelg, huelguist, incluid,

industri, ...

os534

humorístic, human, hígad,

impedid, ...

o1139

hub, hug, human,

huyend, ...

as.o.os54

cas, implicad, jurídic, l, ...

o.os268

human, implicad, indici,

indocumentad, ...

34

De

cre

asin

g S

tem

Co

un

t

Incr

ea

sin

g S

uffix

Co

unt

Basic Search Procedure

Page 35: AVENUE/LETRAS: Learning-based MT for Languages with Limited Resources Faculty: Jaime Carbonell, Alon Lavie, Lori Levin, Ralf Brown, Robert Frederking Students

Mar 1, 2006 AVENUE/LETRAS 35

Examples and Evaluation of Automatically Selected Suffix Sets

Ø.ba.n.ndo ada.adas.ado.ados.aron.ó

a.aba.ado.ados.ar.ará.arán ada.ado.ados.ar.o

a.aciones.ación.adas.ado.ar ado.adores.o

a.ada.adas.ado.ar.ará ado.ados.arse.e

a.adas.ado.an.ar ado.ar.aron.arse.ará

a.ado.ados.ar.ó do.dos.ndo.r.ron

a.ado.an.arse.ó e.ida.ido

a.ado.aron.arse.ó emos.ido.ía.ían

aba.ada.ado.ar.o.os ida.ido.idos.ir.ió

aciones.ación.ado.ados ido.iendo.ir

aciones.ado.ados.ará ido.ir.ro

ación.ado.an.e

35

Global Suffix Evaluation

Precision: 0.506

Recall: 0.517

F1: 0.511

KeyCorrectWrong

Page 36: AVENUE/LETRAS: Learning-based MT for Languages with Limited Resources Faculty: Jaime Carbonell, Alon Lavie, Lori Levin, Ralf Brown, Robert Frederking Students

Mar 1, 2006 AVENUE/LETRAS 36

Next Steps for Morphology Induction

• Improve the Quality of Induced Paradigms– Current Work

• Convert Paradigms into a Segmenter– Soon

• Learn Mappings from Morphemes to Features– Future Goal

Page 37: AVENUE/LETRAS: Learning-based MT for Languages with Limited Resources Faculty: Jaime Carbonell, Alon Lavie, Lori Levin, Ralf Brown, Robert Frederking Students

Mar 1, 2006 AVENUE/LETRAS 37

Elicitation

Learning

Module

Learned Transfer

Rules

Lexical Resources

Run Time Transfer System

Decoder

Translation

Correction

Tool

Word-Aligned Parallel Corpus

Elicitation Tool

Elicitation Corpus

Elicitation Rule Learning

Run-Time System

Rule Refinement

Rule

Refinement

Module

Morphology

Morphology Analyzer

Learning Module Handcrafted

rules

INPUT TEXT

OUTPUT TEXT

Page 38: AVENUE/LETRAS: Learning-based MT for Languages with Limited Resources Faculty: Jaime Carbonell, Alon Lavie, Lori Levin, Ralf Brown, Robert Frederking Students

Mar 1, 2006 AVENUE/LETRAS 38

Purpose of Elicitation

• Provide a small but highly targeted corpus of hand aligned data– To support machine

learning from a small data set

– To cover all basic morpho-syntactic phenomena.

newpairsrcsent: Tú caístetgtsent: eymi ütrünagimialigned: ((1,1),(2,2))context: tú = Juan [masculino, 2a persona del

singular]comment: You (John) fell

newpairsrcsent: Tú estás cayendotgtsent: eymi petu ütünagimialigned: ((1,1),(2 3,2 3))context: tú = Juan [masculino, 2a persona del

singular]comment: You (John) are falling

newpairsrcsent: Tú caíste tgtsent: eymi ütrunagimialigned: ((1,1),(2,2))context: tú = María [femenino, 2a persona del

singular]comment: You (Mary) fell

Page 39: AVENUE/LETRAS: Learning-based MT for Languages with Limited Resources Faculty: Jaime Carbonell, Alon Lavie, Lori Levin, Ralf Brown, Robert Frederking Students

Mar 1, 2006 AVENUE/LETRAS 39

Purpose of Elicitation

• To get data from someone who is– Bilingual – Literate– Not experienced with linguistics

Page 40: AVENUE/LETRAS: Learning-based MT for Languages with Limited Resources Faculty: Jaime Carbonell, Alon Lavie, Lori Levin, Ralf Brown, Robert Frederking Students

Mar 1, 2006 AVENUE/LETRAS 40

English-Hindi Example

Page 41: AVENUE/LETRAS: Learning-based MT for Languages with Limited Resources Faculty: Jaime Carbonell, Alon Lavie, Lori Levin, Ralf Brown, Robert Frederking Students

Mar 1, 2006 AVENUE/LETRAS 41

English-Chinese Example

Page 42: AVENUE/LETRAS: Learning-based MT for Languages with Limited Resources Faculty: Jaime Carbonell, Alon Lavie, Lori Levin, Ralf Brown, Robert Frederking Students

Mar 1, 2006 AVENUE/LETRAS 42

English-Arabic Example

Page 43: AVENUE/LETRAS: Learning-based MT for Languages with Limited Resources Faculty: Jaime Carbonell, Alon Lavie, Lori Levin, Ralf Brown, Robert Frederking Students

Mar 1, 2006 AVENUE/LETRAS 43

The Elicitation Tool has been used with these languages

• Mapudungun• Hindi• Hebrew• Quechua• Aymara• Thai• Japanese• Chinese• Dutch• Arabic

Page 44: AVENUE/LETRAS: Learning-based MT for Languages with Limited Resources Faculty: Jaime Carbonell, Alon Lavie, Lori Levin, Ralf Brown, Robert Frederking Students

Mar 1, 2006 AVENUE/LETRAS 44

Elicitation Corpus: List of Minimal Pairs of Sentences in a Major Language

Eliciting from Spanish

Canto

Canté

Estoy cantando

Cantaste

Eliciting from

English

I sing

I sang

I am singing

You sang

Page 45: AVENUE/LETRAS: Learning-based MT for Languages with Limited Resources Faculty: Jaime Carbonell, Alon Lavie, Lori Levin, Ralf Brown, Robert Frederking Students

Mar 1, 2006 AVENUE/LETRAS 45

AVENUE Elicitation Corpora

• The Functional-Typological Corpus– Designed to elicit elements of meaning that

may have morpho-syntactic realization

• The Structural Elicitation Corpus– Based on sentence structures from the Penn

TreeBank

Page 46: AVENUE/LETRAS: Learning-based MT for Languages with Limited Resources Faculty: Jaime Carbonell, Alon Lavie, Lori Levin, Ralf Brown, Robert Frederking Students

Mar 1, 2006

The Process

List of semantic features and values

The Corpus

Feature Maps: which combinations of features and values are of interest

…Clause-Level

Noun-Phrase

Tense & Aspect Modality

Feature Structure Sets

Feature Specification

Reverse Annotated Feature Structure Sets: add English sentences

Smaller CorpusSampling

Page 47: AVENUE/LETRAS: Learning-based MT for Languages with Limited Resources Faculty: Jaime Carbonell, Alon Lavie, Lori Levin, Ralf Brown, Robert Frederking Students

Mar 1, 2006 AVENUE/LETRAS 47

Feature Structuressrcsent: Mary was not a leader.context: Translate this as though it were spoken to a peer co-

worker;

((actor ((np-function fn-actor)(np-animacy anim-human)(np- biological-gender bio-gender-female) (np-general-type proper-noun-type)(np-identifiability identifiable)(np- specificity specific)…))

(pred ((np-function fn-predicate-nominal)(np-animacy anim- human)(np-biological-gender bio-gender-female) (np- general-type common-noun-type)(np-specificity specificity- neutral)…))

(c-v-lexical-aspect state)(c-copula-type copula-role)(c-secondary-type secondary-copula)(c-solidarity solidarity-neutral) (c-v-grammatical-aspect gram-aspect-neutral)(c-v-absolute-tense past) (c-v-phase-aspect phase-aspect-neutral) (c-general-type declarative-clause)(c-polarity polarity-negative)(c-my-causer-intentionality intentionality-n/a)(c-comparison-type comparison-n/a)(c-relative-tense relative-n/a)(c-our-boundary boundary-n/a)…)

Page 48: AVENUE/LETRAS: Learning-based MT for Languages with Limited Resources Faculty: Jaime Carbonell, Alon Lavie, Lori Levin, Ralf Brown, Robert Frederking Students

Mar 1, 2006 AVENUE/LETRAS 48

Feature Specification

• Defines Features and their values

• Sets default values for features

• Specifies feature requirements and restrictions

• Written in XML

Page 49: AVENUE/LETRAS: Learning-based MT for Languages with Limited Resources Faculty: Jaime Carbonell, Alon Lavie, Lori Levin, Ralf Brown, Robert Frederking Students

Mar 1, 2006 AVENUE/LETRAS 49

Feature SpecificationFeature: c-copula-type

(a copula is a verb like “be”; some languages do not have copulas)Values     

copula-n/a   Restrictions: 1. ~(c-secondary-type secondary-copula)Notes:

copula-role   Restrictions: 1. (c-secondary-type secondary-copula)Notes: 1. A role is something like a job or a function. "He is a teacher" "This is a vegetable peeler"

copula-identity   Restrictions: 1. (c-secondary-type secondary-copula)Notes: 1. "Clark Kent is Superman" "Sam is the teacher"

copula-location   Restrictions: 1. (c-secondary-type secondary-copula)Notes: 1. "The book is on the table" There is a long list of locative relations later in the feature specification.

copula-description   Restrictions: 1. (c-secondary-type secondary-copula)Notes: 1. A description is an attribute. "The children are happy." "The books are long."

Page 50: AVENUE/LETRAS: Learning-based MT for Languages with Limited Resources Faculty: Jaime Carbonell, Alon Lavie, Lori Levin, Ralf Brown, Robert Frederking Students

Mar 1, 2006 AVENUE/LETRAS 50

Feature Maps

• Some features interact in the grammar– English –s reflects person and number of the subject and tense of

the verb.– In expressing the English present progressive tense, the auxiliary

verb is in a different place in a question and a statement:• He is running.

• Is he running?

• We need to check many, but not all combinations of features and values.

• Using unlimited feature combinations leads to an unmanageable number of sentences

Page 51: AVENUE/LETRAS: Learning-based MT for Languages with Limited Resources Faculty: Jaime Carbonell, Alon Lavie, Lori Levin, Ralf Brown, Robert Frederking Students

Mar 1, 2006 AVENUE/LETRAS 51

Page 52: AVENUE/LETRAS: Learning-based MT for Languages with Limited Resources Faculty: Jaime Carbonell, Alon Lavie, Lori Levin, Ralf Brown, Robert Frederking Students

Mar 1, 2006 AVENUE/LETRAS 52

Evidentiality Map

Lexical Aspect

Assertiveness

Polarity

Source

Tense

Gram.

Aspect

activity-accomplishment

Assertiveness-asserted, Assetiveness-neutral

Polarity-positive, Polarity-negative

Hearsay, quotative, inferred, assumption

Visual, Auditory, non-visual-or-auditory

Past Present, Future Past Present

Perfective, progressive, habitual, neutral

habitual, neutral, progressive

Perfective, progressive, habitual, neutral

habitual, neutral, progressive

Page 53: AVENUE/LETRAS: Learning-based MT for Languages with Limited Resources Faculty: Jaime Carbonell, Alon Lavie, Lori Levin, Ralf Brown, Robert Frederking Students

Mar 1, 2006 AVENUE/LETRAS 53

Current Work

• Navigation– Start: large search space of all possible

feature combinations– Finish: each feature has been eliminated as

irrelevant or has been explored– Goal: dynamically find the most efficient path

through the search space for each language.

Page 54: AVENUE/LETRAS: Learning-based MT for Languages with Limited Resources Faculty: Jaime Carbonell, Alon Lavie, Lori Levin, Ralf Brown, Robert Frederking Students

Mar 1, 2006 AVENUE/LETRAS 54

Current Work

• Feature Detection– Which features have an effect on

morphosyntax?– What is the effect?– Drives the Navigation process

Page 55: AVENUE/LETRAS: Learning-based MT for Languages with Limited Resources Faculty: Jaime Carbonell, Alon Lavie, Lori Levin, Ralf Brown, Robert Frederking Students

Mar 1, 2006 AVENUE/LETRAS 55

Feature Detection: Spanish

The girl saw a red book.((1,1)(2,2)(3,3)(4,4)(5,6)(6,5))La niña vió un libro rojo

A girl saw a red book((1,1)(2,2)(3,3)(4,4)(5,6)(6,5))Una niña vió un libro rojo

I saw the red book((1,1)(2,2)(3,3)(4,5)(5,4))Yo vi el libro rojo

I saw a red book.

((1,1)(2,2)(3,3)(4,5)(5,4)) Yo vi un libro rojo

Feature: definitenessValues: definite, indefiniteFunction-of-*: subj, objMarked-on-head-of-*: noMarked-on-dependent: yesMarked-on-governor: noMarked-on-other: noAdd/delete-word: noChange-in-alignment: no

Page 56: AVENUE/LETRAS: Learning-based MT for Languages with Limited Resources Faculty: Jaime Carbonell, Alon Lavie, Lori Levin, Ralf Brown, Robert Frederking Students

Mar 1, 2006 AVENUE/LETRAS 56

Feature Detection: Chinese

A girl saw a red book.

((1,2)(2,2)(3,3)(3,4)(4,5)(5,6)(5,7)(6,8))

有 一个 女人 看见 了 一本 红色 的 书 。

The girl saw a red book.

((1,1)(2,1)(3,3)(3,4)(4,5)(5,6)(6,7))

女人 看见 了 一本 红色的 书

Feature: definiteness

Values: definite, indefinite

Function-of-*: subject

Marked-on-head-of-*: no

Marked-on-dependent: no

Marked-on-governor: no

Add/delete-word: yes

Change-in-alignment: no

Page 57: AVENUE/LETRAS: Learning-based MT for Languages with Limited Resources Faculty: Jaime Carbonell, Alon Lavie, Lori Levin, Ralf Brown, Robert Frederking Students

Mar 1, 2006 AVENUE/LETRAS 57

Feature Detection: Chinese

I saw the red book((1, 3)(2, 4)(2, 5)(4, 1)(5, 2))

红色的 书, 我 看见 了

I saw a red book.((1,1)(2,2)(2,3)(2, 4)(4,5)(5,6))我 看见 了 一本 红色的 书 。

Feature: definitenesValues: definite, indefiniteFunction-of-*: objectMarked-on-head-of-*: noMarked-on-dependent: noMarked-on-governor: noAdd/delete-word: yesChange-in-alignment: yes

Page 58: AVENUE/LETRAS: Learning-based MT for Languages with Limited Resources Faculty: Jaime Carbonell, Alon Lavie, Lori Levin, Ralf Brown, Robert Frederking Students

Mar 1, 2006 AVENUE/LETRAS 58

Feature Detection: Hebrew

A girl saw a red book.((2,1) (3,2)(5,4)(6,3))

ראתה ספר אדוםילדה

The girl saw a red book((1,1)(2,1)(3,2)(5,4)(6,3))

ראתה ספר אדוםהילדה

I saw a red book.((2,1)(4,3)(5,2))

אדוםספרראיתי

I saw the red book.((2,1)(3,3)(3,4)(4,4)(5,3))

האדוםהספרראיתי את

Feature: definitenessValues: definite, indefiniteFunction-of-*: subj, objMarked-on-head-of-*: yesMarked-on-dependent: yesMarked-on-governor: noAdd-word: noChange-in-alignment: no

Page 59: AVENUE/LETRAS: Learning-based MT for Languages with Limited Resources Faculty: Jaime Carbonell, Alon Lavie, Lori Levin, Ralf Brown, Robert Frederking Students

Mar 1, 2006 AVENUE/LETRAS 59

Feature Detection Feeds into…

• Corpus Navigation: which minimal pairs to pursue next.– Don’t pursue gender in Mapudungun– Do pursue definiteness in Hebrew

• Morphology Learning:– Morphological learner identifies the forms of the morphemes– Feature detection identifies the functions

• Rule learning:– Rule learner will have to learn a constraint for each morpho-

syntactic marker that is discovered• E.g., Adjectives and nouns agree in gender, number, and definiteness

in Hebrew.

Page 60: AVENUE/LETRAS: Learning-based MT for Languages with Limited Resources Faculty: Jaime Carbonell, Alon Lavie, Lori Levin, Ralf Brown, Robert Frederking Students

Mar 1, 2006 AVENUE/LETRAS 60

Rule Learning

Learning

Module

Learned Transfer

Rules

Lexical Resources

Run Time Transfer System

Decoder

Translation

Correction

Tool

Word-Aligned Parallel Corpus

Elicitation Tool

Elicitation Corpus

Elicitation Rule Learning

Run-Time System

Rule Refinement

Rule

Refinement

Module

Morphology

Morphology Analyzer

Learning Module Handcrafted

rules

INPUT TEXT

OUTPUT TEXT

Page 61: AVENUE/LETRAS: Learning-based MT for Languages with Limited Resources Faculty: Jaime Carbonell, Alon Lavie, Lori Levin, Ralf Brown, Robert Frederking Students

Mar 1, 2006 AVENUE/LETRAS 61

Rule Learning - Overview

• Goal: Acquire Syntactic Transfer Rules• Use available knowledge from the major-

language side (grammatical structure)• Three steps:

1. Flat Seed Generation: first guesses at transfer rules; flat syntactic structure

2. Compositionality Learning: use previously learned rules to learn hierarchical structure

3. Constraint Learning: refine rules by learning appropriate feature constraints

Page 62: AVENUE/LETRAS: Learning-based MT for Languages with Limited Resources Faculty: Jaime Carbonell, Alon Lavie, Lori Levin, Ralf Brown, Robert Frederking Students

Mar 1, 2006 AVENUE/LETRAS 62

Flat Seed Rule Generation

Learning Example: NP

Eng: the big apple

Heb: ha-tapuax ha-gadol

Generated Seed Rule:

NP::NP [ART ADJ N] [ART N ART ADJ]

((X1::Y1)

(X1::Y3)

(X2::Y4)

(X3::Y2))

Page 63: AVENUE/LETRAS: Learning-based MT for Languages with Limited Resources Faculty: Jaime Carbonell, Alon Lavie, Lori Levin, Ralf Brown, Robert Frederking Students

Mar 1, 2006 AVENUE/LETRAS 63

Flat Seed Rule Generation

• Create a “flat” transfer rule specific to the sentence pair, partially abstracted to POS– Words that are aligned word-to-word and have the same POS in

both languages are generalized to their POS– Words that have complex alignments (or not the same POS)

remain lexicalized

• One seed rule for each translation example• No feature constraints associated with seed rules (but

mark the example(s) from which it was learned)

Page 64: AVENUE/LETRAS: Learning-based MT for Languages with Limited Resources Faculty: Jaime Carbonell, Alon Lavie, Lori Levin, Ralf Brown, Robert Frederking Students

Mar 1, 2006 AVENUE/LETRAS 64

Compositionality Learning

Initial Flat Rules: S::S [ART ADJ N V ART N] [ART N ART ADJ V P ART N]

((X1::Y1) (X1::Y3) (X2::Y4) (X3::Y2) (X4::Y5) (X5::Y7) (X6::Y8))

NP::NP [ART ADJ N] [ART N ART ADJ]

((X1::Y1) (X1::Y3) (X2::Y4) (X3::Y2))

NP::NP [ART N] [ART N]

((X1::Y1) (X2::Y2))

Generated Compositional Rule:

S::S [NP V NP] [NP V P NP]

((X1::Y1) (X2::Y2) (X3::Y4))

Page 65: AVENUE/LETRAS: Learning-based MT for Languages with Limited Resources Faculty: Jaime Carbonell, Alon Lavie, Lori Levin, Ralf Brown, Robert Frederking Students

Mar 1, 2006 AVENUE/LETRAS 65

Compositionality Learning

• Detection: traverse the c-structure of the English sentence, add compositional structure for translatable chunks

• Generalization: adjust constituent sequences and alignments

• Two implemented variants:– Safe Compositionality: there exists a transfer rule that

correctly translates the sub-constituent– Maximal Compositionality: Generalize the rule if supported

by the alignments, even in the absence of an existing transfer rule for the sub-constituent

Page 66: AVENUE/LETRAS: Learning-based MT for Languages with Limited Resources Faculty: Jaime Carbonell, Alon Lavie, Lori Levin, Ralf Brown, Robert Frederking Students

Mar 1, 2006 AVENUE/LETRAS 66

Constraint LearningInput: Rules and their Example Sets

S::S [NP V NP] [NP V P NP] {ex1,ex12,ex17,ex26}

((X1::Y1) (X2::Y2) (X3::Y4))

NP::NP [ART ADJ N] [ART N ART ADJ] {ex2,ex3,ex13}

((X1::Y1) (X1::Y3) (X2::Y4) (X3::Y2))

NP::NP [ART N] [ART N] {ex4,ex5,ex6,ex8,ex10,ex11}

((X1::Y1) (X2::Y2))

Output: Rules with Feature Constraints:

S::S [NP V NP] [NP V P NP]

((X1::Y1) (X2::Y2) (X3::Y4)

(X1 NUM = X2 NUM)

(Y1 NUM = Y2 NUM)

(X1 NUM = Y1 NUM))

Page 67: AVENUE/LETRAS: Learning-based MT for Languages with Limited Resources Faculty: Jaime Carbonell, Alon Lavie, Lori Levin, Ralf Brown, Robert Frederking Students

Mar 1, 2006 AVENUE/LETRAS 67

Constraint Learning

• Goal: add appropriate feature constraints to the acquired rules• Methodology:

– Preserve general structural transfer– Learn specific feature constraints from example set

• Seed rules are grouped into clusters of similar transfer structure (type, constituent sequences, alignments)

• Each cluster forms a version space: a partially ordered hypothesis space with a specific and a general boundary

• The seed rules in a group form the specific boundary of a version space

• The general boundary is the (implicit) transfer rule with the same type, constituent sequences, and alignments, but no feature constraints

Page 68: AVENUE/LETRAS: Learning-based MT for Languages with Limited Resources Faculty: Jaime Carbonell, Alon Lavie, Lori Levin, Ralf Brown, Robert Frederking Students

Mar 1, 2006 AVENUE/LETRAS 68

Rule Refinement

Learning

Module

Learned Transfer

Rules

Lexical Resources

Run Time Transfer System

Decoder

Translation

Correction

Tool

Word-Aligned Parallel Corpus

Elicitation Tool

Elicitation Corpus

Elicitation Rule Learning

Run-Time System

Rule Refinement

Rule

Refinement

Module

Morphology

Morphology Analyzer

Learning Module Handcrafted

rules

INPUT TEXT

OUTPUT TEXT

Page 69: AVENUE/LETRAS: Learning-based MT for Languages with Limited Resources Faculty: Jaime Carbonell, Alon Lavie, Lori Levin, Ralf Brown, Robert Frederking Students

Mar 1, 2006 AVENUE/LETRAS 69

Interactive and Automatic Refinement of Translation Rules

• Problem: Improve Machine Translation quality.

• Proposed Solution: Put bilingual speakers back into the loop; use their corrections to detect the source of the error and automatically improve the lexicon and the grammar.

• Approach: Automate post-editing efforts by feeding them back into the MT system.Automatic refinement of translation rules that

caused an error beyond post-editing.

• Goal: Improve MT coverage and overall quality.

Page 70: AVENUE/LETRAS: Learning-based MT for Languages with Limited Resources Faculty: Jaime Carbonell, Alon Lavie, Lori Levin, Ralf Brown, Robert Frederking Students

Mar 1, 2006 AVENUE/LETRAS 70

Technical Challenges

Elicit minimal MT information from non-expert users

Automatically Refine and Expand

Translation Rules minimally

Manually written Automatically Learned

Automatic Evaluation of Refinement process

Page 71: AVENUE/LETRAS: Learning-based MT for Languages with Limited Resources Faculty: Jaime Carbonell, Alon Lavie, Lori Levin, Ralf Brown, Robert Frederking Students

AVENUE/LETRAS 71

Error Typology for Automatic Rule Refinement (simplified)

Missing word

Extra word

Wrong word order

Incorrect word

Wrong agreement

Local vs Long distance

Word vs. phrase

+ Word change

Sense

Form

Selectional restrictions

Idiom

Missing constraint

Extra constraint

Page 72: AVENUE/LETRAS: Learning-based MT for Languages with Limited Resources Faculty: Jaime Carbonell, Alon Lavie, Lori Levin, Ralf Brown, Robert Frederking Students

Mar 1, 2006 AVENUE/LETRAS 72

TCTool (Demo)• Add a word• Delete a word• Modify a word• Change word order

Actions:

Interactive elicitation of error information

precision recall

error detection 90% 89%

error classification 72% 71%

Page 73: AVENUE/LETRAS: Learning-based MT for Languages with Limited Resources Faculty: Jaime Carbonell, Alon Lavie, Lori Levin, Ralf Brown, Robert Frederking Students

Mar 1, 2006 AVENUE/LETRAS 73

1. Refine a translation rule:R0 R1 (change R0 to make it more

specific or more general)

Types of Refinement Operations

Automatic Rule Adaptation

R0:

R1:

NP

DET N ADJ

NP

DET ADJ N

a nice house

una casa bonito

NP

DET N ADJ

NP

DET ADJ N

a nice house

una casa bonita

N gender = ADJ gender

Page 74: AVENUE/LETRAS: Learning-based MT for Languages with Limited Resources Faculty: Jaime Carbonell, Alon Lavie, Lori Levin, Ralf Brown, Robert Frederking Students

Mar 1, 2006 AVENUE/LETRAS 74

2. Bifurcate a translation rule:R0 R0 (same, general rule)

R1 (add a new more specific rule)

Types of Refinement Operations

Automatic Rule Adaptation

R0: NP

DET N ADJ

NP

DET ADJ N

NP

DET ADJ N

NP

DET ADJ N

R1:

a nice house una casa bonita

a great artist un gran artista

ADJ type: pre-nominal

Page 75: AVENUE/LETRAS: Learning-based MT for Languages with Limited Resources Faculty: Jaime Carbonell, Alon Lavie, Lori Levin, Ralf Brown, Robert Frederking Students

AVENUE/LETRAS 75

Error Information Elicitation

Refinement Operation Typology

Automatic Rule Adaptation

Change word orderSL: Gaudí was a great artist

MT system output:TL: Gaudí era un artista grande

Ucorrection: *Gaudí era un artista grande Gaudí era un gran artista

A concrete example

clue word

error

correction

Page 76: AVENUE/LETRAS: Learning-based MT for Languages with Limited Resources Faculty: Jaime Carbonell, Alon Lavie, Lori Levin, Ralf Brown, Robert Frederking Students

76

Finding Triggering Feature(s): (error word, corrected word) =

need to postulate a new binary feature: feat1

Blame assignment (from MT system output)

tree: <((S,1 (NP,2 (N,5:1 "GAUDI") )

(VP,3 (VB,2 (AUX,17:2 "ERA") )

(NP,8 (DET,0:3 "UN")

(N,4:5 "ARTISTA")

(ADJ,5:4 "GRANDE") ) ) ) )>

Automatic Rule Adaptation

S,1

NP,1

NP,8

…Grammar

ADJ::ADJ |: [great] -> [grande]((X1::Y1)((x0 form) = great)((y0 agr num) = sg)((y0 agr gen) = masc))

ADJ::ADJ |: [great] -> [gran]((X1::Y1)((x0 form) = great)((y0 agr num) = sg)((y0 agr gen) = masc))

Page 77: AVENUE/LETRAS: Learning-based MT for Languages with Limited Resources Faculty: Jaime Carbonell, Alon Lavie, Lori Levin, Ralf Brown, Robert Frederking Students

Mar 1, 2006 AVENUE/LETRAS 77

Refining Rules

• Bifurcate NP,8 NP,8 (R0) + NP,8’ (R1) (flip order of ADJ-N)

{NP,8’} NP::NP : [DET ADJ N] -> [DET ADJ N]( (X1::Y1) (X2::Y2) (X3::Y3)

((x0 def) = (x1 def)) (x0 = x3) ((y1 agr) = (y3 agr)) ; det-noun agreement ((y2 agr) = (y3 agr)) ; adj-noun agreement (y2 = x3) ((y2 feat1) =c + ))

Automatic Rule Adaptation

Page 78: AVENUE/LETRAS: Learning-based MT for Languages with Limited Resources Faculty: Jaime Carbonell, Alon Lavie, Lori Levin, Ralf Brown, Robert Frederking Students

Mar 1, 2006 AVENUE/LETRAS 78

Refining Lexical EntriesADJ::ADJ |: [great] -> [grande]((X1::Y1)((x0 form) = great)((y0 agr num) = sg)((y0 agr gen) = masc)((y0 feat1) = -))

ADJ::ADJ |: [great] -> [gran]((X1::Y1)((x0 form) = great)((y0 agr num) = sg)((y0 agr gen) = masc)((y0 feat1) = +))

Automatic Rule Adaptation

Page 79: AVENUE/LETRAS: Learning-based MT for Languages with Limited Resources Faculty: Jaime Carbonell, Alon Lavie, Lori Levin, Ralf Brown, Robert Frederking Students

Mar 1, 2006 AVENUE/LETRAS 79

Evaluating Improvement

Automatic Rule Adaptation

- Given the initial and final Translation Lattices, the Rule Refinement module needs to take into account, whether the following are present:- Corrected Translation Sentence- Original Translation Sentence (labelled as incorrect

by the user)

un artista gran

un gran artista

un grande artista

*un artista grande

Page 80: AVENUE/LETRAS: Learning-based MT for Languages with Limited Resources Faculty: Jaime Carbonell, Alon Lavie, Lori Levin, Ralf Brown, Robert Frederking Students

Mar 1, 2006 AVENUE/LETRAS 80

Evaluating Improvement

Automatic Rule Adaptation

- Given the initial and final Translation Lattices, the Rule Refinement module needs to take into account, whether the following are present:- Corrected Translation Sentence- Original Translation Sentence (labelled as incorrect

by the user)

*un artista gran

un gran artista

*un grande artista

*un artista grande

Page 81: AVENUE/LETRAS: Learning-based MT for Languages with Limited Resources Faculty: Jaime Carbonell, Alon Lavie, Lori Levin, Ralf Brown, Robert Frederking Students

Mar 1, 2006 AVENUE/LETRAS 81

Challenges and future work

• Credit and Blame assignment from TCTool Log Files and Xfer engine’s trace

• Order of corrections matters ~ explore rule interactions

• Explore the space between batch mode and fully interactive system

• Online TCTool always running to collect corrections from bilingual speakers make it into a game with rewards for the best users

Page 82: AVENUE/LETRAS: Learning-based MT for Languages with Limited Resources Faculty: Jaime Carbonell, Alon Lavie, Lori Levin, Ralf Brown, Robert Frederking Students

Mar 1, 2006 AVENUE/LETRAS 82

AVENUE Prototypes

• General XFER framework under development for past three years

• Prototype systems so far:– German-to-English, Dutch-to-English– Chinese-to-English– Hindi-to-English– Hebrew-to-English

• In progress or planned:– Mapudungun-to-Spanish– Quechua-to-Spanish– Native Alaskan languages (Inupiaq) to English– Native-Bolivian languages (Aymara) to Spanish– Native-Brazilian languages to Brazilian Portuguese

Page 83: AVENUE/LETRAS: Learning-based MT for Languages with Limited Resources Faculty: Jaime Carbonell, Alon Lavie, Lori Levin, Ralf Brown, Robert Frederking Students

Mar 1, 2006 AVENUE/LETRAS 83

Mapudungun

• Indigenous Language of Chile and Argentina• ~ 1 Million Mapuche Speakers

Page 84: AVENUE/LETRAS: Learning-based MT for Languages with Limited Resources Faculty: Jaime Carbonell, Alon Lavie, Lori Levin, Ralf Brown, Robert Frederking Students

Mar 1, 2006 AVENUE/LETRAS 84

Collaboration

• Mapuche Language Experts – Universidad de la Frontera (UFRO)

• Instituto de Estudios Indígenas (IEI)– Institute for Indigenous Studies

• Chilean Funding– Chilean Ministry of Education

(Mineduc)• Bilingual and Multicultural Education

Program

Eliseo Cañulef

Rosendo Huisca

Hugo Carrasco

Hector Painequeo

Flor Caniupil

Luis Caniupil Huaiquiñir

Marcela Collio Calfunao

Cristian Carrillan Anton

Salvador Cañulef

Carolina Huenchullan Arrúe

Claudio Millacura Salas

Page 85: AVENUE/LETRAS: Learning-based MT for Languages with Limited Resources Faculty: Jaime Carbonell, Alon Lavie, Lori Levin, Ralf Brown, Robert Frederking Students

Mar 1, 2006 AVENUE/LETRAS 85

Accomplishments

• Corpora Collection

– Spoken Corpus• Collected: Luis Caniupil Huaiquiñir • Medical Domain• 3 of 4 Mapudungun Dialects

– 120 hours of Nguluche– 30 hours of Lafkenche– 20 hours of Pwenche

• Transcribed in Mapudungun• Translated into Spanish

– Written Corpus• ~ 200,000 words• Bilingual Mapudungun – Spanish• Historical and newspaper text

nmlch-nmjm1_x_0405_nmjm_00:M: <SPA>no pütokovilu kay koC: no, si me lo tomaba con agua

M: chumgechi pütokoki femuechi pütokon pu <Noise> C: como se debe tomar, me lo tomé pués

nmlch-nmjm1_x_0406_nmlch_00:M: ChengewerkelafuymiürkeC: Ya no estabas como gente entonces!

Page 86: AVENUE/LETRAS: Learning-based MT for Languages with Limited Resources Faculty: Jaime Carbonell, Alon Lavie, Lori Levin, Ralf Brown, Robert Frederking Students

Mar 1, 2006 AVENUE/LETRAS 86

Accomplishments

• Developed At UFRO– Bilingual Dictionary with Examples

• 1,926 entries

– Spelling Corrected Mapudungun Word List• 117,003 fully-inflected word forms

– Segmented Word List• 15,120 forms• Stems translated into Spanish

Page 87: AVENUE/LETRAS: Learning-based MT for Languages with Limited Resources Faculty: Jaime Carbonell, Alon Lavie, Lori Levin, Ralf Brown, Robert Frederking Students

Mar 1, 2006 AVENUE/LETRAS 87

Accomplishments

• Developed at LTI using Mapudungun language resources from UFRO– Spelling Checker

• Integrated into OpenOffice

– Hand-built Morphological Analyzer– Prototype Machine Translation Systems

• Rule-Based• Example-Based

– Website: LenguasAmerindias.org

Page 88: AVENUE/LETRAS: Learning-based MT for Languages with Limited Resources Faculty: Jaime Carbonell, Alon Lavie, Lori Levin, Ralf Brown, Robert Frederking Students

Mar 1, 2006 AVENUE/LETRAS 88

QuechuaSpanish MT

• V-Unit: funded Summer project in Cusco (Peru) June-August 2005 [preparations and data collection started earlier]

• Intensive Quechua course in Centro Bartolome de las Casas (CBC)

• Worked together with two Quechua native and one non-native speakers on developing infrastructure (correcting elicited translations, segmenting and translating list of most frequent words)

Page 89: AVENUE/LETRAS: Learning-based MT for Languages with Limited Resources Faculty: Jaime Carbonell, Alon Lavie, Lori Levin, Ralf Brown, Robert Frederking Students

Mar 1, 2006 AVENUE/LETRAS 89

Quechua Spanish Prototype MT System

• Stem Lexicon (semi-automatically generated): 753 lexical entries

• Suffix lexicon: 21 suffixes – (150 Cusihuaman)

• Quechua morphology analyzer• 25 translation rules• Spanish morphology generation

module• User-Studies: 10 sentences, 3

users (2 native, 1 non-native)

Page 90: AVENUE/LETRAS: Learning-based MT for Languages with Limited Resources Faculty: Jaime Carbonell, Alon Lavie, Lori Levin, Ralf Brown, Robert Frederking Students

Mar 1, 2006 AVENUE/LETRAS 90

Challenges for Hebrew MT

• Paucity in existing language resources for Hebrew– No publicly available broad coverage morphological

analyzer– No publicly available bilingual lexicons or dictionaries– No POS-tagged corpus or parse tree-bank corpus for

Hebrew– No large Hebrew/English parallel corpus

• Scenario well suited for CMU transfer-based MT framework for languages with limited resources

Page 91: AVENUE/LETRAS: Learning-based MT for Languages with Limited Resources Faculty: Jaime Carbonell, Alon Lavie, Lori Levin, Ralf Brown, Robert Frederking Students

Mar 1, 2006 AVENUE/LETRAS 91

Hebrew Morphology Example

• Input word: B$WRH

0 1 2 3 4

|--------B$WRH--------|

|-----B-----|$WR|--H--|

|--B--|-H--|--$WRH---|

Page 92: AVENUE/LETRAS: Learning-based MT for Languages with Limited Resources Faculty: Jaime Carbonell, Alon Lavie, Lori Levin, Ralf Brown, Robert Frederking Students

Mar 1, 2006 AVENUE/LETRAS 92

Hebrew Morphology Example

Y0: ((SPANSTART 0) Y1: ((SPANSTART 0) Y2: ((SPANSTART 1) (SPANEND 4) (SPANEND 2) (SPANEND 3) (LEX B$WRH) (LEX B) (LEX $WR) (POS N) (POS PREP)) (POS N) (GEN F) (GEN M) (NUM S) (NUM S) (STATUS ABSOLUTE)) (STATUS ABSOLUTE))

Y3: ((SPANSTART 3) Y4: ((SPANSTART 0) Y5: ((SPANSTART 1) (SPANEND 4) (SPANEND 1) (SPANEND 2) (LEX $LH) (LEX B) (LEX H) (POS POSS)) (POS PREP)) (POS DET))

Y6: ((SPANSTART 2) Y7: ((SPANSTART 0) (SPANEND 4) (SPANEND 4) (LEX $WRH) (LEX B$WRH) (POS N) (POS LEX)) (GEN F) (NUM S) (STATUS ABSOLUTE))

Page 93: AVENUE/LETRAS: Learning-based MT for Languages with Limited Resources Faculty: Jaime Carbonell, Alon Lavie, Lori Levin, Ralf Brown, Robert Frederking Students

Mar 1, 2006 AVENUE/LETRAS 93

Sample Output (dev-data)

maxwell anurpung comes from ghana for israel four years ago and since worked in cleaning in hotels in eilat

a few weeks ago announced if management club hotel that for him to leave israel according to the government instructions and immigration police

in a letter in broken english which spread among the foreign workers thanks to them hotel for their hard work and announced that will purchase for hm flight tickets for their countries from their money

Page 94: AVENUE/LETRAS: Learning-based MT for Languages with Limited Resources Faculty: Jaime Carbonell, Alon Lavie, Lori Levin, Ralf Brown, Robert Frederking Students

Mar 1, 2006 AVENUE/LETRAS 94

Future Research Directions

• Automatic Transfer Rule Learning:– In the “large-data” scenario: from large volumes of

uncontrolled parallel text automatically word-aligned– In the absence of morphology or POS annotated lexica– Learning mappings for non-compositional structures– Effective models for rule scoring for

• Decoding: using scores at runtime• Pruning the large collections of learned rules

– Learning Unification Constraints

• Integrated Xfer Engine and Decoder– Improved models for scoring tree-to-tree mappings,

integration with LM and other knowledge sources in the course of the search

Page 95: AVENUE/LETRAS: Learning-based MT for Languages with Limited Resources Faculty: Jaime Carbonell, Alon Lavie, Lori Levin, Ralf Brown, Robert Frederking Students

Mar 1, 2006 AVENUE/LETRAS 95

Future Research Directions

• Automatic Rule Refinement

• Morphology Learning

• Feature Detection and Corpus Navigation

• Prototypes for New Languages

Page 96: AVENUE/LETRAS: Learning-based MT for Languages with Limited Resources Faculty: Jaime Carbonell, Alon Lavie, Lori Levin, Ralf Brown, Robert Frederking Students

Mar 1, 2006 AVENUE/LETRAS 96

Publications• 2005, Carbonell, J. G., A. Lavie

, L. Levin and A. Black, "Language Technologies for Humanitarian Aid". In Technology for Humanitarian Action, K. M. Cahill (ed.), pp. 111-138, Fordham University Press, ISBN 0-8232-2393-0, 2005.

• 2005. Font Llitjós, A., R. Aranovich and L. Levin. "Building Machine translation systems for indigenous languages". Second Conference on the Indigenous Languages of Latin America (CILLA II), 27-29 October 2005, Texas, USA.

• 2005, Font-Llitjos, A., J.G. Carbonell and A. Lavie. "A Framework for Interactive and Automatic Refinement of Transfer-based Machine Translation" . In Proceedings of the 10th Annual Conference of the European Association for Machine Translation (EAMT-2005), Budapest, Hungary, May 2005.

• 2004, Lavie, A., S. Wintner, Y. Eytani, E. Peterson and K. Probst. "Rapid Prototyping of a Transfer-based Hebrew-to-English Machine Translation System". In Proceedings of the 10th International Conference on Theoretical and Methodological Issues in Machine Translation (TMI-2004), Baltimore, MD, October 2004. Pages 1-10.

• 2004, Probst, K. and A. Lavie. "A Structurally Diverse Minimal Corpus for Eliciting Structural Mappings between Languages". In Proceedings of the 6th Conference of the Association for Machine Translation in the Americas (AMTA-2004), Washington, DC, September 2004.

Page 97: AVENUE/LETRAS: Learning-based MT for Languages with Limited Resources Faculty: Jaime Carbonell, Alon Lavie, Lori Levin, Ralf Brown, Robert Frederking Students

Mar 1, 2006 AVENUE/LETRAS 97

Publications• 2004. Font Llitjós, A., K. Probst and J.G. Carbonell .

"Error Analysis of Two Types of Grammar for the Purpose of Automatic Rule Refinement". In Proceedings of the 6th Conference of the Association for Machine Translation in the Americas (AMTA-2004), Washington, DC, September 2004.

• 2004, Monson, C., A. Lavie, J. Carbonell and L. Levin "Unsupervised Induction of Natural Language Morphology Inflection Classes". In Proceedings of Workshop on Current Themes in Computational Phonology and Morphology at the 42th Annual Meeting of the Association of Computational Linguistics (ACL-2004), Barcelona, Spain, July 2004.

• 2004, Monson, C., L. Levin, R. Vega, R. Brown, A. Font Llitjos, A. Lavie, J. Carbonell, E. Cañulef, R. Huisca. "Data Collection and Analysis of Mapudungun Morphology for Spelling Correction". In Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC-2004), Lisbon, Portugal, May 2004.

• 2004. Font Llitjós, A. and J.G. Carbonell . "The Translation Correction Tool: English-Spanish user studies“. In Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC-2004). Lisbon, Portugal, May 2004.

• 2004, Lavie, A., K. Probst, E. Peterson, S. Vogel, L.Levin, A. Font-Llitjos and J. Carbonell. "A Trainable Transfer-based Machine Translation Approach for Languages with Limited Resources". In Proceedings of Workshop of the European Association for Machine Translation (EAMT-2004), Valletta, Malta, April 2004.

Page 98: AVENUE/LETRAS: Learning-based MT for Languages with Limited Resources Faculty: Jaime Carbonell, Alon Lavie, Lori Levin, Ralf Brown, Robert Frederking Students

Mar 1, 2006 AVENUE/LETRAS 98

Publications• 2003, Lavie, A., S. Vogel, L. Levin, E. Peterson, K. Probst, A. Font Llitjos

, R. Reynolds, J. Carbonell, and R. Cohen, "Experiments with a Hindi-to-English Transfer-based MT System under a Miserly Data Scenario". ACM Transactions on Asian Language Information Processing (TALIP), 2(2). June 2003. Pages 143-163.

• 2002, Probst, K., L. Levin, E. Peterson, A. Lavie, and J. Carbonell, "MT for Minority Languages Using Elicitation-Based Learning of Syntactic Transfer Rules". Machine Translation, 17(4). Pages 245-270.

• 2002, Carbonell, J., K. Probst, E. Peterson, C. Monson, A. Lavie, R. Brown and L. Levin. "Automatic Rule Learning for Resource Limited MT". In Proceedings of 5th Conference of the Association for Machine Translation in the Americas (AMTA-2002), Tiburon, CA, October 2002.

• 2002, Levin, L., R. Vega, J. Carbonell, R. Brown, A. Lavie, E. Canulef and C. Huenchullan. "Data Collection and Language Technologies for Mapudungun". In Proceedings of International Workshop on Resources and Tools in Field Linguistics at the Third International Conference on Language Resources and Evaluation (LREC-2002), Las Palmas, Canary Islands, Spain, June 2002.

• 2001, Probst, K., R. Brown, J. Carbonell, A. Lavie, L. Levin, and E. Peterson. "Design and Implementation of Controlled Elicitation for Machine Translation of Low-density Languages". In Proceedings of the MT-2010 Workshop at MT-Summit VIII, Santiago de Compostela, Spain, September 2001. 

Page 99: AVENUE/LETRAS: Learning-based MT for Languages with Limited Resources Faculty: Jaime Carbonell, Alon Lavie, Lori Levin, Ralf Brown, Robert Frederking Students

Mar 1, 2006 AVENUE/LETRAS 99

Mapudungun-to-Spanish Example

Mapudungun

pelafiñ Maria

Spanish

No vi a María

English

I didn’t see Maria

Page 100: AVENUE/LETRAS: Learning-based MT for Languages with Limited Resources Faculty: Jaime Carbonell, Alon Lavie, Lori Levin, Ralf Brown, Robert Frederking Students

Mar 1, 2006 AVENUE/LETRAS 100

Mapudungun-to-Spanish Example

Mapudungun

pelafiñ Mariape -la -fi -ñ Mariasee -neg -3.obj -1.subj.indicative Maria

Spanish

No vi a MaríaNo vi a Maríaneg see.1.subj.past.indicative acc Maria

English

I didn’t see Maria

Page 101: AVENUE/LETRAS: Learning-based MT for Languages with Limited Resources Faculty: Jaime Carbonell, Alon Lavie, Lori Levin, Ralf Brown, Robert Frederking Students

Mar 1, 2006 AVENUE/LETRAS 101

V

pe

pe-la-fi-ñ Maria

Page 102: AVENUE/LETRAS: Learning-based MT for Languages with Limited Resources Faculty: Jaime Carbonell, Alon Lavie, Lori Levin, Ralf Brown, Robert Frederking Students

Mar 1, 2006 AVENUE/LETRAS 102

V

pe

pe-la-fi-ñ Maria

VSuff

laNegation = +

Page 103: AVENUE/LETRAS: Learning-based MT for Languages with Limited Resources Faculty: Jaime Carbonell, Alon Lavie, Lori Levin, Ralf Brown, Robert Frederking Students

Mar 1, 2006 AVENUE/LETRAS 103

V

pe

pe-la-fi-ñ Maria

VSuff

la

VSuffGPass all features up

Page 104: AVENUE/LETRAS: Learning-based MT for Languages with Limited Resources Faculty: Jaime Carbonell, Alon Lavie, Lori Levin, Ralf Brown, Robert Frederking Students

Mar 1, 2006 AVENUE/LETRAS 104

V

pe

pe-la-fi-ñ Maria

VSuff

la

VSuffG VSuff

fiobject person = 3

Page 105: AVENUE/LETRAS: Learning-based MT for Languages with Limited Resources Faculty: Jaime Carbonell, Alon Lavie, Lori Levin, Ralf Brown, Robert Frederking Students

Mar 1, 2006 AVENUE/LETRAS 105

V

pe

pe-la-fi-ñ Maria

VSuff

la

VSuffG VSuff

fi

VSuffGPass all features up from both children

Page 106: AVENUE/LETRAS: Learning-based MT for Languages with Limited Resources Faculty: Jaime Carbonell, Alon Lavie, Lori Levin, Ralf Brown, Robert Frederking Students

Mar 1, 2006 AVENUE/LETRAS 106

V

pe

pe-la-fi-ñ Maria

VSuff

la

VSuffG VSuff

fi

VSuffG VSuff

ñ

person = 1number = sgmood = ind

Page 107: AVENUE/LETRAS: Learning-based MT for Languages with Limited Resources Faculty: Jaime Carbonell, Alon Lavie, Lori Levin, Ralf Brown, Robert Frederking Students

Mar 1, 2006 AVENUE/LETRAS 107

V

pe

pe-la-fi-ñ Maria

VSuff

la

VSuffG VSuff

fi

VSuffG VSuff

ñ

Pass all features up from both children

VSuffG

Page 108: AVENUE/LETRAS: Learning-based MT for Languages with Limited Resources Faculty: Jaime Carbonell, Alon Lavie, Lori Levin, Ralf Brown, Robert Frederking Students

Mar 1, 2006 AVENUE/LETRAS 108

V

V

pe

pe-la-fi-ñ Maria

VSuff

la

VSuffG VSuff

fi

VSuffG VSuff

ñ

Pass all features up from both children

VSuffGCheck that:1) negation = +2) tense is undefined

Page 109: AVENUE/LETRAS: Learning-based MT for Languages with Limited Resources Faculty: Jaime Carbonell, Alon Lavie, Lori Levin, Ralf Brown, Robert Frederking Students

Mar 1, 2006 AVENUE/LETRAS 109

V

pe

pe-la-fi-ñ Maria

VSuff

la

VSuffG VSuff

fi

VSuffG VSuff

ñ

VSuffG

V NP

N

Maria

N person = 3number = sghuman = +

Page 110: AVENUE/LETRAS: Learning-based MT for Languages with Limited Resources Faculty: Jaime Carbonell, Alon Lavie, Lori Levin, Ralf Brown, Robert Frederking Students

Mar 1, 2006 AVENUE/LETRAS 110

Pass features up from

V

pe

pe-la-fi-ñ Maria

VSuff

la

VSuffG VSuff

fi

VSuffG VSuff

ñ

VSuffG

NP

N

Maria

N

S

V

Check that NP is human = +V VP

Page 111: AVENUE/LETRAS: Learning-based MT for Languages with Limited Resources Faculty: Jaime Carbonell, Alon Lavie, Lori Levin, Ralf Brown, Robert Frederking Students

Mar 1, 2006 AVENUE/LETRAS 111

V

pe

Transfer to Spanish: Top-Down

VSuff

la

VSuffG VSuff

fi

VSuffG VSuff

ñ

VSuffG

NP

N

Maria

N

S

V

VP

S

VP

Page 112: AVENUE/LETRAS: Learning-based MT for Languages with Limited Resources Faculty: Jaime Carbonell, Alon Lavie, Lori Levin, Ralf Brown, Robert Frederking Students

Mar 1, 2006 AVENUE/LETRAS 112

V

pe

Transfer to Spanish: Top-Down

VSuff

la

VSuffG VSuff

fi

VSuffG VSuff

ñ

VSuffG

NP

N

Maria

N

S

V

VP

S

VP

NP“a”V

Pass all features to Spanish side

Page 113: AVENUE/LETRAS: Learning-based MT for Languages with Limited Resources Faculty: Jaime Carbonell, Alon Lavie, Lori Levin, Ralf Brown, Robert Frederking Students

Mar 1, 2006 AVENUE/LETRAS 113

V

pe

Transfer to Spanish: Top-Down

VSuff

la

VSuffG VSuff

fi

VSuffG VSuff

ñ

VSuffG

NP

N

Maria

N

S

V

VP

S

VP

NP“a”V

Pass all features down

Page 114: AVENUE/LETRAS: Learning-based MT for Languages with Limited Resources Faculty: Jaime Carbonell, Alon Lavie, Lori Levin, Ralf Brown, Robert Frederking Students

Mar 1, 2006 AVENUE/LETRAS 114

V

pe

Transfer to Spanish: Top-Down

VSuff

la

VSuffG VSuff

fi

VSuffG VSuff

ñ

VSuffG

NP

N

Maria

N

S

V

VP

S

VP

NP“a”V

Pass object features down

Page 115: AVENUE/LETRAS: Learning-based MT for Languages with Limited Resources Faculty: Jaime Carbonell, Alon Lavie, Lori Levin, Ralf Brown, Robert Frederking Students

Mar 1, 2006 AVENUE/LETRAS 115

V

pe

Transfer to Spanish: Top-Down

VSuff

la

VSuffG VSuff

fi

VSuffG VSuff

ñ

VSuffG

NP

N

Maria

N

S

V

VP

S

VP

NP“a”V

Accusative marker on objects is introduced because human = +

Page 116: AVENUE/LETRAS: Learning-based MT for Languages with Limited Resources Faculty: Jaime Carbonell, Alon Lavie, Lori Levin, Ralf Brown, Robert Frederking Students

Mar 1, 2006 AVENUE/LETRAS 116

V

pe

Transfer to Spanish: Top-Down

VSuff

la

VSuffG VSuff

fi

VSuffG VSuff

ñ

VSuffG

NP

N

Maria

N

S

V

VP

S

VP

NP“a”V

VP::VP [VBar NP] -> [VBar "a" NP]( (X1::Y1)

(X2::Y3)

((X2 type) = (*NOT* personal)) ((X2 human) =c +)

(X0 = X1) ((X0 object) = X2)

(Y0 = X0)

((Y0 object) = (X0 object))(Y1 = Y0)(Y3 = (Y0 object))((Y1 objmarker person) = (Y3 person))((Y1 objmarker number) = (Y3 number))((Y1 objmarker gender) = (Y3 ender)))

Page 117: AVENUE/LETRAS: Learning-based MT for Languages with Limited Resources Faculty: Jaime Carbonell, Alon Lavie, Lori Levin, Ralf Brown, Robert Frederking Students

Mar 1, 2006 AVENUE/LETRAS 117

V

pe

Transfer to Spanish: Top-Down

VSuff

la

VSuffG VSuff

fi

VSuffG VSuff

ñ

VSuffG

NP

N

Maria

N

S

V

VP

S

VP

NP“a”V

V“no”

Pass person, number, and mood features to Spanish Verb

Assign tense = past

Page 118: AVENUE/LETRAS: Learning-based MT for Languages with Limited Resources Faculty: Jaime Carbonell, Alon Lavie, Lori Levin, Ralf Brown, Robert Frederking Students

Mar 1, 2006 AVENUE/LETRAS 118

V

pe

Transfer to Spanish: Top-Down

VSuff

la

VSuffG VSuff

fi

VSuffG VSuff

ñ

VSuffG

NP

N

Maria

N

S

V

VP

S

VP

NP“a”V

V“no”

Introduced because negation = +

Page 119: AVENUE/LETRAS: Learning-based MT for Languages with Limited Resources Faculty: Jaime Carbonell, Alon Lavie, Lori Levin, Ralf Brown, Robert Frederking Students

Mar 1, 2006 AVENUE/LETRAS 119

V

pe

Transfer to Spanish: Top-Down

VSuff

la

VSuffG VSuff

fi

VSuffG VSuff

ñ

VSuffG

NP

N

Maria

N

S

V

VP

S

VP

NP“a”V

V“no”

ver

Page 120: AVENUE/LETRAS: Learning-based MT for Languages with Limited Resources Faculty: Jaime Carbonell, Alon Lavie, Lori Levin, Ralf Brown, Robert Frederking Students

Mar 1, 2006 AVENUE/LETRAS 120

V

pe

Transfer to Spanish: Top-Down

VSuff

la

VSuffG VSuff

fi

VSuffG VSuff

ñ

VSuffG

NP

N

Maria

N

S

V

VP

S

VP

NP“a”V

V“no”

vervi

person = 1number = sgmood = indicativetense = past

Page 121: AVENUE/LETRAS: Learning-based MT for Languages with Limited Resources Faculty: Jaime Carbonell, Alon Lavie, Lori Levin, Ralf Brown, Robert Frederking Students

Mar 1, 2006 AVENUE/LETRAS 121

V

pe

Transfer to Spanish: Top-Down

VSuff

la

VSuffG VSuff

fi

VSuffG VSuff

ñ

VSuffG

NP

N

Maria

N

S

V

VP

S

VP

NP“a”V

V“no”

vi N

María

N

Pass features over to Spanish side

Page 122: AVENUE/LETRAS: Learning-based MT for Languages with Limited Resources Faculty: Jaime Carbonell, Alon Lavie, Lori Levin, Ralf Brown, Robert Frederking Students

Mar 1, 2006 AVENUE/LETRAS 122

V

pe

I Didn’t see Maria

VSuff

la

VSuffG VSuff

fi

VSuffG VSuff

ñ

VSuffG

NP

N

Maria

N

S

V

VP

S

VP

NP“a”V

V“no”

vi N

María

N

Page 123: AVENUE/LETRAS: Learning-based MT for Languages with Limited Resources Faculty: Jaime Carbonell, Alon Lavie, Lori Levin, Ralf Brown, Robert Frederking Students

Mar 1, 2006 AVENUE/LETRAS 123