Download ppt - A Trainable Transfer-based MT Approach for Languages with Limited Resources Alon Lavie Language Technologies Institute Carnegie Mellon University Joint

A Trainable Transfer-based MT Approach for Languages with Limited Resources

Alon LavieLanguage Technologies Institute

Carnegie Mellon University

Joint Work with: Lori Levin, Jaime Carbonell, Katharina Probst, Erik Peterson, Stephan Vogel and Ariadna Font-Llitjos

26 April, 2004 EAMT Meeting/ Malta 2

Why Machine Translation for Languages with Limited Resources?

• Commercial MT economically feasible for only a handful of major languages with large resources (corpora, human developers)

• Statistical MT looks promising – but requires very large volumes of parallel texts

• Is there hope for MT for languages with limited electronic data resources?

• Benefits include:– Better government information access to indigenous

communities– Better indigenous communities participation in

information-rich activities (health care, education, government) without giving up their languages.

– Civilian and military applications (disaster relief)– Language preservation


MT for Languages with Limited Resources: Challenges

• Minimal amount of parallel text• Possibly lack of standards for

orthography/spelling• Often relatively few trained linguists• Access to native bilingual informants possible• No real economic incentive, Limited financial

resources for developing MT– Need to minimize development time and cost


AVENUE PartnersLanguage Country Institutions

Mapudungun (in place)

Chile Universidad de la Frontera, Institute for Indigenous Studies, Ministry of Education

Quechua(discussion)

Peru Ministry of Education

Aymara(discussion)

Bolivia, Peru Ministry of Education


AVENUE: Two Technical Approaches

• Generalized EBMT• Parallel text 50K-

2MB (uncontrolled corpus)

• Rapid implementation

• Proven for major L’s with reduced data

• Transfer-rule learning

• Elicitation (controlled) corpus to extract grammatical properties

• Seeded version-space learning


Transfer with Strong Decoding

Learning Module

Transfer Rules

{PP,4894};;Score:0.0470PP::PP [NP POSTP] -> [PREP NP]((X2::Y1)(X1::Y2))

Translation Lexicon

Run Time Transfer System

Lattice Decoder

English Language Model

Word-to-Word Translation Probabilities

Word-aligned elicited data

SL input

TL output


Learning Transfer-Rules for Languages with Limited Resources

• Rationale:– Large bilingual corpora not available– Bilingual native informant(s) can translate and align a

small pre-designed elicitation corpus, using elicitation tool– Elicitation corpus designed to be typologically

comprehensive and compositional– Transfer-rule engine and new learning approach support

acquisition of generalized transfer-rules from the data


English-Hindi Example


Spanish-Mapudungun Example


English-Arabic Example


The Elicitation Corpus

• Translated, aligned by bilingual informant• Rich information about the sentences elicited • Corpus consists of linguistically diverse constructions• Based on elicitation and documentation work of field

linguists (e.g. Comrie 1977, Bouquiaux 1992)• Organized compositionally: elicit simple structures first,

then use them as building blocks• Goal: minimize size, maximize linguistic coverage• Typological EC currently of about ~1000 sentences• Work in progress:

– Feature Detection– Navigation control through the corpus during elicitation– Extensions to phenomena not currently covered– Experimenting with alternative types of elicited data


Transfer Rule Formalism

Type informationPart-of-speech/constituent

informationAlignments

x-side constraints

y-side constraints

xy-constraints, e.g. ((Y1 AGR) = (X1 AGR))

;SL: the old man, TL: ha-ish ha-zaqen

NP::NP [DET ADJ N] -> [DET N DET ADJ]((X1::Y1)(X1::Y3)(X2::Y4)(X3::Y2)

((X1 AGR) = *3-SING)((X1 DEF = *DEF)((X3 AGR) = *3-SING)((X3 COUNT) = +)

((Y1 DEF) = *DEF)((Y3 DEF) = *DEF)((Y2 AGR) = *3-SING)((Y2 GENDER) = (Y4 GENDER)))


Transfer Rule Formalism (II)

Value constraints

Agreement constraints

;SL: the old man, TL: ha-ish ha-zaqen

NP::NP [DET ADJ N] -> [DET N DET ADJ]((X1::Y1)(X1::Y3)(X2::Y4)(X3::Y2)

((X1 AGR) = *3-SING)((X1 DEF = *DEF)((X3 AGR) = *3-SING)((X3 COUNT) = +)

((Y1 DEF) = *DEF)((Y3 DEF) = *DEF)((Y2 AGR) = *3-SING)((Y2 GENDER) = (Y4 GENDER)))


The Transfer EngineAnalysis

Source text is parsed into its grammatical structure. Determines transfer application ordering.

Example:

他看书。 (he read book)

S

NP VP

N V NP

他看书

TransferA target language tree is created by reordering, insertion, and deletion.

S

NP VP

N V NP

he read DET N

a book

Article “a” is inserted into object NP. Source words translated with transfer lexicon.

GenerationTarget language constraints are checked and final translation produced.

E.g. “reads” is chosen over “read” to agree with “he”.

Final translation:

“He reads a book”


Rule Learning - Overview

• Goal: Acquire Syntactic Transfer Rules• Use available knowledge from the source

side (grammatical structure)• Three steps:

1. Flat Seed Generation: first guesses at transfer rules; flat syntactic structure

2. Compositionality: use previously learned rules to add hierarchical structure

3. Seeded Version Space Learning: refine rules by learning appropriate feature constraints


Flat Seed Rule Generation

Learning Example: NP

Eng: the big apple

Heb: ha-tapuax ha-gadol

Generated Seed Rule:

NP::NP [ART ADJ N] [ART N ART ADJ]

((X1::Y1)

(X1::Y3)

(X2::Y4)

(X3::Y2))


CompositionalityInitial Flat Rules: S::S [ART ADJ N V ART N] [ART N ART ADJ V P ART N]

((X1::Y1) (X1::Y3) (X2::Y4) (X3::Y2) (X4::Y5) (X5::Y7) (X6::Y8))

NP::NP [ART ADJ N] [ART N ART ADJ]

((X1::Y1) (X1::Y3) (X2::Y4) (X3::Y2))

NP::NP [ART N] [ART N]

((X1::Y1) (X2::Y2))

Generated Compositional Rule:

S::S [NP V NP] [NP V P NP]

((X1::Y1) (X2::Y2) (X3::Y4))


Compositionality - Overview

• Traverse the c-structure of the English sentence, add compositional structure for translatable chunks

• Adjust constituent sequences, alignments in the transfer rule


Seeded Version Space LearningInput: Rules and their Example Sets

S::S [NP V NP] [NP V P NP] {ex1,ex12,ex17,ex26}

((X1::Y1) (X2::Y2) (X3::Y4))

NP::NP [ART ADJ N] [ART N ART ADJ] {ex2,ex3,ex13}

((X1::Y1) (X1::Y3) (X2::Y4) (X3::Y2))

NP::NP [ART N] [ART N] {ex4,ex5,ex6,ex8,ex10,ex11}

((X1::Y1) (X2::Y2))

Output: Rules with Feature Constraints:

S::S [NP V NP] [NP V P NP]

((X1::Y1) (X2::Y2) (X3::Y4)

(X1 NUM = X2 NUM)

(Y1 NUM = Y2 NUM)

(X1 NUM = Y1 NUM))


Seeded Version Space Learning: Overview

• Goal: add appropriate feature constraints to the acquired rules

• Methodology:– Preserve general structural transfer– Learn specific feature constraints from example set

• Seed rules are grouped into clusters of similar transfer structure (type, constituent sequences, alignments)

• Each cluster forms a version space: a partially ordered hypothesis space with a specific and a general boundary

• The seed rules in a group form the specific boundary of a version space

• The general boundary is the (implicit) transfer rule with the same type, constituent sequences, and alignments, but no feature constraints


Examples of Automatically Learned Rules (Hindi-to-English)

{NP,14244}

;;Score:0.0429

NP::NP [N] -> [DET N]

(

(X1::Y2)

)

{NP,14434}

;;Score:0.0040

NP::NP [ADJ CONJ ADJ N] ->

[ADJ CONJ ADJ N]

(

(X1::Y1) (X2::Y2)

(X3::Y3) (X4::Y4)

)

{PP,4894};;Score:0.0470PP::PP [NP POSTP] -> [PREP NP]((X2::Y1)(X1::Y2))


Manual Transfer Rules: Hindi Example

;; PASSIVE OF SIMPLE PAST (NO AUX) WITH LIGHT VERB;; passive of 43 (7b){VP,28}VP::VP : [V V V] -> [Aux V]( (X1::Y2) ((x1 form) = root) ((x2 type) =c light) ((x2 form) = part) ((x2 aspect) = perf) ((x3 lexwx) = 'jAnA') ((x3 form) = part) ((x3 aspect) = perf) (x0 = x1) ((y1 lex) = be) ((y1 tense) = past) ((y1 agr num) = (x3 agr num)) ((y1 agr pers) = (x3 agr pers)) ((y2 form) = part))


Manual Transfer Rules: Example

; NP1 ke NP2 -> NP2 of NP1; Ex: jIvana ke eka aXyAya; life of (one) chapter ; ==> a chapter of life;{NP,12}NP::NP : [PP NP1] -> [NP1 PP]( (X1::Y2) (X2::Y1); ((x2 lexwx) = 'kA'))

{NP,13}NP::NP : [NP1] -> [NP1]( (X1::Y1))

{PP,12}PP::PP : [NP Postp] -> [Prep NP]( (X1::Y2) (X2::Y1))

NP

PP NP1

NP P Adj N

N1 ke eka aXyAya

N

jIvana

NP

NP1 PP

Adj N P NP

one chapter of N1

N

life


A Limited Data Scenario for Hindi-to-English

• Conducted during a DARPA “Surprise Language Exercise” (SLE) in June 2003

• Put together a scenario with “miserly” data resources:– Elicited Data corpus: 17589 phrases– Cleaned portion (top 12%) of LDC dictionary: ~2725

Hindi words (23612 translation pairs)– Manually acquired resources during the SLE:

• 500 manual bigram translations• 72 manually written phrase transfer rules• 105 manually written postposition rules• 48 manually written time expression rules

• No additional parallel text!!


Manual Grammar Development

• Covers mostly NPs, PPs and VPs (verb complexes)

• ~70 grammar rules, covering basic and recursive NPs and PPs, verb complexes of main tenses in Hindi (developed in two weeks)


Adding a “Strong” Decoder

• XFER system produces a full lattice of translation fragments, ranging from single words to long phrases or sentences

• Edges are scored using word-to-word translation probabilities, trained from the limited bilingual data

• Decoder uses an English LM (70m words)• Decoder can also reorder words or phrases (up

to 4 positions ahead)• For XFER(strong) , ONLY edges from basic XFER

system are used!


Testing Conditions

• Tested on section of JHU provided data: 258 sentences with four reference translations– SMT system (stand-alone)– EBMT system (stand-alone)– XFER system (naïve decoding)– XFER system with “strong” decoder

• No grammar rules (baseline)• Manually developed grammar rules• Automatically learned grammar rules

– XFER+SMT with strong decoder (MEMT)


Automatic MT Evaluation Metrics

• Intend to replace or complement human assessment of translation quality of MT produced translation

• Principle idea: compare how similar is the MT produced translation with human reference translation(s) of the same input

• Main metric in use today: IBM’s BLEU– Count n-gram (unigrams, bigrams, trigrams, etc)

overlap between the MT output and several reference translations

– Calculate a combined n-gram precision score• NIST variant of BLEU used for official DARPA

evaluations


Results on JHU Test Set

System BLEU M-BLEU NIST

EBMT 0.058 0.165 4.22

SMT 0.093 0.191 4.64

XFER (naïve) man grammar

0.055 0.177 4.46

XFER (strong)

no grammar0.109 0.224 5.29

XFER (strong) learned grammar

0.116 0.231 5.37

XFER (strong) man grammar

0.135 0.243 5.59

XFER+SMT 0.136 0.243 5.65


Effect of Reordering in the Decoder

NIST vs. Reordering

4.8

4.9

5

5.1

5.2

5.3

5.4

5.5

5.6

5.7

0 1 2 3 4

reordering window

NIS

T s

core no grammar

learned grammar

manual grammar

MEMT: SFXER+ SMT


Observations and Lessons (I)• XFER with strong decoder outperformed SMT even

without any grammar rules in the miserly data scenario– SMT Trained on elicited phrases that are very short– SMT has insufficient data to train more discriminative

translation probabilities– XFER takes advantage of Morphology

• Token coverage without morphology: 0.6989• Token coverage with morphology: 0.7892

• Manual grammar currently somewhat better than automatically learned grammar– Learned rules did not yet use version-space learning– Large room for improvement on learning rules – Importance of effective well-founded scoring of learned rules


Observations and Lessons (II)

• MEMT (XFER and SMT) based on strong decoder produced best results in the miserly scenario.

• Reordering within the decoder provided very significant score improvements– Much room for more sophisticated grammar rules– Strong decoder can carry some of the reordering

“burden”


XFER MT for Hebrew-to-English• Two month intensive effort to apply our XFER approach

to the development of a Hebrew-to-English MT system• Challenges:

– No large parallel corpus– Limited coverage translation lexicon– Rich Morphology: incomplete analyzer available

• Plan:– Collect available resources, establish methodology for

processing Hebrew input– Translate and align Elicitation Corpus– Learn XFER rules– Develop (small) manual XFER grammar as a point of

comparison– System debugging and development– Evaluate performance on unseen test data using automatic

evaluation metrics


Hebrew-to-English XFER System• Accomplished:

– Baseline system in place– Good lexical coverage: 24634 translation pairs– Reasonable morphological coverage– Small manual grammar: 29 rules, mostly NPs– Translated and aligned elicitation corpora– Learning of automatic grammar– Testing and development on dev-test in progress– Results on unseen data within a couple of weeks…

• Translation Example:in agreement with the interior ministry that copy fund will come to Haaretz agreed hotel homes to do all efforts to remove the african employees from Israel within days from the arrival of the new workers and to let people activities immigration police


Conclusions• Transfer rules (both manual and learned) offer

significant contributions that can outperform existing data-driven approaches– Also in medium and large data settings?

• Initial steps to development of a well-grounded transfer-based MT system with:– Translation segments that are scored based on a

well-founded probability model – Strong and effective decoding that incorporates the

most advanced techniques used in SMT decoding

• Working from the “opposite” end of research on incorporating models of syntax into “standard” SMT systems [Knight et al]

• Our direction makes sense in the limited data scenario


Future Directions• Continued work on automatic rule learning (especially

Seeded Version Space Learning)– Use Hebrew and Hindi systems as test platforms for

experimenting with advanced learning research

• Correcting and refining transfer rules by interaction with native bilingual speakers

• Developing a well-founded model for assigning scores (probabilities) to transfer rules

• Improving the strong decoder to better fit the specific characteristics of the XFER model

• Further improved MEMT with:– Combination of output from different translation engines with

different scorings– strong decoding capabilities