A Trainable Transfer-based MT Approach for Languages with Limited Resources
Alon LavieLanguage Technologies Institute
Carnegie Mellon University
Joint Work with: Lori Levin, Jaime Carbonell, Katharina Probst, Erik Peterson, Stephan Vogel and Ariadna Font-Llitjos
26 April, 2004 EAMT Meeting/ Malta 2
Why Machine Translation for Languages with Limited Resources?
• Commercial MT economically feasible for only a handful of major languages with large resources (corpora, human developers)
• Statistical MT looks promising – but requires very large volumes of parallel texts
• Is there hope for MT for languages with limited electronic data resources?
• Benefits include:– Better government information access to indigenous
communities– Better indigenous communities participation in
information-rich activities (health care, education, government) without giving up their languages.
– Civilian and military applications (disaster relief)– Language preservation
26 April, 2004 EAMT Meeting/ Malta 3
MT for Languages with Limited Resources: Challenges
• Minimal amount of parallel text• Possibly lack of standards for
orthography/spelling• Often relatively few trained linguists• Access to native bilingual informants possible• No real economic incentive, Limited financial
resources for developing MT– Need to minimize development time and cost
26 April, 2004 EAMT Meeting/ Malta 4
AVENUE PartnersLanguage Country Institutions
Mapudungun (in place)
Chile Universidad de la Frontera, Institute for Indigenous Studies, Ministry of Education
Quechua(discussion)
Peru Ministry of Education
Aymara(discussion)
Bolivia, Peru Ministry of Education
26 April, 2004 EAMT Meeting/ Malta 5
AVENUE: Two Technical Approaches
• Generalized EBMT• Parallel text 50K-
2MB (uncontrolled corpus)
• Rapid implementation
• Proven for major L’s with reduced data
• Transfer-rule learning
• Elicitation (controlled) corpus to extract grammatical properties
• Seeded version-space learning
26 April, 2004 EAMT Meeting/ Malta 6
Transfer with Strong Decoding
Learning Module
Transfer Rules
{PP,4894};;Score:0.0470PP::PP [NP POSTP] -> [PREP NP]((X2::Y1)(X1::Y2))
Translation Lexicon
Run Time Transfer System
Lattice Decoder
English Language Model
Word-to-Word Translation Probabilities
Word-aligned elicited data
SL input
TL output
26 April, 2004 EAMT Meeting/ Malta 7
Learning Transfer-Rules for Languages with Limited Resources
• Rationale:– Large bilingual corpora not available– Bilingual native informant(s) can translate and align a
small pre-designed elicitation corpus, using elicitation tool– Elicitation corpus designed to be typologically
comprehensive and compositional– Transfer-rule engine and new learning approach support
acquisition of generalized transfer-rules from the data
26 April, 2004 EAMT Meeting/ Malta 8
English-Hindi Example
26 April, 2004 EAMT Meeting/ Malta 9
Spanish-Mapudungun Example
26 April, 2004 EAMT Meeting/ Malta 10
English-Arabic Example
26 April, 2004 EAMT Meeting/ Malta 11
The Elicitation Corpus
• Translated, aligned by bilingual informant• Rich information about the sentences elicited • Corpus consists of linguistically diverse constructions• Based on elicitation and documentation work of field
linguists (e.g. Comrie 1977, Bouquiaux 1992)• Organized compositionally: elicit simple structures first,
then use them as building blocks• Goal: minimize size, maximize linguistic coverage• Typological EC currently of about ~1000 sentences• Work in progress:
– Feature Detection– Navigation control through the corpus during elicitation– Extensions to phenomena not currently covered– Experimenting with alternative types of elicited data
26 April, 2004 EAMT Meeting/ Malta 12
Transfer Rule Formalism
Type informationPart-of-speech/constituent
informationAlignments
x-side constraints
y-side constraints
xy-constraints, e.g. ((Y1 AGR) = (X1 AGR))
;SL: the old man, TL: ha-ish ha-zaqen
NP::NP [DET ADJ N] -> [DET N DET ADJ]((X1::Y1)(X1::Y3)(X2::Y4)(X3::Y2)
((X1 AGR) = *3-SING)((X1 DEF = *DEF)((X3 AGR) = *3-SING)((X3 COUNT) = +)
((Y1 DEF) = *DEF)((Y3 DEF) = *DEF)((Y2 AGR) = *3-SING)((Y2 GENDER) = (Y4 GENDER)))
26 April, 2004 EAMT Meeting/ Malta 13
Transfer Rule Formalism (II)
Value constraints
Agreement constraints
;SL: the old man, TL: ha-ish ha-zaqen
NP::NP [DET ADJ N] -> [DET N DET ADJ]((X1::Y1)(X1::Y3)(X2::Y4)(X3::Y2)
((X1 AGR) = *3-SING)((X1 DEF = *DEF)((X3 AGR) = *3-SING)((X3 COUNT) = +)
((Y1 DEF) = *DEF)((Y3 DEF) = *DEF)((Y2 AGR) = *3-SING)((Y2 GENDER) = (Y4 GENDER)))
26 April, 2004 EAMT Meeting/ Malta 14
The Transfer EngineAnalysis
Source text is parsed into its grammatical structure. Determines transfer application ordering.
Example:
他 看 书。 (he read book)
S
NP VP
N V NP
他 看 书
TransferA target language tree is created by reordering, insertion, and deletion.
S
NP VP
N V NP
he read DET N
a book
Article “a” is inserted into object NP. Source words translated with transfer lexicon.
GenerationTarget language constraints are checked and final translation produced.
E.g. “reads” is chosen over “read” to agree with “he”.
Final translation:
“He reads a book”
26 April, 2004 EAMT Meeting/ Malta 15
Rule Learning - Overview
• Goal: Acquire Syntactic Transfer Rules• Use available knowledge from the source
side (grammatical structure)• Three steps:
1. Flat Seed Generation: first guesses at transfer rules; flat syntactic structure
2. Compositionality: use previously learned rules to add hierarchical structure
3. Seeded Version Space Learning: refine rules by learning appropriate feature constraints
26 April, 2004 EAMT Meeting/ Malta 16
Flat Seed Rule Generation
Learning Example: NP
Eng: the big apple
Heb: ha-tapuax ha-gadol
Generated Seed Rule:
NP::NP [ART ADJ N] [ART N ART ADJ]
((X1::Y1)
(X1::Y3)
(X2::Y4)
(X3::Y2))
26 April, 2004 EAMT Meeting/ Malta 17
CompositionalityInitial Flat Rules: S::S [ART ADJ N V ART N] [ART N ART ADJ V P ART N]
((X1::Y1) (X1::Y3) (X2::Y4) (X3::Y2) (X4::Y5) (X5::Y7) (X6::Y8))
NP::NP [ART ADJ N] [ART N ART ADJ]
((X1::Y1) (X1::Y3) (X2::Y4) (X3::Y2))
NP::NP [ART N] [ART N]
((X1::Y1) (X2::Y2))
Generated Compositional Rule:
S::S [NP V NP] [NP V P NP]
((X1::Y1) (X2::Y2) (X3::Y4))
26 April, 2004 EAMT Meeting/ Malta 18
Compositionality - Overview
• Traverse the c-structure of the English sentence, add compositional structure for translatable chunks
• Adjust constituent sequences, alignments in the transfer rule
26 April, 2004 EAMT Meeting/ Malta 19
Seeded Version Space LearningInput: Rules and their Example Sets
S::S [NP V NP] [NP V P NP] {ex1,ex12,ex17,ex26}
((X1::Y1) (X2::Y2) (X3::Y4))
NP::NP [ART ADJ N] [ART N ART ADJ] {ex2,ex3,ex13}
((X1::Y1) (X1::Y3) (X2::Y4) (X3::Y2))
NP::NP [ART N] [ART N] {ex4,ex5,ex6,ex8,ex10,ex11}
((X1::Y1) (X2::Y2))
Output: Rules with Feature Constraints:
S::S [NP V NP] [NP V P NP]
((X1::Y1) (X2::Y2) (X3::Y4)
(X1 NUM = X2 NUM)
(Y1 NUM = Y2 NUM)
(X1 NUM = Y1 NUM))
26 April, 2004 EAMT Meeting/ Malta 20
Seeded Version Space Learning: Overview
• Goal: add appropriate feature constraints to the acquired rules
• Methodology:– Preserve general structural transfer– Learn specific feature constraints from example set
• Seed rules are grouped into clusters of similar transfer structure (type, constituent sequences, alignments)
• Each cluster forms a version space: a partially ordered hypothesis space with a specific and a general boundary
• The seed rules in a group form the specific boundary of a version space
• The general boundary is the (implicit) transfer rule with the same type, constituent sequences, and alignments, but no feature constraints
26 April, 2004 EAMT Meeting/ Malta 21
Examples of Automatically Learned Rules (Hindi-to-English)
{NP,14244}
;;Score:0.0429
NP::NP [N] -> [DET N]
(
(X1::Y2)
)
{NP,14434}
;;Score:0.0040
NP::NP [ADJ CONJ ADJ N] ->
[ADJ CONJ ADJ N]
(
(X1::Y1) (X2::Y2)
(X3::Y3) (X4::Y4)
)
{PP,4894};;Score:0.0470PP::PP [NP POSTP] -> [PREP NP]((X2::Y1)(X1::Y2))
26 April, 2004 EAMT Meeting/ Malta 22
Manual Transfer Rules: Hindi Example
;; PASSIVE OF SIMPLE PAST (NO AUX) WITH LIGHT VERB;; passive of 43 (7b){VP,28}VP::VP : [V V V] -> [Aux V]( (X1::Y2) ((x1 form) = root) ((x2 type) =c light) ((x2 form) = part) ((x2 aspect) = perf) ((x3 lexwx) = 'jAnA') ((x3 form) = part) ((x3 aspect) = perf) (x0 = x1) ((y1 lex) = be) ((y1 tense) = past) ((y1 agr num) = (x3 agr num)) ((y1 agr pers) = (x3 agr pers)) ((y2 form) = part))
26 April, 2004 EAMT Meeting/ Malta 23
Manual Transfer Rules: Example
; NP1 ke NP2 -> NP2 of NP1; Ex: jIvana ke eka aXyAya; life of (one) chapter ; ==> a chapter of life;{NP,12}NP::NP : [PP NP1] -> [NP1 PP]( (X1::Y2) (X2::Y1); ((x2 lexwx) = 'kA'))
{NP,13}NP::NP : [NP1] -> [NP1]( (X1::Y1))
{PP,12}PP::PP : [NP Postp] -> [Prep NP]( (X1::Y2) (X2::Y1))
NP
PP NP1
NP P Adj N
N1 ke eka aXyAya
N
jIvana
NP
NP1 PP
Adj N P NP
one chapter of N1
N
life
26 April, 2004 EAMT Meeting/ Malta 24
A Limited Data Scenario for Hindi-to-English
• Conducted during a DARPA “Surprise Language Exercise” (SLE) in June 2003
• Put together a scenario with “miserly” data resources:– Elicited Data corpus: 17589 phrases– Cleaned portion (top 12%) of LDC dictionary: ~2725
Hindi words (23612 translation pairs)– Manually acquired resources during the SLE:
• 500 manual bigram translations• 72 manually written phrase transfer rules• 105 manually written postposition rules• 48 manually written time expression rules
• No additional parallel text!!
26 April, 2004 EAMT Meeting/ Malta 25
Manual Grammar Development
• Covers mostly NPs, PPs and VPs (verb complexes)
• ~70 grammar rules, covering basic and recursive NPs and PPs, verb complexes of main tenses in Hindi (developed in two weeks)
26 April, 2004 EAMT Meeting/ Malta 26
Adding a “Strong” Decoder
• XFER system produces a full lattice of translation fragments, ranging from single words to long phrases or sentences
• Edges are scored using word-to-word translation probabilities, trained from the limited bilingual data
• Decoder uses an English LM (70m words)• Decoder can also reorder words or phrases (up
to 4 positions ahead)• For XFER(strong) , ONLY edges from basic XFER
system are used!
26 April, 2004 EAMT Meeting/ Malta 27
Testing Conditions
• Tested on section of JHU provided data: 258 sentences with four reference translations– SMT system (stand-alone)– EBMT system (stand-alone)– XFER system (naïve decoding)– XFER system with “strong” decoder
• No grammar rules (baseline)• Manually developed grammar rules• Automatically learned grammar rules
– XFER+SMT with strong decoder (MEMT)
26 April, 2004 EAMT Meeting/ Malta 28
Automatic MT Evaluation Metrics
• Intend to replace or complement human assessment of translation quality of MT produced translation
• Principle idea: compare how similar is the MT produced translation with human reference translation(s) of the same input
• Main metric in use today: IBM’s BLEU– Count n-gram (unigrams, bigrams, trigrams, etc)
overlap between the MT output and several reference translations
– Calculate a combined n-gram precision score• NIST variant of BLEU used for official DARPA
evaluations
26 April, 2004 EAMT Meeting/ Malta 29
Results on JHU Test Set
System BLEU M-BLEU NIST
EBMT 0.058 0.165 4.22
SMT 0.093 0.191 4.64
XFER (naïve) man grammar
0.055 0.177 4.46
XFER (strong)
no grammar0.109 0.224 5.29
XFER (strong) learned grammar
0.116 0.231 5.37
XFER (strong) man grammar
0.135 0.243 5.59
XFER+SMT 0.136 0.243 5.65
26 April, 2004 EAMT Meeting/ Malta 30
Effect of Reordering in the Decoder
NIST vs. Reordering
4.8
4.9
5
5.1
5.2
5.3
5.4
5.5
5.6
5.7
0 1 2 3 4
reordering window
NIS
T s
core no grammar
learned grammar
manual grammar
MEMT: SFXER+ SMT
26 April, 2004 EAMT Meeting/ Malta 31
Observations and Lessons (I)• XFER with strong decoder outperformed SMT even
without any grammar rules in the miserly data scenario– SMT Trained on elicited phrases that are very short– SMT has insufficient data to train more discriminative
translation probabilities– XFER takes advantage of Morphology
• Token coverage without morphology: 0.6989• Token coverage with morphology: 0.7892
• Manual grammar currently somewhat better than automatically learned grammar– Learned rules did not yet use version-space learning– Large room for improvement on learning rules – Importance of effective well-founded scoring of learned rules
26 April, 2004 EAMT Meeting/ Malta 32
Observations and Lessons (II)
• MEMT (XFER and SMT) based on strong decoder produced best results in the miserly scenario.
• Reordering within the decoder provided very significant score improvements– Much room for more sophisticated grammar rules– Strong decoder can carry some of the reordering
“burden”
26 April, 2004 EAMT Meeting/ Malta 33
XFER MT for Hebrew-to-English• Two month intensive effort to apply our XFER approach
to the development of a Hebrew-to-English MT system• Challenges:
– No large parallel corpus– Limited coverage translation lexicon– Rich Morphology: incomplete analyzer available
• Plan:– Collect available resources, establish methodology for
processing Hebrew input– Translate and align Elicitation Corpus– Learn XFER rules– Develop (small) manual XFER grammar as a point of
comparison– System debugging and development– Evaluate performance on unseen test data using automatic
evaluation metrics
26 April, 2004 EAMT Meeting/ Malta 34
Hebrew-to-English XFER System• Accomplished:
– Baseline system in place– Good lexical coverage: 24634 translation pairs– Reasonable morphological coverage– Small manual grammar: 29 rules, mostly NPs– Translated and aligned elicitation corpora– Learning of automatic grammar– Testing and development on dev-test in progress– Results on unseen data within a couple of weeks…
• Translation Example:in agreement with the interior ministry that copy fund will come to Haaretz agreed hotel homes to do all efforts to remove the african employees from Israel within days from the arrival of the new workers and to let people activities immigration police
26 April, 2004 EAMT Meeting/ Malta 35
Conclusions• Transfer rules (both manual and learned) offer
significant contributions that can outperform existing data-driven approaches– Also in medium and large data settings?
• Initial steps to development of a well-grounded transfer-based MT system with:– Translation segments that are scored based on a
well-founded probability model – Strong and effective decoding that incorporates the
most advanced techniques used in SMT decoding
• Working from the “opposite” end of research on incorporating models of syntax into “standard” SMT systems [Knight et al]
• Our direction makes sense in the limited data scenario
26 April, 2004 EAMT Meeting/ Malta 36
Future Directions• Continued work on automatic rule learning (especially
Seeded Version Space Learning)– Use Hebrew and Hindi systems as test platforms for
experimenting with advanced learning research
• Correcting and refining transfer rules by interaction with native bilingual speakers
• Developing a well-founded model for assigning scores (probabilities) to transfer rules
• Improving the strong decoder to better fit the specific characteristics of the XFER model
• Further improved MEMT with:– Combination of output from different translation engines with
different scorings– strong decoding capabilities