32
MEMT: Multi-Engine Machine Translation Faculty: Alon Lavie, Jaime Carbonell Students and Staff: Gregory Hanneman, Justin Merrill (Shyamsundar Jayaraman, Satanjeev Banerjee)

MEMT: Multi-Engine Machine Translation Faculty: Alon Lavie, Jaime Carbonell Students and Staff: Gregory Hanneman, Justin Merrill (Shyamsundar Jayaraman,

Embed Size (px)

Citation preview

Page 1: MEMT: Multi-Engine Machine Translation Faculty: Alon Lavie, Jaime Carbonell Students and Staff: Gregory Hanneman, Justin Merrill (Shyamsundar Jayaraman,

MEMT:Multi-Engine Machine

Translation

Faculty: Alon Lavie, Jaime Carbonell

Students and Staff:

Gregory Hanneman, Justin Merrill(Shyamsundar Jayaraman, Satanjeev Banerjee)

Page 2: MEMT: Multi-Engine Machine Translation Faculty: Alon Lavie, Jaime Carbonell Students and Staff: Gregory Hanneman, Justin Merrill (Shyamsundar Jayaraman,

March 10, 2005 MEMT 2

Goals and Approach• Combine the output of multiple MT engines into a synthetic

output that outperforms the originals in translation quality• Synthetic combination of the originals, NOT selecting the best

system• Two main approaches:• Approach-1: Merging of Lattice outputs + joint decoding

– Each MT system produces a lattice of translation fragments, indexed based on source word positions

– Lattices are merged into a single common lattice– Statistical MT decoder selects a translation “path” through the

lattice• Approach-2: Align best output from engines + new decoder

– Each MT system produces a sentence translation output– Establish an explicit word matching between all words of the

various MT engine outputs– “Decoding”: create a collection of synthetic combinations of the

original strings based on matched words, target LM, and constraints + re-combination and pruning

– Score resulting hypotheses and select a final output

Page 3: MEMT: Multi-Engine Machine Translation Faculty: Alon Lavie, Jaime Carbonell Students and Staff: Gregory Hanneman, Justin Merrill (Shyamsundar Jayaraman,

March 10, 2005 MEMT 3

Synthetic Translation MEMT

• Idea:– Start with output sentences of the various MT

engines– Explicitly align the words that are common between

any pair of systems, and apply transitivity– Use the alignments as reinforcement and as

indicators of possible locations for the words– Each engine has a “weight” that is used for the

words that it contributes– Decoder searches for an optimal synthetic

combination of words and phrases that optimizes a scoring function that combines the alignment weights and a LM score

Page 4: MEMT: Multi-Engine Machine Translation Faculty: Alon Lavie, Jaime Carbonell Students and Staff: Gregory Hanneman, Justin Merrill (Shyamsundar Jayaraman,

March 10, 2005 MEMT 4

The Word-level Matcher

• Developed by Satanjeev Banerjee as a component in our METEOR Automatic MT Evaluation metric

• Finds maximal alignment match with minimal “crossing branches”

• Implementation: Clever search algorithm for best match using pruning of sub-optimal sub-solutions

Page 5: MEMT: Multi-Engine Machine Translation Faculty: Alon Lavie, Jaime Carbonell Students and Staff: Gregory Hanneman, Justin Merrill (Shyamsundar Jayaraman,

March 10, 2005 MEMT 5

Matcher Example

IBM: the sri lankan prime minister criticizes head of the country's

ISI: The President of the Sri Lankan Prime Minister Criticized the President of the Country

CMU: Lankan Prime Minister criticizes her country

Page 6: MEMT: Multi-Engine Machine Translation Faculty: Alon Lavie, Jaime Carbonell Students and Staff: Gregory Hanneman, Justin Merrill (Shyamsundar Jayaraman,

March 10, 2005 MEMT 6

The MEMT Algorithm• Algorithm builds collections of partial hypotheses of

increasing length • Partial hypotheses are extended by selecting the “next

available” word from one of the original systems • Sentences are assumed synchronous:

– Each word is either aligned with another word or is an alternative of another word

• Extending a partial hypothesis with a word “pulls” and “uses” its aligned words with it, and marks its alternatives as “used” – “vectors” keep track of this

• Partial hypotheses are scored and ranked• Pruning and re-combination• Hypothesis can end if any original system proposes an

end of sentence as next word

Page 7: MEMT: Multi-Engine Machine Translation Faculty: Alon Lavie, Jaime Carbonell Students and Staff: Gregory Hanneman, Justin Merrill (Shyamsundar Jayaraman,

March 10, 2005 MEMT 7

The MEMT Algorithm

• Scoring:– Alignment score based on reinforcement

from alignments of the words– LM score based on trigram LM– Sum logs of alignment score and LM score

(equivalent to product of probabilities)– Select best scoring hypothesis based on:

• Total score (bias towards shorter hypotheses)• Average score per word

Page 8: MEMT: Multi-Engine Machine Translation Faculty: Alon Lavie, Jaime Carbonell Students and Staff: Gregory Hanneman, Justin Merrill (Shyamsundar Jayaraman,

March 10, 2005 MEMT 8

The MEMT Algorithm

• Parameters:– “lingering word” horizon: how long is a word

allowed to linger when words following it have already been used?

– “lookahead” horizon: how far ahead can we look for an alternative for a word that is not aligned?

– “POS matching”: limit search for an alternative to only words of the same POS

Page 9: MEMT: Multi-Engine Machine Translation Faculty: Alon Lavie, Jaime Carbonell Students and Staff: Gregory Hanneman, Justin Merrill (Shyamsundar Jayaraman,

March 10, 2005 MEMT 9

Example

IBM: korea stands ready to allow visits to verify that it does not manufacture nuclear weapons 0.7407

ISI: North Korea Is Prepared to Allow Washington to Verify that It Does Not Make Nuclear Weapons 0.8007

CMU: North Korea prepared to allow Washington to the verification of that is to manufacture nuclear weapons 0.7668

Selected MEMT Sentence : north korea is prepared to allow washington to verify that it does not manufacture nuclear weapons . 0.8894 (-2.75135)

Page 10: MEMT: Multi-Engine Machine Translation Faculty: Alon Lavie, Jaime Carbonell Students and Staff: Gregory Hanneman, Justin Merrill (Shyamsundar Jayaraman,

March 10, 2005 MEMT 10

ExampleIBM: victims russians are one man and his wife and abusing their eight

year old daughter plus a ( 11 and 7 years ) man and his wife and driver , egyptian nationality . : 0.6327

ISI: The victims were Russian man and his wife, daughter of the most from the age of eight years in addition to the young girls ) 11 7 years ( and a man and his wife and the bus driver Egyptian nationality. : 0.7054

CMU: the victims Cruz man who wife and daughter both critical of the eight years old addition to two Orient ( 11 ) 7 years ) woman , wife of bus drivers Egyptian nationality . : 0.5293

MEMT Sentence : Selected : the victims were russian man and his wife and daughter of the

eight years from the age of a 11 and 7 years in addition to man and his wife and bus drivers egyptian nationality . 0.7647 -3.25376

Oracle : the victims were russian man and wife and his daughter of the eight years old from the age of a 11 and 7 years in addition to the man and his wife and bus drivers egyptian nationality young girls . 0.7964 -3.44128

Page 11: MEMT: Multi-Engine Machine Translation Faculty: Alon Lavie, Jaime Carbonell Students and Staff: Gregory Hanneman, Justin Merrill (Shyamsundar Jayaraman,

March 10, 2005 MEMT 11

Example

IBM: the sri lankan prime minister criticizes head of the country's : 0.8862

ISI: The President of the Sri Lankan Prime Minister Criticized the President of the Country : 0.8660

CMU: Lankan Prime Minister criticizes her country: 0.6615

MEMT Sentence : Selected: the sri lankan prime minister criticizes president

of the country . 0.9353 -3.27483Oracle: the sri lankan prime minister criticizes president

of the country's . 0.9767 -3.75805

Page 12: MEMT: Multi-Engine Machine Translation Faculty: Alon Lavie, Jaime Carbonell Students and Staff: Gregory Hanneman, Justin Merrill (Shyamsundar Jayaraman,

March 10, 2005 MEMT 12

Current System

• Initial development tests performed on TIDES 2003 Arabic-to-English MT data, using IBM, ISI and CMU SMT system output

• Further development tests performed on Arabic-to-English EBMT Apptek and SYSTRAN system output and on three Chinese-to-English COTS systems

Page 13: MEMT: Multi-Engine Machine Translation Faculty: Alon Lavie, Jaime Carbonell Students and Staff: Gregory Hanneman, Justin Merrill (Shyamsundar Jayaraman,

March 10, 2005 MEMT 13

Experimental Results:Chinese-to-English

System METEOR Score

Online Translator A .4917

Online Translator B .4859

Online Translator C .4910

Choosing best online translation .5381

MEMT .5301

Best hypothesis generated by MEMT .5840

Page 14: MEMT: Multi-Engine Machine Translation Faculty: Alon Lavie, Jaime Carbonell Students and Staff: Gregory Hanneman, Justin Merrill (Shyamsundar Jayaraman,

March 10, 2005 MEMT 14

Experimental Results:Arabic-to-English

System METEOR Score

Apptek .4241

EBMT .4231

Systran .4405

Choosing best online translation .4432

MEMT .5185

Best hypothesis generated by MEMT .5883

Page 15: MEMT: Multi-Engine Machine Translation Faculty: Alon Lavie, Jaime Carbonell Students and Staff: Gregory Hanneman, Justin Merrill (Shyamsundar Jayaraman,

March 10, 2005 MEMT 15

Other Exampleshttp://www-2.cs.cmu.edu/afs/cs/user/alavie/Students/Shyam/Comps100

Page 16: MEMT: Multi-Engine Machine Translation Faculty: Alon Lavie, Jaime Carbonell Students and Staff: Gregory Hanneman, Justin Merrill (Shyamsundar Jayaraman,

March 10, 2005 MEMT 16

Architecture and Engineering• Challenge: How do we construct an effective

architecture for running MEMT within large-scale distributed projects?– Example: GALE Project– Multiple MT engines running at different locations– Input may be text or output of speech recognizers,

Output may go downstream to other applications (IE, Summarization, TDT)

• Approach: Using IBM’s UIMA: Unstructured Information Management Architecture– Provides support for building robust processing

“workflows” with heterogeneous components– Components act as “annotators” at the character

level within documents

Page 17: MEMT: Multi-Engine Machine Translation Faculty: Alon Lavie, Jaime Carbonell Students and Staff: Gregory Hanneman, Justin Merrill (Shyamsundar Jayaraman,

March 10, 2005 MEMT 17

UIMA-based MEMT

• MT engines and MEMT engine are set up as distributed servers:– Communication over socket connections– Sentence-by-sentence translation

• Java “wrappers” convert these into UIMA-style annotator components

• UIMA-based “workflows” implement a variety of a-synchronous tasks, with results stored in a common Annotations Database (ADB)– Translation workflows– MEMT workflow– Evaluation/scoring workflow

Page 18: MEMT: Multi-Engine Machine Translation Faculty: Alon Lavie, Jaime Carbonell Students and Staff: Gregory Hanneman, Justin Merrill (Shyamsundar Jayaraman,

March 10, 2005 MEMT 18

UIMA-based MEMT: Examples

• Translation Workflow:– Retrieve document from ADB– “Annotate” document with translation annotator X– Write back new “annotation” into ADB

• MEMT Workflow:– Retrieve document translation annotations labeled

by X, Y, Z from ADB– “Annotate” the document with a new MEMT

annotation– Write back MEMT annotation into ADB

Page 19: MEMT: Multi-Engine Machine Translation Faculty: Alon Lavie, Jaime Carbonell Students and Staff: Gregory Hanneman, Justin Merrill (Shyamsundar Jayaraman,

March 10, 2005 MEMT 19

Conclusions and Open Research Issues

• New sentence-level MEMT approach with promising performance

• Easy to run on both research and COTS systems• UIMA-based architecture design for effective integration

is large distributed systems/projects GALE• Main Open Research Issues:

– Improvements to the underlying algorithm: better word alignments, “artificial” word alignments

– Confidence scores at the sentence or word level– Decoding is still suboptimal

• Oracle scores show there is much room for improvement• Need for additional discriminant features

– Extend approach to Multi-Engine SR combination– Engineering issues: synchronization, human friendly

workflows

Page 20: MEMT: Multi-Engine Machine Translation Faculty: Alon Lavie, Jaime Carbonell Students and Staff: Gregory Hanneman, Justin Merrill (Shyamsundar Jayaraman,

March 10, 2005 MEMT 20

Page 21: MEMT: Multi-Engine Machine Translation Faculty: Alon Lavie, Jaime Carbonell Students and Staff: Gregory Hanneman, Justin Merrill (Shyamsundar Jayaraman,

March 10, 2005 MEMT 21

Demo

Page 22: MEMT: Multi-Engine Machine Translation Faculty: Alon Lavie, Jaime Carbonell Students and Staff: Gregory Hanneman, Justin Merrill (Shyamsundar Jayaraman,

March 10, 2005 MEMT 22

Approach-1: Lattice MEMT

• Approach:– Multiple MT systems produce a lattice of

output segments– Create a “union” lattice of the various

systems– Decode the joint lattice and select best

synthetic output

Page 23: MEMT: Multi-Engine Machine Translation Faculty: Alon Lavie, Jaime Carbonell Students and Staff: Gregory Hanneman, Justin Merrill (Shyamsundar Jayaraman,

March 10, 2005 MEMT 23

Approach-1: Lattice MEMT

• Lattice Decoder from CMU’s SMT:– Lattice arcs are scored uniformly using

word-to-word translation probabilities, regardless of which engine produced the arc

– Decoder searches for path that optimizes combination of Translation Model score and Language Model score

– Decoder can also reorder words or phrases (up to 4 positions ahead)

Page 24: MEMT: Multi-Engine Machine Translation Faculty: Alon Lavie, Jaime Carbonell Students and Staff: Gregory Hanneman, Justin Merrill (Shyamsundar Jayaraman,

March 10, 2005 MEMT 24

Initial Experiment: Hindi-to-English Systems

• Put together a scenario with “miserly” data resources:– Elicited Data corpus: 17589 phrases– Cleaned portion (top 12%) of LDC dictionary: ~2725

Hindi words (23612 translation pairs)– Manually acquired resources during the DARPA SLE:

• 500 manual bigram translations• 72 manually written phrase transfer rules• 105 manually written postposition rules• 48 manually written time expression rules

• No additional parallel text!!

Page 25: MEMT: Multi-Engine Machine Translation Faculty: Alon Lavie, Jaime Carbonell Students and Staff: Gregory Hanneman, Justin Merrill (Shyamsundar Jayaraman,

March 10, 2005 MEMT 25

Initial Experiment: Hindi-to-English Systems

• Tested on section of JHU provided data: 258 sentences with four reference translations– SMT system (stand-alone)– EBMT system (stand-alone)– XFER system (naïve decoding)– XFER system with “strong” decoder

• No grammar rules (baseline)• Manually developed grammar rules• Automatically learned grammar rules

– XFER+SMT with strong decoder (MEMT)

Page 26: MEMT: Multi-Engine Machine Translation Faculty: Alon Lavie, Jaime Carbonell Students and Staff: Gregory Hanneman, Justin Merrill (Shyamsundar Jayaraman,

March 10, 2005 MEMT 26

Results on JHU Test Set (very miserly training data)System BLEU M-BLEU NIST

EBMT 0.058 0.165 4.22

SMT 0.093 0.191 4.64

XFER (naïve) man grammar

0.055 0.177 4.46

XFER (strong)

no grammar0.109 0.224 5.29

XFER (strong) learned grammar

0.116 0.231 5.37

XFER (strong) man grammar

0.135 0.243 5.59

XFER+SMT 0.136 0.243 5.65

Page 27: MEMT: Multi-Engine Machine Translation Faculty: Alon Lavie, Jaime Carbonell Students and Staff: Gregory Hanneman, Justin Merrill (Shyamsundar Jayaraman,

March 10, 2005 MEMT 27

Effect of Reordering in the Decoder

NIST vs. Reordering

4.8

4.9

5

5.1

5.2

5.3

5.4

5.5

5.6

5.7

0 1 2 3 4

reordering window

NIS

T s

core no grammar

learned grammar

manual grammar

MEMT: SFXER+ SMT

Page 28: MEMT: Multi-Engine Machine Translation Faculty: Alon Lavie, Jaime Carbonell Students and Staff: Gregory Hanneman, Justin Merrill (Shyamsundar Jayaraman,

March 10, 2005 MEMT 28

Further Experiments:Arabic-to-English Systems

• Combined: – CMU’s SMT system– CMU’s EBMT system– UMD rule-based system– (IBM didn’t work out)

• TM scores from CMU SMT system• Built large new English LM• Tested on TIDES 2003 Test set

Page 29: MEMT: Multi-Engine Machine Translation Faculty: Alon Lavie, Jaime Carbonell Students and Staff: Gregory Hanneman, Justin Merrill (Shyamsundar Jayaraman,

March 10, 2005 MEMT 29

Arabic-to-English SystemsLattice MEMT Results:

BLEU M-BLEU METEOR

UMD only .0335 [.0300, .0374]

.1099 [.1074, .1129]

.2356 [.2293, .3419]

EBMT only .1090 [.1017, .1160]

.1861 [.1799, .1921]

.3666 [.3574, .3752]

SMT only .2779 [.2782, .2886]

.3499 [.3412, .3582]

.5754 [.5649, .5855]

EBMT+UMD .1206 [.1133, .1288]

.2069 [.2010, .2135]

.4061 [.3976, .4151]

SMT+EBMT .2586 [.2477, .2702]

.3309[.3222, .3403]

.5450 [.5360, .5545]

SMT+UMD .2622 [.2519, .2724]

.3363 [.3281, .3446]

.5666 [.5575, .5764]

SMT+UMD+ EBMT

.2527 [.2426, .2640]

.3262 [.3181, .3349]

.5394 [.5290, .5504]

Page 30: MEMT: Multi-Engine Machine Translation Faculty: Alon Lavie, Jaime Carbonell Students and Staff: Gregory Hanneman, Justin Merrill (Shyamsundar Jayaraman,

March 10, 2005 MEMT 30

Lattice MEMT

• Main Drawbacks:– Requires MT engines to provide lattice output

difficult to obtain!– Lattice output from all engines must be compatible:

common indexing based on source word positions difficult to standardize!

– Common TM used for scoring edges may not work well for all engines

– Decoding does not take into account any reinforcements from multiple engines proposing the same translation for any portion of the input

Page 31: MEMT: Multi-Engine Machine Translation Faculty: Alon Lavie, Jaime Carbonell Students and Staff: Gregory Hanneman, Justin Merrill (Shyamsundar Jayaraman,

March 10, 2005 MEMT 31

Demonstration

Page 32: MEMT: Multi-Engine Machine Translation Faculty: Alon Lavie, Jaime Carbonell Students and Staff: Gregory Hanneman, Justin Merrill (Shyamsundar Jayaraman,

March 10, 2005 MEMT 32

Experimental Results:Arabic-to-English

System P/R/F1/Fmean

Apptek .5137/.5336/.5235/.5316

EBMT .5710/.4781/.5204/.4860

Systran .4994/.5474/.5223/.5422

Choosing best online translation .

MEMT .5383/.6212/.5768/.6118

Best hypothesis generated by MEMT .