33
Search Applications: Machine Translation Next time: Constraint Satisfaction Reading for today: See “Machine Translation Paper” under links Reading for next time: Chapter 5

Search Applications: Machine Translation

  • Upload
    tacy

  • View
    35

  • Download
    0

Embed Size (px)

DESCRIPTION

Search Applications: Machine Translation. Next time: Constraint Satisfaction Reading for today: See “Machine Translation Paper” under links Reading for next time: Chapter 5. Homework Questions?. Agenda. Introduction to machine translation Statistical approaches Use of parallel data - PowerPoint PPT Presentation

Citation preview

Page 1: Search Applications: Machine Translation

Search Applications:Machine Translation

Next time: Constraint Satisfaction

Reading for today: See “Machine Translation Paper” under links

Reading for next time: Chapter 5

Page 2: Search Applications: Machine Translation

2

Homework Questions?

Page 3: Search Applications: Machine Translation

3

Agenda

Introduction to machine translation Statistical approaches Use of parallel data Alignment

What functions must be optimized?

Comparison of A* and greedy local search (hill climbing) algorithms for translation

How they work Their performance

Page 4: Search Applications: Machine Translation

4

Approach to Statistical MT

Translate from past experience

Observe how words, and phrases, and sentences are translated

Given new sentences in the source language, choose the most probable translation in the target language

Data: large corpus of parallel text E.g., Canadian Parliamentary proceedings

Page 5: Search Applications: Machine Translation

5

Data

Example Ce n’est pas clair. It is not clear.

Quantity 200 billion words (2004 MT evaluation)

Sources Hansards: Canadian parliamentary proceedings Hong Kong: official documents published in multiple

languages Newspapers published in multiple languages Religious and literary works

Page 6: Search Applications: Machine Translation

6

Alignment – the first step

Which sentences or paragraphs in one language correspond to which paragraphs or sentences in another language? (Or what words?)

Problems Translators don’t use word for word translations Crossing alignments

Types of alignment 1:1 (90% of the cases) 1:2, 2:1 3:1, 1:3

Page 7: Search Applications: Machine Translation

7

With regard to Quant aux According to

[the] mineral waters and [(les) eaux minerales et [our survey,] 1988

the lemonades-soft drinks aux limonades],

they encounter [elles rencontrent [sales] of

still more toujours plus [mineral water

users. Indeed

d’adeptes. ]En effet

and soft drinks] were

our survey [notre sondage] [much higher]

makes standout fait ressortir [than in 1987,]

the sales [des ventes] reflecting

clearly [nettement [The growing popularity]

superior Superieures] Of these products.

to those in 1987 [a celles de 1987] [Cola drink] manufacturers

for cola-based drinks Pour [les boissons a base de cola]

[in particular]

especially notamment Achieved above

Average growth rates

An example of 2:2 alignment

Page 8: Search Applications: Machine Translation

8

Fertility: a word may be translated by more than 1 word

Notamment -> in particular (fertility 2) Limonades -> soft drinks

Fertility 0: A word translated by 0 words Des ventes -> sales Les boissons a base de cola -> cola drinks

Many to many: Elles rencontrent toujours plus d’adeptes -> The

growing popularity

Page 9: Search Applications: Machine Translation

9

Bead for sentence alignment

A group of sentences in one language that corresponds in content to some group of sentences in the other language

Either group can be empty

How much content has to overlap between sentences to count it as alignment?

An overlapping clause can be sufficient

Page 10: Search Applications: Machine Translation

10

Methods for alignment

Length based

Offset alignment

Word based

Anchors (e.g., cognates)

Page 11: Search Applications: Machine Translation

11

Word Based Alignment

Assume first and last sentences of the texts align (anchors).

Then until most sentences aligned: Form an envelope of alignments from the cartesian

product of the list of sentences Exclude alignments if they cross anchors or too distance

Choose pairs of words that tend to occur in alignments

Find pairs of source and target sentences which contain many possible lexical correspondences.

The most reliable augment the set of anchors

Page 12: Search Applications: Machine Translation

12

The Noisy Channel Model for MT

Language Model

P(e)

Translation Model

P(f|e)

Decoder

e’=argmaxeP(e|f)

Noisy Channel

Page 13: Search Applications: Machine Translation

13

The problem

Language model constructed from a large corpus of English

Bigram model: probability of word pairs Trigram model: probability of 3 words in a row From these, compute sentence probability

Translation model can be derived from alignment

For any pair of English/French words, what is the probability that pair is a translation?

Decoding is the problem: Given an unseen French sentence, how do we determine the translation?

Page 14: Search Applications: Machine Translation

14

Language Model

Predict the next word given the previous words

P(Wn| W1……Wn-1) Markov assumption

Only the last few words affects the next word Usual cases: bigram, trigram, 4gram

Sue swallowed the large green …. Parameter estimation

Bigram: 20,000X19,000 = 400 million Trigram: 20,0002X19,000 = 8 trillion 4gram: 20,0003X19,000=1.6X1017

Page 15: Search Applications: Machine Translation

15

Translation Model

For a particular word alignment, multiply the m translation probabilities:

P(Jean aime Marie | John loves Mary) P(Jean|John)XP(aime|loves)XP(Marie|Mary)

Then sum the probabilities of all alignments

Page 16: Search Applications: Machine Translation

16

Decoding is NP complete

When considering any word re-ordering Swapped words Words with fertility > n (insertions) Words with fertility 0 (deletions)

Usual strategy: examine a subset of likely possibilities and choose from that

Search error: decoder returns e’ but there exists some e s.t. P(e|f) > P (e’|f)

Page 17: Search Applications: Machine Translation

17

Example Decoding Errors

Search ErrorPermettez que je donne un example a la chambre.Let me give the House one example.Let me give an example in the House

Model Error Vous avez besoin de toute l’aide disponible.You need all the help you can get.You need of the whole benefits available.

Page 18: Search Applications: Machine Translation

18

Search

Traditional decoding method: stack decoder A* algorithm Deeply explore each hypothesis

Fast greedy algorithm Much faster than A* How often does it fail?

Integer Programming Method Transform to Traveling Salesman (see paper) Very slow Guaranteed to find the best choice

Page 19: Search Applications: Machine Translation

19

Large branching factors Machine Translation

Input: sequence of n words, each with up to 200 possible target word translations.

Output: sequence of m words in the target language that has high score under some goodness criterion.

Search space: 6 words French

sentence has 10300 distinct translation scores under the IBM M4 translation model. [Soricut, Knight, Marcu, AMTA’2002]

Page 20: Search Applications: Machine Translation

20

Stack decoder: A*

Initialize the stack with an empty hypothesis

Loop Pop h, the best hypothesis off the stack If h is a complete sentence, output h and

terminate For each possible next word w, extend h

by adding w and push the resulting hypothesis onto the stack.

Page 21: Search Applications: Machine Translation

21

Complications

It’s not a simple left-to-right translation

Because we multiply probabilities as we add words, shorter hypotheses will always win

Use multiple stacks, one for each length

Given fertility possibilities, when we add a new target word for an input source word, how many do we add?

Page 22: Search Applications: Machine Translation

22

Example

Page 23: Search Applications: Machine Translation

Hill climbing function HillClimbing(problem, initial-state, queuing-fn)node ← MakeNode(initial-state(problem));while T do

next ← Best(SearchOperator-fn(node,cost-fn));if(IsBetter-fn(next, node)) then continue;else if(GoalTest(node)) then return node;else exit;

end whilereturn Failure;

MT (Germann et al., ACL-2001)

node ← targetGloss(sourceSentence);while T do next ← Best( LocallyModifiedTranslationOf(node)); if(IsBetter(next, node)) then continue; else print node; exit;end while

Page 24: Search Applications: Machine Translation

24

Types of changes

Translate one or two words (j1e1j2e2)

Translate and insert (j e1 e2)

Remove word of fertility 0 (i)

Swap segments (i1 i2 j1 j2)

Join words (i1 i2)

Page 25: Search Applications: Machine Translation

25

Example

Total of 77,421 possible translations attempted

Page 26: Search Applications: Machine Translation

26

Page 27: Search Applications: Machine Translation

27

Page 28: Search Applications: Machine Translation

28

How to search better?

MakeNode(initial-state(problem))

RemoveFront(Q)

SearchOperator-fn(node, cost-fn);

queuing-fn(problem, Q, (Next,Cost));

Page 29: Search Applications: Machine Translation

29

Example 1: Greedy Search MakeNode(initial-state(problem))

Machine Translation (Marcu and Wong, EMNLP-2002)

node ← targetGloss(sourceSentence);while T do next ← Best( LocallyModifiedTranslationOf(node)); if(IsBetter(next, node)) then continue; else print node; exit;end while

20.5

21

21.5

22

22.5

23

23.5

IBM J M, p(E | F) gloss J M, p(E, F) gloss

IBM JM, p(E | F) gloss JM, p(E, F) gloss

Page 30: Search Applications: Machine Translation

30

Climbing the wrong peak

What sentence is more grammatical?1. better bart than madonna , i say 2. i say better than bart madonna ,

Can you make a sentence with these words? a and apparently as be could dissimilar firing identical neural really so things thought two

Model validation

Model stress-testing

Page 31: Search Applications: Machine Translation

31

Language-model stress-testing

Input: bag of words Output: best sequence according to a

linear combination of an ngram LM syntax-based LM (Collins, 1997)

Page 32: Search Applications: Machine Translation

32

Size: 10-25 words long

0.00%

10.00%

20.00%

30.00%

40.00%

50.00%

60.00%

70.00%

80.00%

90.00%

ID E Search Errors Model Errors

NGLM NGLM+SBLM NGLM+SBLM*

Best searched• 51.6: and so could really be a neural apparently thought things as dissimilar firing two identicalOriginal word order• 64.3: could two things so apparently dissimilar as a thought and neural firing really be identical

0.00%10.00%20.00%30.00%40.00%50.00%60.00%70.00%80.00%90.00%

ID E Search Errors Model Errors

NGLM NGLM+SBLM NGLM+SBLM*

Best searched• 32.3: i say better than bart madonna ,Original word order• 41.6: better bart than madonna, i say

Size: 3-7 words long

SBLM*: trained on an additional 160k WSJ sentences.

Page 33: Search Applications: Machine Translation

33

End of Class Questions