51
Phrase-based Translation Instructor: Yoav Artzi CS5740: Natural Language Processing Spring 2017 Slides adapted from Michael Collins and Yejin Choi

CS5740: Natural Language Processing Spring 2017•Add one alignment point at a time •Only add alignment points which align a word that currently has no alignment •At first, restrict

  • Upload
    others

  • View
    17

  • Download
    0

Embed Size (px)

Citation preview

Page 1: CS5740: Natural Language Processing Spring 2017•Add one alignment point at a time •Only add alignment points which align a word that currently has no alignment •At first, restrict

Phrase-based Translation

Instructor: Yoav Artzi

CS5740: Natural Language ProcessingSpring 2017

Slides adapted from Michael Collins and Yejin Choi

Page 2: CS5740: Natural Language Processing Spring 2017•Add one alignment point at a time •Only add alignment points which align a word that currently has no alignment •At first, restrict

Overview• Learning phrases from alignments• A phrase-based model• Decoding in phrase-based model• MT evaluation

Page 3: CS5740: Natural Language Processing Spring 2017•Add one alignment point at a time •Only add alignment points which align a word that currently has no alignment •At first, restrict

Phrase-based Models• First stage in training a phrase-based (PB) model is

extraction of PB lexicon• A PB lexicon pairs strings in one language with string in

another language, e.g.,

Page 4: CS5740: Natural Language Processing Spring 2017•Add one alignment point at a time •Only add alignment points which align a word that currently has no alignment •At first, restrict

An Example• A training example:

• Some (not all) phrase pairs extracted from this example:

• We will see how to do this using alignments from IBM models (e.g., IBM Model 2)

[From tutorial by Koehn and Knight]

Page 5: CS5740: Natural Language Processing Spring 2017•Add one alignment point at a time •Only add alignment points which align a word that currently has no alignment •At first, restrict

Recap: IBM Model 2• IBM Model 2 defines a distribution 𝑝(𝑎, 𝑓|𝑒,𝑚) where 𝑓 is a target (French) sentence, 𝑒 is an source (English) sentence, 𝑎 is an alignment, 𝑚 is the length of the foreign sentence

• A useful by-product: for any pair (𝑓, 𝑒), can calculate

where 𝑎∗ is the most likely alignmenta⇤ = argmax

ap(a|f, e,m) = argmax

ap(a, f |e,m)

English: Mary did not slap the green witch

Spanish: Maria no daba una bofetada a la bruja verde

Page 6: CS5740: Natural Language Processing Spring 2017•Add one alignment point at a time •Only add alignment points which align a word that currently has no alignment •At first, restrict

Recap: IBM Model 2• IBM Model 2 defines a distribution 𝑝(𝑎, 𝑓|𝑒,𝑚) where 𝑓 is a target (French) sentence, 𝑒 is an source (English) sentence, 𝑎 is an alignment, 𝑚 is the length of the foreign sentence

• A useful by-product: for any pair (𝑓, 𝑒), can calculate

where 𝑎∗ is the most likely alignmenta⇤ = argmax

ap(a|f, e,m) = argmax

ap(a, f |e,m)

English: Mary did not slap the green witch

Spanish: Maria no daba una bofetada a la bruja verde

Page 7: CS5740: Natural Language Processing Spring 2017•Add one alignment point at a time •Only add alignment points which align a word that currently has no alignment •At first, restrict

Representation as Alignment Matrix

• In IBM Model 2, each target (Spanish) word is aligned to exactly one English word. The matrix shows these alignments.

bof’ = bofetada

Page 8: CS5740: Natural Language Processing Spring 2017•Add one alignment point at a time •Only add alignment points which align a word that currently has no alignment •At first, restrict

Finding Alignment Matrices• Step 1: train IBM Model 2 for 𝑝(𝑓|𝑒), and

find the most likely alignment for each (𝑒, 𝑓) pair

• Step 2: train IBM Model 2 for 𝑝(𝑒|𝑓), and find the most likely alignment for each (𝑒, 𝑓) pair

• Given the two alignments, take the intersection of the two as a starting point

Page 9: CS5740: Natural Language Processing Spring 2017•Add one alignment point at a time •Only add alignment points which align a word that currently has no alignment •At first, restrict
Page 10: CS5740: Natural Language Processing Spring 2017•Add one alignment point at a time •Only add alignment points which align a word that currently has no alignment •At first, restrict

The intersection of the two alignments has been found to be a very reliable starting point

Page 11: CS5740: Natural Language Processing Spring 2017•Add one alignment point at a time •Only add alignment points which align a word that currently has no alignment •At first, restrict

Heuristics for Growing Alignments

• Only explore alignment in union of 𝑝(𝑓|𝑒)and 𝑝(𝑒|𝑓) alignments

• Add one alignment point at a time• Only add alignment points which align a

word that currently has no alignment• At first, restrict to alignment points that are

“neighbors” (adjacent or diagonal) of current alignment points

• Later, consider other alignment points

Page 12: CS5740: Natural Language Processing Spring 2017•Add one alignment point at a time •Only add alignment points which align a word that currently has no alignment •At first, restrict
Page 13: CS5740: Natural Language Processing Spring 2017•Add one alignment point at a time •Only add alignment points which align a word that currently has no alignment •At first, restrict

Extracting Phrase Pairs from the Alignment Matrix

• A phrase-pair consists of a sequence of source (English) words, 𝑒, paired with a sequence of target (French) words, 𝑓

• A phrase-pair (𝑒, 𝑓) is consistent if: – There is at least one word in 𝑒 aligned to a word in 𝑓– There are no words in 𝑓 aligned to words outside 𝑒– There are no words in 𝑒 aligned to words outside 𝑓

• Extract all consistent phrase pairs from the training example

Page 14: CS5740: Natural Language Processing Spring 2017•Add one alignment point at a time •Only add alignment points which align a word that currently has no alignment •At first, restrict

Extracting Phrase Pairs from the Alignment Matrix

• A phrase-pair consists of a sequence of source (English) words, 𝑒, paired with a sequence of target (French) words, 𝑓

• A phrase-pair (𝑒, 𝑓) is consistent if: – There is at least one word in 𝑒 aligned

to a word in 𝑓– There are no words in 𝑓 aligned to

words outside 𝑒– There are no words in 𝑒 aligned to

words outside 𝑓• Extract all consistent phrase pairs

from the training example

(Maria, Mary)(no, did not)(Maria no, Mary did not)(no daba, did not slap)(no daba una bof’, did not slap)(daba una bof’, slap)(a la, the)(verde, green)(bruja, witch)(bruja verde, green witch)(la bruja verde ,the green witch)

X

X

Page 15: CS5740: Natural Language Processing Spring 2017•Add one alignment point at a time •Only add alignment points which align a word that currently has no alignment •At first, restrict

Probabilities for Phrase Pairs• For any phrase pair (f,e) extracted from

the training data, can calculate:

• For example:

• Probabilistic model?

t(f |e) = count(f, e)

count(e)

t(daba una bofetada|slap) = count(daba una bofetada, slap)

count(slap)

Page 16: CS5740: Natural Language Processing Spring 2017•Add one alignment point at a time •Only add alignment points which align a word that currently has no alignment •At first, restrict

Example Phrase Translation Table

Page 17: CS5740: Natural Language Processing Spring 2017•Add one alignment point at a time •Only add alignment points which align a word that currently has no alignment •At first, restrict

Overview• Learning phrases from alignments• A phrase-based model• Decoding in phrase-based model• MT evaluation

Page 18: CS5740: Natural Language Processing Spring 2017•Add one alignment point at a time •Only add alignment points which align a word that currently has no alignment •At first, restrict
Page 19: CS5740: Natural Language Processing Spring 2017•Add one alignment point at a time •Only add alignment points which align a word that currently has no alignment •At first, restrict
Page 20: CS5740: Natural Language Processing Spring 2017•Add one alignment point at a time •Only add alignment points which align a word that currently has no alignment •At first, restrict
Page 21: CS5740: Natural Language Processing Spring 2017•Add one alignment point at a time •Only add alignment points which align a word that currently has no alignment •At first, restrict
Page 22: CS5740: Natural Language Processing Spring 2017•Add one alignment point at a time •Only add alignment points which align a word that currently has no alignment •At first, restrict
Page 23: CS5740: Natural Language Processing Spring 2017•Add one alignment point at a time •Only add alignment points which align a word that currently has no alignment •At first, restrict

• Language model score• Phrase score• Distortion score

Each choiceSearch the space of choices

Key problem?

Page 24: CS5740: Natural Language Processing Spring 2017•Add one alignment point at a time •Only add alignment points which align a word that currently has no alignment •At first, restrict

Overview• Learning phrases from alignments• A phrase-based model• Decoding in phrase-based model• MT evaluation

Page 25: CS5740: Natural Language Processing Spring 2017•Add one alignment point at a time •Only add alignment points which align a word that currently has no alignment •At first, restrict

Phrase-based Translation

Page 26: CS5740: Natural Language Processing Spring 2017•Add one alignment point at a time •Only add alignment points which align a word that currently has no alignment •At first, restrict

Definitions

Page 27: CS5740: Natural Language Processing Spring 2017•Add one alignment point at a time •Only add alignment points which align a word that currently has no alignment •At first, restrict

Definitions

Page 28: CS5740: Natural Language Processing Spring 2017•Add one alignment point at a time •Only add alignment points which align a word that currently has no alignment •At first, restrict

Definitions

Page 29: CS5740: Natural Language Processing Spring 2017•Add one alignment point at a time •Only add alignment points which align a word that currently has no alignment •At first, restrict

Valid Derivations

Page 30: CS5740: Natural Language Processing Spring 2017•Add one alignment point at a time •Only add alignment points which align a word that currently has no alignment •At first, restrict

ExamplesDistortion limit = 4

Page 31: CS5740: Natural Language Processing Spring 2017•Add one alignment point at a time •Only add alignment points which align a word that currently has no alignment •At first, restrict

Examples

V

X

X

Distortion limit = 4

Page 32: CS5740: Natural Language Processing Spring 2017•Add one alignment point at a time •Only add alignment points which align a word that currently has no alignment •At first, restrict

Valid Derivations

How many valid derivation exist?

Page 33: CS5740: Natural Language Processing Spring 2017•Add one alignment point at a time •Only add alignment points which align a word that currently has no alignment •At first, restrict

Scoring Derivations

Page 34: CS5740: Natural Language Processing Spring 2017•Add one alignment point at a time •Only add alignment points which align a word that currently has no alignment •At first, restrict

Example

Page 35: CS5740: Natural Language Processing Spring 2017•Add one alignment point at a time •Only add alignment points which align a word that currently has no alignment •At first, restrict

Example

log p(we | ⇤, ⇤) + log p(must | we, ⇤) + log p(also | must,we)+

log p(take | also,must) · · ·+ log p(seriously | criticism, this)+

g(1, 3,we must also) + g(7, 7, take) + g(4, 5, this criticism) + g(6, 6, seriously)+

⌘|0 + 1� 1|+ ⌘|3 + 1� 7|+ ⌘|7 + 1� 4|+ ⌘|5 + 1� 6|

Page 36: CS5740: Natural Language Processing Spring 2017•Add one alignment point at a time •Only add alignment points which align a word that currently has no alignment •At first, restrict

Decoding Algorithm: Definitions

Page 37: CS5740: Natural Language Processing Spring 2017•Add one alignment point at a time •Only add alignment points which align a word that currently has no alignment •At first, restrict

State Length: 𝑙𝑒𝑛(𝑞)• Given a state 𝑞, 𝑙𝑒𝑛(𝑞) is the number of

words translated– The number of 1’s in the bitmask 𝑏

Page 38: CS5740: Natural Language Processing Spring 2017•Add one alignment point at a time •Only add alignment points which align a word that currently has no alignment •At first, restrict

States and the Search Space

Page 39: CS5740: Natural Language Processing Spring 2017•Add one alignment point at a time •Only add alignment points which align a word that currently has no alignment •At first, restrict

States and the Search Space

(⇤, ⇤, 0000000, 0, 0) ! (must, also, 1110000, 3, ?) !(also, take, 1110001, 7, ?) ! (this, criticism, 1111101, 5, ?) !(criticism, seriously, 1111111, 6, ?)

Page 40: CS5740: Natural Language Processing Spring 2017•Add one alignment point at a time •Only add alignment points which align a word that currently has no alignment •At first, restrict

Transitions

Page 41: CS5740: Natural Language Processing Spring 2017•Add one alignment point at a time •Only add alignment points which align a word that currently has no alignment •At first, restrict

Transition Function: Example

(3, 3, also)

(1, 2,we must)

(4, 5, this criticism)

(5, 6, criticism seriously)

(6, 6, seriously)

(5, 5, review)V

X

VVV

X

Page 42: CS5740: Natural Language Processing Spring 2017•Add one alignment point at a time •Only add alignment points which align a word that currently has no alignment •At first, restrict

Transition Function: Example

(4, 5, this criticism)

(5, 6, criticism seriously)

(6, 6, seriously)

(5, 5, review)

Page 43: CS5740: Natural Language Processing Spring 2017•Add one alignment point at a time •Only add alignment points which align a word that currently has no alignment •At first, restrict

The next function

next((must, also, 1110000, 3, ?), (7, 7, take)) = (also, take, 1110001, 7, ?)

Page 44: CS5740: Natural Language Processing Spring 2017•Add one alignment point at a time •Only add alignment points which align a word that currently has no alignment •At first, restrict

The Equality Function

Page 45: CS5740: Natural Language Processing Spring 2017•Add one alignment point at a time •Only add alignment points which align a word that currently has no alignment •At first, restrict

The Decoding Algorithm

Page 46: CS5740: Natural Language Processing Spring 2017•Add one alignment point at a time •Only add alignment points which align a word that currently has no alignment •At first, restrict

Definition of Add (Q,q’,q,p)

Page 47: CS5740: Natural Language Processing Spring 2017•Add one alignment point at a time •Only add alignment points which align a word that currently has no alignment •At first, restrict

Definition of beam(Q)

Page 48: CS5740: Natural Language Processing Spring 2017•Add one alignment point at a time •Only add alignment points which align a word that currently has no alignment •At first, restrict

The Decoding Algorithm

Page 49: CS5740: Natural Language Processing Spring 2017•Add one alignment point at a time •Only add alignment points which align a word that currently has no alignment •At first, restrict

Overview• Learning phrases from alignments• A phrase-based model• Decoding in phrase-based model• MT evaluation

Page 50: CS5740: Natural Language Processing Spring 2017•Add one alignment point at a time •Only add alignment points which align a word that currently has no alignment •At first, restrict

Automatic Evaluation• Human evaluations: subject measures,

fluency/adequacy

• Automatic measures: n-gram match to references– NIST measure: n-gram recall (worked

poorly)– BLEU: n-gram precision (no one really likes

it, but everyone uses it)

• BLEU:– P1 = unigram precision– P2, P3, P4 = bi-, tri-, 4-gram precision– Weighted geometric mean of P1-4– Brevity penalty (why?)– Somewhat hard to game…

Page 51: CS5740: Natural Language Processing Spring 2017•Add one alignment point at a time •Only add alignment points which align a word that currently has no alignment •At first, restrict

Correlation with Human Evaluataion