SP11 cs288 lecture 10 -- phrase alignment (2PP)klein/cs288/sp11... · frais .. Learning weights has been tried, several times: [Marcu and Wong, 02] [DeNero et al, 06] … and others

1

Statistical NLPSpring 2011

Lecture 10: Phrase AlignmentDan Klein – UC Berkeley

Phrase Weights

2

3

4

Phrase Scoring

les chatsaiment

lepoisson

cats

like

fresh

fish

.

.frais

.

� Learning weights has been tried, several times:� [Marcu and Wong, 02]� [DeNero et al, 06]� … and others

� Seems not to work well, for a variety of partially understood reasons

� Main issue: big chunks get all the weight, obvious priors don’t help� Though, [DeNero et al 08]

Phrase Size

� Phrases do help� But they don’t need

to be long� Why should this be?

5

Lexical Weighting

Phrase Alignment

6

7

8

9

10

Identifying Phrasal Translations

In the past two years , a number of US citizens …

过去两年中 , 一批美国公民 …

past two year in , one lots US citizen

Phrase alignment models: Choose a segmentation and a one-to-one phrase alignment

Past Go over

Underlying assumption: There is a correct phrasal segmentation

11

Unique Segmentations?




Problem 1: Overlapping phrases can be useful (and complementary)

Problem 2: Phrases and their sub-phrases can both be useful

Hypothesis: This is why models of phrase alignment don’t work well

Identifying Phrasal Translations

This talk: Modeling sets of overlapping, multi-scale phrase pairs




Input: sentence pairs

Output: extracted phrases

12

… But the Standard Pipeline has Overlap!

M O T I V A T I O N

In the past two years

过去

两

年

中

past

two

year

in

Sentence

Pair

Word

Alignment

Extracted

Phrases

Our Task: Predict Extraction Sets

M O T I V A T I O N

Sentence

Pair

Extracted

Phrases

Conditional model of extraction sets given sentence pairs


过去

两

年

中

0

1

2

3

40 1 2 3 4 5


过去

两

年

中

0

1

2

3

40 1 2 3 4 5

Extracted Phrases +

``Word Alignments’’

13

Alignments Imply Extraction Sets

M O D E L


过去

两

年

中

past

two

year

in

0

1

2

3

40 1 2 3 4 5

Word-level

alignment

links

Word-to-span

alignments

Extraction set

of bispans

Incorporating Possible Alignments

M O D E L


过去

两

年

中

past

two

year

in

0

1

2

3

40 1 2 3 4 5

Sure and

possible

word links

Word-to-span

alignments

Extraction set

of bispans

14

Linear Model for Extraction Sets

M O D E L


过去

两

年

中

0

1

2

3

40 1 2 3 4 5

Features on sure links

Features on all bispans

Features on Bispans and Sure Links

F E A T U R E S

过

地球

go over

Earth

over the Earth

Some features on sure links

HMM posteriors

Presence in dictionary

Numbers & punctuation

Features on bispans

HMM phrase table features: e.g., phrase relative frequencies

Lexical indicator features for phrases with common words

Monolingual phrase features: e.g., “the _____”

Shape features: e.g., Chinese character counts

15

Getting Gold Extraction Sets

T R A I N I N G

Hand Aligned:

Sure and possible

word links

Word-to-span

alignments

Extraction set

of bispans

Deterministic: A bispan is included iff every word within the bispan aligns within the bispan

Deterministic: Find min and maxalignment index for each word

Discriminative Training with MIRA

T R A I N I N G

Loss function: F-score of bispan errors (precision & recall)

Training Criterion: Minimal change to w such that the gold is preferred to the guess by a loss-scaled margin

Gold (annotated) Guess (arg max w·ɸ)

16

Inference: An ITG Parser

I N F E R E N C E

ITG captures some bispans

Experimental Setup

R E S U L T S

Chinese-to-English newswire

Parallel corpus: 11.3 million words; sentences length ≤ 40

MT systems: Tuned and tested on NIST ‘04 and ‘05

Supervised data: 150 training & 191 test sentences (NIST ‘02)

Unsupervised Model: Jointly trained HMM (Berkeley Aligner)

17

Baselines and Limited Systems

R E S U L T S

HMM:

ITG:

Coarse:

State-of-the-art unsupervised baseline

Joint training & competitive posterior decoding

Source of many features for supervised models

Supervised ITG aligner with block terminals

State-of-the-art supervised baseline

Re-implementation of Haghighi et al., 2009

Supervised block ITG + possible alignments

Coarse pass of full extraction set model

Word Alignment Performance

R E S U L T S

84.7

84.0

84.4

82.2

84.2

83.1

83.4

83.8

83.6

84.0

76.9

80.4

Precision

Recall

1 - AER

HMM

ITG

Coarse

Full

18

Extracted Bispan Performance

R E S U L T S

69.0

74.2

71.6

74.0

70.0

72.9

71.4

72.8

75.8

62.3

68.4

62.8

69.5

59.5

64.1

59.9

Precision

Recall

F1

F5 HMM

ITG

Coarse

Full

Translation Performance (BLEU)

R E S U L T S

34.4

35.9

34.2

35.7

33.6

34.7

33.2

34.5

31 32 33 34 35 36 37

Moses

Joshua

HMM

ITG

Coarse

Full

Supervised conditions also included HMM alignments

Documents

SP11 cs288 lecture 10 -- phrase alignment (2PP)klein/cs288/sp11... · frais .. Learning weights has been tried, several times: [Marcu and Wong, 02] [DeNero et al, 06] … and others