23
September 2004 CSAW 2004 1 Extraction of Bilingual Information from Parallel Texts Mike Rosner

Extraction of Bilingual Information from Parallel Texts

  • Upload
    hoang

  • View
    37

  • Download
    0

Embed Size (px)

DESCRIPTION

Extraction of Bilingual Information from Parallel Texts. Mike Rosner. Outline. Machine Translation Traditional vs. Statistical Architectures Experimental Results Conclusions. Translational Equivalence: many:many relation. SOURCE. TARGET. Traditional Machine Translation. Remarks. - PowerPoint PPT Presentation

Citation preview

Page 1: Extraction of Bilingual Information from Parallel Texts

September 2004 CSAW 2004 1

Extraction of Bilingual Information from Parallel Texts

Mike Rosner

Page 2: Extraction of Bilingual Information from Parallel Texts

September 2004 CSAW 2004 2

Outline

• Machine Translation

• Traditional vs. Statistical Architectures

• Experimental Results

• Conclusions

Page 3: Extraction of Bilingual Information from Parallel Texts

September 2004 CSAW 2004 3

Translational Equivalence:many:many relation

SOURCE TARGET

Page 4: Extraction of Bilingual Information from Parallel Texts

September 2004 CSAW 2004 4

Traditional Machine Translation

Page 5: Extraction of Bilingual Information from Parallel Texts

September 2004 CSAW 2004 5

Remarks

• Character of System– Knowledge based.– High quality results if domain is well delimited.– Knowledge takes the form of specialised rules

(analysis; synthesis; transfer).

• Problems– Limited coverage– Knowledge acquisition bottleneck.– Extensibility.

Page 6: Extraction of Bilingual Information from Parallel Texts

September 2004 CSAW 2004 6

Statistical Translation

• Robust

• Domain independent

• Extensible

• Does not require language specialists

• Uses noisy channel model of translation

Page 7: Extraction of Bilingual Information from Parallel Texts

September 2004 CSAW 2004 7

Noisy Channel ModelSentence Translation (Brown et. al. 1990)

sourcesentence

target sentence

sentence

Page 8: Extraction of Bilingual Information from Parallel Texts

September 2004 CSAW 2004 8

The Problem of Translation

• Given a sentence T of the target language, seek the sentence S from which a translator produced T, i.e.

find S that maximises P(S|T)• By Bayes' theorem

P(S|T) = P(S) x P(T|S)

P(T)

whose denominator is independent of S.• Hence it suffices to maximise P(S) x P(T|S)

Page 9: Extraction of Bilingual Information from Parallel Texts

September 2004 CSAW 2004 9

A Statistical MT System

Source Language

Model

TranslationModel

P(S) * P(T|S) = P(S,T)

S T

DecoderT S

Page 10: Extraction of Bilingual Information from Parallel Texts

September 2004 CSAW 2004 10

The Three Components of a Statistical MT model

1. Method for computing language model probabilities (P(S))

2. Method for computing translation probabilities (P(S|T))

3. Method for searching amongst source sentences for one that maximisesP(S) * P(T|S)

Page 11: Extraction of Bilingual Information from Parallel Texts

September 2004 CSAW 2004 11

ProbabilisticLanguage Models

• GeneralP(s1s2...sn) =P(s1)*P(s2|s1) ...*P(sn|s1...s(n-1))

• TrigramP(s1s2...sn) =P(s1)*P(s2|s1)*P(s3|s1,s2) ...*P(sn|s(n-1)s(n-2))

• BigramP(s1s2...sn) =P(s1)*P(s2|s1) ...*P(sn|s(n-1))

Page 12: Extraction of Bilingual Information from Parallel Texts

September 2004 CSAW 2004 12

A Simple Alignment Based Translation Model

Assumption: target sentence is generated from the source sentence word-by-word

S: John loves Mary

T: Jean aime Marie

Page 13: Extraction of Bilingual Information from Parallel Texts

September 2004 CSAW 2004 13

Sentence Translation Probability

• According to this model, the translation probability of the sentence is just the product of the translation probabilities of the words.

• P(T|S) =P(Jean aime Marie|John loves Mary) =P(Jean|John) * P(aime|loves) * P(Marie|Mary)

Page 14: Extraction of Bilingual Information from Parallel Texts

September 2004 CSAW 2004 14

More Realistic Example

The proposal will not now be implemented

Les propositions ne seront pas mises en application maintenant

Page 15: Extraction of Bilingual Information from Parallel Texts

September 2004 CSAW 2004 15

Some Further Parameters

• Word Translation Probability:P(t|s)

• Fertility: the number of words in the target that are paired with each source word: (0 – N)

• Distortion: the difference in sentence position between the source word and the target word: P(i|j,l)

Page 16: Extraction of Bilingual Information from Parallel Texts

September 2004 CSAW 2004 16

Searching

• Maintain list of hypotheses. Initial hypothesis: (Jean aime Marie | *)

• Search proceeds interatively. At each iteration we extend most promising hypotheses with additional wordsJean aime Marie | John(1) *Jean aime Marie | * loves(2) *Jean aime Marie | * Mary(3) *

Page 17: Extraction of Bilingual Information from Parallel Texts

September 2004 CSAW 2004 17

Parameter Estimation

• In general - large quantities of data

• For language model, we need only source language text.

• For translation model, we need pairs of sentences that are translations of each other.

• Use EM Algorithm (Baum 1972) to optimize model parameters.

Page 18: Extraction of Bilingual Information from Parallel Texts

September 2004 CSAW 2004 18

Experiment (Brown et. al. 1990)

• Hansard. 40,000 pairs of sentences = approx. 800,000 words in each language.

• Considered 9,000 most common words in each language.

• Assumptions (initial parameter values)– each of the 9000 target words equally likely as

translations of each of the source words.– each of the fertilities from 0 to 25 equally likely for

each of the 9000 source words– each target position equally likely given each source

position and target length

Page 19: Extraction of Bilingual Information from Parallel Texts

September 2004 CSAW 2004 19

English: not

French Probability

pas .469

ne .460

non .024

pas du tout .003

faux .003

plus .002

ce .002

que .002

jamais .002

Fertility Probability

2 .758

0 .133

1 .106

Page 20: Extraction of Bilingual Information from Parallel Texts

September 2004 CSAW 2004 20

English: hear

French Probability

bravo .992

entendre .005

entendu .002

entends .001

Fertility Probability

0 .584

1 .416

Page 21: Extraction of Bilingual Information from Parallel Texts

September 2004 CSAW 2004 21

Bajada 2003/4

• 400 sentence pairs from Malta/EU accession treaty

• Three different types of alignment– Paragraph (precision 97% recall 97%)– Sentence (precision 91% recall 95%)– Word: 2 translation models

• Model 1: distortion independent• Model 2: distortion dependent

Page 22: Extraction of Bilingual Information from Parallel Texts

September 2004 CSAW 2004 22

Bajada 2003/4

Model 1 Model 2

word pairs present 244 244

word pairs identified 145 145

correct 58 77

incorrect 87 68

precision 40% 53%

recall 24% 32%

Page 23: Extraction of Bilingual Information from Parallel Texts

September 2004 CSAW 2004 23

Conclusion/Future Work

• Larger data sets

• Finer models of word/word translation probabilities taking into account– fertility– morphological variants of the same words

• Role and tools for bilingual informant (not linguistic specialist)