Extraction of Bilingual Information from Parallel Texts

September 2004 CSAW 2004 1

Extraction of Bilingual Information from Parallel Texts

Mike Rosner


Outline

• Machine Translation

• Traditional vs. Statistical Architectures

• Experimental Results

• Conclusions


Translational Equivalence:many:many relation

SOURCE TARGET


Traditional Machine Translation


Remarks

• Character of System– Knowledge based.– High quality results if domain is well delimited.– Knowledge takes the form of specialised rules

(analysis; synthesis; transfer).

• Problems– Limited coverage– Knowledge acquisition bottleneck.– Extensibility.


Statistical Translation

• Robust

• Domain independent

• Extensible

• Does not require language specialists

• Uses noisy channel model of translation


Noisy Channel ModelSentence Translation (Brown et. al. 1990)

sourcesentence

target sentence

sentence


The Problem of Translation

• Given a sentence T of the target language, seek the sentence S from which a translator produced T, i.e.

find S that maximises P(S|T)• By Bayes' theorem

P(S|T) = P(S) x P(T|S)

P(T)

whose denominator is independent of S.• Hence it suffices to maximise P(S) x P(T|S)


A Statistical MT System

Source Language

Model

TranslationModel

P(S) * P(T|S) = P(S,T)

S T

DecoderT S


The Three Components of a Statistical MT model

1. Method for computing language model probabilities (P(S))

2. Method for computing translation probabilities (P(S|T))

3. Method for searching amongst source sentences for one that maximisesP(S) * P(T|S)


ProbabilisticLanguage Models

• GeneralP(s1s2...sn) =P(s1)*P(s2|s1) ...*P(sn|s1...s(n-1))

• TrigramP(s1s2...sn) =P(s1)*P(s2|s1)*P(s3|s1,s2) ...*P(sn|s(n-1)s(n-2))

• BigramP(s1s2...sn) =P(s1)*P(s2|s1) ...*P(sn|s(n-1))


A Simple Alignment Based Translation Model

Assumption: target sentence is generated from the source sentence word-by-word

S: John loves Mary

T: Jean aime Marie


Sentence Translation Probability

• According to this model, the translation probability of the sentence is just the product of the translation probabilities of the words.

• P(T|S) =P(Jean aime Marie|John loves Mary) =P(Jean|John) * P(aime|loves) * P(Marie|Mary)


More Realistic Example

The proposal will not now be implemented

Les propositions ne seront pas mises en application maintenant


Some Further Parameters

• Word Translation Probability:P(t|s)

• Fertility: the number of words in the target that are paired with each source word: (0 – N)

• Distortion: the difference in sentence position between the source word and the target word: P(i|j,l)


Searching

• Maintain list of hypotheses. Initial hypothesis: (Jean aime Marie | *)

• Search proceeds interatively. At each iteration we extend most promising hypotheses with additional wordsJean aime Marie | John(1) *Jean aime Marie | * loves(2) *Jean aime Marie | * Mary(3) *


Parameter Estimation

• In general - large quantities of data

• For language model, we need only source language text.

• For translation model, we need pairs of sentences that are translations of each other.

• Use EM Algorithm (Baum 1972) to optimize model parameters.


Experiment (Brown et. al. 1990)

• Hansard. 40,000 pairs of sentences = approx. 800,000 words in each language.

• Considered 9,000 most common words in each language.

• Assumptions (initial parameter values)– each of the 9000 target words equally likely as

translations of each of the source words.– each of the fertilities from 0 to 25 equally likely for

each of the 9000 source words– each target position equally likely given each source

position and target length


English: not

French Probability

pas .469

ne .460

non .024

pas du tout .003

faux .003

plus .002

ce .002

que .002

jamais .002

Fertility Probability

2 .758

0 .133

1 .106


English: hear

French Probability

bravo .992

entendre .005

entendu .002

entends .001

Fertility Probability

0 .584

1 .416


Bajada 2003/4

• 400 sentence pairs from Malta/EU accession treaty

• Three different types of alignment– Paragraph (precision 97% recall 97%)– Sentence (precision 91% recall 95%)– Word: 2 translation models

• Model 1: distortion independent• Model 2: distortion dependent


Bajada 2003/4

Model 1 Model 2

word pairs present 244 244

word pairs identified 145 145

correct 58 77

incorrect 87 68

precision 40% 53%

recall 24% 32%


Conclusion/Future Work

• Larger data sets

• Finer models of word/word translation probabilities taking into account– fertility– morphological variants of the same words

• Role and tools for bilingual informant (not linguistic specialist)

Documents

Extraction of Bilingual Information from Parallel Texts