Paper Presentation: HMM-based Alignment

HMM-based Alignment in Statistical Translation (1996)

Lekha Muraleedharan [133050002]Sagar Ahire [133050073]

Roadmap

● Review of Alignment● HMM-based Alignment● Results and Examples

Roadmap: We Are Here


Review of Alignment

● In order to translate a French sentence F to an English sentence E, the following expression can be used:

E* = argmaxE P(E|F)

= argmaxE P(E) * P(F|E)

● To learn P(F|E), the concept of alignments is used.

Review of Alignment

● Alignment refers to a correspondence between E and F which indicates which word in F is translated to a particular word in E.

● For Example:पीटर ज द सोया

Peter slept early 1 3 2

Alignment Models

Depending on the assumptions taken, there are several possible alignment models:● IBM Models (1 to 5)● HMM-based Alignment Models

MODEL 1 MODEL 2

IBM Model 1,2 :The Math

● Assumes alignments are more likely to “lie along the diagonal”

IBM Model 1

● Assumes all alignments are equally likely● Assumes source word depends only on

target word

IBM Model 2



HMM-based Alignment :The Math

HMM-based Alignment

● Assumes alignment depends only on○ The previous alignment (not all previous)○ The jump width

● Thus, in this model alignments are relative

A ComparisonIBM MODEL 1 IBM MODEL 2

HMM Based Model



Statistical Results:Basic Framework

● Models compared:○ IBM 1○ IBM 2○ HMM

● Corpora Used (German to French)○ Avalanche Bulletins Corpus (News)○ Vermobil Corpus (Spoken Dialog)○ EuTrans Corpus (Travel & Tourism)

Statistical Results:Basic Framework

● Training Process:○ IBM 1: 10 iterations of EM○ IBM 2: 5 iterations of Maximum Approximation○ HMM: 5 iterations of Maximum Approximation

● Metric Used○ Perplexity (Wikipedia: “a measurement of how well a

probability model predicts a sample”)

Statistical Results

Corpus IBM 1 IBM 2 HMM

EuTrans 16.267 9.781 9.686

Vermobil 46.672 30.706 26.495

Intuitive Example: 1

Hin: पीटर ज द सोया

Eng: Peter slept earlyA: 1 3 2Jump: N/A 2 -1

Intuitive Example:पीटर ज द सोया

● Relatively straightforward● As there are no major jumps, translation

probabilities take precedence


Hin: पीटर घर लौटने पर ज द सोया

Eng: Peter slept early on returning homeA: 1 6 5 4 3 2Jump: N/A 5 -1 -1 -1 -1

Intuitive Example:पीटर घर लौटने पर ज द सोया

● IBM 2 stresses on diagonal alignments, so it will find the correct alignment difficult, as all alignments are nearly on the inverse diagonal

● HMM only concentrates on previous alignments and overall jump lengths, so this alignment minimizes the total jump length


Hin: पीटर बहुत ह ज द सोया

Eng: Peter slept very earlyA: 1 3 ? 4 2

Intuitive Example:पीटर बहुत ह ज द सोया

● The HMM model assumes that every source word has a corresponding target word

● Moreover, empty word alignments are not incorporated in the basic HMM model

● To model empty words an HMM of order 2 is required


Hin: पीटर आज कल ज द सोता है

Eng: Peter sleeps early these daysA: 1 2,3 3 2 2

Intuitive Example:पीटर आज कल ज द सोता है

● सोता है↔sleeps can be handled by HMM● आज कल↔these days requires multi-word

handling to defeat a translation like “today tomorrow”

References

● HMM-based Word Alignment in Statistical Translation (1996) by Stephan Vogel, Hermann Ney, Christoph Tillman; COLING ‘96, Copenhagen

● The Mathematics of Statistical Machine Translation: Parameter Estimation (1993) by Peter Brown, Stephen Della-Pietra, Vincent Della-Pietra, Robert Mercer; Journal of Computational Linguistics

Technology

Paper Presentation: HMM-based Alignment