17
CS460/IT632 Natural Language Processing/Language Technology for the Web Guest Lecture (31/03/06) Prof. Niladri Chatterjee IIT Delhi Guest Lecture on Machine Translation

Machine Translation

Embed Size (px)

DESCRIPTION

CS460/IT632 Natural Language Processing/Language Technology for the Web Guest Lecture (31/03/06) Prof. Niladri Chatterjee IIT Delhi Guest Lecture on Machine Translation. Machine Translation. Machine Translation System. Target Language. Source Language. Understanding. 31/03/06. - PowerPoint PPT Presentation

Citation preview

Page 1: Machine Translation

CS460/IT632Natural Language Processing/Language

Technology for the Web

Guest Lecture (31/03/06)Prof. Niladri Chatterjee

IIT Delhi

Guest Lecture on Machine Translation

Page 2: Machine Translation

31/03/06 Prof. Pushpak Bhattacharyya, IIT Bombay

2

Machine Translation

Source LanguageMachine Translation

SystemTarget Language

Understanding

Page 3: Machine Translation

31/03/06 Prof. Pushpak Bhattacharyya, IIT Bombay

3

Problems in Machine Translation (MT)

1. I take rice with dal.

I take rice with my friend.• Same syntax but different semantics

2. Polysemy

3. The computer prints data. It is fast.

The computer prints data. It is numeric.• Different meaning for “it” in both cases.

Page 4: Machine Translation

31/03/06 Prof. Pushpak Bhattacharyya, IIT Bombay

4

Problem with Multilingual MT systems

Suppose we have a multilingual MT system with N languages• O(N2) translators required

• Interlingua: Intermediate language, which captures the semantics.• The translation is: SL -> IL -> TL• The number of MT translators required is

O(2N)

Page 5: Machine Translation

31/03/06 Prof. Pushpak Bhattacharyya, IIT Bombay

5

Other Approaches for MT

• Word Based Approach• Rule Based Approach• Statistical Approach• Generation-Heavy Approach• Example Based Approach

Page 6: Machine Translation

31/03/06 Prof. Pushpak Bhattacharyya, IIT Bombay

6

Example Based Approach

• Knowledge base of translation examples.• Given input, apply similarity metric to pick

up a close match.• Adapt the retrieved translation to suit the

current requirement.

Page 7: Machine Translation

31/03/06 Prof. Pushpak Bhattacharyya, IIT Bombay

7

Example for English to Bengali translation using Example Based

Approach-Ram goes to school

Ram bidyalaya jaay

-Ram goes home

Ram bari jaay

-Sita goes to school

? (guess to get a feel)

Page 8: Machine Translation

31/03/06 Prof. Pushpak Bhattacharyya, IIT Bombay

8

Some considerations

1. Similarity measure

2. What are the adaptation strategies?

Page 9: Machine Translation

31/03/06 Prof. Pushpak Bhattacharyya, IIT Bombay

9

Typical Techniques used

• Word Deletion• Ram eats rice with spoon.

• Ram chamach diye bhaat khaaye

• Ram eats rice• ? (guess it, given that from dictionary you have

Bengali word for spoon is “chamach”)

• Word Addition• Word Replacement• Word Swapping

Page 10: Machine Translation

31/03/06 Prof. Pushpak Bhattacharyya, IIT Bombay

10

A simple assumption

“Sentences of similar structure in the source language have a similar structure in the target language.”

Page 11: Machine Translation

31/03/06 Prof. Pushpak Bhattacharyya, IIT Bombay

11

Problems with the assumption..• Translation Divergence

– It is running• Wah bhaag raha hai

– It is raining• Baarish ho rahi hai

• Structural Divergence– Ram will attend the meeting

• Ram sabha mein jayegaa

– Ram will go to school• Ram school jayegaa

Page 12: Machine Translation

Problems.. (contd.)

• Promotional Divergence– The fan is on [adverb]

• Pankha chal [verb] raha hai– The fan is good [adjective]

• Pankha achcha [adjective] hai

• Conflational Divergence (conflate: to make bigger)– To get same meaning we have to add more words

than in SL.• Ram killed Ravana

– Ram ne Ravan ko mara => No divergence• Ram stabbed Ravana

– Ram ne Ravan ko chaku se mara => divergence

31/03/06 Prof. Pushpak Bhattacharyya, IIT Bombay

12

Page 13: Machine Translation

Problems.. (contd.)

• Categorical Divergence– She is hungry

• Use bhookh lagi hai

– She is beautiful• Wah sundar hai

• In approx. 12% of sentences divergence occur.

31/03/06 Prof. Pushpak Bhattacharyya, IIT Bombay

13

Page 14: Machine Translation

Solution to Divergence

• Classify as standard or divergence translation– Measure the similarity of a sentence in two

databases.

• Example• She is in panic• She is in trouble• She is in pain

– Present all the solutions to the user.

31/03/06 Prof. Pushpak Bhattacharyya, IIT Bombay

14

Page 15: Machine Translation

Adaptation Problem

• There is more morphological variation in Hindi than in English

31/03/06 Prof. Pushpak Bhattacharyya, IIT Bombay

15

Page 16: Machine Translation

Divergence Identification

• 7 types of divergence between Hindi and English are defined– Based on 7K-8K sentences

31/03/06 Prof. Pushpak Bhattacharyya, IIT Bombay

16

Page 17: Machine Translation

Word Sense Disambiguation

• I saw the man with a binocular– Keep the ambiguity even in the translation

31/03/06 Prof. Pushpak Bhattacharyya, IIT Bombay

17