Upload
elmo-barnett
View
28
Download
3
Embed Size (px)
DESCRIPTION
CS460/IT632 Natural Language Processing/Language Technology for the Web Guest Lecture (31/03/06) Prof. Niladri Chatterjee IIT Delhi Guest Lecture on Machine Translation. Machine Translation. Machine Translation System. Target Language. Source Language. Understanding. 31/03/06. - PowerPoint PPT Presentation
Citation preview
CS460/IT632Natural Language Processing/Language
Technology for the Web
Guest Lecture (31/03/06)Prof. Niladri Chatterjee
IIT Delhi
Guest Lecture on Machine Translation
31/03/06 Prof. Pushpak Bhattacharyya, IIT Bombay
2
Machine Translation
Source LanguageMachine Translation
SystemTarget Language
Understanding
31/03/06 Prof. Pushpak Bhattacharyya, IIT Bombay
3
Problems in Machine Translation (MT)
1. I take rice with dal.
I take rice with my friend.• Same syntax but different semantics
2. Polysemy
3. The computer prints data. It is fast.
The computer prints data. It is numeric.• Different meaning for “it” in both cases.
31/03/06 Prof. Pushpak Bhattacharyya, IIT Bombay
4
Problem with Multilingual MT systems
Suppose we have a multilingual MT system with N languages• O(N2) translators required
• Interlingua: Intermediate language, which captures the semantics.• The translation is: SL -> IL -> TL• The number of MT translators required is
O(2N)
31/03/06 Prof. Pushpak Bhattacharyya, IIT Bombay
5
Other Approaches for MT
• Word Based Approach• Rule Based Approach• Statistical Approach• Generation-Heavy Approach• Example Based Approach
31/03/06 Prof. Pushpak Bhattacharyya, IIT Bombay
6
Example Based Approach
• Knowledge base of translation examples.• Given input, apply similarity metric to pick
up a close match.• Adapt the retrieved translation to suit the
current requirement.
31/03/06 Prof. Pushpak Bhattacharyya, IIT Bombay
7
Example for English to Bengali translation using Example Based
Approach-Ram goes to school
Ram bidyalaya jaay
-Ram goes home
Ram bari jaay
-Sita goes to school
? (guess to get a feel)
31/03/06 Prof. Pushpak Bhattacharyya, IIT Bombay
8
Some considerations
1. Similarity measure
2. What are the adaptation strategies?
31/03/06 Prof. Pushpak Bhattacharyya, IIT Bombay
9
Typical Techniques used
• Word Deletion• Ram eats rice with spoon.
• Ram chamach diye bhaat khaaye
• Ram eats rice• ? (guess it, given that from dictionary you have
Bengali word for spoon is “chamach”)
• Word Addition• Word Replacement• Word Swapping
31/03/06 Prof. Pushpak Bhattacharyya, IIT Bombay
10
A simple assumption
“Sentences of similar structure in the source language have a similar structure in the target language.”
31/03/06 Prof. Pushpak Bhattacharyya, IIT Bombay
11
Problems with the assumption..• Translation Divergence
– It is running• Wah bhaag raha hai
– It is raining• Baarish ho rahi hai
• Structural Divergence– Ram will attend the meeting
• Ram sabha mein jayegaa
– Ram will go to school• Ram school jayegaa
Problems.. (contd.)
• Promotional Divergence– The fan is on [adverb]
• Pankha chal [verb] raha hai– The fan is good [adjective]
• Pankha achcha [adjective] hai
• Conflational Divergence (conflate: to make bigger)– To get same meaning we have to add more words
than in SL.• Ram killed Ravana
– Ram ne Ravan ko mara => No divergence• Ram stabbed Ravana
– Ram ne Ravan ko chaku se mara => divergence
31/03/06 Prof. Pushpak Bhattacharyya, IIT Bombay
12
Problems.. (contd.)
• Categorical Divergence– She is hungry
• Use bhookh lagi hai
– She is beautiful• Wah sundar hai
• In approx. 12% of sentences divergence occur.
31/03/06 Prof. Pushpak Bhattacharyya, IIT Bombay
13
Solution to Divergence
• Classify as standard or divergence translation– Measure the similarity of a sentence in two
databases.
• Example• She is in panic• She is in trouble• She is in pain
– Present all the solutions to the user.
31/03/06 Prof. Pushpak Bhattacharyya, IIT Bombay
14
Adaptation Problem
• There is more morphological variation in Hindi than in English
31/03/06 Prof. Pushpak Bhattacharyya, IIT Bombay
15
Divergence Identification
• 7 types of divergence between Hindi and English are defined– Based on 7K-8K sentences
31/03/06 Prof. Pushpak Bhattacharyya, IIT Bombay
16
Word Sense Disambiguation
• I saw the man with a binocular– Keep the ambiguity even in the translation
31/03/06 Prof. Pushpak Bhattacharyya, IIT Bombay
17