Upload
others
View
2
Download
0
Embed Size (px)
Citation preview
Introduction in Machine Translation
Maria Sukhareva, Christian Chiarcos
Goethe University Frankfurt
April 15, 2015
Maria Sukhareva, Christian Chiarcos (Goethe University)Introduction in Machine Translation April 15, 2015 1 / 23
Overview
1 Motivation
2 Statistical Machine Translation
3 Organisation of the Seminar
Maria Sukhareva, Christian Chiarcos (Goethe University)Introduction in Machine Translation April 15, 2015 2 / 23
What do we want to translate?
Most translated Title Author Languages
Book Bible - 2883Non-fictional book Universal Declaration of Human Rights UN 440Fictional book The Little Prince Antoine de Saint Exupry 253Website www.jw.org Jehovah’s Witnesses 440
(a) Most translated literary works
language %
Mandarin 14.4Spanish 6.15English 5.43Hindi 4.7Arabic 4.43
(b) Mostspokenlanguages
But! Literature of low literary status is most frequently translated:scientific, technical documents, commercial and business transactions, administrativememoranda, legal documentation, instruction manuals, agricultural and medical text books,industrial patents, publicity leaflets, newspaper reports etc. etc.
Maria Sukhareva, Christian Chiarcos (Goethe University)Introduction in Machine Translation April 15, 2015 3 / 23
Machine Translation Use Cases
1 MT for information assimilation
2 MT for information dissemination
3 MT for communication
Maria Sukhareva, Christian Chiarcos (Goethe University)Introduction in Machine Translation April 15, 2015 4 / 23
Machine Translation for assimilation
Definition
The class of translation in which an individual or organization wants togather material written by others in a variety of languages and convertthem all into his or her own language.
Requirements:
1 Fast translation of large volumes of data
2 Support of multiple foreign languages
3 Quantity over quality
4 Domain independence
Maria Sukhareva, Christian Chiarcos (Goethe University)Introduction in Machine Translation April 15, 2015 5 / 23
Examples of MT for assimilation
1 Online MT are frequentlyused for assimilation (GoogleTranslate, Bing Translator,PROMT etc.)
2 The user has little controlover the input
3 The translation quality isusually not publishable
(Happy Birtday! I have not seen you for a while. How isyour husband?)
Maria Sukhareva, Christian Chiarcos (Goethe University)Introduction in Machine Translation April 15, 2015 6 / 23
Machine Translation for dissemination
Definition
The class in which an individual or organization wants to broadcast his orher own material, written in one language, in a variety of language to theworld.
Requirements:
1 Quality over quantity
2 Publishable output
3 No need for fast translation
Maria Sukhareva, Christian Chiarcos (Goethe University)Introduction in Machine Translation April 15, 2015 7 / 23
Examples of MT for dissemination
1 Commercial MT Systems
2 The output of MT must berevised
3 Human-aided MT Systems
Dangers of MT dissemination
Maria Sukhareva, Christian Chiarcos (Goethe University)Introduction in Machine Translation April 15, 2015 8 / 23
Machine Translation for communication
Definition
the class in which two or more individuals are in more or less immediateinteraction, typically via email or otherwise online, with an MT systemmediating between them.
Requirements:
1 Fast and robust translation
2 No need for translation of big volume
3 Robust towards mistakes in the input
Maria Sukhareva, Christian Chiarcos (Goethe University)Introduction in Machine Translation April 15, 2015 9 / 23
Examples of MT for communication
1 Speech translation e.g.various APPs for voicetranslation
2 Translation of Emails, chats,SMS etc.
3 VoxOx, JANUS, Jibbigo
Prof. Alex Waibel LREC AntonioZampolli Prize Talkhttps://www.youtube.com/
watch?v=g1uHRFPhnMA
Maria Sukhareva, Christian Chiarcos (Goethe University)Introduction in Machine Translation April 15, 2015 10 / 23
https://www.youtube.com/watch?v=g1uHRFPhnMAhttps://www.youtube.com/watch?v=g1uHRFPhnMA
History of MT
1 Georgetown-IBM experimentclaimed to solve MTproblem
2 Russian-English translator byIBM 701 exploited only 6grammar rules for 250 items
3 After no further progresswas reported, research ofMT ”fell into deep sleep“...
there is no immediate or predictableprospect of useful machinetranslation
— ALPAC Report, 1966
Figure : R. Reagan and H. Grosch at anIBM 701 in 1954
Maria Sukhareva, Christian Chiarcos (Goethe University)Introduction in Machine Translation April 15, 2015 11 / 23
History: MT after ALPAC Report
1 Most of the research inEurope and Canada
2 Transfer and Interlinguasystems.
TAUM-METEO, TAUM-AVIATION (Montreal), SUSY
(Saar), ARIANE-78 (Grenoble), METAL (Austin) etc.
Figure : Transfer-based and Interlingualsystems
Interlingual Systems demand:1 Dictionaries for TL and SL
2 Grammar rules for parsing and generation
3 Transition rules
4 Conceptual lexicon
Maria Sukhareva, Christian Chiarcos (Goethe University)Introduction in Machine Translation April 15, 2015 12 / 23
Motivation for Statistical Machine Translation
1 It does not demand expensive manual labour to build rules for variouslinguistic layers of representation
2 Large amount of parallel corpora (EuroParl, OPUS, EAPCOUNT1,Parallel Bible Corpus, etc.) are available and constantly growing
3 Statistical approaches proved to be successful in multiple areas of NLP
4 Hybrid-based system on the basis of SMT with integration ofrule-based modules are possible
5 Dynamic learning from user feedback
1The English-Arabic Parallel Corpus Of United Nations TextsMaria Sukhareva, Christian Chiarcos (Goethe University)Introduction in Machine Translation April 15, 2015 13 / 23
Statistical Machine Translation
1 Probabilistic view of MT: E - target language, F - source language,conventionally English and French
2 Finding the most likely target sentence e for a source sentence f -argmaxeP(e|f ):
3 e = argmaxeP(f |e)P(e)
e = argmaxeP(f |e)P(e)1 Translation model: P(f |e): the set of possible translations for a
target sentence
2 A language model: P(e): how likely it is to observe e
3 Decoding: argmax operation navigating through the space of possibletarget translation
Maria Sukhareva, Christian Chiarcos (Goethe University)Introduction in Machine Translation April 15, 2015 14 / 23
Word-based Machine Translation
Motivation1 Context-independent approach
2 Word-based IBM Models
3 IBM 1 Model: only lexical translation probabilities
4 For IBM 1 Model example - black board
Translation Model:P(f |e)Lexical translation probability:t(WORDe |WORDf )relative count ctotal count total(f )IBM 1: find the alignment for which the product of t(e|f ) is at itsmaximum
Maria Sukhareva, Christian Chiarcos (Goethe University)Introduction in Machine Translation April 15, 2015 15 / 23
Phrase-based Machine Translation
Motivation1 Phrases - n-grams
2 Context resolves ambiguity
3 Collocations
Translation Model:
P(f |e) =i∏
i=1
φ(fi |ei )d(starti − endi−1 − 1)
needs word alignmentrelative frequency:
φ(fi |ei ) =count(f , e)∑f count(f , e)
Maria Sukhareva, Christian Chiarcos (Goethe University)Introduction in Machine Translation April 15, 2015 16 / 23
Distance-Based Reordering
starti − endi−1 − 1, starti isthe starting position of theforeign phrase thattranslates to i th Englishphrase. endi is the endingposition of the foreignphrase that translates to i th
English phrase
reordering probability d ,d(x) = α|x | for α ∈ [0, 1]exponential with distance
d is not learnt from the data
Maria Sukhareva, Christian Chiarcos (Goethe University)Introduction in Machine Translation April 15, 2015 17 / 23
Grading
1 Work engagement 33, 3%
2 Presentation 33, 3%
3 Technical report 33, 3%
Work engagement
1 Two questions for each paper sent to my email
2 Or questions during the seminar
3 Absence: twice without medical attest – warn by email (mandatory)
Maria Sukhareva, Christian Chiarcos (Goethe University)Introduction in Machine Translation April 15, 2015 18 / 23
Grading
1 Work engagement 33, 3%
2 Presentation 33, 3%
3 Technical report 33, 3%
Presentation1 A paper of your choice: 30 min presentation, 10 min discussion
2 Two papers per session
3 Papers will be divided at the next session
Maria Sukhareva, Christian Chiarcos (Goethe University)Introduction in Machine Translation April 15, 2015 19 / 23
Grading
1 Work engagement 33, 3%
2 Presentation 33, 3%
3 Technical report 33, 3%
Technical report
1 A practical experiment with machine translations or textnormalisation tools
2 Training, testing and evaluation
3 Groups of two, three students
4 No more than 3 pages.
Maria Sukhareva, Christian Chiarcos (Goethe University)Introduction in Machine Translation April 15, 2015 20 / 23
Topics
1 IBM Models
2 Word alignment
3 Language Models, KenLM, SRILM, IRSTLM.
4 Phrase-based MT (Moses Toolkit)
5 Character-based MT
6 SMT Evaluation
7 SMT for under-resource languages
Maria Sukhareva, Christian Chiarcos (Goethe University)Introduction in Machine Translation April 15, 2015 21 / 23
Contacts
Maria SukharevaDoctoral [email protected]
Prof. Dr. Christian Chiarcoschiarcos@informatik.
uni-frankfurt.de
Maria Sukhareva, Christian Chiarcos (Goethe University)Introduction in Machine Translation April 15, 2015 22 / 23
The End
Maria Sukhareva, Christian Chiarcos (Goethe University)Introduction in Machine Translation April 15, 2015 23 / 23
MotivationStatistical Machine TranslationOrganisation of the Seminar