View
601
Download
2
Tags:
Embed Size (px)
DESCRIPTION
This presentation is a part of the MosesCore project that encourages the development and usage of open source machine translation tools, notably the Moses statistical MT toolkit. MosesCore is supported by the European Commission Grant Number 288487 under the 7th Framework Programme. For the latest updates, follow us on Twitter - #MosesCore
Citation preview
TAUS MACHINE TRANSLATION SHOWCASE
Moses in the Mix: A Technology Agnostic Approach to a Winning MT Strategy!!10:50 – 11:10!Wednesday, 12 June 2013!!Lori Thicke!LexWorks!
Moses in the Mix: A Technology Agnos-c Approach to
the Winning MT Strategy
• McKinsey’s definition of the T-shaped company
• Language Services Provider (Lexcelera, founded 1986; managing translators & post-editors)
• MT Services Provider (training engines, post-editing, etc.)
• Technology Agnostic!
What is LexWorks?!
• Developing new technologies to help MT work better with community content!
Other Technology Agnos-cs
“A good MT strategy should be technology-agnostic and look for the most efficient solution on a case-by-case basis. The type of technology that best suits your needs will change depending on the language pair.” !
All approaches - SMT, RBMT, Hybrid - are good when matched
to the course!
The process aims to define best of breed soluDons for superior performance
MT is not a tool. MT is an industrial process.!
1. Best of breed means raw MT that is perfectly understandable
MS Translator! Systran Hybrid!sentences:! %! %!not understandable! 15.65! 20.87!partly understandable! 20.00! 34.78!fully understandable! 64.35! 44.35!
Raw MT for FAQs and Forum Content
MS Translator! Systran Hybrid!
Average score on FAQ article! 2.6! 2.4!
Average score on forum! 2.31! 1.97!
Overall score! 2.48! 2.23!
2. Best of breed means managing post-editing costs!
3. Best of breed means retaining your post-‐editors
4. Best of breed means clear metrics
!Translation engine!
!Engine Type!
!BLEU Score!
!GTM Score (SymEval)!
!Systran !
!Hybrid!
!69.74!
!72.69!
!Moses!
!Statistical!
!50.46!
!57.93!
!Microsoft Translator!
!Statistical!
!54.01!
!60.81!
15!
Area! Feature! RBMT! SMT!Capability!Add rare language pairs! !
Capability!Number of languages it can handle out of the box! 20! 50!
Cost! Free or Open Source version exists! ! !
Quality! Respects grammatical rules! !
Quality! Handles software tags properly! !
Quality! Output is fluent! !
Quality! Can handle bad grammar! !
Quality! Quality improves with Controlled Authoring! !
Quality! Output is predictable! !
Quality! Retains corrections to terminology (and applies the correct grammar)! !
16!
Area! Feature! RBMT! SMT!
Suitability! Is better for User Generated Content and broad domain material such as patents! !
Suitability! Is better suited to on-the-fly translations of short shelf-life content! !
Suitability! Is better for documentation and even software! !
Suitability! Is suited for rare language pairs! !
Suitability! Is better suited to post-editing! !
Training! Learns automatically ! !
Training! Rapid development customization cycle! !
Training! Effective with limited training corpus! !
17!
Languages! Online! Hybrid! RBMT! SMT!
French, Spanish! ! ! ! !
Russian, Japanese, German! ! !
Norwegian, Danish, Thai! ! !
18!
Content Type & Other Considerations! Online! Hybrid! RBMT! SMT!
Documentation, reports, online help, UI! ! !
FAQs, forums, UGC, ! ! !
Patents, other broad domain! ! !
Marketing materials!
Insufficient in-domain/out-of-domain data ('I', 'me')! ! ! !
Poor grammar, spelling! ! !
Choose the horse that will win on your course
19!