Neural Network Language Models for Candidate Scoring in Multi-System Machine Translation

Neural Network Language Models

for Candidate Scoring in Multi-System Machine

TranslationMatīss Rikters

University of LatviaCOLING 2016 6th Workshop on

Hybrid Approaches to TranslationOsaka, Japan

December 11, 2016

Contents

1. Introduction2. Baseline System 3. Example Sentence4. Neural Network Language Models5. Results6. Related publications7. Future plans

Chunking– Parse sentences with Berkeley Parser (Petrov et al., 2006)– Traverse the syntax tree bottom up, from right to left– Add a word to the current chunk if

• The current chunk is not too long (sentence word count / 4)• The word is non-alphabetic or only one symbol long• The word begins with a genitive phrase («of »)

– Otherwise, initialize a new chunk with the word– In case when chunking results in too many chunks, repeat the process,

allowing more (than sentence word count / 4) words in a chunkTranslation with online MT systems

– Google Translate; Bing Translator; Yandex.Translate; Hugo.lv 12-gram language model

– DGT-Translation Memory corpus (Steinberger, 2011) – 3.1 million Latvian legal domain sentences

Baseline System

Teikumu dalīšana tekstvienībās

Tulkošana ar tiešsaistes MT API

Google Translate

Bing Translator LetsMT

Labāko fragmentu izvēle

Tulkojumu izvade

Teikumu sadalīšana fragmentos

Sintaktiskā analīze

Teikumu apvienošana

Sentence tokenization

Translation with online MT

Selection of the best chunks

Output

Syntactic analysis

Sentence chunking

Sentence recomposition

Baseline System

Sentence Chunking

Choose the best candidate

KenLM (Heafield, 2011) calculates probabilities based on the observed entry with longest matching history :

where the probability and backoff penalties are given by an already-estimated language model. Perplexity is then calculated using this probability: where given an unknown probability distribution p and a proposed probability model q, it is evaluated by determining how well it predicts a separate test sample x1, x2... xN drawn from p.

Example sentence

Recently there has been an increased interest

in the automated discovery

of equivalent expressions in different languages .

Neural Language Models

• RWTHLM• CPU only• Feed-forward, recurrent (RNN) and long short-term

memory (LSTM) NNs• MemN2N

• CPU or GPU• End-to-end memory network (RNN with attention)

• Char-RNN• CPU or GPU• RNNs, LSTMs and rated recurrent units (GRU)• Character level

Best Models• RWTHLM

• one feed-forward input layer with a 3-word history, followed by one linear layer of 200 neurons with sigmoid activation function

• MemN2N• internal state dimension of 150, linear part of

the state 75, number of hops set to six• Char-RNN

• 2 LSTM layers with 1,024 neurons each, dropout set to 0.5

Char-RNN

• A character level model works better for highly inflected languages with less data

• Requires Torch scientific computing framework + additional packages

• Can run on CPU, NVIDIA GPU or AMD GPU

• Intended for generating new text, modified to score new text

More in Andrej Karpathy’s blog

Experiment Environment

Training• Baseline KenLM and RWTHLM modes

• 8-core CPU with 16GB of RAM• MemN2N

• GeForce Titan X (12GB, 3,072 CUDA cores)12-core CPU and 64GB RAM

• Char-RNN• Radeon HD 7950 (3GB, 1,792 cores)

8-core CPU and 16GB RAM

Translation• All models

• 4-core CPU with 16GB of RAM

Results

System PerplexityTraining Corpus

SizeTrained

OnTraining

Time BLEU

KenLM 34.67 3.1M CPU 1 hour 19.23

RWTHLM 136.47 3.1M CPU 7 days 18.78

MemN2N 25.77 3.1M GPU 4 days 18.81

Char-RNN 24.46 1.5M GPU 2 days 19.53

General domain

0.11 0.20 0.32 0.41 0.50 0.61 0.70 0.79 0.88 1.00 1.09 1.20 1.29 1.40 1.47 1.56 1.67 1.74 1.7715.00

Perplexity BLEU-HY Linear (BLEU-HY)BLEU-BG Linear (BLEU-BG)

Legal domain

0.11 0.20 0.32 0.41 0.50 0.61 0.70 0.79 0.88 1.00 1.09 1.20 1.29 1.40 1.47 1.56 1.67 1.74 1.7715.00

Perplexity BLEU-BG Linear (BLEU-BG)BLEU-HY Linear (BLEU-HY)

• Matīss Rikters"Multi-system machine translation using online APIs for English-Latvian" ACL-IJCNLP 2015 4th HyTra Workshop

• Matīss Rikters and Inguna Skadiņa"Syntax-based multi-system machine translation" LREC 2016

• Matīss Rikters and Inguna Skadiņa"Combining machine translated sentence chunks from multiple MT systems" CICLing 2016

• Matīss Rikters"K-translate – interactive multi-system machine translation"Baltic DB&IS 2016

• Matīss Rikters"Searching for the Best Translation Combination Across All Possible Variants"Baltic HLT 2016

Related publications

Baseline system• http://ej.uz/ChunkMTOnly the chunker + visualizer• http://ej.uz/chunkerInteractive browser version• http://ej.uz/KTranslateWith integrated usage of NN LMs• http://ej.uz/NNLMs

Code on GitHub

https://github.com/M4t1ss

More enhancements for the chunking step– Try dependency parsing instead of constituency

Choose the best translation candidate with MT quality estimation– QuEst++ (Specia et al., 2015)– SHEF-NN (Shah et al., 2015)

Add special processing of multi-word expressions (MWEs)Handle MWEs in neural machine translation systems

Future work

References• Ahsan, A., and P. Kolachina. "Coupling Statistical Machine Translation with Rule-based Transfer and Generation, AMTA-The Ninth Conference of the Association for Machine Translation in the Americas." Denver, Colorado (2010).

• Barrault, Loïc. "MANY: Open source machine translation system combination." The Prague Bulletin of Mathematical Linguistics 93 (2010): 147-155.• Heafield, Kenneth. "KenLM: Faster and smaller language model queries." Proceedings of the Sixth Workshop on Statistical Machine Translation.

Association for Computational Linguistics, 2011.• Kim, Yoon, et al. "Character-aware neural language models." arXiv preprint arXiv:1508.06615 (2015).• Mellebeek, Bart, et al. "Multi-engine machine translation by recursive sentence decomposition." (2006).• Mikolov, Tomas, et al. "Recurrent neural network based language model." INTERSPEECH. Vol. 2. 2010.• Petrov, Slav, et al. "Learning accurate, compact, and interpretable tree annotation." Proceedings of the 21st International Conference on

Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics. Association for Computational Linguistics, 2006.

• Raivis Skadiņš, Kārlis Goba, Valters Šics. 2010. Improving SMT for Baltic Languages with Factored Models. Proceedings of the Fourth International Conference Baltic HLT 2010, Frontiers in Artificial Intelligence and Applications, Vol. 2192. , 125-132.

• Rikters, M., Skadiņa, I.: Syntax-based multi-system machine translation. LREC 2016. (2016)• Rikters, M., Skadiņa, I.: Combining machine translated sentence chunks from multiple MT systems. CICLing 2016. (2016)• Santanu, Pal, et al. "USAAR-DCU Hybrid Machine Translation System for ICON 2014" The Eleventh International Conference on Natural Language

Processing. , 2014.• Schwenk, Holger, Daniel Dchelotte, and Jean-Luc Gauvain. "Continuous space language models for statistical machine translation." Proceedings of

the COLING/ACL on Main conference poster sessions. Association for Computational Linguistics, 2006.• Shah, Kashif, et al. "SHEF-NN: Translation Quality Estimation with Neural Networks." Proceedings of the Tenth Workshop on Statistical Machine

Translation. 2015.• Specia, Lucia, G. Paetzold, and Carolina Scarton. "Multi-level Translation Quality Prediction with QuEst++." 53rd Annual Meeting of the Association

for Computational Linguistics and Seventh International Joint Conference on Natural Language Processing of the Asian Federation of Natural Language Processing: System Demonstrations. 2015.

• Steinberger, Ralf, et al. "Dgt-tm: A freely available translation memory in 22 languages." arXiv preprint arXiv:1309.5226 (2013).• Steinberger, Ralf, et al. "The JRC-Acquis: A multilingual aligned parallel corpus with 20+ languages." arXiv preprint cs/0609058 (2006).

References

Thank you!

Neural Network Language Models for Candidate Scoring in Multi-System Machine Translation

Technology

Dynamically Time-Capped Possibilistic Testing of SubClassOf …tettaman/k-cap2015-slides.pdf · Introduction Principles Possibilistic Scoring Candidate Axiom Testing Subsumption Axiom

Learning Groupwise Multivariate Scoring Functions Using ...Learning Groupwise Multivariate Scoring Functions Using Deep Neural Networks Qingyao Ai CICS, UMass Amherst Amherst, MA,

Learning Compositional Rules via Neural Program Synthesis · symbolic programs by sampling from the neural program induction model and symbolically checks whether candidate programs

Scoring Handbook - VolleyWrite | Simplified Scoring

Result · Next Class Candidate scoring minimum 33% marks in all subjects will be considered "Pass" One star (*) on total marks indicates that the candidate has passed with grace marks

Judge’s Scoring Sheet, 2017-18 Web view17-08-2016 · Judge’s Scoring Sheet. Judge’s Scoring Sheet. Judge’s Scoring Sheet. Entry: Interpretation

AP BIOLOGY 2014 SCORING GUIDELINES - College Board€¦ · AP® BIOLOGY 2014 SCORING GUIDELINES Question 6 . Information processing involves complex neural pathways that require a

4 Lead Scoring - Kentico · 4 Lead Scoring For more information visit What is Lead Scoring? Lead scoring is …

Credit-scoring models in the credit-onion environment ... · Credit-scoring models in the credit-onion environment using neural networks and genetic algorithms ... First Chicago Corp.,

CHAPTER 234 · Web viewThe candidate demonstrates technical competency by scoring 100% on the clinical certification examination. The clinical certification examination consists of

Analysis of Scoring and Rating Models using Neural Networks EN

Scoring Guide for Candidates - NBPTS · ABOUT THIS SCORING GUIDE Scoring Guide for Candidates i About This Scoring Guide The Scoring Guide for Candidates is a comprehensive overview

Protein-Ligand Scoring with Convolutional Neural Networks · against this reference receptor using smina’s default arguments for exhaustiveness and sampling and select the pose

Automatic Essay Scoring of Swedish Essays using Neural

Automated Essay Scoring for English Using Di˙erent Neural

A Neural Net Model for natural language learning: examples from cognitive metaphor and constructional polysemy Eleni Koutsomitopoulou PhD candidate, Computational

edTPA: Guidelines and Support for Teacher Candidate ...scoring expertise Review score profile and portfolio ... Explore candidate’s decision and thinking about their practice Use

Saiva Bhanu Kshatriya Collegesbkcollegeapk.in/syllabus2018/2_MSc-Zoology.pdfA Candidate passes the M.Sc.., Zoology by scoring a minimum of 50% (internal + external) in each paper of

M. B. Yobas ,J. N. Crook, D P. Ross - Credit Scoring Using Neural and Evolutionary Techniques

Automatic Sleep Stage Scoring with Single-Channel EEG ... · Automatic Sleep Stage Scoring with Single-Channel EEG Using Convolutional Neural Networks Orestis Tsinalis, Paul M. Matthews,