53
Under the hood of Neural Machine Translation Vincent Vandeghinste

Under the hood of Neural Machine Translation...Why neural machine translation (NMT) 1. Results show that NMT produces automatic translations that are significantly preferred by humans

  • Upload
    others

  • View
    9

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Under the hood of Neural Machine Translation...Why neural machine translation (NMT) 1. Results show that NMT produces automatic translations that are significantly preferred by humans

Under the hood of Neural Machine Translation

Vincent Vandeghinste

Page 2: Under the hood of Neural Machine Translation...Why neural machine translation (NMT) 1. Results show that NMT produces automatic translations that are significantly preferred by humans

Recipe for (data-driven) machine translationIngredients

• 1 (or more) Parallel corpus

• 1 (or more) Trainable MT engine + Decoder• Statistical machine translation• Neural machine translation

Instructions:

• Pour the parallel corpus in the engine

• Let it simmer• for a day (when using SMT) + add seasoning (optimization tuning)• for a week (when using NMT)

Page 3: Under the hood of Neural Machine Translation...Why neural machine translation (NMT) 1. Results show that NMT produces automatic translations that are significantly preferred by humans

Freely Available Parallel Corpora

http://opus.nlpl.eu/

Page 4: Under the hood of Neural Machine Translation...Why neural machine translation (NMT) 1. Results show that NMT produces automatic translations that are significantly preferred by humans

Statistical machine translation (SMT)

www.statmt.org

STEP 1: WORD ALIGNMENT

Page 5: Under the hood of Neural Machine Translation...Why neural machine translation (NMT) 1. Results show that NMT produces automatic translations that are significantly preferred by humans

Statistical machine translation (SMT)

www.statmt.org

STEP 2: EXTRACT PHRASE TABLE

Page 6: Under the hood of Neural Machine Translation...Why neural machine translation (NMT) 1. Results show that NMT produces automatic translations that are significantly preferred by humans

Statistical machine translation (SMT)

www.statmt.org

STEP 3: ESTIMATE LANGUAGE MODEL

Page 7: Under the hood of Neural Machine Translation...Why neural machine translation (NMT) 1. Results show that NMT produces automatic translations that are significantly preferred by humans

Statistical machine translation (SMT)

www.statmt.org

STEP 4: OPTIMIZE PARAMETERS

Page 8: Under the hood of Neural Machine Translation...Why neural machine translation (NMT) 1. Results show that NMT produces automatic translations that are significantly preferred by humans

Statistical machine translation (SMT)

www.statmt.org

STEP 5: TRANSLATE

Page 9: Under the hood of Neural Machine Translation...Why neural machine translation (NMT) 1. Results show that NMT produces automatic translations that are significantly preferred by humans

Downsides of SMT

• Everything depends on the quality of Word Alignments:• errors in word alignment are going into the system

• Separate training of different models• translation model (phrase tables with probabilities)• language model (n-grams)• distortion model

• Everything happens in a local window• max phrase length: 7• max n-gram length: 5• does not cover long distance phenomena

• subj-verb agreement in Dutch subordinate clauses

Page 10: Under the hood of Neural Machine Translation...Why neural machine translation (NMT) 1. Results show that NMT produces automatic translations that are significantly preferred by humans

Neural machine translation (NMT)

www.OpenNMT.net

STEP 1: PREPROCESS

Page 11: Under the hood of Neural Machine Translation...Why neural machine translation (NMT) 1. Results show that NMT produces automatic translations that are significantly preferred by humans

Neural machine translation (NMT)

www.OpenNMT.net

STEP 2: TRAIN

Page 12: Under the hood of Neural Machine Translation...Why neural machine translation (NMT) 1. Results show that NMT produces automatic translations that are significantly preferred by humans

Neural machine translation (NMT)

www.OpenNMT.net

STEP 3: TRANSLATE

Page 13: Under the hood of Neural Machine Translation...Why neural machine translation (NMT) 1. Results show that NMT produces automatic translations that are significantly preferred by humans

Neural Networks: The Brain

• Used for information processing and to model the world around us

• Large interconnected network of neurons

• Neuron collects inputs from other neurons using dendrites

• Neurons sum all the inputs and if result is greater than threshold, they fire

• The fired signal is sent to other neurons through the axon

Page 14: Under the hood of Neural Machine Translation...Why neural machine translation (NMT) 1. Results show that NMT produces automatic translations that are significantly preferred by humans

Artificial Neural Networks: The Perceptron• Inputs are real numbers (positive

or negative)• Weights are real numbers

• Each of the inputs are individually weighted

• Example activation function: step function: output 1 if input > threshold,

0 otherwise

dendritesaxon

Neurons sum all the inputs and if result is greater than threshold, they fire

• added together and passed into the activation function

x1=0.6x2=1.0

w1= 0.5w2= 0.8

x1*w1= 0.6 * 0.5 = 0.3x2*w2= 1.0 * 0.8 = 0.8

+

1.1 > threshold= 1.0 FIRE

Page 15: Under the hood of Neural Machine Translation...Why neural machine translation (NMT) 1. Results show that NMT produces automatic translations that are significantly preferred by humans

Trainingthis is a bus this is not a

bus

People learn by examples (positive and negative)

Page 16: Under the hood of Neural Machine Translation...Why neural machine translation (NMT) 1. Results show that NMT produces automatic translations that are significantly preferred by humans

Training Perceptrons

x1 x2 output

0 0 0

0 1 0

1 0 0

1 1 1

Training Datasum of

weighted input

activation(≥ t = 0.5)

error

0 0 0

0.2 0 0

0.1 0 0

0.3 0 1

The AND function

x1

x2minimize this error:adapt the weights

RandomInitializationof weights

w1=0.1w2=0.2

Calculations

Page 17: Under the hood of Neural Machine Translation...Why neural machine translation (NMT) 1. Results show that NMT produces automatic translations that are significantly preferred by humans

Training Perceptrons

x1 x2 output

0 0 0

0 1 0

1 0 0

1 1 1

Training Datasum of

weighted input

activation(≥ t = 0.5)

error

0 0 0

0.3 0 0

0.2 0 0

0.5 1 0

The AND function

x1

x2no more errors:we have learned

AdaptedWeights

w1=0.2w2=0.3

Calculations

Page 18: Under the hood of Neural Machine Translation...Why neural machine translation (NMT) 1. Results show that NMT produces automatic translations that are significantly preferred by humans

What is happening?

• The perceptron is putting all the training instances into two categories:• those that fire (category 1)

• those that don’t fire (category 2)

• It draws a line in a two-dimensional space• points on one side fall into category 1

• points on other side fall into category 2

Page 19: Under the hood of Neural Machine Translation...Why neural machine translation (NMT) 1. Results show that NMT produces automatic translations that are significantly preferred by humans

What is happening?

• It is not always possible to draw a line

• Example: Exclusive OR (XOR)

x1 x2 output

0 0 0

0 1 1

1 0 1

1 1 0

x1

x2

Page 20: Under the hood of Neural Machine Translation...Why neural machine translation (NMT) 1. Results show that NMT produces automatic translations that are significantly preferred by humans

What do we need to learn this?

A more complex architecture than the perceptron

Page 21: Under the hood of Neural Machine Translation...Why neural machine translation (NMT) 1. Results show that NMT produces automatic translations that are significantly preferred by humans

Language Modeling

• used to predict the next word

• trained on large monolingual text

• In SMT, we represent a set of words as discontinuous units

• In neural models, we represent words as points in a continuous space(word embeddings: meaning representations of words as a list of numbers)

Page 22: Under the hood of Neural Machine Translation...Why neural machine translation (NMT) 1. Results show that NMT produces automatic translations that are significantly preferred by humans

Language Modeling: n-grams

Page 23: Under the hood of Neural Machine Translation...Why neural machine translation (NMT) 1. Results show that NMT produces automatic translations that are significantly preferred by humans

Neural Language Modeling

dictionary:246 elements

one-hot vector:246 dimensions

word embedding:124 dimensions

dimensionality reduction !

Page 24: Under the hood of Neural Machine Translation...Why neural machine translation (NMT) 1. Results show that NMT produces automatic translations that are significantly preferred by humans

Word Embeddings: Properties

semantics of each dimension?

Page 25: Under the hood of Neural Machine Translation...Why neural machine translation (NMT) 1. Results show that NMT produces automatic translations that are significantly preferred by humans

Word Embeddings: Properties

• Words with similar meaning are close to each other

Page 26: Under the hood of Neural Machine Translation...Why neural machine translation (NMT) 1. Results show that NMT produces automatic translations that are significantly preferred by humans

• Can we do word arithmetic?

• king – man + woman = ?

Word Embeddings: Properties

Page 27: Under the hood of Neural Machine Translation...Why neural machine translation (NMT) 1. Results show that NMT produces automatic translations that are significantly preferred by humans

Word Embeddings: Properties

Page 28: Under the hood of Neural Machine Translation...Why neural machine translation (NMT) 1. Results show that NMT produces automatic translations that are significantly preferred by humans

Recurrent Neural Network

Page 29: Under the hood of Neural Machine Translation...Why neural machine translation (NMT) 1. Results show that NMT produces automatic translations that are significantly preferred by humans

Neural Machine Translation (NMT)

Page 30: Under the hood of Neural Machine Translation...Why neural machine translation (NMT) 1. Results show that NMT produces automatic translations that are significantly preferred by humans

NMT: Basic model

Page 31: Under the hood of Neural Machine Translation...Why neural machine translation (NMT) 1. Results show that NMT produces automatic translations that are significantly preferred by humans

NMT Encoding: 1-Hot vector

Page 32: Under the hood of Neural Machine Translation...Why neural machine translation (NMT) 1. Results show that NMT produces automatic translations that are significantly preferred by humans

NMT: Word Embedding

Page 33: Under the hood of Neural Machine Translation...Why neural machine translation (NMT) 1. Results show that NMT produces automatic translations that are significantly preferred by humans

NMT: Hidden layer

Page 34: Under the hood of Neural Machine Translation...Why neural machine translation (NMT) 1. Results show that NMT produces automatic translations that are significantly preferred by humans

NMT Summary Vector

Page 35: Under the hood of Neural Machine Translation...Why neural machine translation (NMT) 1. Results show that NMT produces automatic translations that are significantly preferred by humans

NMT Decoding

From a vector to a sequence of words

1. Compute hidden state of the decoder

Page 36: Under the hood of Neural Machine Translation...Why neural machine translation (NMT) 1. Results show that NMT produces automatic translations that are significantly preferred by humans

NMT Decoding

From a vector to a sequence of words

2. Next word probability

Page 37: Under the hood of Neural Machine Translation...Why neural machine translation (NMT) 1. Results show that NMT produces automatic translations that are significantly preferred by humans

NMT Decoding

From a vector to a sequence of words

3. Generating the next word

Page 38: Under the hood of Neural Machine Translation...Why neural machine translation (NMT) 1. Results show that NMT produces automatic translations that are significantly preferred by humans

The Trouble with Simple Encoder-Decoder Architectures• Input sequence is compressed as a fixed-size list of numbers (vector)

• Translation is generated from this vector

This vector must• contain every detail about the source sentence

• be large enough to compress sentences of any length

• Translation quality decreases as source sentence length increases (with small model)

Page 39: Under the hood of Neural Machine Translation...Why neural machine translation (NMT) 1. Results show that NMT produces automatic translations that are significantly preferred by humans

The Trouble with Simple Encoder-Decoder Architectures

Page 40: Under the hood of Neural Machine Translation...Why neural machine translation (NMT) 1. Results show that NMT produces automatic translations that are significantly preferred by humans

The Trouble with Simple Encoder-Decoder Architectures• RNNs remember recent symbols better

the further a symbol is, the less likely the RNNs hidden states remember it

Page 41: Under the hood of Neural Machine Translation...Why neural machine translation (NMT) 1. Results show that NMT produces automatic translations that are significantly preferred by humans

Bi-directional representation

Combine forward and backward hidden vector: represents the word in the entire sentence

Set of these representations = variable-length representation of source sentence

Page 42: Under the hood of Neural Machine Translation...Why neural machine translation (NMT) 1. Results show that NMT produces automatic translations that are significantly preferred by humans

How does the decoder know which part of the encoding is relevant at each step of the generation?

Page 43: Under the hood of Neural Machine Translation...Why neural machine translation (NMT) 1. Results show that NMT produces automatic translations that are significantly preferred by humans

Attention Mechanism

The y‘s are our translated words produced by the decoder, and the x‘s are our source sentence words.

Each decoder output word y_t now depends on a weighted combination of all the input states, not just the last state.

The a‘s are weights that define how much of each input should be considered for each output.

Page 44: Under the hood of Neural Machine Translation...Why neural machine translation (NMT) 1. Results show that NMT produces automatic translations that are significantly preferred by humans

Attention Mechanism

Sample translations made by the neural machine translation model with

the soft-attention mechanism. Edge thicknesses represent the attention

weights found by the attention model.

Page 45: Under the hood of Neural Machine Translation...Why neural machine translation (NMT) 1. Results show that NMT produces automatic translations that are significantly preferred by humans

Advantages of NMT

1. End-to-end trainingAll parameters are simultaneously optimized to minimize a loss function

2. Distributed representations share strengthBetter exploitation of word and phrase similarities

3. Better exploitation of contextNMT can use a much bigger context – both source and partial target text – to translate more accurately

Page 46: Under the hood of Neural Machine Translation...Why neural machine translation (NMT) 1. Results show that NMT produces automatic translations that are significantly preferred by humans

Why neural machine translation (NMT)

1. Results show that NMT produces automatic translations that are significantly preferred by humans to other machine translation outputs.

2. Similar methods (often called seq2seq) are also effective for many other NLP and language-related applications such as dialogue, image captioning, and summarization.

3. NMT has been used as a representative application of the recent success of deep learning-based artificial intelligence.

source: opennmt.net

Page 47: Under the hood of Neural Machine Translation...Why neural machine translation (NMT) 1. Results show that NMT produces automatic translations that are significantly preferred by humans

1. NMT systems have lower quality out of domain, to the point that they completely sacrifice adequacy for the sake of fluency.

NMT compared to SMT (Koehn & Knowles 2017)

Page 48: Under the hood of Neural Machine Translation...Why neural machine translation (NMT) 1. Results show that NMT produces automatic translations that are significantly preferred by humans

2. NMT systems have a steeper learning curve with respect to the amount of training data, resulting in worse quality in low-resource settings, but better performance in high-resource settings.

NMT compared to SMT (Koehn & Knowles 2017)

Page 49: Under the hood of Neural Machine Translation...Why neural machine translation (NMT) 1. Results show that NMT produces automatic translations that are significantly preferred by humans

3. NMT systems that operate at the sub-word level perform better than SMT systems on extremely low-frequency words, but still show weakness in translating low-frequency words belonging to highly-inflected categories (e.g. verbs).

NMT compared to SMT (Koehn & Knowles 2017)

Page 50: Under the hood of Neural Machine Translation...Why neural machine translation (NMT) 1. Results show that NMT produces automatic translations that are significantly preferred by humans

4. NMT systems have lower translation quality on very long sentences , but do comparably better up to a sentence length of about 60 words.

NMT compared to SMT (Koehn & Knowles 2017)

Page 51: Under the hood of Neural Machine Translation...Why neural machine translation (NMT) 1. Results show that NMT produces automatic translations that are significantly preferred by humans

5. The attention model for NMT does not always fulfill the role of a word alignment model, but may in fact dramatically diverge.

NMT compared to SMT (Koehn & Knowles 2017)

Page 52: Under the hood of Neural Machine Translation...Why neural machine translation (NMT) 1. Results show that NMT produces automatic translations that are significantly preferred by humans

Conclusions

• NMT is better compared to SMT• if you have the hardware• if you have the time• if you have the data

• NMT is work in progress: a hot research topic• speeding up the learning• larger vocabularies• introducing linguistic information

• part-of-speech tags• syntax trees

• intelligibility: understanding what is being represented• work on low frequency words• what with morphology?• …

Page 53: Under the hood of Neural Machine Translation...Why neural machine translation (NMT) 1. Results show that NMT produces automatic translations that are significantly preferred by humans

Sources and references

• https://medium.com/technologymadeeasy/for-dummies-the-introduction-to-neural-networks-we-all-need-c50f6012d5eb

• https://www.xenonstack.com/blog/data-science/overview-of-artificial-neural-networks-and-its-applications

• http://www.cs.stir.ac.uk/courses/ITNP4B/lectures/kms/2-Perceptrons.pdf

• http://blog.systransoft.com/how-does-neural-machine-translation-work/

• https://sites.google.com/site/acl16nmt/home

• https://devblogs.nvidia.com/introduction-neural-machine-translation-with-gpus/

• Koehn & Knowles (2017). Six challenges for Neural Machine Translation. https://arxiv.org/pdf/1706.03872.pdf