24
1/22 Dropout improves Recurrent Neural Networks for Handwriting Recognition Vu Pham Th´ eodore Bluche Christopher Kermorvant erˆ ome Louradour [email protected], [email protected] Dropout improves Recurrent Neural Networks for Handwriting Recognition Vu Pham Th´ eodore Bluche Christopher Kermorvant erˆ ome Louradour

Dropout improves Recurrent Neural Networks for Handwriting ...€¦ · due to your great help of launching my career. Line segmentation in the front-end \Temporal Classi cation":

  • Upload
    others

  • View
    4

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Dropout improves Recurrent Neural Networks for Handwriting ...€¦ · due to your great help of launching my career. Line segmentation in the front-end \Temporal Classi cation":

1/22

Dropout improves Recurrent NeuralNetworks for Handwriting Recognition

Vu PhamTheodore Bluche

Christopher KermorvantJerome Louradour

[email protected], [email protected]

Dropout improves Recurrent Neural Networks for Handwriting Recognition Vu Pham Theodore Bluche Christopher Kermorvant Jerome Louradour 1/22

Page 2: Dropout improves Recurrent Neural Networks for Handwriting ...€¦ · due to your great help of launching my career. Line segmentation in the front-end \Temporal Classi cation":

2/22

Outline

1 RNN for Handwritten Text Line RecognitionOffline Handwritten Text RecognitionRecurrent Neural Networks (RNN)

2 Dropout for RNN

3 ExperimentsImprovement of RNNImprovement of the complete recognition system

Dropout improves Recurrent Neural Networks for Handwriting Recognition Vu Pham Theodore Bluche Christopher Kermorvant Jerome Louradour 2/22

Page 3: Dropout improves Recurrent Neural Networks for Handwriting ...€¦ · due to your great help of launching my career. Line segmentation in the front-end \Temporal Classi cation":

3/22

RNN for Handwritten Text Line Recognition

Outline

1 RNN for Handwritten Text Line RecognitionOffline Handwritten Text RecognitionRecurrent Neural Networks (RNN)

2 Dropout for RNN

3 ExperimentsImprovement of RNNImprovement of the complete recognition system

Dropout improves Recurrent Neural Networks for Handwriting Recognition Vu Pham Theodore Bluche Christopher Kermorvant Jerome Louradour 3/22

Page 4: Dropout improves Recurrent Neural Networks for Handwriting ...€¦ · due to your great help of launching my career. Line segmentation in the front-end \Temporal Classi cation":

4/22

RNN for Handwritten Text Line Recognition

Offline Handwritten Text Recognition

Dear Charlize.

You are cordially invited

to the grand opening of

my new art gallery intitled

«The new era of Media Music

and paintings».

on July 17th 2012

P.S: UR presence is obligatory

due to your great help of

launching my career.

Line segmentation in the front-end

“Temporal Classification”:Variable-length 1D or 2D input 7→ 1D target sequence (differentlength)

Dropout improves Recurrent Neural Networks for Handwriting Recognition Vu Pham Theodore Bluche Christopher Kermorvant Jerome Louradour 4/22

Page 5: Dropout improves Recurrent Neural Networks for Handwriting ...€¦ · due to your great help of launching my career. Line segmentation in the front-end \Temporal Classi cation":

5/22

RNN for Handwritten Text Line Recognition

Modeling: Recurrent Neural Networks (RNN)State-of-the-art in Handwritten Text Recognition

Task: Image (2D sequence) 7→ 1D sequence of characters

2

2

Input imageBlock: 2 x 2

MDLSTMFeatures: 2

ConvolutionalInput size: 2x4

Features: 6

26

6

Sum & Tanh MDLSTMFeatures: 10

ConvolutionalInput: 2 x 4

Features: 20

Sum & Tanh MDLSTMFeatures: 50

Fully-connectedFeatures: N

Sum Collapse

1020

20

50 N

NN

.....N-way softmax

CTC

0 20 40 60 80 100 120 140 1600.0

0.2

0.4

0.6

0.8

1.0I t _ w a s _ a _ s p l e n d i d _ i n t e r p r e t a t i o n

1 RNN Network Architecture (Graves & Schmidhuber, 2008)

Multi-Directional layers of LSTM unit“Long-Short Term Memory” – 2D recurrence in 4 possible directionsConvolutions: parameterized subsampling layersCollapse layer: from 2D to 1D (output ∼ logP)

2 CTC Training (“Connectionist Temporal Classification”)

The network can output all possible symbols and also a blank outputMinimization of the Negative Log-Likelihood − log(P(Y |X )) (NLL)

Dropout improves Recurrent Neural Networks for Handwriting Recognition Vu Pham Theodore Bluche Christopher Kermorvant Jerome Louradour 5/22

Page 6: Dropout improves Recurrent Neural Networks for Handwriting ...€¦ · due to your great help of launching my career. Line segmentation in the front-end \Temporal Classi cation":

5/22

RNN for Handwritten Text Line Recognition

Modeling: Recurrent Neural Networks (RNN)State-of-the-art in Handwritten Text Recognition

Task: Image (2D sequence) 7→ 1D sequence of characters

2

2

Input imageBlock: 2 x 2

MDLSTMFeatures: 2

ConvolutionalInput size: 2x4

Features: 6

26

6

Sum & Tanh MDLSTMFeatures: 10

ConvolutionalInput: 2 x 4

Features: 20

Sum & Tanh MDLSTMFeatures: 50

Fully-connectedFeatures: N

Sum Collapse

1020

20

50 N

NN

.....N-way softmax

CTC

0 20 40 60 80 100 120 140 1600.0

0.2

0.4

0.6

0.8

1.0I t _ w a s _ a _ s p l e n d i d _ i n t e r p r e t a t i o n

1 RNN Network Architecture (Graves & Schmidhuber, 2008)

Multi-Directional layers of LSTM unit“Long-Short Term Memory” – 2D recurrence in 4 possible directionsConvolutions: parameterized subsampling layersCollapse layer: from 2D to 1D (output ∼ logP)

2 CTC Training (“Connectionist Temporal Classification”)

The network can output all possible symbols and also a blank outputMinimization of the Negative Log-Likelihood − log(P(Y |X )) (NLL)

Dropout improves Recurrent Neural Networks for Handwriting Recognition Vu Pham Theodore Bluche Christopher Kermorvant Jerome Louradour 5/22

Page 7: Dropout improves Recurrent Neural Networks for Handwriting ...€¦ · due to your great help of launching my career. Line segmentation in the front-end \Temporal Classi cation":

6/22

RNN for Handwritten Text Line Recognition

Modeling: Recurrent Neural Networks (RNN)State-of-the-art in Handwritten Text Recognition

The recurrent neurons are Long Short-Term Memory (LSTM) units

Input gate

Forget gate

Output gateForget gatehidden layer

input layer

(i, j)

(i, j)(i-1, j)

(i, j-1)

hidden layer

input layer

(i, j)

(i, j)

(i+1, j)

(i, j-1)

hidden layer

input layer

hidden layer

input layer

(i, j)

(i, j)(i-1, j)

(i, j+1)

(i, j)

(i, j)

(i+1, j)(i, j+1)

Dropout improves Recurrent Neural Networks for Handwriting Recognition Vu Pham Theodore Bluche Christopher Kermorvant Jerome Louradour 6/22

Page 8: Dropout improves Recurrent Neural Networks for Handwriting ...€¦ · due to your great help of launching my career. Line segmentation in the front-end \Temporal Classi cation":

7/22

RNN for Handwritten Text Line Recognition

Loss function: Connectionist Temporal Classification (CTC)Deal with several possible alignments between two 1D sequences

T

E

A

lengt

h U

' =

2U

+1

18,5

cm

 ∅

 ∅

 ∅

 ∅

TargetSequence''Tea''

length T ≥ U34,2cm

− logP(Y |X )

U = 3: Number of target symbols

T : Number of RNN outputs ∝ image width

Basic decoding strategy (without lexicon neither language model):

[∅ . . . ]T . . . [∅ . . . ]E . . . [∅ . . . ]A . . . [∅ . . . ] 7→ “TEA”

Dropout improves Recurrent Neural Networks for Handwriting Recognition Vu Pham Theodore Bluche Christopher Kermorvant Jerome Louradour 7/22

Page 9: Dropout improves Recurrent Neural Networks for Handwriting ...€¦ · due to your great help of launching my career. Line segmentation in the front-end \Temporal Classi cation":

7/22

RNN for Handwritten Text Line Recognition

Loss function: Connectionist Temporal Classification (CTC)Deal with several possible alignments between two 1D sequences

T

E

E

length T ≥ U+1

lengt

h U

' =

2U

+1

34,2cm

18,5

cm

TargetSequence''Tee''

 ∅

 ∅

 ∅

 ∅

− logP(Y |X )

U = 3: Number of target symbols

T : Number of RNN outputs ∝ image width

Basic decoding strategy (without lexicon neither language model):

[∅ . . . ]T . . . [∅ . . . ]E . . .

[

∅ . . .

]

E . . . [∅ . . . ] 7→ “TEE”

Dropout improves Recurrent Neural Networks for Handwriting Recognition Vu Pham Theodore Bluche Christopher Kermorvant Jerome Louradour 7/22

Page 10: Dropout improves Recurrent Neural Networks for Handwriting ...€¦ · due to your great help of launching my career. Line segmentation in the front-end \Temporal Classi cation":

8/22

RNN for Handwritten Text Line Recognition

Optimization: Stochastic Gradient DescentSimple and efficient

No mathematical guarantee (no chance to converge to the realglobal minimum)

But popular with deep networks: works well in practice! (find“good” local minima)

for ( input, target ) in Oracle() dooutput= RNN.Forward( input )

outGrad= CTC NLL.Gradient( output, target )

paramGrad= RNN.BackwardGradient( input, ..., outGrad )

RNN.Update( paramGrad )

end for

Dropout improves Recurrent Neural Networks for Handwriting Recognition Vu Pham Theodore Bluche Christopher Kermorvant Jerome Louradour 8/22

Page 11: Dropout improves Recurrent Neural Networks for Handwriting ...€¦ · due to your great help of launching my career. Line segmentation in the front-end \Temporal Classi cation":

9/22

Dropout for RNN

Outline

1 RNN for Handwritten Text Line RecognitionOffline Handwritten Text RecognitionRecurrent Neural Networks (RNN)

2 Dropout for RNN

3 ExperimentsImprovement of RNNImprovement of the complete recognition system

Dropout improves Recurrent Neural Networks for Handwriting Recognition Vu Pham Theodore Bluche Christopher Kermorvant Jerome Louradour 9/22

Page 12: Dropout improves Recurrent Neural Networks for Handwriting ...€¦ · due to your great help of launching my career. Line segmentation in the front-end \Temporal Classi cation":

10/22

Dropout for RNN

DropoutGeneral Principle [Krizhevsky & Hinton, 2012]

Training:

Randomly set to 0 intermediate activities (*) with probability p(typically p = 0.5)(*) neurons outputs usually in [−1, 1], [0, 1] or [0,∞)∼ Sampling from 2N different architectures that share weights

Decoding:

All intermediate activities are scaled, by 1− p∼ Geometric mean of the outputs from 2N models

Featured in award-winning convolutional networks (ImageNet)

Dropout improves Recurrent Neural Networks for Handwriting Recognition Vu Pham Theodore Bluche Christopher Kermorvant Jerome Louradour 10/22

Page 13: Dropout improves Recurrent Neural Networks for Handwriting ...€¦ · due to your great help of launching my career. Line segmentation in the front-end \Temporal Classi cation":

11/22

Dropout for RNN

DropoutDropout with recurrent layer

Recurrent connections are kept untouched

Dropout can be implemented as separated layer (outputs identical toinputs, except at dropped locations)

Dropout improves Recurrent Neural Networks for Handwriting Recognition Vu Pham Theodore Bluche Christopher Kermorvant Jerome Louradour 11/22

Page 14: Dropout improves Recurrent Neural Networks for Handwriting ...€¦ · due to your great help of launching my career. Line segmentation in the front-end \Temporal Classi cation":

12/22

Dropout for RNN

DropoutOverview of the full network

2

2

Input imageBlock: 2 x 2

MDLSTMFeatures: 2

ConvolutionalInput: 2x4Stride: 2x4Features: 6

26

6

Sum & Tanh MDLSTMFeatures: 10

ConvolutionalInput: 2x4Stride: 2x4Features: 20

Sum & Tanh MDLSTMFeatures: 50

Fully-connectedFeatures: N

Sum Collapse

1020

20

50 N

NN

.....N-way softmax

CTCDropout

DropoutDropout

After recurrent LSTM layers

Before feed-forward layers (convolutional and linear layers)

Dropout improves Recurrent Neural Networks for Handwriting Recognition Vu Pham Theodore Bluche Christopher Kermorvant Jerome Louradour 12/22

Page 15: Dropout improves Recurrent Neural Networks for Handwriting ...€¦ · due to your great help of launching my career. Line segmentation in the front-end \Temporal Classi cation":

13/22

Experiments

Outline

1 RNN for Handwritten Text Line RecognitionOffline Handwritten Text RecognitionRecurrent Neural Networks (RNN)

2 Dropout for RNN

3 ExperimentsImprovement of RNNImprovement of the complete recognition system

Dropout improves Recurrent Neural Networks for Handwriting Recognition Vu Pham Theodore Bluche Christopher Kermorvant Jerome Louradour 13/22

Page 16: Dropout improves Recurrent Neural Networks for Handwriting ...€¦ · due to your great help of launching my career. Line segmentation in the front-end \Temporal Classi cation":

14/22

Experiments

Databases and performance assessment

Training subset# different # labelled # characters

Database Language characters lines (in lines)IAM English 78 9, 462 338, 904Rimes French 114 11, 065 429, 099OpenHaRT Arabic 154 91, 811 2, 267, 450

Training:Minimizing Negative Log-Likelihood (NLL) with CTC alignments.

Decoding:Pick the best label at each timestep, Remove duplicates, then blanks.

Evaluation:Character Error Rate (%), on a separate dataset.Reduction w/ and w/o dropout.

Training convergence time is also interesting, but not critical.

Dropout improves Recurrent Neural Networks for Handwriting Recognition Vu Pham Theodore Bluche Christopher Kermorvant Jerome Louradour 14/22

Page 17: Dropout improves Recurrent Neural Networks for Handwriting ...€¦ · due to your great help of launching my career. Line segmentation in the front-end \Temporal Classi cation":

15/22

Experiments

Results: Dropout on the topmost LSTM layer

∼ Dropout on high-level features used in Logit Regression

Error rate reduction when varying the number of hidden units in thetopmost layer

Dropout improves Recurrent Neural Networks for Handwriting Recognition Vu Pham Theodore Bluche Christopher Kermorvant Jerome Louradour 15/22

Page 18: Dropout improves Recurrent Neural Networks for Handwriting ...€¦ · due to your great help of launching my career. Line segmentation in the front-end \Temporal Classi cation":

16/22

Experiments

Results: Dropout on all LSTM layers

Use the good recipe whenever possible!

Number of hidden units tuned (on validation dataset) to reach bestperformance

Dropout improves Recurrent Neural Networks for Handwriting Recognition Vu Pham Theodore Bluche Christopher Kermorvant Jerome Louradour 16/22

Page 19: Dropout improves Recurrent Neural Networks for Handwriting ...€¦ · due to your great help of launching my career. Line segmentation in the front-end \Temporal Classi cation":

17/22

Experiments

Results analysis: Dropout acts as Regularization

0 10 20 30 40 50 60

6

8

10

12

14

16

CER

(%

)

0 10 20 30 40 50 60Nb updates / 1M

20253035404550

WER

(%

)

nb topmost hidden units:50 (valid)50 (train)10020050 with dropout100 with dropout200 with dropout

Convergence curves

Less overfitting:the gap between training and validation loss is smaller

Training with dropout is slower:There is a trade-off between accuracy & training speed.(However, decoding speed is the same for a given neural archi.!)

Dropout improves Recurrent Neural Networks for Handwriting Recognition Vu Pham Theodore Bluche Christopher Kermorvant Jerome Louradour 17/22

Page 20: Dropout improves Recurrent Neural Networks for Handwriting ...€¦ · due to your great help of launching my career. Line segmentation in the front-end \Temporal Classi cation":

18/22

Experiments

Results analysis: Dropout acts as Regularization

IAM (English)

1.5 1.0 0.5 0.0 0.5 1.0 1.5

Classif weights - BaselineClassif weights - Dropout

Rimes (French)

1.5 1.0 0.5 0.0 0.5 1.0 1.5

Classif weights - BaselineClassif weights - Dropout

OpenHaRT (Arabic)

1.5 1.0 0.5 0.0 0.5 1.0 1.5

Classif weights - BaselineClassif weights - Dropout

Outgoing weights are smaller: L1 and L2 norms are greatly reduced

Better than L1/L2 Weight Decay (and also simple to implement)

Data-driven approach.No need to tune λ ∈ [0,+∞( to control the Bias-Variance Tradeoff.Only one hyper-parameter p ∈ [0, 1( that is less sensitive.NB: p = 0.5 works well!

On the other hand, tanh activations (in [-1,1]) are sharper:More “helpful” features learned by “preventing co-adaptation”(Hinton et al., 2012)

Dropout improves Recurrent Neural Networks for Handwriting Recognition Vu Pham Theodore Bluche Christopher Kermorvant Jerome Louradour 18/22

Page 21: Dropout improves Recurrent Neural Networks for Handwriting ...€¦ · due to your great help of launching my career. Line segmentation in the front-end \Temporal Classi cation":

19/22

Experiments

Intergration in a complete recognition system

Performance improves when language constraints (vocabulary, LM) areadded.Decoding in a hybrid RNN/HMM framework ( p(y |x)

p(y) ∝p(x|y)p(x) )

HMM: One state for each label including blank, with self-loop andoutgoing transition

Lexicon: Each word is the sequence of character HMMs withoptional blanks in between

Language Model: Word n-grams

The goal is to find the optimal word sequence W

W = arg maxW

p(W|X) = arg maxW

p(X|W)p(W) (1)

Dropout improves Recurrent Neural Networks for Handwriting Recognition Vu Pham Theodore Bluche Christopher Kermorvant Jerome Louradour 19/22

Page 22: Dropout improves Recurrent Neural Networks for Handwriting ...€¦ · due to your great help of launching my career. Line segmentation in the front-end \Temporal Classi cation":

20/22

Experiments

Results in a complete system:

Word Error Rate of Full Systems (Optical Model + Lexicon/Language Model):

Rimes (Eval) IAM (Eval) OpenHaRT (Eval)

best published 12,9 12,2 23,3

w/o dropout 12,6 15,9 18,6

with dropout 12,3 13,6 18,0

Wor

d Er

ror R

ate

(%)

0

6

12

18

24

Rimes (Eval) IAM (Eval) OpenHaRT (Eval)

18

13,612,3

18,6

15,9

12,6

23,3

12,212,9

best published w/o dropout with dropout

# wordsDatabase Language # words in vocabulary % OOV LM PerplexityRimes French 5,639 12k 2.6% 4-gram 18IAM English 25,920 50k 3.7% 3-gram 329OpenHaRT Arabic 47,837 95k 6.8% 3-gram 1162

Dropout improves Recurrent Neural Networks for Handwriting Recognition Vu Pham Theodore Bluche Christopher Kermorvant Jerome Louradour 20/22

Page 23: Dropout improves Recurrent Neural Networks for Handwriting ...€¦ · due to your great help of launching my career. Line segmentation in the front-end \Temporal Classi cation":

21/22

Conclusion

Conclusions and future work

Dropout acts as a regularizer: outgoing weights tend to be lower

Dropout improves accuracy of Offline Text Recognition with RNNabout 10-20% improvement in CER and WER

Training convergence with dropout is longerroughly twice slower

Dropout improves Recurrent Neural Networks for Handwriting Recognition Vu Pham Theodore Bluche Christopher Kermorvant Jerome Louradour 21/22

Page 24: Dropout improves Recurrent Neural Networks for Handwriting ...€¦ · due to your great help of launching my career. Line segmentation in the front-end \Temporal Classi cation":

22/22

Thank you for your attention!

Questions and comments are welcome.

[email protected], [email protected]

Dropout improves Recurrent Neural Networks for Handwriting Recognition Vu Pham Theodore Bluche Christopher Kermorvant Jerome Louradour 22/22