Deep Recurrent Neural Networks - AI Ukraine€¦ · Learning Recurrent Neural Networks with...

Deep Recurrent Neural Networks

Artem Chernodub

e-mail: a.chernodub@gmail.com web: http://zzphoto.me

IMMSP NASU ZZ Photo

𝑝𝑥 𝑦 = 𝑝𝑦𝑥 𝑝(𝑥)/𝑝(𝑦) 

Biological-inspired models

2 / 28

Neuroscience

Machine Learning

Problem domain: processing the sequences

3 / 28

Neural Network Models under consideration

4 / 28

• Classic Feedforward (Shrink) Neural Networks.

•  Feedforward Deep Neural Networks. • Recurrent Neural Networks.

Biological Neural Networks

5 / 28

Sequence processing problems Time series predictions Speech Recognition

Engine control Natural Language Processing

6 / 28

Pen handwriting recognition (IAM-OnDB)

A. Graves, S. Fernandez, M. Liwicki, H. Bunke, J. Schmidhuber. Unconstrained online handwriting recognition with recurrent neural networks // Advances in Neural Information Processing Systems 21, NIPS'21, p 577-584, 2008, MIT Press, Cambridge, MA, 2008.

Model # of parameters Word Error Rate (WER)

HMM - 35.5% CTC RNN 13M 20.4%

Training data: 20, 000 most frequently occurring words written by 221 different writers.

7 / 28

TIMIT Phoneme Recognition

Graves, A., Mohamed, A.-R., and Hinton, G. E. (2013). Speech recognition with deep recurrent neural networks // IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 6645–6649. IEEE. Mohamed, A. and Hinton, G. E. (2010). Phone recognition using restricted Boltzmann machines // IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 4354–4357.

Model # of parameters Error Hidden Markov Model, HMM - 27,3% Deep Belief Network, DBN ~ 4M 26,7%

Deep RNN 4,3M 17.7%

Training data: 462 speakers train / 24 speakers test, 3.16 / 0.14 hrs.

8 / 28

Google Large Vocabulary Speech Recognition

H. Sak, A. Senior, F. Beaufays. Long Short-Term Memory Recurrent Neural Network Architectures for Large Scale Acoustic Modeling // INTERSPEECH’2014. K. Vesely, A. Ghoshal, L. Burget, D. Povey. Sequence-discriminative training of deep neural networks // INTERSPEECH’2014.

Model # of parameters Cross-entropy ReLU DNN 85M 11.3

Deep Projection LSTM RNN 13M 10.7

Training data: 3M utterances (1900 hrs).

9 / 28

Dynamic process identification scheme

10 / 28

Feedforward and Recurrent Neural Networks: popularity 90% / 10%

FFNN RNN

11 / 28

Feedforward vs Recurrent Neural Networks

Dynamic Multilayer Perceptron, DMLP (FFNN)

12 / 28

Recurrent Multilayer Perceptron, RMLP (RNN)

Processing the sequences: Single-Step-Ahead predictions

13 / 28

Processing the sequences: Multi-Step-Ahead predictions,

autoregression

14 / 28

Processing the sequences: Multi-Step-Ahead predictions under

control

15 / 28

Classic Feedforward Neural Networks (before 2006).

•  Single hidden layer (Kolmogorov-Cybenko Universal Approximation Theorem as the main hope).

•  Vanishing gradients effect prevents using more layers. •  Less than 10K free parameters. •  Feature preprocessing stage is often critical.

16 / 28

Deep Feedforward Neural Networks

•  Number of hidden layers > 1 (usually 6-9). •  100K – 100M free parameters. •  No (or less) feature preprocessing stage. •  2-stage training process: i) unsupervised pre-training; ii) fine tuning

(vanishing gradients problem is beaten!). 17 / 28

Training the Neural Network: derivative + optimization

18 / 28

DMLP: 1) forward propagation pass

),( )1(∑=i

ijij xwfz

),()1(~ )2(∑=+j

jj zwgky

where zj is the postsynaptic value for the j-th hidden neuron, w(1) are the hidden layer’s weights, f() are the hidden layer’s activation functions, w(2) are the output layer’s weights, and g() are the output layer’s activation functions.

19 / 28

DMLP: 2) backpropagation pass

Local gradients calculation:

),1(~)1( +−+= kyktOUTδ

.)(' )2( OUTjj

HIDj wzf δδ =

,)()2( j

δ=∂∂

.)()1( i

δ=∂∂

Derivatives calculation:

20 / 28

Backpropagation Through Time (BPTT)

.)(1)(0

)1()1( ∑∂

−∂=

∂∂

n jiji

where h = truncation depth.

BPTT = unrolling RNN to Deep Feedforward Neural Network

21 / 28

Bad effect of vanishing (exploding) gradients: a problem

,)( )1()()(

−=∂∂ m

,' )1()()1()( +− ∑= mi

mj wf δδ

0)()( →

∂∂

mjiwkE=>

22 / 28

1→mfor

Vanishing gradients effect: solutions

•  Accurate initialization of weights: - if the weights are small, the gradients shrink exponentially; - if the weights are big the gradients grow exponentially.

•  Hessian Free Optimization: use new optimizer by Martens.

J. Martens. Deep Learning via Hessian-Free Optimization // Proc. International Conference of Machine Learning, pp. 735-742, 2010. J. Martens and I. Sutskever. Learning Recurrent Neural Networks with Hessian-Free Optimization // Proc. ICML, 2011.

23 / 28

Time-series Single-Step-Ahead prediction problems

24 / 28

A. Chernodub. Training Dynamic Neural Networks Using the Extended Kalman Filter for Multi-Step-Ahead Predictions // Artificial Neural Networks Methods and Applications in Bio-/Neuroinformatics. Petia Koprinkova-Hristova, Valeri Mladenov, Nikola K. Kasabov (Eds.), Springer, 2014, pp. 221-244.

Time-series Single-Step-Ahead predictions: the results

MG17 ICE SOLAR CHICKENPOX RIVER LASER

DLNN (FF) 0.0042 0.0244 0.1316 0.8670 4.3638 0.7711

DMLP (FF) 0.0006 0.0376 0.1313 0.4694 0.9979 0.0576 RMLP (RNN) 0.0050 0.0273 0.1493 0.7608 6.4790 0.2415 NARX (RNN) 0.0010 0.1122 0.1921 1.4736 2.0685 0.1332

Mean NMSE Single-Step-Ahead predictions for different neural network architectures

25 / 28

Multi-Step-Ahead prediction, Laser Dataset

feedforward data flow recurrent data flow

27 / 28

Long-term Dependencies catching: T = 50, 100, 200 …

R. Pascanu, Y. Bengio. On the Difficulty of Training Recurrent Neural Networks // Tech. Rep. arXiv:1211.5063, Universite de Montreal (2012).

28 / 28

Deep Feed-Forward NNs

Recurrent NNs

Pre-training stage Yes No (rarely) Derivatives calculation

Backpropagation (BP)

Backpropagation Through Time (BPTT)

Multi-Step-Ahead Prediction Quality

Bad (-) Good (+)

Sequence Recognition Quality

Bad (-)

Good (+)

Vanishing gradient effect

Present (-)

RNNvs FFNN

e-mail: a.chernodub@gmail.com web: http://zzphoto.me

Thanks!

Deep Recurrent Neural Networks - AI Ukraine€¦ · Learning Recurrent Neural Networks with...

Documents

Cumulative Damage Modeling with Recurrent Neural Networks · machines; convolutional neural networks; and purely data-driven versions of recurrent neural networks, including the long

Neural and Evolutionary Computing - Lecture 5 1 Recurrent neural networks Architectures –Fully recurrent networks –Partially recurrent networks Dynamics

Pixel Recurrent Neural Networks

RECURRENT NEURAL NETWORKS - index-of.co.ukindex-of.co.uk/Artificial-Intelligence/Recurrent Neural Networks Design And... · recurrent neural networks and is exemplary of current research

Convolutional Neural Networks (CNNs) and Recurrent Neural

Recurrent Neural Networks - GitHub Pages

Particle Filter Recurrent Neural Networks

Recurrent Neural Network - whdeng Neural Network.pdfRecurrent neural networks Long Short-Term Memory recurrent networks Applications Recurrent Neural Network Xiaogang Wang xgwang@ee.cuhk.edu.hk

Recurrent Neural Networks and Proper Orthogonal

Lecture 6: Recurrent & Graph Neural Networks · 2019. 12. 14. · UVA DEEP LEARNING COURSE –EFSTRATIOS GAVVES RECURRENT NEURAL NETWORKS - 1 Lecture 6: Recurrent & Graph Neural Networks

Quasi-Recurrent Neural Networks - NVIDIAon-demand.gputechconf.com/...bradbury-quasi-recurrent-neural-netw… · Quasi-Recurrent Neural Networks James Bradbury and Stephen Merity Salesforce

Recurrent Convolutional Neural Networks for …openaccess.thecvf.com/content_cvpr_2017/papers/Cui...Recurrent Convolutional Neural Networks for Continuous Sign Language Recognition

Lecture 6: Recurrent Neural Networks - GitHub Pages · Lecture 6: Recurrent Neural Networks Efstratios Gavves. UVA DEEP LEARNING COURSE –EFSTRATIOS GAVVES RECURRENT NEURAL NETWORKS

Cs231n 2017 lecture10 Recurrent Neural Networks

Hierarchical Recurrent Neural Networks for Long …papers.nips.cc/paper/1102-hierarchical-recurrent-neural...Hierarchical Recurrent Neural Networks for Long-term Dependencies 495 The

순환신경망(Recurrent neural networks) 개요

Neural Network Part 4: Recurrent Neural Networkspages.cs.wisc.edu/~jerryzhu/cs760/10_neural-networks-4.pdf · •recurrent neural networks (RNN) and the advantage •training recurrent

Neural networks for structured data 1. Table of contents Recurrent models Partially recurrent neural networks Elman networks Jordan networks Recurrent

Recurrent neural networks for statistical machine translation (Neural Machine Translation) · 2016-11-14 · Recurrent neural networks for statistical machine translation (Neural

RECURRENT NEURAL NETWORKS AND FINITE AUTOMATAbinds.cs.umass.edu/papers/1996_Siegelmann_JCompInt.pdf · Computational Intelligence, Volume 12, Number 4, 1996 RECURRENT NEURAL NETWORKS