Generative RNNs for sequence modeling · 2016-01-21 · }Krishnan, Shalit, and Sontag, “Deep...

Christof Angermueller

https://cangermueller.com

cangermueller@gmail.com

@cangermueller

University of Cambridge, European Bioinformatics Institute (EBI-EMBL)

Cambridge, UK

Generative RNNs for sequence modeling

2016-01-21

}  Wanted: Probability over sequences x

Sequence modeling

x ~ P(x1,..., xT )

Applications

Text translation

Speech recognition

Bioinformatics

Music modeling

}  Wanted: Probability over sequences x

Sequence modeling

x ~ P(x1,..., xT )

Models

n-gram

à Markov assumption

à Simple linear dynamic

à Static hidden state

à Non-linear transition function à Continuous hidden state

Discriminative RNN

I like RNNs

Subject Verb Object

P(y1,..., yT | x1,..., xT )

à Requires target labels Y

Generative RNN

x1 x2 x3

I like RNNs

x ~ P(x1,..., xT )

Generative RNN

I like RNNs

Idea: trying to predict the next word

ht = fh (W xhxt +W

hhht−1 + bh )

Generative RNN

I like RNNs

P(x2 | y1)

ht = fh (W xhxt +W

hhht−1 + bh )

yt = fy (Whyht + b

parameterizes

Generative RNN

I like RNNs

P(x2 | y1) P(x3 | y2 )

Generative RNN

I like RNNs

P(x2 | y1) P(x3 | y2 ) P(x4 | y3) Likelihood

Loss function

}  Training via BPTT

Sampling sequences

P(x2 | h1)

<START>

Generating sequences

P(x2 | h1)

<START>

I ~ P(x2|y1)

P(x2 | h1) P(x3 | h2 )

<START>

I like

P(x2 | h1) P(x3 | h2 )

<START>

like ~ P(x3|y2)

I like RNNs

P(x2 | h1) P(x3 | h2 ) P(x4 | h3) P(x5 | h4 )

<START>

like RNNs <STOP> <STOP> ~ P(x2|y1)

<STOP>

Example: Wikipedia

Generate

}  xt are characters instead of words! }  Fewer Parameters

char-RNN

https://github.com/karpathy/char-rnn http://karpathy.github.io/2015/05/21/rnn-effectiveness/

Example: Linux source code

Conditional language model

P(x1,..., xT | z1,..., zL )

!h3!h1

Encoder Decoder

Initialize with last encoder hidden state

Example: Shakespeare

Why, Salisbury must find his flesh and thought That which I am not aps, not a man and in fire, To show the reining of the raven and the wars To grace my hand reproach within, and not a fair are hand, That Caesar and my goodly father's world; When I was heaven of presence and our fleets, We spare with hours, but cut thy council I am great, Murdered and by thy master's ready there My power to give thee but so much as hell: Some service in the noble bondman here, Would show him to her wine. O, if you were a feeble sight, the courtesy of your law, Your sight and several breath, will wear the gods With his heads, and my hands are wonder'd at the deeds, So drop upon your lordship's head, and your opinion Shall be against your honour.

Image caption generation

Hand-writing generation

He dismissed the idea

Hand-writing generation

•  xt,1, xt,2: spatial position •  st: pen state (0 = up or 1 = down)

xt = (xt,1, xt,2, st )

Challenges }  Multi-dimensional output }  Multi-modal, correlation (x1, x2)

}  Can’t be represented by simple output function

yt =σ (Whyht + b

Solution: Mixture Density Network

Output vector

Conditional probability distribution

à RNN predicts parameters of Gaussian Mixture Model (GMM)

Samples

More of national temperament

Can we choose a writing style?

Conditional language model

!h3!h1

I love RNNs He dismissed the idea

Seed sequence pair

Target sequence pair

http://www.cs.toronto.edu/~graves/handwriting.html

He dismissed the idea

I love RNNs

Same idea: Chinese characters

http://blog.otoro.net/2015/12/28/recurrent-net-dreams-up-fake-chinese-characters-in-vector-format-with-tensorflow/

Polyphonic music modeling

Notes (binary) x =

Challenges 1.  Correlation along time

Notes (binary) x =

Challenges 1.  Correlation along time 2.  Correlation between notes

}  High-dimensional, multi-modal output

Notes (binary) x =

Challenges 1.  Correlation along time 2.  Correlation between notes

}  High-dimensional, multi-modal output

3.  Time-dependent, non-local factors of variation }  Theme, tune, chord progression, …

RNN: correlation along time

RBM: correlation between notes

Notes (binary) x =

RNN-RBM, 2013

Notes (binary) x =

RBM 101

Likelihood

Conditional independence

à Inference P(h|v) easy

Intractable partition function

P(v) = P(v,h) = 1Z

exp(−E(v,h))h∑

à Sampling v ~ P(v) intractable à Contrastive Divergence approximation

Learning requires sampling

Sampling

1. Hidden state

2. Prediction

Bias RBM

3. Sample v(t) from RBM via CD

Learning

1. Propagate h’(t-1)

2. Predict bias terms bh(t) and bv(t)

3. Sample v(t) from RBM via CD

P(v) = P(v,h) = 1Z

exp(−E(v,h))h∑

4. Estimate gradients

5. Back-propagate via BPTT

Results

RNN-RBM

RNN-NADE

http://www-etud.iro.umontreal.ca/~boulanni/icml2012

Big picture

RNN-GMM Graves, 2013

RNN-RBM Boulanger, 2012

RNN-NADE Boulanger, 2012

RNN-DBN Gan et al, 2015

Prediction RNN Likelihood function parameterizes

More latent-variable RNNs

Bayer and Osendorfer, 2014 Learning Stochastic Recurrent Networks

}  Stochastic Gradient Variational Bayes (SGVB) to speed up training

Krishnan, Shalit, and Sontag, 2015 Deep Kalman Filters

Chung et al., 2015 A Recurrent Latent Variable Model for Sequential data

}  Generative RNNs allow sequence modeling

}  Different output functions allow to model data of different modalities

}  Latent-variable RNNs allow to model highly-structured data at the cost of runtime

Conclusions

}  Graves, “Generating Sequences With Recurrent Neural Networks.

}  Boulanger-Lewandowski, Bengio, and Vincent, “Modeling Temporal Dependencies in High-Dimensional Sequences.”

}  Boulanger-Lewandowski, Bengio, and Vincent, “High-Dimensional Sequence Transduction.”

}  Bayer and Osendorfer, “Learning Stochastic Recurrent Networks.”

}  Chung et al., “A Recurrent Latent Variable Model for Sequential Data.”

}  Gan et al., “Deep Temporal Sigmoid Belief Networks for Sequence Modeling.”

}  Bowman et al., “Generating Sentences from a Continuous Space.”

References

}  Krishnan, Shalit, and Sontag, “Deep Kalman Filters.” }  Gregor et al., “DRAW.” }  Kingma and Welling, “Auto-Encoding Variational Bayes.” }  Larochelle and Murray, “The Neural Autoregressive Distribution

Estimator.” }  Brakel, Stroobandt, and Schrauwen, “Training Energy-Based

Models for Time-Series Imputation.” }  Goel and Vohra, “Learning Temporal Dependencies in Data Using

a DBN-BLSTM.” }  Fabius and van Amersfoort, “Variational Recurrent Auto-

Encoders.” }  Zaremba and COM, “An Empirical Exploration of Recurrent

Network Architectures.”

References

Generative RNNs for sequence modeling · 2016-01-21 · }Krishnan, Shalit, and Sontag, “Deep...

Documents

Introduction to RNNs - Svetlana Lazebnikslazebni.cs.illinois.edu/spring17/lec02_rnn.pdf · Introduction to RNNs! Arun Mallya! Best viewed with Computer Modern fonts installed! Outline!

King Case Profiel Annemarie Kingma

SIB e-News 298 e-News_ SIB e-News 298.pdfPai do soldado Shalit concorrerá a vaga de deputado Noam Shalit, pai do soldado israelense Gilad Shalit, que ficou preso em Gaza de junho

Introduction to CNNs and RNNs with PyTorchIntroduction to CNNs and RNNs with PyTorch Presented by: Adam Balint Email: balint@uoguelph.ca

RNNs in TensorFlow - Stanford Universityweb.stanford.edu/class/cs20si/lectures/slides_11.pdf · · 2017-02-22Character-level Language Modeling 24 ... RNNs in TensorFlow 32. Cell

RNNs: Teacher Forcing

´De Kinkhoorn` - Kingma Statekingmastate.nl/de_kinkhoorn/Kinkhoorn3.pdf · De Kinkhoorn` 3 2001 / 1 2.te Jahrgang Nr. 1 März 2001 Nieuwsblad van de Stichting Kingma State 1

Learning and Reﬁning of Privileged Information-based RNNs for Action Recognition from Depth … · Learning and Reﬁning of Privileged Information-based RNNs for Action Recognition

SLEEP STAGE SCORING USING CNNs AND RNNs

Persistent RNNs: Stashing Recurrent Weights On-Chip

Transformers are RNNs: Fast Autoregressive Transformers with … · 2020-07-01 · Transformers are RNNs: Fast Autoregressive Transformers with Linear Attention Angelos Katharopoulos

City Planning Realities by Prof. Dr. Avner de-Shalit

´De Kinkhoorn` - Kingma Statekingmastate.nl/de_kinkhoorn/Kinkhoorn4.pdf · ´De Kinkhoorn` Deutsche Version Übersetzt von Ralf J. Kingma De Kinkhoorn` 4 2001 / 2 2.te Jahrgang Nr

F-isocrystals Ehud de Shalit - huji.ac.ildeshalit/new_site/files/F-isocrystals.pdfF-isocrystals Ehud de Shalit CHAPTER 1 F-isocrystals 1. Semilinear algebra 1.1. What is semilinear

Gilad shalit ppt

Persistent RNNs - (stashing recurrent weights on-chip)on-demand.gputechconf.com/...diamos-persisten-rnns.pdf · Persistent RNNs (stashing recurrent weights on-chip) Gregory Diamos

Frost & Sullivan Graham Kingma - WEST - WIND Mobile Presentation … · 2010. 10. 28. · Microsoft PowerPoint - Frost & Sullivan Graham Kingma - WEST - WIND Mobile Presentation 2010.pptx

RNNs as Directed Graphical Models

Transformers are RNNs: Fast Autoregressive Transformers ... · Transformers are RNNs 3.1. Transformers Let x2RN F denote a sequence of Nfeature vectors of dimensions F. A transformer

Generating Sequences with Deep LSTMs & RNNS in julia