44
Christof Angermueller https:// cangermueller.com [email protected] @ cangermueller University of Cambridge, European Bioinformatics Institute (EBI-EMBL) Cambridge, UK Generative RNNs for sequence modeling 2016-01-21

Generative RNNs for sequence modeling · 2016-01-21 · }Krishnan, Shalit, and Sontag, “Deep Kalman Filters.” } Gregor et al., “DRAW.”} Kingma and Welling, “Auto-Encoding

  • Upload
    others

  • View
    12

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Generative RNNs for sequence modeling · 2016-01-21 · }Krishnan, Shalit, and Sontag, “Deep Kalman Filters.” } Gregor et al., “DRAW.”} Kingma and Welling, “Auto-Encoding

Christof Angermueller

https://cangermueller.com

[email protected]

@cangermueller

University of Cambridge, European Bioinformatics Institute (EBI-EMBL)

Cambridge, UK

Generative RNNs for sequence modeling

2016-01-21

Page 2: Generative RNNs for sequence modeling · 2016-01-21 · }Krishnan, Shalit, and Sontag, “Deep Kalman Filters.” } Gregor et al., “DRAW.”} Kingma and Welling, “Auto-Encoding

}  Wanted: Probability over sequences x

Sequence modeling

x ~ P(x1,..., xT )

Applications

Text translation

X =

Speech recognition

X =

Bioinformatics

X =

Music modeling

X =

Page 3: Generative RNNs for sequence modeling · 2016-01-21 · }Krishnan, Shalit, and Sontag, “Deep Kalman Filters.” } Gregor et al., “DRAW.”} Kingma and Welling, “Auto-Encoding

}  Wanted: Probability over sequences x

Sequence modeling

x ~ P(x1,..., xT )

Models

n-gram

à Markov assumption

LDS

à Simple linear dynamic

HMM

à Static hidden state

RNN

à Non-linear transition function à Continuous hidden state

Page 4: Generative RNNs for sequence modeling · 2016-01-21 · }Krishnan, Shalit, and Sontag, “Deep Kalman Filters.” } Gregor et al., “DRAW.”} Kingma and Welling, “Auto-Encoding

Discriminative RNN

x1

h2

x2

y2

x3

h3

y3

h1

y1

I like RNNs

Subject Verb Object

X

Y

P(y1,..., yT | x1,..., xT )

à Requires target labels Y

Page 5: Generative RNNs for sequence modeling · 2016-01-21 · }Krishnan, Shalit, and Sontag, “Deep Kalman Filters.” } Gregor et al., “DRAW.”} Kingma and Welling, “Auto-Encoding

Generative RNN

x1 x2 x3

I like RNNs

X

x ~ P(x1,..., xT )

Page 6: Generative RNNs for sequence modeling · 2016-01-21 · }Krishnan, Shalit, and Sontag, “Deep Kalman Filters.” } Gregor et al., “DRAW.”} Kingma and Welling, “Auto-Encoding

Generative RNN

x1

h2

x2 x3

h3h1

I like RNNs

X

Idea: trying to predict the next word

ht = fh (W xhxt +W

hhht−1 + bh )

Page 7: Generative RNNs for sequence modeling · 2016-01-21 · }Krishnan, Shalit, and Sontag, “Deep Kalman Filters.” } Gregor et al., “DRAW.”} Kingma and Welling, “Auto-Encoding

Generative RNN

x1

h2

x2 x3

h3h1

y1

I like RNNs

X

Y

P(x2 | y1)

Idea: trying to predict the next word

ht = fh (W xhxt +W

hhht−1 + bh )

yt = fy (Whyht + b

y )

parameterizes

Page 8: Generative RNNs for sequence modeling · 2016-01-21 · }Krishnan, Shalit, and Sontag, “Deep Kalman Filters.” } Gregor et al., “DRAW.”} Kingma and Welling, “Auto-Encoding

Generative RNN

x1

h2

x2

y2

x3

h3h1

y1

I like RNNs

X

Y

P(x2 | y1) P(x3 | y2 )

Idea: trying to predict the next word

Page 9: Generative RNNs for sequence modeling · 2016-01-21 · }Krishnan, Shalit, and Sontag, “Deep Kalman Filters.” } Gregor et al., “DRAW.”} Kingma and Welling, “Auto-Encoding

Generative RNN

x1

h2

x2

y2

x3

h3

y3

h1

y1

I like RNNs

X

Y

P(x2 | y1) P(x3 | y2 ) P(x4 | y3) Likelihood

Loss function

}  Training via BPTT

Idea: trying to predict the next word

Page 10: Generative RNNs for sequence modeling · 2016-01-21 · }Krishnan, Shalit, and Sontag, “Deep Kalman Filters.” } Gregor et al., “DRAW.”} Kingma and Welling, “Auto-Encoding

Sampling sequences

X

Y

h0

y1Y

P(x2 | h1)

<START>

Page 11: Generative RNNs for sequence modeling · 2016-01-21 · }Krishnan, Shalit, and Sontag, “Deep Kalman Filters.” } Gregor et al., “DRAW.”} Kingma and Welling, “Auto-Encoding

Generating sequences

x1

I

X

Y

h0

y1Y

P(x2 | h1)

<START>

I ~ P(x2|y1)

Page 12: Generative RNNs for sequence modeling · 2016-01-21 · }Krishnan, Shalit, and Sontag, “Deep Kalman Filters.” } Gregor et al., “DRAW.”} Kingma and Welling, “Auto-Encoding

Generating sequences

x1

h1

y1

I

X

Y

h0

y1Y

P(x2 | h1) P(x3 | h2 )

<START>

I

Page 13: Generative RNNs for sequence modeling · 2016-01-21 · }Krishnan, Shalit, and Sontag, “Deep Kalman Filters.” } Gregor et al., “DRAW.”} Kingma and Welling, “Auto-Encoding

Generating sequences

x1 x2

h1

y1

I like

X

Y

h0

y1Y

I

P(x2 | h1) P(x3 | h2 )

<START>

like ~ P(x3|y2)

Page 14: Generative RNNs for sequence modeling · 2016-01-21 · }Krishnan, Shalit, and Sontag, “Deep Kalman Filters.” } Gregor et al., “DRAW.”} Kingma and Welling, “Auto-Encoding

Generating sequences

x1

h2

x2

y2

x3

h3

y3

h1

y1

I like RNNs

X

Y

h0

y0Y

I

P(x2 | h1) P(x3 | h2 ) P(x4 | h3) P(x5 | h4 )

<START>

like RNNs <STOP> <STOP> ~ P(x2|y1)

<STOP>

x4x0

Page 15: Generative RNNs for sequence modeling · 2016-01-21 · }Krishnan, Shalit, and Sontag, “Deep Kalman Filters.” } Gregor et al., “DRAW.”} Kingma and Welling, “Auto-Encoding
Page 16: Generative RNNs for sequence modeling · 2016-01-21 · }Krishnan, Shalit, and Sontag, “Deep Kalman Filters.” } Gregor et al., “DRAW.”} Kingma and Welling, “Auto-Encoding

Example: Wikipedia

Train

Generate

}  xt are characters instead of words! }  Fewer Parameters

Page 17: Generative RNNs for sequence modeling · 2016-01-21 · }Krishnan, Shalit, and Sontag, “Deep Kalman Filters.” } Gregor et al., “DRAW.”} Kingma and Welling, “Auto-Encoding

char-RNN

https://github.com/karpathy/char-rnn http://karpathy.github.io/2015/05/21/rnn-effectiveness/

Page 18: Generative RNNs for sequence modeling · 2016-01-21 · }Krishnan, Shalit, and Sontag, “Deep Kalman Filters.” } Gregor et al., “DRAW.”} Kingma and Welling, “Auto-Encoding

Example: Linux source code

Page 19: Generative RNNs for sequence modeling · 2016-01-21 · }Krishnan, Shalit, and Sontag, “Deep Kalman Filters.” } Gregor et al., “DRAW.”} Kingma and Welling, “Auto-Encoding

Conditional language model

P(x1,..., xT | z1,..., zL )

x1

h2

x2

y2

x3

h3

y3

h1

y1

z1

!h2

z2 z3

!h3!h1

Encoder Decoder

Initialize with last encoder hidden state

Page 20: Generative RNNs for sequence modeling · 2016-01-21 · }Krishnan, Shalit, and Sontag, “Deep Kalman Filters.” } Gregor et al., “DRAW.”} Kingma and Welling, “Auto-Encoding

Example: Shakespeare

Why, Salisbury must find his flesh and thought That which I am not aps, not a man and in fire, To show the reining of the raven and the wars To grace my hand reproach within, and not a fair are hand, That Caesar and my goodly father's world; When I was heaven of presence and our fleets, We spare with hours, but cut thy council I am great, Murdered and by thy master's ready there My power to give thee but so much as hell: Some service in the noble bondman here, Would show him to her wine. O, if you were a feeble sight, the courtesy of your law, Your sight and several breath, will wear the gods With his heads, and my hands are wonder'd at the deeds, So drop upon your lordship's head, and your opinion Shall be against your honour.

Page 21: Generative RNNs for sequence modeling · 2016-01-21 · }Krishnan, Shalit, and Sontag, “Deep Kalman Filters.” } Gregor et al., “DRAW.”} Kingma and Welling, “Auto-Encoding

Image caption generation

Page 22: Generative RNNs for sequence modeling · 2016-01-21 · }Krishnan, Shalit, and Sontag, “Deep Kalman Filters.” } Gregor et al., “DRAW.”} Kingma and Welling, “Auto-Encoding

Hand-writing generation

He dismissed the idea

Page 23: Generative RNNs for sequence modeling · 2016-01-21 · }Krishnan, Shalit, and Sontag, “Deep Kalman Filters.” } Gregor et al., “DRAW.”} Kingma and Welling, “Auto-Encoding

Hand-writing generation

•  xt,1, xt,2: spatial position •  st: pen state (0 = up or 1 = down)

xt = (xt,1, xt,2, st )

Challenges }  Multi-dimensional output }  Multi-modal, correlation (x1, x2)

}  Can’t be represented by simple output function

yt =σ (Whyht + b

y )

Page 24: Generative RNNs for sequence modeling · 2016-01-21 · }Krishnan, Shalit, and Sontag, “Deep Kalman Filters.” } Gregor et al., “DRAW.”} Kingma and Welling, “Auto-Encoding

Solution: Mixture Density Network

Output vector

Conditional probability distribution

à RNN predicts parameters of Gaussian Mixture Model (GMM)

Page 25: Generative RNNs for sequence modeling · 2016-01-21 · }Krishnan, Shalit, and Sontag, “Deep Kalman Filters.” } Gregor et al., “DRAW.”} Kingma and Welling, “Auto-Encoding

Samples

More of national temperament

Page 26: Generative RNNs for sequence modeling · 2016-01-21 · }Krishnan, Shalit, and Sontag, “Deep Kalman Filters.” } Gregor et al., “DRAW.”} Kingma and Welling, “Auto-Encoding

Can we choose a writing style?

Page 27: Generative RNNs for sequence modeling · 2016-01-21 · }Krishnan, Shalit, and Sontag, “Deep Kalman Filters.” } Gregor et al., “DRAW.”} Kingma and Welling, “Auto-Encoding

Conditional language model

x1

h2

x2

y2

x3

h3

y3

h1

y1

z1

!h2

z2 z3

!h3!h1

I love RNNs He dismissed the idea

Seed sequence pair

Target sequence pair

http://www.cs.toronto.edu/~graves/handwriting.html

He dismissed the idea

I love RNNs

Page 28: Generative RNNs for sequence modeling · 2016-01-21 · }Krishnan, Shalit, and Sontag, “Deep Kalman Filters.” } Gregor et al., “DRAW.”} Kingma and Welling, “Auto-Encoding

Same idea: Chinese characters

http://blog.otoro.net/2015/12/28/recurrent-net-dreams-up-fake-chinese-characters-in-vector-format-with-tensorflow/

Page 29: Generative RNNs for sequence modeling · 2016-01-21 · }Krishnan, Shalit, and Sontag, “Deep Kalman Filters.” } Gregor et al., “DRAW.”} Kingma and Welling, “Auto-Encoding

Polyphonic music modeling

Time

Notes (binary) x =

Page 30: Generative RNNs for sequence modeling · 2016-01-21 · }Krishnan, Shalit, and Sontag, “Deep Kalman Filters.” } Gregor et al., “DRAW.”} Kingma and Welling, “Auto-Encoding

Polyphonic music modeling

Time

Notes (binary) x =

Challenges 1.  Correlation along time

Page 31: Generative RNNs for sequence modeling · 2016-01-21 · }Krishnan, Shalit, and Sontag, “Deep Kalman Filters.” } Gregor et al., “DRAW.”} Kingma and Welling, “Auto-Encoding

Polyphonic music modeling

Time

Notes (binary) x =

Challenges 1.  Correlation along time 2.  Correlation between notes

}  High-dimensional, multi-modal output

Page 32: Generative RNNs for sequence modeling · 2016-01-21 · }Krishnan, Shalit, and Sontag, “Deep Kalman Filters.” } Gregor et al., “DRAW.”} Kingma and Welling, “Auto-Encoding

Polyphonic music modeling

Time

Notes (binary) x =

Challenges 1.  Correlation along time 2.  Correlation between notes

}  High-dimensional, multi-modal output

3.  Time-dependent, non-local factors of variation }  Theme, tune, chord progression, …

Page 33: Generative RNNs for sequence modeling · 2016-01-21 · }Krishnan, Shalit, and Sontag, “Deep Kalman Filters.” } Gregor et al., “DRAW.”} Kingma and Welling, “Auto-Encoding

RNN: correlation along time

Time

x =

RNN

Page 34: Generative RNNs for sequence modeling · 2016-01-21 · }Krishnan, Shalit, and Sontag, “Deep Kalman Filters.” } Gregor et al., “DRAW.”} Kingma and Welling, “Auto-Encoding

RBM: correlation between notes

Notes (binary) x =

RBM

Page 35: Generative RNNs for sequence modeling · 2016-01-21 · }Krishnan, Shalit, and Sontag, “Deep Kalman Filters.” } Gregor et al., “DRAW.”} Kingma and Welling, “Auto-Encoding

RNN-RBM, 2013

Time

Notes (binary) x =

RBM

RNN

+

Page 36: Generative RNNs for sequence modeling · 2016-01-21 · }Krishnan, Shalit, and Sontag, “Deep Kalman Filters.” } Gregor et al., “DRAW.”} Kingma and Welling, “Auto-Encoding

RBM 101

Likelihood

Conditional independence

à Inference P(h|v) easy

Intractable partition function

P(v) = P(v,h) = 1Z

exp(−E(v,h))h∑

h∑

à Sampling v ~ P(v) intractable à Contrastive Divergence approximation

Learning requires sampling

Page 37: Generative RNNs for sequence modeling · 2016-01-21 · }Krishnan, Shalit, and Sontag, “Deep Kalman Filters.” } Gregor et al., “DRAW.”} Kingma and Welling, “Auto-Encoding

Sampling

RBM

RNN

+

RNN

1. Hidden state

1.

2. Prediction

2.

RBM

Bias RBM

3.

3. Sample v(t) from RBM via CD

Page 38: Generative RNNs for sequence modeling · 2016-01-21 · }Krishnan, Shalit, and Sontag, “Deep Kalman Filters.” } Gregor et al., “DRAW.”} Kingma and Welling, “Auto-Encoding

Learning

RBM

RNN

+

1.

1. Propagate h’(t-1)

2.

2. Predict bias terms bh(t) and bv(t)

3.

3. Sample v(t) from RBM via CD

P(v) = P(v,h) = 1Z

exp(−E(v,h))h∑

h∑

4. Estimate gradients

5. Back-propagate via BPTT

Page 39: Generative RNNs for sequence modeling · 2016-01-21 · }Krishnan, Shalit, and Sontag, “Deep Kalman Filters.” } Gregor et al., “DRAW.”} Kingma and Welling, “Auto-Encoding

Results

RBM

RNN

RNN-RBM

RNN-NADE

http://www-etud.iro.umontreal.ca/~boulanni/icml2012

Page 40: Generative RNNs for sequence modeling · 2016-01-21 · }Krishnan, Shalit, and Sontag, “Deep Kalman Filters.” } Gregor et al., “DRAW.”} Kingma and Welling, “Auto-Encoding

Big picture

x1

h2

x2

y2

x3

h3

y3

h1

y1

RNN-GMM Graves, 2013

RNN-RBM Boulanger, 2012

RNN-NADE Boulanger, 2012

RNN-DBN Gan et al, 2015

Prediction RNN Likelihood function parameterizes

Page 41: Generative RNNs for sequence modeling · 2016-01-21 · }Krishnan, Shalit, and Sontag, “Deep Kalman Filters.” } Gregor et al., “DRAW.”} Kingma and Welling, “Auto-Encoding

More latent-variable RNNs

Bayer and Osendorfer, 2014 Learning Stochastic Recurrent Networks

}  Stochastic Gradient Variational Bayes (SGVB) to speed up training

Krishnan, Shalit, and Sontag, 2015 Deep Kalman Filters

Chung et al., 2015 A Recurrent Latent Variable Model for Sequential data

Page 42: Generative RNNs for sequence modeling · 2016-01-21 · }Krishnan, Shalit, and Sontag, “Deep Kalman Filters.” } Gregor et al., “DRAW.”} Kingma and Welling, “Auto-Encoding

}  Generative RNNs allow sequence modeling

}  Different output functions allow to model data of different modalities

}  Latent-variable RNNs allow to model highly-structured data at the cost of runtime

Conclusions

Page 43: Generative RNNs for sequence modeling · 2016-01-21 · }Krishnan, Shalit, and Sontag, “Deep Kalman Filters.” } Gregor et al., “DRAW.”} Kingma and Welling, “Auto-Encoding

}  Graves, “Generating Sequences With Recurrent Neural Networks.

}  Boulanger-Lewandowski, Bengio, and Vincent, “Modeling Temporal Dependencies in High-Dimensional Sequences.”

}  Boulanger-Lewandowski, Bengio, and Vincent, “High-Dimensional Sequence Transduction.”

}  Bayer and Osendorfer, “Learning Stochastic Recurrent Networks.”

}  Chung et al., “A Recurrent Latent Variable Model for Sequential Data.”

}  Gan et al., “Deep Temporal Sigmoid Belief Networks for Sequence Modeling.”

}  Bowman et al., “Generating Sentences from a Continuous Space.”

References

Page 44: Generative RNNs for sequence modeling · 2016-01-21 · }Krishnan, Shalit, and Sontag, “Deep Kalman Filters.” } Gregor et al., “DRAW.”} Kingma and Welling, “Auto-Encoding

}  Krishnan, Shalit, and Sontag, “Deep Kalman Filters.” }  Gregor et al., “DRAW.” }  Kingma and Welling, “Auto-Encoding Variational Bayes.” }  Larochelle and Murray, “The Neural Autoregressive Distribution

Estimator.” }  Brakel, Stroobandt, and Schrauwen, “Training Energy-Based

Models for Time-Series Imputation.” }  Goel and Vohra, “Learning Temporal Dependencies in Data Using

a DBN-BLSTM.” }  Fabius and van Amersfoort, “Variational Recurrent Auto-

Encoders.” }  Zaremba and COM, “An Empirical Exploration of Recurrent

Network Architectures.”

References