Upload
others
View
6
Download
0
Embed Size (px)
Citation preview
Recurrent Neural NetworksLecture 11 - Part A
Yaniv Bronhaim 11/6/2018
Outline
- Feedforward networks revisit
- The structure of Recurrent Neural Networks (RNN)
- RNN Architectures
- Bidirectional RNNs and Deep RNNs
- Backpropagation through time (BPTT)
- Natural Language Processing example
- “The unreasonable effectiveness” of RNNs (Andrej Karpathy)
- RNN Interpretations - Neural science with RNNs
- Image captioning with ConvNets and RNNs
- Summary
Feedforward network
InputY - Prediction (classification\regression)
A[x] - Network State in hidden layer x
W[x] - Network parameters for hidden layer x
b[x] - Bayes for layer x
Input -
Presentation
for valid
inputs
Feedforward networks
What will David do tonight?
Feedforward networks
What will David do tonight?
Party Sleep Trainin
g
Possible activities:
1
0
0
0
1
0
0
0
1
Feedforward networks
Possible Inputs:
Sunny
Day
Rainy
Day0
11
0
NN
F(Sunny Day) = Party
0.8
0.2
0
Training
over time1
0
0
Expected
score
NN
F(Rainy day) = Sleep
0.2
0.7
0.1
0
1
0
Expected
score
1
0
0
1
W1
0X =
1
0
0
Feedforward Neural Network Mission
f(x,W) = Wx
x
0.7
0.2
0.1
Let’s look on sequential data
Every morning David decides what to do -
Running in the gym
Riding on bicycles
Swimming
- Every new day David does the next
activity in his activities options, by
order.
Let’s look on sequential data
- Unless it’s rainy day. In that case David
stays to sleep instead of his daily
practice
- When the sun comes out again I do the
next activity since my last train.
First solution - Using Yesterday’s data in FFN
Sunny dayYesterday’s
activity
+
Today’s activity
First solution - Using Yesterday’s data in FFN
Sunny dayYesterday’s
activity
+
Today’s activity
+
+
First solution - Problem
Sunny dayYesterday’s
activity
+
Today’s activity
+
+
1st day
2nd day
3th day
4th day
5th day
6th day
1st day
2nd day
Func
Func
As long as we know the activity of the last sunny day
2nd day
3rd day
Func
Func
But we know only yesterday
2nd day
3rd day
Func
Func
The “Func” output includes also data from the past
+
Sequential data
- The input is a sequence x of vectors (Xt is the vector at
time t) + the output from previous run with “history” (HOW:
Hidden layer is looped back from the past into the future)
- Output is a softmax layer predicting the next activity
RNN Equations
Token from: http://cs231n.stanford.edu/slides/2017/cs231n_2017_lecture10.pdf
Same function and same
parameters are passed in each
timestep t
Recurrent Neural Network Computational Graph
Reusing same weight matrix every time step
Recurrent Neural Network
- W is shared across time - reduces the number of parameters
- Hidden state == Memory
- “temporal size” of sequences
RNN Architectures
Taken from: http://cs231n.stanford.edu/slides/2017/cs231n_2017_lecture10.pdf
Image
Caption Sentiment Sequence to sequence POSClassification
Token from: http://cs231n.stanford.edu/slides/2017/cs231n_2017_lecture10.pdf
RNN: many to one
RNN: one to many
RNN: many to many
Token from: http://cs231n.stanford.edu/slides/2017/cs231n_2017_lecture10.pdf
Multi Layer RNNs
More leaning capacity
Bidirectional RNNs
- I want to go to school\college, I have a lesson at 8 o’clock
- We might want to consider words that appear after the
word in focus
Backpropagation through time (BPTT)
Truncated Backpropagation through time (BPTT)
Process only
chunk of
sequence and
backprop to
update W
Carry hidden states
forward in time
forever, but only
backpropagate for
some smaller
number of steps
Truncated Backpropagation through time (BPTT)
Vocabulary - “H”, “E”, “L”, “O”
Token from: http://cs231n.stanford.edu/slides/2017/cs231n_2017_lecture10.pdf
Concrete example - Character-level language model
Token from: http://cs231n.stanford.edu/slides/2017/cs231n_2017_lecture10.pdf
Concrete example - Character-level language model
4 separate training examples:
1. The probability of “e” should be
likely given the context of “h”.
2. “l” should be likely in the context
of “he”.
3. “l” should also be likely given the
context of “hel”.
4. “o” should be likely given the
context of “hell”.
Generating Sequences
- Feed-back the sample
character to the model to
generate a sentence
Token from: http://cs231n.stanford.edu/slides/2017/cs231n_2017_lecture10.pdf
<START>
Training this on a lot of sentences would give us a language
model. A way to predict:
Continue until <END>..
https://github.com/karpathy/char-rnn
Center for Brains, Minds and Machines (CBMM)
Generated Latex notes
http://vision.stanford.edu/pdf/KarpathyICLR2016.pdf
Generated Latex notes
http://vision.stanford.edu/pdf/KarpathyICLR2016.pdf
What can we learn from the internal state of
specific cells in the recurrent network
Interpretation - Neural Science With
RNN
Interpretation - Neural Science With RNN
Red = -1
Blue = 1
White = 0
As we process text we pick particular cell and
visualize it’s activation - looking at the firing rate
of the cell as we read the text
Interpretation - Neural Science With RNN
Generated C code - Trained on Linux kernel code
https://github.com/karpathy/char-rnn
Interpretation - Neural Science With RNN
Interpretation - Neural Science With RNN
Interpretation - Neural Science With RNN
Recurrent Neural Networks for Folk Music Generation
https://highnoongmt.wordpress.com/2015/05/22/lisls-stis-recurrent-neural-
networks-for-folk-music-generation/
https://imgur.com/gallery/u76wY
Image captioning with ConvNets and RNNs
- Back to CNN - How RNN is integrated for
application related to image processing?
Image captioning with ConvNetsAnd RNNs
- Convolutional Networks express a single
differentiable function from raw image pixel
values to class probabilities
“VGGNet” or “OxfordNet”
(5 conv layers and 4 pooling layers)
“Very Deep Convolutional Networks for Large-Scale Visual Recognition” [Simonyan
and Zisserman, 2014]
- We use the FC-4096 layer as the image
representation and push it to RNNs which
generate sentences as we saw before
Image captioning with ConvNets and RNNs
- First caption input will be constant string -
<START>
X0 =
<START>
H0
Image captioning with ConvNets and RNNs
Wih
- Generating first word in caption
X0 =
<START>
H0
Image captioning with ConvNets and RNNs
Wih
Y0
Man
X0 =
<START>
H0
Image captioning with ConvNets and RNNs
Wih
Y0
Man
- We use Y0 as the input for next iteration
X1
H1
Y1
With
X0 =
<START>
H0
Image captioning with ConvNets and RNNs
Wih
Y0
Man
X1
H1
Y1
With
- continue until Yt = <END>
X2
H2
Y2
a
X0 =
<START>
H0
Image captioning with ConvNets and RNNs
Wih
Y0
Man
X1
H1
Y1
With
- continue until Yt = <END>
X2
H2
Y2
a
X3
H3
Y3
Dog
X0 =
<START>
H0
Image captioning with ConvNets and RNNs
Wih
Y0
Man
X1
H1
Y1
With
- continue until Yt = <END>
X2
H2
Y2
a
X3
H3
Y3
X4
H4
Y4
Dog <END>
X0 =
<START>
H0
Image captioning with ConvNets and RNNs
Wih
Y0
Man
X1
H1
Y1
With
- continue until Yt = <END>
X2
H2
Y2
a
X3
H3
Y3
X4
H4
Y4
Dog <END>
Summary
- RNN declaration and architectures
- Bits about language processing and RNN effectiveness
- Applications based RNNs
- Integrating with ConvNets
- Next - More advanced memory with Long Short Term
Memory (LSTM) and many more applications based RNNs
References
- Stanford CS231- Fei-Fei & Justin Johnson & Serena Yeung. Lecture 10
- https://deeplearning4j.org/lstm.html
- Coursera, Machine Learning course by Andrew Ng.
- https://karpathy.github.io/2015/05/21/rnn-effectiveness/
- https://arxiv.org/pdf/1406.6247.pdf
- Udacity - Deep learning, by Luis Serrano
- https://medium.com/syncedreview/a-brief-overview-of-attention-mechanism-13c578ba9129
- http://vision.stanford.edu/pdf/KarpathyICLR2016.pdf
- https://highnoongmt.wordpress.com/2015/05/22/lisls-stis-recurrent-neural-networks-for-folk-music-generation/
- NLP course (IDC) - Kfir Bar - NLM lecture
- https://cs.stanford.edu/people/karpathy/deepimagesent/
- https://arxiv.org/abs/1308.0850
- https://deeplearning4j.org/lstm.html#backpropagation
- https://arxiv.org/pdf/1312.6026.pdf
- https://www.safaribooksonline.com/library/view/neural-networks-and/9781492037354/ch04.html
- https://www.di.ens.fr/~lelarge/dldiy/slides/lecture_8