74
Recurrent Neural Networks Lecture 11 - Part A Yaniv Bronhaim 11/6/2018

Recurrent Neural Networks Lecture 11 - Part A · 2nd day 3th day 4th day 5th day 6th day. 1st day 2nd day Func Func As long as we know the activity of the last sunny day. 2nd day

  • Upload
    others

  • View
    6

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Recurrent Neural Networks Lecture 11 - Part A · 2nd day 3th day 4th day 5th day 6th day. 1st day 2nd day Func Func As long as we know the activity of the last sunny day. 2nd day

Recurrent Neural NetworksLecture 11 - Part A

Yaniv Bronhaim 11/6/2018

Page 2: Recurrent Neural Networks Lecture 11 - Part A · 2nd day 3th day 4th day 5th day 6th day. 1st day 2nd day Func Func As long as we know the activity of the last sunny day. 2nd day

Outline

- Feedforward networks revisit

- The structure of Recurrent Neural Networks (RNN)

- RNN Architectures

- Bidirectional RNNs and Deep RNNs

- Backpropagation through time (BPTT)

- Natural Language Processing example

- “The unreasonable effectiveness” of RNNs (Andrej Karpathy)

- RNN Interpretations - Neural science with RNNs

- Image captioning with ConvNets and RNNs

- Summary

Page 3: Recurrent Neural Networks Lecture 11 - Part A · 2nd day 3th day 4th day 5th day 6th day. 1st day 2nd day Func Func As long as we know the activity of the last sunny day. 2nd day

Feedforward network

InputY - Prediction (classification\regression)

A[x] - Network State in hidden layer x

W[x] - Network parameters for hidden layer x

b[x] - Bayes for layer x

Input -

Presentation

for valid

inputs

Page 4: Recurrent Neural Networks Lecture 11 - Part A · 2nd day 3th day 4th day 5th day 6th day. 1st day 2nd day Func Func As long as we know the activity of the last sunny day. 2nd day

Feedforward networks

What will David do tonight?

Page 5: Recurrent Neural Networks Lecture 11 - Part A · 2nd day 3th day 4th day 5th day 6th day. 1st day 2nd day Func Func As long as we know the activity of the last sunny day. 2nd day

Feedforward networks

What will David do tonight?

Party Sleep Trainin

g

Possible activities:

1

0

0

0

1

0

0

0

1

Page 6: Recurrent Neural Networks Lecture 11 - Part A · 2nd day 3th day 4th day 5th day 6th day. 1st day 2nd day Func Func As long as we know the activity of the last sunny day. 2nd day

Feedforward networks

Possible Inputs:

Sunny

Day

Rainy

Day0

11

0

Page 7: Recurrent Neural Networks Lecture 11 - Part A · 2nd day 3th day 4th day 5th day 6th day. 1st day 2nd day Func Func As long as we know the activity of the last sunny day. 2nd day

NN

F(Sunny Day) = Party

0.8

0.2

0

Training

over time1

0

0

Expected

score

Page 8: Recurrent Neural Networks Lecture 11 - Part A · 2nd day 3th day 4th day 5th day 6th day. 1st day 2nd day Func Func As long as we know the activity of the last sunny day. 2nd day

NN

F(Rainy day) = Sleep

0.2

0.7

0.1

0

1

0

Expected

score

Page 9: Recurrent Neural Networks Lecture 11 - Part A · 2nd day 3th day 4th day 5th day 6th day. 1st day 2nd day Func Func As long as we know the activity of the last sunny day. 2nd day

1

0

0

1

W1

0X =

1

0

0

Feedforward Neural Network Mission

f(x,W) = Wx

x

0.7

0.2

0.1

Page 10: Recurrent Neural Networks Lecture 11 - Part A · 2nd day 3th day 4th day 5th day 6th day. 1st day 2nd day Func Func As long as we know the activity of the last sunny day. 2nd day

Let’s look on sequential data

Every morning David decides what to do -

Running in the gym

Riding on bicycles

Swimming

- Every new day David does the next

activity in his activities options, by

order.

Page 11: Recurrent Neural Networks Lecture 11 - Part A · 2nd day 3th day 4th day 5th day 6th day. 1st day 2nd day Func Func As long as we know the activity of the last sunny day. 2nd day

Let’s look on sequential data

- Unless it’s rainy day. In that case David

stays to sleep instead of his daily

practice

- When the sun comes out again I do the

next activity since my last train.

Page 12: Recurrent Neural Networks Lecture 11 - Part A · 2nd day 3th day 4th day 5th day 6th day. 1st day 2nd day Func Func As long as we know the activity of the last sunny day. 2nd day

First solution - Using Yesterday’s data in FFN

Sunny dayYesterday’s

activity

+

Today’s activity

Page 13: Recurrent Neural Networks Lecture 11 - Part A · 2nd day 3th day 4th day 5th day 6th day. 1st day 2nd day Func Func As long as we know the activity of the last sunny day. 2nd day

First solution - Using Yesterday’s data in FFN

Sunny dayYesterday’s

activity

+

Today’s activity

+

+

Page 14: Recurrent Neural Networks Lecture 11 - Part A · 2nd day 3th day 4th day 5th day 6th day. 1st day 2nd day Func Func As long as we know the activity of the last sunny day. 2nd day

First solution - Problem

Sunny dayYesterday’s

activity

+

Today’s activity

+

+

Page 15: Recurrent Neural Networks Lecture 11 - Part A · 2nd day 3th day 4th day 5th day 6th day. 1st day 2nd day Func Func As long as we know the activity of the last sunny day. 2nd day

1st day

2nd day

3th day

4th day

5th day

6th day

Page 16: Recurrent Neural Networks Lecture 11 - Part A · 2nd day 3th day 4th day 5th day 6th day. 1st day 2nd day Func Func As long as we know the activity of the last sunny day. 2nd day

1st day

2nd day

Func

Func

As long as we know the activity of the last sunny day

Page 17: Recurrent Neural Networks Lecture 11 - Part A · 2nd day 3th day 4th day 5th day 6th day. 1st day 2nd day Func Func As long as we know the activity of the last sunny day. 2nd day

2nd day

3rd day

Func

Func

But we know only yesterday

Page 18: Recurrent Neural Networks Lecture 11 - Part A · 2nd day 3th day 4th day 5th day 6th day. 1st day 2nd day Func Func As long as we know the activity of the last sunny day. 2nd day

2nd day

3rd day

Func

Func

The “Func” output includes also data from the past

+

Page 19: Recurrent Neural Networks Lecture 11 - Part A · 2nd day 3th day 4th day 5th day 6th day. 1st day 2nd day Func Func As long as we know the activity of the last sunny day. 2nd day

Sequential data

- The input is a sequence x of vectors (Xt is the vector at

time t) + the output from previous run with “history” (HOW:

Hidden layer is looped back from the past into the future)

- Output is a softmax layer predicting the next activity

Page 20: Recurrent Neural Networks Lecture 11 - Part A · 2nd day 3th day 4th day 5th day 6th day. 1st day 2nd day Func Func As long as we know the activity of the last sunny day. 2nd day
Page 21: Recurrent Neural Networks Lecture 11 - Part A · 2nd day 3th day 4th day 5th day 6th day. 1st day 2nd day Func Func As long as we know the activity of the last sunny day. 2nd day

RNN Equations

Token from: http://cs231n.stanford.edu/slides/2017/cs231n_2017_lecture10.pdf

Same function and same

parameters are passed in each

timestep t

Page 22: Recurrent Neural Networks Lecture 11 - Part A · 2nd day 3th day 4th day 5th day 6th day. 1st day 2nd day Func Func As long as we know the activity of the last sunny day. 2nd day

Recurrent Neural Network Computational Graph

Reusing same weight matrix every time step

Page 23: Recurrent Neural Networks Lecture 11 - Part A · 2nd day 3th day 4th day 5th day 6th day. 1st day 2nd day Func Func As long as we know the activity of the last sunny day. 2nd day

Recurrent Neural Network

- W is shared across time - reduces the number of parameters

- Hidden state == Memory

- “temporal size” of sequences

Page 24: Recurrent Neural Networks Lecture 11 - Part A · 2nd day 3th day 4th day 5th day 6th day. 1st day 2nd day Func Func As long as we know the activity of the last sunny day. 2nd day

RNN Architectures

Taken from: http://cs231n.stanford.edu/slides/2017/cs231n_2017_lecture10.pdf

Image

Caption Sentiment Sequence to sequence POSClassification

Page 25: Recurrent Neural Networks Lecture 11 - Part A · 2nd day 3th day 4th day 5th day 6th day. 1st day 2nd day Func Func As long as we know the activity of the last sunny day. 2nd day

Token from: http://cs231n.stanford.edu/slides/2017/cs231n_2017_lecture10.pdf

RNN: many to one

Page 26: Recurrent Neural Networks Lecture 11 - Part A · 2nd day 3th day 4th day 5th day 6th day. 1st day 2nd day Func Func As long as we know the activity of the last sunny day. 2nd day

RNN: one to many

Page 27: Recurrent Neural Networks Lecture 11 - Part A · 2nd day 3th day 4th day 5th day 6th day. 1st day 2nd day Func Func As long as we know the activity of the last sunny day. 2nd day

RNN: many to many

Page 28: Recurrent Neural Networks Lecture 11 - Part A · 2nd day 3th day 4th day 5th day 6th day. 1st day 2nd day Func Func As long as we know the activity of the last sunny day. 2nd day

Token from: http://cs231n.stanford.edu/slides/2017/cs231n_2017_lecture10.pdf

Page 29: Recurrent Neural Networks Lecture 11 - Part A · 2nd day 3th day 4th day 5th day 6th day. 1st day 2nd day Func Func As long as we know the activity of the last sunny day. 2nd day

Multi Layer RNNs

More leaning capacity

Page 30: Recurrent Neural Networks Lecture 11 - Part A · 2nd day 3th day 4th day 5th day 6th day. 1st day 2nd day Func Func As long as we know the activity of the last sunny day. 2nd day

Bidirectional RNNs

- I want to go to school\college, I have a lesson at 8 o’clock

- We might want to consider words that appear after the

word in focus

Page 31: Recurrent Neural Networks Lecture 11 - Part A · 2nd day 3th day 4th day 5th day 6th day. 1st day 2nd day Func Func As long as we know the activity of the last sunny day. 2nd day

Backpropagation through time (BPTT)

Page 32: Recurrent Neural Networks Lecture 11 - Part A · 2nd day 3th day 4th day 5th day 6th day. 1st day 2nd day Func Func As long as we know the activity of the last sunny day. 2nd day

Truncated Backpropagation through time (BPTT)

Process only

chunk of

sequence and

backprop to

update W

Page 33: Recurrent Neural Networks Lecture 11 - Part A · 2nd day 3th day 4th day 5th day 6th day. 1st day 2nd day Func Func As long as we know the activity of the last sunny day. 2nd day

Carry hidden states

forward in time

forever, but only

backpropagate for

some smaller

number of steps

Truncated Backpropagation through time (BPTT)

Page 34: Recurrent Neural Networks Lecture 11 - Part A · 2nd day 3th day 4th day 5th day 6th day. 1st day 2nd day Func Func As long as we know the activity of the last sunny day. 2nd day

Vocabulary - “H”, “E”, “L”, “O”

Token from: http://cs231n.stanford.edu/slides/2017/cs231n_2017_lecture10.pdf

Concrete example - Character-level language model

Page 35: Recurrent Neural Networks Lecture 11 - Part A · 2nd day 3th day 4th day 5th day 6th day. 1st day 2nd day Func Func As long as we know the activity of the last sunny day. 2nd day

Token from: http://cs231n.stanford.edu/slides/2017/cs231n_2017_lecture10.pdf

Concrete example - Character-level language model

4 separate training examples:

1. The probability of “e” should be

likely given the context of “h”.

2. “l” should be likely in the context

of “he”.

3. “l” should also be likely given the

context of “hel”.

4. “o” should be likely given the

context of “hell”.

Page 36: Recurrent Neural Networks Lecture 11 - Part A · 2nd day 3th day 4th day 5th day 6th day. 1st day 2nd day Func Func As long as we know the activity of the last sunny day. 2nd day

Generating Sequences

- Feed-back the sample

character to the model to

generate a sentence

Token from: http://cs231n.stanford.edu/slides/2017/cs231n_2017_lecture10.pdf

Page 37: Recurrent Neural Networks Lecture 11 - Part A · 2nd day 3th day 4th day 5th day 6th day. 1st day 2nd day Func Func As long as we know the activity of the last sunny day. 2nd day

<START>

Training this on a lot of sentences would give us a language

model. A way to predict:

Page 38: Recurrent Neural Networks Lecture 11 - Part A · 2nd day 3th day 4th day 5th day 6th day. 1st day 2nd day Func Func As long as we know the activity of the last sunny day. 2nd day

Continue until <END>..

Page 40: Recurrent Neural Networks Lecture 11 - Part A · 2nd day 3th day 4th day 5th day 6th day. 1st day 2nd day Func Func As long as we know the activity of the last sunny day. 2nd day
Page 41: Recurrent Neural Networks Lecture 11 - Part A · 2nd day 3th day 4th day 5th day 6th day. 1st day 2nd day Func Func As long as we know the activity of the last sunny day. 2nd day
Page 42: Recurrent Neural Networks Lecture 11 - Part A · 2nd day 3th day 4th day 5th day 6th day. 1st day 2nd day Func Func As long as we know the activity of the last sunny day. 2nd day

Generated Latex notes

http://vision.stanford.edu/pdf/KarpathyICLR2016.pdf

Page 43: Recurrent Neural Networks Lecture 11 - Part A · 2nd day 3th day 4th day 5th day 6th day. 1st day 2nd day Func Func As long as we know the activity of the last sunny day. 2nd day

Generated Latex notes

http://vision.stanford.edu/pdf/KarpathyICLR2016.pdf

Page 44: Recurrent Neural Networks Lecture 11 - Part A · 2nd day 3th day 4th day 5th day 6th day. 1st day 2nd day Func Func As long as we know the activity of the last sunny day. 2nd day

RNN Bible

https://twitter.com/rnn_bible

Page 45: Recurrent Neural Networks Lecture 11 - Part A · 2nd day 3th day 4th day 5th day 6th day. 1st day 2nd day Func Func As long as we know the activity of the last sunny day. 2nd day

What can we learn from the internal state of

specific cells in the recurrent network

Interpretation - Neural Science With

RNN

Page 46: Recurrent Neural Networks Lecture 11 - Part A · 2nd day 3th day 4th day 5th day 6th day. 1st day 2nd day Func Func As long as we know the activity of the last sunny day. 2nd day

Interpretation - Neural Science With RNN

Red = -1

Blue = 1

White = 0

As we process text we pick particular cell and

visualize it’s activation - looking at the firing rate

of the cell as we read the text

Page 47: Recurrent Neural Networks Lecture 11 - Part A · 2nd day 3th day 4th day 5th day 6th day. 1st day 2nd day Func Func As long as we know the activity of the last sunny day. 2nd day

Interpretation - Neural Science With RNN

Page 48: Recurrent Neural Networks Lecture 11 - Part A · 2nd day 3th day 4th day 5th day 6th day. 1st day 2nd day Func Func As long as we know the activity of the last sunny day. 2nd day

Generated C code - Trained on Linux kernel code

https://github.com/karpathy/char-rnn

Page 49: Recurrent Neural Networks Lecture 11 - Part A · 2nd day 3th day 4th day 5th day 6th day. 1st day 2nd day Func Func As long as we know the activity of the last sunny day. 2nd day
Page 50: Recurrent Neural Networks Lecture 11 - Part A · 2nd day 3th day 4th day 5th day 6th day. 1st day 2nd day Func Func As long as we know the activity of the last sunny day. 2nd day
Page 51: Recurrent Neural Networks Lecture 11 - Part A · 2nd day 3th day 4th day 5th day 6th day. 1st day 2nd day Func Func As long as we know the activity of the last sunny day. 2nd day

Interpretation - Neural Science With RNN

Page 52: Recurrent Neural Networks Lecture 11 - Part A · 2nd day 3th day 4th day 5th day 6th day. 1st day 2nd day Func Func As long as we know the activity of the last sunny day. 2nd day

Interpretation - Neural Science With RNN

Page 53: Recurrent Neural Networks Lecture 11 - Part A · 2nd day 3th day 4th day 5th day 6th day. 1st day 2nd day Func Func As long as we know the activity of the last sunny day. 2nd day

Interpretation - Neural Science With RNN

Page 54: Recurrent Neural Networks Lecture 11 - Part A · 2nd day 3th day 4th day 5th day 6th day. 1st day 2nd day Func Func As long as we know the activity of the last sunny day. 2nd day

Recurrent Neural Networks for Folk Music Generation

https://highnoongmt.wordpress.com/2015/05/22/lisls-stis-recurrent-neural-

networks-for-folk-music-generation/

Page 55: Recurrent Neural Networks Lecture 11 - Part A · 2nd day 3th day 4th day 5th day 6th day. 1st day 2nd day Func Func As long as we know the activity of the last sunny day. 2nd day

https://imgur.com/gallery/u76wY

Page 56: Recurrent Neural Networks Lecture 11 - Part A · 2nd day 3th day 4th day 5th day 6th day. 1st day 2nd day Func Func As long as we know the activity of the last sunny day. 2nd day

Image captioning with ConvNets and RNNs

- Back to CNN - How RNN is integrated for

application related to image processing?

Page 57: Recurrent Neural Networks Lecture 11 - Part A · 2nd day 3th day 4th day 5th day 6th day. 1st day 2nd day Func Func As long as we know the activity of the last sunny day. 2nd day
Page 58: Recurrent Neural Networks Lecture 11 - Part A · 2nd day 3th day 4th day 5th day 6th day. 1st day 2nd day Func Func As long as we know the activity of the last sunny day. 2nd day
Page 59: Recurrent Neural Networks Lecture 11 - Part A · 2nd day 3th day 4th day 5th day 6th day. 1st day 2nd day Func Func As long as we know the activity of the last sunny day. 2nd day
Page 60: Recurrent Neural Networks Lecture 11 - Part A · 2nd day 3th day 4th day 5th day 6th day. 1st day 2nd day Func Func As long as we know the activity of the last sunny day. 2nd day

Image captioning with ConvNetsAnd RNNs

- Convolutional Networks express a single

differentiable function from raw image pixel

values to class probabilities

“VGGNet” or “OxfordNet”

(5 conv layers and 4 pooling layers)

“Very Deep Convolutional Networks for Large-Scale Visual Recognition” [Simonyan

and Zisserman, 2014]

Page 61: Recurrent Neural Networks Lecture 11 - Part A · 2nd day 3th day 4th day 5th day 6th day. 1st day 2nd day Func Func As long as we know the activity of the last sunny day. 2nd day

- We use the FC-4096 layer as the image

representation and push it to RNNs which

generate sentences as we saw before

Image captioning with ConvNets and RNNs

Page 62: Recurrent Neural Networks Lecture 11 - Part A · 2nd day 3th day 4th day 5th day 6th day. 1st day 2nd day Func Func As long as we know the activity of the last sunny day. 2nd day

- First caption input will be constant string -

<START>

X0 =

<START>

H0

Image captioning with ConvNets and RNNs

Wih

Page 63: Recurrent Neural Networks Lecture 11 - Part A · 2nd day 3th day 4th day 5th day 6th day. 1st day 2nd day Func Func As long as we know the activity of the last sunny day. 2nd day

- Generating first word in caption

X0 =

<START>

H0

Image captioning with ConvNets and RNNs

Wih

Y0

Man

Page 64: Recurrent Neural Networks Lecture 11 - Part A · 2nd day 3th day 4th day 5th day 6th day. 1st day 2nd day Func Func As long as we know the activity of the last sunny day. 2nd day

X0 =

<START>

H0

Image captioning with ConvNets and RNNs

Wih

Y0

Man

- We use Y0 as the input for next iteration

X1

H1

Y1

With

Page 65: Recurrent Neural Networks Lecture 11 - Part A · 2nd day 3th day 4th day 5th day 6th day. 1st day 2nd day Func Func As long as we know the activity of the last sunny day. 2nd day

X0 =

<START>

H0

Image captioning with ConvNets and RNNs

Wih

Y0

Man

X1

H1

Y1

With

- continue until Yt = <END>

X2

H2

Y2

a

Page 66: Recurrent Neural Networks Lecture 11 - Part A · 2nd day 3th day 4th day 5th day 6th day. 1st day 2nd day Func Func As long as we know the activity of the last sunny day. 2nd day

X0 =

<START>

H0

Image captioning with ConvNets and RNNs

Wih

Y0

Man

X1

H1

Y1

With

- continue until Yt = <END>

X2

H2

Y2

a

X3

H3

Y3

Dog

Page 67: Recurrent Neural Networks Lecture 11 - Part A · 2nd day 3th day 4th day 5th day 6th day. 1st day 2nd day Func Func As long as we know the activity of the last sunny day. 2nd day

X0 =

<START>

H0

Image captioning with ConvNets and RNNs

Wih

Y0

Man

X1

H1

Y1

With

- continue until Yt = <END>

X2

H2

Y2

a

X3

H3

Y3

X4

H4

Y4

Dog <END>

Page 68: Recurrent Neural Networks Lecture 11 - Part A · 2nd day 3th day 4th day 5th day 6th day. 1st day 2nd day Func Func As long as we know the activity of the last sunny day. 2nd day

X0 =

<START>

H0

Image captioning with ConvNets and RNNs

Wih

Y0

Man

X1

H1

Y1

With

- continue until Yt = <END>

X2

H2

Y2

a

X3

H3

Y3

X4

H4

Y4

Dog <END>

Page 69: Recurrent Neural Networks Lecture 11 - Part A · 2nd day 3th day 4th day 5th day 6th day. 1st day 2nd day Func Func As long as we know the activity of the last sunny day. 2nd day
Page 70: Recurrent Neural Networks Lecture 11 - Part A · 2nd day 3th day 4th day 5th day 6th day. 1st day 2nd day Func Func As long as we know the activity of the last sunny day. 2nd day
Page 71: Recurrent Neural Networks Lecture 11 - Part A · 2nd day 3th day 4th day 5th day 6th day. 1st day 2nd day Func Func As long as we know the activity of the last sunny day. 2nd day
Page 72: Recurrent Neural Networks Lecture 11 - Part A · 2nd day 3th day 4th day 5th day 6th day. 1st day 2nd day Func Func As long as we know the activity of the last sunny day. 2nd day

Summary

- RNN declaration and architectures

- Bits about language processing and RNN effectiveness

- Applications based RNNs

- Integrating with ConvNets

- Next - More advanced memory with Long Short Term

Memory (LSTM) and many more applications based RNNs

Page 73: Recurrent Neural Networks Lecture 11 - Part A · 2nd day 3th day 4th day 5th day 6th day. 1st day 2nd day Func Func As long as we know the activity of the last sunny day. 2nd day

References

- Stanford CS231- Fei-Fei & Justin Johnson & Serena Yeung. Lecture 10

- https://deeplearning4j.org/lstm.html

- Coursera, Machine Learning course by Andrew Ng.

- https://karpathy.github.io/2015/05/21/rnn-effectiveness/

- https://arxiv.org/pdf/1406.6247.pdf

- Udacity - Deep learning, by Luis Serrano

- https://medium.com/syncedreview/a-brief-overview-of-attention-mechanism-13c578ba9129

- http://vision.stanford.edu/pdf/KarpathyICLR2016.pdf

- https://highnoongmt.wordpress.com/2015/05/22/lisls-stis-recurrent-neural-networks-for-folk-music-generation/

- NLP course (IDC) - Kfir Bar - NLM lecture

- https://cs.stanford.edu/people/karpathy/deepimagesent/

- https://arxiv.org/abs/1308.0850

- https://deeplearning4j.org/lstm.html#backpropagation

- https://arxiv.org/pdf/1312.6026.pdf

- https://www.safaribooksonline.com/library/view/neural-networks-and/9781492037354/ch04.html

- https://www.di.ens.fr/~lelarge/dldiy/slides/lecture_8

Page 74: Recurrent Neural Networks Lecture 11 - Part A · 2nd day 3th day 4th day 5th day 6th day. 1st day 2nd day Func Func As long as we know the activity of the last sunny day. 2nd day