Networks Long Short Term Memory - rctn.orgLong Short Term Memory Networks Brian Cheung ... Memory cells and gating units allow information to be stored for long periods of time. Memory

Long Short Term Memory Networks

Brian Cheung

● Proposed by Sepp Hochreiter in 1997● Originally used approximate error gradient

with Real Time Recurrent Learning and truncated backpropagation through time

● Used for processing long range contextual information

Hochreiter & Schmidhuber 1997

LSTM

LSTMs reduce vanishing gradient problem

Standard Recurrent Network LSTM Network

● The darker the shade, the greater the sensitivity● The sensitivity decays exponentially over time as new inputs overwrite the

activation of hidden unit and the network ‘forgets’ the first input

Graves et al 2013

Graves et al 2013

LSTMs reduce vanishing gradient problem

● Memory cells and gating units allow information to be stored for long periods of time.

● Memory cells are additive in time○ Gradients also additive in time which

alleviates vanishing gradient

Backpropagation through time

● Derivative of objective function, O, with respect to linear activations, a

● Gradient descent weight update

xιφωchg

X

Input Gate

Forget Gate

Output Gate

Cell State

Hidden State

g

Input to the LSTM node at a particular time point (character, phoneme, word)

Anatomy of an LSTM node

Determines whether the input will be stored in the memory cell

Determines whether current contents of memory will be forgotten (erased)

Determines if current memory contents will be output

Memory Cell

Output of LSTM node

Input nonlinearity of Memory Cell (i.e. tanh, sigmoid)

Input Gate

Cell StateForget Gate

Output Gate

X

Hidden State

Anatomy of an LSTM node

Input GateForget Gate

Cell State

Output Gate

Hidden State

X

Hidden State

Cell State

*

g

+

**

Time: t-1 t

LSTM Forward Propagation


Cell State

Output Gate

Hidden State

X

Hidden State

Cell State

g

1. Input Gate


Cell State

Output Gate

Hidden State

X

Hidden State

Cell State

g

2. Forget Gate


Cell State

Output Gate

Hidden State

X

Hidden State

Cell State

*

g

+

*

3. Cell State


Cell State

Output Gate

Hidden State

X

Hidden State

Cell State

*

g

+

*

4. Output Gate


Cell State

Output Gate

Hidden State

X

Hidden State

Cell State

*

g

+

**

5. Hidden State


Cell State

Output Gate

Hidden State

X

Hidden State

Cell State

*

g

+

*

Output

*

6. Output


Cell State

Output Gate

Hidden State

X

Hidden State

Cell State

*

g

+

*

Output

...*

LSTM Backprop, Hidden State


Cell State

Output Gate

Hidden State

X

Hidden State

Cell State

*

g

+

*

Output

...*

2. Output Gate


Cell State

Output Gate

Hidden State

X

Hidden State

Cell State

*

g

+

*

Output

...*

3. Cell State


Cell State

Output Gate

Hidden State

X

Hidden State

Cell State

*

g

+

*

Output

...*

4. Forget Gate


Cell State

Output Gate

Hidden State

X

Hidden State

Cell State

*

g

+

*

Output

...*

5. Input Gate


Cell State

Output Gate

Hidden State

X

Hidden State

Cell State

*

g

+

*

Output

...*

6. Input

Image Captioning

Vinyals et al 2014Donahue et al 2014

LSTM Applications

Kiros et al 2014

LSTM Applications

Graves 2014

Handwriting Synthesis

LSTM Applications● Speech recognition (Graves et al 2013)● Neural Machine Translation (Sutskever et al 2014)● Neural Turing Machine (Graves et al 2014)

Documents

Networks Long Short Term Memory - rctn.orgLong Short Term Memory Networks Brian Cheung ... Memory cells and gating units allow information to be stored for long periods of time. Memory