Upload
others
View
24
Download
0
Embed Size (px)
Citation preview
Long Short Term Memory Networks
Brian Cheung
● Proposed by Sepp Hochreiter in 1997● Originally used approximate error gradient
with Real Time Recurrent Learning and truncated backpropagation through time
● Used for processing long range contextual information
Hochreiter & Schmidhuber 1997
LSTM
LSTMs reduce vanishing gradient problem
Standard Recurrent Network LSTM Network
● The darker the shade, the greater the sensitivity● The sensitivity decays exponentially over time as new inputs overwrite the
activation of hidden unit and the network ‘forgets’ the first input
Graves et al 2013
Graves et al 2013
LSTMs reduce vanishing gradient problem
● Memory cells and gating units allow information to be stored for long periods of time.
● Memory cells are additive in time○ Gradients also additive in time which
alleviates vanishing gradient
Backpropagation through time
● Derivative of objective function, O, with respect to linear activations, a
● Gradient descent weight update
xιφωchg
X
Input Gate
Forget Gate
Output Gate
Cell State
Hidden State
g
Input to the LSTM node at a particular time point (character, phoneme, word)
Anatomy of an LSTM node
Determines whether the input will be stored in the memory cell
Determines whether current contents of memory will be forgotten (erased)
Determines if current memory contents will be output
Memory Cell
Output of LSTM node
Input nonlinearity of Memory Cell (i.e. tanh, sigmoid)
Input Gate
Cell StateForget Gate
Output Gate
X
Hidden State
Anatomy of an LSTM node
Input GateForget Gate
Cell State
Output Gate
Hidden State
X
Hidden State
Cell State
*
g
+
**
Time: t-1 t
LSTM Forward Propagation
Input GateForget Gate
Cell State
Output Gate
Hidden State
X
Hidden State
Cell State
g
1. Input Gate
Input GateForget Gate
Cell State
Output Gate
Hidden State
X
Hidden State
Cell State
g
2. Forget Gate
Input GateForget Gate
Cell State
Output Gate
Hidden State
X
Hidden State
Cell State
*
g
+
*
3. Cell State
Input GateForget Gate
Cell State
Output Gate
Hidden State
X
Hidden State
Cell State
*
g
+
*
4. Output Gate
Input GateForget Gate
Cell State
Output Gate
Hidden State
X
Hidden State
Cell State
*
g
+
**
5. Hidden State
Input GateForget Gate
Cell State
Output Gate
Hidden State
X
Hidden State
Cell State
*
g
+
*
Output
*
6. Output
Input GateForget Gate
Cell State
Output Gate
Hidden State
X
Hidden State
Cell State
*
g
+
*
Output
...*
LSTM Backprop, Hidden State
Input GateForget Gate
Cell State
Output Gate
Hidden State
X
Hidden State
Cell State
*
g
+
*
Output
...*
2. Output Gate
Input GateForget Gate
Cell State
Output Gate
Hidden State
X
Hidden State
Cell State
*
g
+
*
Output
...*
3. Cell State
Input GateForget Gate
Cell State
Output Gate
Hidden State
X
Hidden State
Cell State
*
g
+
*
Output
...*
4. Forget Gate
Input GateForget Gate
Cell State
Output Gate
Hidden State
X
Hidden State
Cell State
*
g
+
*
Output
...*
5. Input Gate
Input GateForget Gate
Cell State
Output Gate
Hidden State
X
Hidden State
Cell State
*
g
+
*
Output
...*
6. Input
Image Captioning
Vinyals et al 2014Donahue et al 2014
LSTM Applications
Kiros et al 2014
LSTM Applications
Graves 2014
Handwriting Synthesis
LSTM Applications● Speech recognition (Graves et al 2013)● Neural Machine Translation (Sutskever et al 2014)● Neural Turing Machine (Graves et al 2014)