Upload
others
View
6
Download
0
Embed Size (px)
Citation preview
LSTM: A Search Space Odyssey
Authors: Klaus Greff, Rupesh K. Srivastava, Jan Koutn´ık, Bas R. Steunebrink, J¨urgen Schmidhuber
Outlines
• Introduction
• Long Short-Term Memory (LSTM) with peephole connections
• Experiment and discussion
• Conclusion
Definition:
• Recurrent Neural Networks
• Importance and its applications
• Gradient problem
• Vanishing gradient
• Exploding gradient
• What is the LSTM?
Introduction LSTM with peephole connections Results and discussion Conclusion
LSTM History:
• LSTM was proposed in 1997 by Sepp Hochreiter and Jürgen Schmidhuber.
• In 1999, Felix Gers and Jürgen Schmidhuber and Fred Cummins introduced the
forget gate into LSTM architecture.
• In 2000, Gers & Schmidhuber & Cummins added peephole connections
• In 2014, Kyunghyun Cho et al. put forward a simplified variant called Gated
recurrent unit
Introduction LSTM with peephole connections Results and discussion Conclusion
Simple RNN
Introduction LSTM with peephole connections Results and discussion Conclusion
Block diagram
• Three gates:• Input gate
• Forget gate
• Output gate
• Two blocks:• Block input
• Block output
• One cell state:• cell state
Introduction LSTM with peephole connections Results and discussion Conclusion
Block Diagram
Block input:
𝑊𝑊𝑧𝑧: input weight ( 𝑅𝑅𝑁𝑁 ×𝑀𝑀)
𝑅𝑅𝑧𝑧: recurrent weight ( 𝑅𝑅𝑁𝑁 ×𝑀𝑀)
𝑏𝑏𝑧𝑧: bias weight
𝑥𝑥𝑡𝑡: input vector at time t
𝑦𝑦𝑡𝑡−1: output at time t-1
Input
Recurrent
z
Introduction LSTM with peephole connections Results and discussion Conclusion
Block Diagram
Input gate:𝑊𝑊𝑖𝑖: input weight ( 𝑅𝑅𝑁𝑁 ×𝑀𝑀)
𝑅𝑅𝑖𝑖: recurrent weight ( 𝑅𝑅𝑁𝑁 ×𝑀𝑀)
𝑏𝑏𝑖𝑖: bias weight (𝑅𝑅𝑁𝑁 )
𝑝𝑝𝑖𝑖: peephole weight (𝑅𝑅𝑁𝑁 )
𝑐𝑐𝑡𝑡−1: cell state at time t-1
𝑥𝑥𝑡𝑡: input vector at time t
𝑦𝑦𝑡𝑡−1: output at time t-1
Input
Recurrent
i
𝑐𝑐𝑡𝑡−1
Introduction LSTM with peephole connections Results and discussion Conclusion
Block Diagram
Forget gate:𝑊𝑊𝑓𝑓: input weight ( 𝑅𝑅𝑁𝑁 ×𝑀𝑀)
𝑅𝑅𝑓𝑓: recurrent weight ( 𝑅𝑅𝑁𝑁 ×𝑀𝑀)
𝑏𝑏𝑓𝑓: bias weight (𝑅𝑅𝑁𝑁 )
𝑝𝑝𝑓𝑓: peephole weight (𝑅𝑅𝑁𝑁 )
𝑐𝑐𝑡𝑡−1: cell state at time t-1
𝑥𝑥𝑡𝑡: input vector at time t
𝑦𝑦𝑡𝑡−1: output at time t-1
Input
Recurrent
f
𝑐𝑐𝑡𝑡−1
Introduction LSTM with peephole connections Results and discussion Conclusion
Block Diagram
Output gate:𝑊𝑊𝑜𝑜: input weight ( 𝑅𝑅𝑁𝑁 ×𝑀𝑀)
𝑅𝑅𝑜𝑜: recurrent weight ( 𝑅𝑅𝑁𝑁 ×𝑀𝑀)
𝑏𝑏𝑜𝑜: bias weight (𝑅𝑅𝑁𝑁 )
𝑝𝑝𝑜𝑜: peephole weight (𝑅𝑅𝑁𝑁 )
𝑐𝑐𝑡𝑡−1: cell state at time t-1
𝑥𝑥𝑡𝑡: input vector at time t
𝑦𝑦𝑡𝑡−1: output at time t-1
Input
Recurrent
o
𝑐𝑐𝑡𝑡
Introduction LSTM with peephole connections Results and discussion Conclusion
Block Diagram
State cell:𝑧𝑧𝑡𝑡: the output of block input at time t
𝑖𝑖𝑡𝑡: the output of input gate at time t
𝑐𝑐𝑡𝑡−1: the output of cell state at time
t-1
𝑓𝑓𝑡𝑡: output of forget gate at time t
𝑐𝑐𝑡𝑡−1
𝑖𝑖𝑡𝑡
𝑧𝑧𝑡𝑡
𝑐𝑐𝑡𝑡−1
𝑓𝑓𝑡𝑡
Introduction LSTM with peephole connections Results and discussion Conclusion
Block Diagram
Block output:𝑜𝑜𝑡𝑡: the output of output gate at time t
𝑐𝑐𝑡𝑡: state cell at time tInput
Recurrent
y
Introduction LSTM with peephole connections Results and discussion Conclusion
LSTM Variants
• NIG: No Input Gate: 𝑖𝑖𝑡𝑡 = 1
• NFG: No Forget Gate: 𝑓𝑓𝑡𝑡 = 1
• NOG: No Output Gate: 𝑜𝑜𝑡𝑡 = 1
• NIAF: No Input Activation Function: g(x) = x
• NOAF: No Output Activation Function: h(x) = x
• CIFG: Coupled Input and Forget Gate: 𝑓𝑓𝑡𝑡 = 1- 𝑖𝑖𝑡𝑡
• NP: No Peepholes
• FGR: Full gate recurrence
Introduction LSTM with peephole connections Results and discussion Conclusion
Experiment setup
Datasets:
• TIMIT speech corpus
• IAM Online Handwriting Database
• JSB Chorales
Introduction LSTM with peephole connections Results and discussion Conclusion
Experiment setup
Features:
• TIMIT speech corpus:• extract 12 MFCCs + energy as well as their first and second derivatives
• IAM Online Handwriting Database:• x, y, t and the time of the pen lifting
• JSB Chorales:
• transposing each MIDI sequence in C major or C minor and sampling frames every quarter note.
Introduction LSTM with peephole connections Results and discussion Conclusion
Experiment setup
Network Architectures and training:
Dataset Type of Network Num of Hidden Layer Output Layer Loss Function Training
TIMIT Bidirectional LSTM Two SoftMax Cross-Entropy Error SGD
IAM Online Bidirectional LSTM Two SoftMax CTC Loss SGD
JSB Chorales LSTM one Sigmoid Cross-Entropy Error SGD
Introduction LSTM with peephole connections Results and discussion Conclusion
Comparison of the Variants
• Test set performance for all 200 trials:
Introduction LSTM with peephole connections Results and discussion Conclusion
Comparison of the Variants
• Test set performance for the best 10% trials:
Introduction LSTM with peephole connections Results and discussion Conclusion
Impact of Hyperparameters
Introduction LSTM with peephole connections Results and discussion Conclusion
Interaction of Hyperparameters
Introduction LSTM with peephole connections Results and discussion Conclusion
Total marginal predicted performance
TIMIT:
Introduction LSTM with peephole connections Results and discussion Conclusion
Total marginal predicted performance
IAM Online:
Introduction LSTM with peephole connections Results and discussion Conclusion
Total marginal predicted performance
JSB Chorales :
Introduction LSTM with peephole connections Results and discussion Conclusion
Conclusion
• The most commonly used LSTM architecture performs reasonably well on various datasets.
• Coupling the input and forget gates (CIFG) or removing peephole connections (NP)
simplified LSTMs in these experiments without significantly decreasing performance.
• The forget gate and the output activation function are the most critical components of the
LSTM block
• the learning rate is the most crucial hyperparameter, followed by the network size.
• Hyperparameters are virtually independent
Introduction LSTM with peephole connections Results and discussion Conclusion
References:
• K. Greff, R. K. Srivastava, J. Koutník, B. R. Steunebrink and J. Schmidhuber, "LSTM: A
Search Space Odyssey," in IEEE Transactions on Neural Networks and Learning Systems, vol.
28, no. 10, pp. 2222-2232, Oct. 2017.
• https://www.youtube.com/watch?v=lycKqccytfU
• https://www.youtube.com/watch?v=lWkFhVq9-nc
• https://en.wikipedia.org/wiki/Long_short-term_memory
Introduction LSTM with peephole connections Results and discussion Conclusion