Upload
mlconf
View
250
Download
1
Embed Size (px)
Citation preview
Neural Turing Machines: Perils and PromiseDaniel Shank
Overview1.Neural Turing Machines
2.Applications and Performance
3.Challenges and Recommendations
4.Dynamic Neural Computers
328
Neural Turing Machines
428
What’s a Turing Machine?Model of a computer
Memory tape
Read and write heads
528
What’s a Neural Turing Machine?Neural Network “Controller”
Memory
Learns from sequence
Graves et al 2014,arXiv:1410.5401v2
628
Neural Turing Machines are Differentiable Turing Machines‘Sharp’ functions made smooth
Can train with backpropagation
728
Applications and Performance
828
Neural Turing Machines can…Learn simple algorithms (Copy, repeat, recognize simple formal languages...)
Generalize
Do well at language modeling
Do well at bAbI
928
Generalization on Copy/Repeat task
Graves et al 2014
1028
Neural Turing Machines Outperform LSTMs
Graves et al 2014
1128
Balanced Parenthesis
Tristan Deleu https://medium.com/snips-ai/
1228
bAbI dataset1 Mary moved to the bathroom.
2 John went to the hallway.
3 Where is Mary? bathroom 1
4 Daniel went back to the hallway.
5 Sandra moved to the garden.
6 Where is Daniel? hallway 4
7 John moved to the office.
8 Sandra journeyed to the bathroom.
9 Where is Daniel? hallway 4
10 Mary moved to the hallway.
11 Daniel travelled to the office.
12 Where is Daniel? office 11
13 John went back to the garden.
14 John moved to the bedroom.
15 Where is Sandra? bathroom 8
1 Sandra travelled to the office. 2 Sandra went to the bathroom. 3 Where is Sandra? bathroom 2
Small vocabulary
Stories
Context
https://research.facebook.com/research/babi/
1328
bAbI results
Empirical Study on Deep Learning Models for Question AnsweringYu et al. 2015
1428
Challenges and Recommendations
1528
ProblemsArchitecture dependent
Large number of parameters
Doesn’t benefit much from GPU acceleration
Hard to train
1628
Hard to trainNumerical Instability
Using memory is hard
Needs smart optimization
Difficult to use in practice
1728
Combating Numerical Instability: Gradient clippingLimits training speed of parameters
Particularly helpful for learning long range dependencies
1828
Loss clippingCap total response to a given training batch
Helpful in addition to gradient clipping
1928
Graves’ RMSpropA version of back propagation used to train the networkUsed in many of Graves’ RNN papers:
Similar to normalizing gradient updates by their variance, important for the NTM’s high-variability changes in loss.
2028
Adam OptimizerWorks well for many tasks
Comes pre-loaded in most ML frameworks
Like Graves’ RMSprop, smooths gradients
2128
Attention to initializationMemory initialization extremely important
Poor initialization can prevent convergence
Pay particularly close attention to the starting value of the memory
2228
Short sequences first (“Curriculum Learning”)1) Feed in short training data
2) When loss hits a target, increase the size of the input
3) Repeat
2328
Dynamic Neural Computers
2428
Neural Turing Machines “V2”Similar to NTMs, except…
No index shift based addressing
Can ‘allocate’ and ‘deallocate’ memory
Remembers recent memory use
2528
Architecture updates(1)
Graves et al. 2016
2628
Architecture updates(2)
Graves et al. 2016
2728
Dynamic Neural Computer Performance on Inference Tasks
Graves et al. 2016
2828
Dynamic Neural Computer bAbI Results
Graves et al. 2016
2928
ReferencesImplementations:Tensorflow: https://github.com/carpedm20/NTM-tensorflowGo: https://github.com/fumin/ntmTorch: https://github.com/kaishengtai/torch-ntmNode.JS: https://github.com/gcgibson/NTMLasagne: https://github.com/snipsco/ntm-lasagneTheano: https://github.com/shawntan/neural-turing-machines
Papers:
Graves et al. 2016 – Hybrid computing using a neural network with dynamic external memory
Graves et al. 2014 – Neural Turing Machines
Yu et al. 2015 – Empirical Study on Deep Learning Models for Question Answering
Rae et al. 2016 – Scaling Memory-Augmented Neural Networks with Sparse Reads and Writes
3028
NTM operations
The Convolutional Shift parameter has provento be one of if not the most problematic.
3128