Upload
others
View
2
Download
0
Embed Size (px)
Citation preview
8/06/2019 1
Hanoi, June 2019
Truyen TranDeakin University
@truyenoz
truyentran.github.io
letdataspeak.blogspot.com
goo.gl/3jJ1O0
Memory Advancesin Neural Turing Machines
8/06/2019 2
Deep Learning
Domain expert
Knowledge-based
8/06/2019 3
Can we learn from data a model that is as powerful as a Turing machine?
In other words, can we learn a (neural) program that learns to program from data?
8/06/2019 4
Program memory
Outlook
Sparse read/write
Variational memory
Neural Turing Machine
Agenda
Modelling
Three interwoven processes:• Disease progression• Interventions & care processes• Recording rules
Example: Electronic medical records
8/06/2019 5
Source: medicalbillingcodings.org
visits/admissions
time gap ?
prediction point
Abstraction
Need memory to handle thousands of events, compute complex healthcare “grammars”, support chain of reasoning, rapid switching of tasks.
Neural Turing machine (NTM)
A controller that takes input/output and talks to an external memory module.Memory has read/write operations.
The main issue is where to write, and how to update the memory state.All operations are differentiable.
https://rylanschaeffer.github.io/content/research/neural_turing_machine/main.html
8/06/2019 7
Program memory
Outlook
Sparse read/write
Variational memory
Neural Turing Machine
Agenda
Motivation: Dialog system
8/06/2019 8
A dialog system needs to maintain the history of chat (e.g., could be hours)Memory is needed
The generation of response needs to be flexible, adapting to variation of moods, styles Current techniques are mostly based on LSTM, leading
to “stiff” default responses (e.g., “I see”).
There are many ways to express the same thought Variational generative methods are needed. vectorstock
Variational memory encoder-decoder (VMED)
8/06/2019 9
Conditional Variational Auto-Encoder
contextgenerated
latent variables
VMED
contextgenerated
latent variables memory
reads
Sample response
8/06/2019 10
8/06/2019 11
Program memory
Outlook
Sparse read/write
Variational memory
Neural Turing Machine
Agenda
Problems of current NTMs
Lack of theoretical analysis on optimal memory operations.
Previous works are based on intuitions:Location-based reading/writing; temporal linkage reading; least-used writing [Santoro et.al, Graves et.al]
Sparse access over big memory [Rae et.al]
Very slow due to heavy memory read/write computations
12
Cached Uniform Writing (CUW)
13
Ablation StudyMemory-augmented Neural Networks w/wo Uniform Writing
14Task: repeat the input sequence twice
Synthetic tasks: memorize all
15
Synthetic tasks: memorize selectively
16
Synthetic sinusoidal generation: memorize featured points
17
Flatten MNIST classification
18
Document classification
19
8/06/2019 20
Program memory
Outlook
Sparse read/write
Variational memory
Neural Turing Machine
Agenda
Computing devices vs neural counterparts
FSM (1943) ↔ RNNs (1982)PDA (1954) ↔ Stack RNN (1993)TM (1936) ↔ NTM (2014)UTM/VNA (1936/1945) ↔ NUTM--ours (2019)The missing piece: A memory to store programs Neural stored-program memory
NUTM = NTM + NSM
Multi-level modellingHierarchical Regression: if the input is clustered, clustering before regression helps
Prove for low dimensions maybe available, higher dimension?
NSM is beneficial to NTM
Algorithmic single tasks
Sequencing tasks
Continual Learning
Few-shot learning
Question answering (bAbI dataset)
8/06/2019 30
Program memory
Outlook
Sparse read/write
Variational memory
Neural Turing Machine
Agenda
Memory for graphs & relational structuresTuring machine to design machine learning algorithms
Memory-supported reasoningImaginative memorySocial memory: collective mem, theory of mind, memory of others
Full cognitive architecturesTheoretical analysis
8/06/2019 31
https://twitter.com/nvidia/status/1010545517405835264
Towards AGI:Is Human Brain a
(super-)Turing machine?
Memory Advances in Neural Turing MachinesSlide Number 2Slide Number 3Slide Number 4Example: Electronic medical recordsNeural Turing machine (NTM)Slide Number 7Motivation: Dialog systemVariational memory encoder-decoder (VMED)Sample responseSlide Number 11Problems of current NTMsCached Uniform Writing (CUW)Ablation Study�Memory-augmented Neural Networks w/wo Uniform WritingSynthetic tasks: memorize allSynthetic tasks: memorize selectivelySynthetic sinusoidal generation: memorize featured pointsFlatten MNIST classificationDocument classificationSlide Number 20Computing devices vs neural counterpartsNUTM = NTM + NSMMulti-level modellingNSM is beneficial to NTMAlgorithmic single tasksSequencing tasksContinual LearningFew-shot learningQuestion answering (bAbI dataset)Slide Number 30Slide Number 31