Memory Advances in Neural Turing Machines · 2020. 12. 21. · 8/06/2019 1 Hanoi, June 2019 Truyen Tran Deakin University @truyenoz truyentran.github.io. [email protected]

8/06/2019 1

Hanoi, June 2019

Truyen TranDeakin University

@truyenoz

truyentran.github.io

[email protected]

letdataspeak.blogspot.com

goo.gl/3jJ1O0

Memory Advancesin Neural Turing Machines

8/06/2019 2

Deep Learning

Domain expert

Knowledge-based

8/06/2019 3

Can we learn from data a model that is as powerful as a Turing machine?

In other words, can we learn a (neural) program that learns to program from data?

8/06/2019 4

Program memory

Outlook

Sparse read/write

Variational memory

Neural Turing Machine

Agenda

Modelling

Three interwoven processes:• Disease progression• Interventions & care processes• Recording rules

Example: Electronic medical records

8/06/2019 5

Source: medicalbillingcodings.org

visits/admissions

time gap ?

prediction point

Abstraction

Need memory to handle thousands of events, compute complex healthcare “grammars”, support chain of reasoning, rapid switching of tasks.

Neural Turing machine (NTM)

A controller that takes input/output and talks to an external memory module.Memory has read/write operations.

The main issue is where to write, and how to update the memory state.All operations are differentiable.

https://rylanschaeffer.github.io/content/research/neural_turing_machine/main.html

8/06/2019 7

Program memory

Outlook

Sparse read/write

Variational memory


Agenda

Motivation: Dialog system

8/06/2019 8

A dialog system needs to maintain the history of chat (e.g., could be hours)Memory is needed

The generation of response needs to be flexible, adapting to variation of moods, styles Current techniques are mostly based on LSTM, leading

to “stiff” default responses (e.g., “I see”).

There are many ways to express the same thought Variational generative methods are needed. vectorstock

Variational memory encoder-decoder (VMED)

8/06/2019 9

Conditional Variational Auto-Encoder

contextgenerated

latent variables

VMED

contextgenerated

latent variables memory

reads

Sample response

8/06/2019 10

8/06/2019 11

Program memory

Outlook

Sparse read/write

Variational memory


Agenda

Problems of current NTMs

Lack of theoretical analysis on optimal memory operations.

Previous works are based on intuitions:Location-based reading/writing; temporal linkage reading; least-used writing [Santoro et.al, Graves et.al]

Sparse access over big memory [Rae et.al]

Very slow due to heavy memory read/write computations

12

Cached Uniform Writing (CUW)

13

Ablation StudyMemory-augmented Neural Networks w/wo Uniform Writing

14Task: repeat the input sequence twice

Synthetic tasks: memorize all

15

Synthetic tasks: memorize selectively

16

Synthetic sinusoidal generation: memorize featured points

17

Flatten MNIST classification

18

Document classification

19

8/06/2019 20

Program memory

Outlook

Sparse read/write

Variational memory


Agenda

Computing devices vs neural counterparts

FSM (1943) ↔ RNNs (1982)PDA (1954) ↔ Stack RNN (1993)TM (1936) ↔ NTM (2014)UTM/VNA (1936/1945) ↔ NUTM--ours (2019)The missing piece: A memory to store programs Neural stored-program memory

NUTM = NTM + NSM

Multi-level modellingHierarchical Regression: if the input is clustered, clustering before regression helps

Prove for low dimensions maybe available, higher dimension?

NSM is beneficial to NTM

Algorithmic single tasks

Sequencing tasks

Continual Learning

Few-shot learning

Question answering (bAbI dataset)

8/06/2019 30

Program memory

Outlook

Sparse read/write

Variational memory


Agenda

Memory for graphs & relational structuresTuring machine to design machine learning algorithms

Memory-supported reasoningImaginative memorySocial memory: collective mem, theory of mind, memory of others

Full cognitive architecturesTheoretical analysis

8/06/2019 31

https://twitter.com/nvidia/status/1010545517405835264

Towards AGI:Is Human Brain a

(super-)Turing machine?

Memory Advances in Neural Turing MachinesSlide Number 2Slide Number 3Slide Number 4Example: Electronic medical recordsNeural Turing machine (NTM)Slide Number 7Motivation: Dialog systemVariational memory encoder-decoder (VMED)Sample responseSlide Number 11Problems of current NTMsCached Uniform Writing (CUW)Ablation Study�Memory-augmented Neural Networks w/wo Uniform WritingSynthetic tasks: memorize allSynthetic tasks: memorize selectivelySynthetic sinusoidal generation: memorize featured pointsFlatten MNIST classificationDocument classificationSlide Number 20Computing devices vs neural counterpartsNUTM = NTM + NSMMulti-level modellingNSM is beneficial to NTMAlgorithmic single tasksSequencing tasksContinual LearningFew-shot learningQuestion answering (bAbI dataset)Slide Number 30Slide Number 31

Documents

Memory Advances in Neural Turing Machines · 2020. 12. 21. · 8/06/2019 1 Hanoi, June 2019 Truyen Tran Deakin University @truyenoz truyentran.github.io. [email protected]