68
A Not that Comprehensive Introduction to Neural Programming Lili Mou [email protected]

A Not that Comprehensive Introduction to Neural Programming · 2020. 4. 18. · Introduction Outline Symbolic execution – (Fully supervised) Neural programmer interpreter – (Weakly

  • Upload
    others

  • View
    5

  • Download
    0

Embed Size (px)

Citation preview

Page 1: A Not that Comprehensive Introduction to Neural Programming · 2020. 4. 18. · Introduction Outline Symbolic execution – (Fully supervised) Neural programmer interpreter – (Weakly

A Not that Comprehensive Introduction to Neural Programming

Lili [email protected]

Page 2: A Not that Comprehensive Introduction to Neural Programming · 2020. 4. 18. · Introduction Outline Symbolic execution – (Fully supervised) Neural programmer interpreter – (Weakly

Outline● Introduction● Symbolic execution

– (Fully supervised) Neural programmer interpreter– (Weakly supervised) Neural symbolic machine– “Spurious programs” and inductive programming

● Learning semantic parsers from denotations● DeepCoder● More thoughts on “spurious programs”

● Distributed execution– Learning to execute– Neural enquirer– Differentiable neural computer

● Hybrid execution– Neural programmer– Coupling approach

Page 3: A Not that Comprehensive Introduction to Neural Programming · 2020. 4. 18. · Introduction Outline Symbolic execution – (Fully supervised) Neural programmer interpreter – (Weakly

Outline● Introduction● Symbolic execution

– (Fully supervised) Neural programmer interpreter– (Weakly supervised) Neural symbolic machine– “Spurious programs” and inductive programming

● Learning semantic parsers from denotations● DeepCoder● More thoughts on “spurious programs”

● Distributed execution– Learning to execute– Neural enquirer– Differentiable neural computer

● Hybrid execution– Neural programmer– Coupling approach

Page 4: A Not that Comprehensive Introduction to Neural Programming · 2020. 4. 18. · Introduction Outline Symbolic execution – (Fully supervised) Neural programmer interpreter – (Weakly
Page 5: A Not that Comprehensive Introduction to Neural Programming · 2020. 4. 18. · Introduction Outline Symbolic execution – (Fully supervised) Neural programmer interpreter – (Weakly
Page 6: A Not that Comprehensive Introduction to Neural Programming · 2020. 4. 18. · Introduction Outline Symbolic execution – (Fully supervised) Neural programmer interpreter – (Weakly
Page 7: A Not that Comprehensive Introduction to Neural Programming · 2020. 4. 18. · Introduction Outline Symbolic execution – (Fully supervised) Neural programmer interpreter – (Weakly
Page 8: A Not that Comprehensive Introduction to Neural Programming · 2020. 4. 18. · Introduction Outline Symbolic execution – (Fully supervised) Neural programmer interpreter – (Weakly

Example

● Grade­school addition

Page 9: A Not that Comprehensive Introduction to Neural Programming · 2020. 4. 18. · Introduction Outline Symbolic execution – (Fully supervised) Neural programmer interpreter – (Weakly

Example

● Grade­school addition

Add for one digit (a wrapper)

Page 10: A Not that Comprehensive Introduction to Neural Programming · 2020. 4. 18. · Introduction Outline Symbolic execution – (Fully supervised) Neural programmer interpreter – (Weakly

Example

● Grade­school addition

Write “1” at the output slip

Page 11: A Not that Comprehensive Introduction to Neural Programming · 2020. 4. 18. · Introduction Outline Symbolic execution – (Fully supervised) Neural programmer interpreter – (Weakly

Example

● Grade­school addition

Move printers left

Page 12: A Not that Comprehensive Introduction to Neural Programming · 2020. 4. 18. · Introduction Outline Symbolic execution – (Fully supervised) Neural programmer interpreter – (Weakly

Recursion?

Page 13: A Not that Comprehensive Introduction to Neural Programming · 2020. 4. 18. · Introduction Outline Symbolic execution – (Fully supervised) Neural programmer interpreter – (Weakly

The Magic Here

● No magic● Fully supervised

– What “ADD” sees is● ADD1, LSHIFT, ADD, RETURN

– What “ADD1” sees is● WRITE (parameters: OUT, 1)● CARRY, RETURN

– Etc.

Page 14: A Not that Comprehensive Introduction to Neural Programming · 2020. 4. 18. · Introduction Outline Symbolic execution – (Fully supervised) Neural programmer interpreter – (Weakly

Design Philosophy

● Define a “local” context, which the current step of execution fully depends on.

● Manually program in terms of the above “language.”

● Generate execution sequences (a.k.a. program traces) with some (not many) examples.

Page 15: A Not that Comprehensive Introduction to Neural Programming · 2020. 4. 18. · Introduction Outline Symbolic execution – (Fully supervised) Neural programmer interpreter – (Weakly

Outline● Introduction● Symbolic execution

– (Fully supervised) Neural programmer interpreter– (Weakly supervised) Neural symbolic machine– “Spurious programs” and inductive programming

● Learning semantic parsers from denotations● DeepCoder● More thoughts on “spurious programs”

● Distributed execution– Learning to execute– Neural enquirer– Differentiable neural computer

● Hybrid execution– Neural programmer– Coupling approach

Page 16: A Not that Comprehensive Introduction to Neural Programming · 2020. 4. 18. · Introduction Outline Symbolic execution – (Fully supervised) Neural programmer interpreter – (Weakly
Page 17: A Not that Comprehensive Introduction to Neural Programming · 2020. 4. 18. · Introduction Outline Symbolic execution – (Fully supervised) Neural programmer interpreter – (Weakly

Architecture

● Generally seq2seq

Page 18: A Not that Comprehensive Introduction to Neural Programming · 2020. 4. 18. · Introduction Outline Symbolic execution – (Fully supervised) Neural programmer interpreter – (Weakly

Architecture

● But with tricks– Entities marked with GRU output as embeddings– Searched with attention

Page 19: A Not that Comprehensive Introduction to Neural Programming · 2020. 4. 18. · Introduction Outline Symbolic execution – (Fully supervised) Neural programmer interpreter – (Weakly

Training

● Generally REINFORCE

Page 20: A Not that Comprehensive Introduction to Neural Programming · 2020. 4. 18. · Introduction Outline Symbolic execution – (Fully supervised) Neural programmer interpreter – (Weakly

Training

● But with more tricks– Beam search instead of sampling– Keep the “best” obtained programs so far and 

perform supervised training in addition to REINFORCE

Page 21: A Not that Comprehensive Introduction to Neural Programming · 2020. 4. 18. · Introduction Outline Symbolic execution – (Fully supervised) Neural programmer interpreter – (Weakly

Outline● Introduction● Symbolic execution

– (Fully supervised) Neural programmer interpreter– (Weakly supervised) Neural symbolic machine– “Spurious programs” and inductive programming

● Learning semantic parsers from denotations● DeepCoder● More thoughts on “spurious programs”

● Distributed execution– Learning to execute– Neural enquirer– Differentiable neural computer

● Hybrid execution– Neural programmer– Coupling approach

Page 22: A Not that Comprehensive Introduction to Neural Programming · 2020. 4. 18. · Introduction Outline Symbolic execution – (Fully supervised) Neural programmer interpreter – (Weakly

(ACL­16)

● Spurious programs● Fictitious worlds

Page 23: A Not that Comprehensive Introduction to Neural Programming · 2020. 4. 18. · Introduction Outline Symbolic execution – (Fully supervised) Neural programmer interpreter – (Weakly

The Problem of “Spurious Programs”

Page 24: A Not that Comprehensive Introduction to Neural Programming · 2020. 4. 18. · Introduction Outline Symbolic execution – (Fully supervised) Neural programmer interpreter – (Weakly

Solution: Generating fictitious data 

● Equivalent classesq1={z1, z2, z3}q2={z4}q3={z5,z6}

● Choose worlds with most information gain

Page 25: A Not that Comprehensive Introduction to Neural Programming · 2020. 4. 18. · Introduction Outline Symbolic execution – (Fully supervised) Neural programmer interpreter – (Weakly

Outline● Introduction● Symbolic execution

– (Fully supervised) Neural programmer interpreter– (Weakly supervised) Neural symbolic machine– “Spurious programs” and inductive programming

● Learning semantic parsers from denotations● DeepCoder● More thoughts on “spurious programs”

● Distributed execution– Learning to execute– Neural enquirer– Differentiable neural computer

● Hybrid execution– Neural programmer– Coupling approach

Page 26: A Not that Comprehensive Introduction to Neural Programming · 2020. 4. 18. · Introduction Outline Symbolic execution – (Fully supervised) Neural programmer interpreter – (Weakly
Page 27: A Not that Comprehensive Introduction to Neural Programming · 2020. 4. 18. · Introduction Outline Symbolic execution – (Fully supervised) Neural programmer interpreter – (Weakly
Page 28: A Not that Comprehensive Introduction to Neural Programming · 2020. 4. 18. · Introduction Outline Symbolic execution – (Fully supervised) Neural programmer interpreter – (Weakly

Search

● DFS– DFS can opt to consider the functions as ordered 

by their predicted probabilities from the neural network

● “Sort and add” enumeration● Sketch

● Not really “structured” prediction

Page 29: A Not that Comprehensive Introduction to Neural Programming · 2020. 4. 18. · Introduction Outline Symbolic execution – (Fully supervised) Neural programmer interpreter – (Weakly

Outline● Introduction● Symbolic execution

– (Fully supervised) Neural programmer interpreter– (Weakly supervised) Neural symbolic machine– “Spurious programs” and inductive programming

● Learning semantic parsers from denotations● DeepCoder● More thoughts on “spurious programs”

● Distributed execution– Learning to execute– Neural enquirer– Differentiable neural computer

● Hybrid execution– Neural programmer– Coupling approach

Page 30: A Not that Comprehensive Introduction to Neural Programming · 2020. 4. 18. · Introduction Outline Symbolic execution – (Fully supervised) Neural programmer interpreter – (Weakly

Outline● Introduction● Symbolic execution

– (Fully supervised) Neural programmer interpreter– (Weakly supervised) Neural symbolic machine– “Spurious programs” and inductive programming

● Learning semantic parsers from denotations● DeepCoder● More thoughts on “spurious programs”

● Distributed execution– Learning to execute– Neural enquirer– Differentiable neural computer

● Hybrid execution– Neural programmer– Coupling approach

Page 31: A Not that Comprehensive Introduction to Neural Programming · 2020. 4. 18. · Introduction Outline Symbolic execution – (Fully supervised) Neural programmer interpreter – (Weakly

Essentially a seq2seq model

Page 32: A Not that Comprehensive Introduction to Neural Programming · 2020. 4. 18. · Introduction Outline Symbolic execution – (Fully supervised) Neural programmer interpreter – (Weakly

Outline● Introduction● Symbolic execution

– (Fully supervised) Neural programmer interpreter– (Weakly supervised) Neural symbolic machine– “Spurious programs” and inductive programming

● Learning semantic parsers from denotations● DeepCoder● More thoughts on “spurious programs”

● Distributed execution– Learning to execute– Neural enquirer– Differentiable neural computer

● Hybrid execution– Neural programmer– Coupling approach

Page 33: A Not that Comprehensive Introduction to Neural Programming · 2020. 4. 18. · Introduction Outline Symbolic execution – (Fully supervised) Neural programmer interpreter – (Weakly
Page 34: A Not that Comprehensive Introduction to Neural Programming · 2020. 4. 18. · Introduction Outline Symbolic execution – (Fully supervised) Neural programmer interpreter – (Weakly

Three Modules●  Query encoder: BiRNN● Table encoder

– Concatenation of cell and field embeddings– Further processed by a multi­layer perceptron

● Executor– Column attention 

(soft selection)– Row annotation 

(distributed selection)

Page 35: A Not that Comprehensive Introduction to Neural Programming · 2020. 4. 18. · Introduction Outline Symbolic execution – (Fully supervised) Neural programmer interpreter – (Weakly

Executor

● The result of one­step execution softmax attention over columns and a distributed – Column attention– Row annotation

Page 36: A Not that Comprehensive Introduction to Neural Programming · 2020. 4. 18. · Introduction Outline Symbolic execution – (Fully supervised) Neural programmer interpreter – (Weakly

Details

● Last step’s execution information

● Current step– Column attention– Row annotation

Page 37: A Not that Comprehensive Introduction to Neural Programming · 2020. 4. 18. · Introduction Outline Symbolic execution – (Fully supervised) Neural programmer interpreter – (Weakly

Outline● Introduction● Symbolic execution

– (Fully supervised) Neural programmer interpreter– (Weakly supervised) Neural symbolic machine– “Spurious programs” and inductive programming

● Learning semantic parsers from denotations● DeepCoder● More thoughts on “spurious programs”

● Distributed execution– Learning to execute– Neural enquirer– Differentiable neural computer

● Hybrid execution– Neural programmer– Coupling approach

Page 38: A Not that Comprehensive Introduction to Neural Programming · 2020. 4. 18. · Introduction Outline Symbolic execution – (Fully supervised) Neural programmer interpreter – (Weakly
Page 39: A Not that Comprehensive Introduction to Neural Programming · 2020. 4. 18. · Introduction Outline Symbolic execution – (Fully supervised) Neural programmer interpreter – (Weakly

Memory and its Access

● Memory:– N: # of memory slots

– W: Width/dim of a slot

● Reader:–  R readers

– Content addressing

+ Temporal linking addressing

● Writer:– Exact 1 writer

– Content addressing + dynamic memory allocation

Page 40: A Not that Comprehensive Introduction to Neural Programming · 2020. 4. 18. · Introduction Outline Symbolic execution – (Fully supervised) Neural programmer interpreter – (Weakly

Architecture

● Input– Current signal– Previous step’s memory read

● Output– Value

– Control(so that output is dependent on read memory)

Page 41: A Not that Comprehensive Introduction to Neural Programming · 2020. 4. 18. · Introduction Outline Symbolic execution – (Fully supervised) Neural programmer interpreter – (Weakly

Controller: Preliminary

● To obtain a gate: sigmoid● To obtain a distribution: softmax● To constrain in [1, infinity)

● A distribution

● No more than a distribution (detracted by a default null prob.)

Page 42: A Not that Comprehensive Introduction to Neural Programming · 2020. 4. 18. · Introduction Outline Symbolic execution – (Fully supervised) Neural programmer interpreter – (Weakly

A Glance at the Control Signals

Page 43: A Not that Comprehensive Introduction to Neural Programming · 2020. 4. 18. · Introduction Outline Symbolic execution – (Fully supervised) Neural programmer interpreter – (Weakly

Memory Reading and Writing● Reader

– Reading weights– Read vectors

where M: <N x W>, each w: <N x 1>

(no more than attention)

● Writer– Writing weight regarding a memory slot– Erasing vector regarding a memory dimension

(given by the controller)– Writing operation

E: 1’s    |  w 1 & e   1 ==> → → M is determined by v

Page 44: A Not that Comprehensive Introduction to Neural Programming · 2020. 4. 18. · Introduction Outline Symbolic execution – (Fully supervised) Neural programmer interpreter – (Weakly

Memory Addressing

● Content­based addressing– e.g., select 5

● Dynamic memory allocation– Free(.), Malloc(.) and write to it

● Temporal memory linkage– Move to the next element in a linked list

● C.f. CopyNet, Latent Predictor Network

Page 45: A Not that Comprehensive Introduction to Neural Programming · 2020. 4. 18. · Introduction Outline Symbolic execution – (Fully supervised) Neural programmer interpreter – (Weakly

Content­Based Addressing

Size:  <N>

k and beta given by controller

Page 46: A Not that Comprehensive Introduction to Neural Programming · 2020. 4. 18. · Introduction Outline Symbolic execution – (Fully supervised) Neural programmer interpreter – (Weakly

Dynamic Memory Allocation● Memory usage vector

1=used, 0=unused

● Memory retention vectorhow much each location will not be freed by the free gates

where free gates are given by the controllerFree is in the sense of “malloc and free,” rather than “free­style.”

Read more=> retain lessf larger => retrain less

Page 47: A Not that Comprehensive Introduction to Neural Programming · 2020. 4. 18. · Introduction Outline Symbolic execution – (Fully supervised) Neural programmer interpreter – (Weakly

● Memory usage vector recursion– 1=used, 0=unused

– u = [u * (1 – w) + 1* w] * psi 

–         : last step’s writing weight

– write more => use more

● Free list– Sorted list of memory usage

–         : least used memory index

Dynamic Memory Allocation

Page 48: A Not that Comprehensive Introduction to Neural Programming · 2020. 4. 18. · Introduction Outline Symbolic execution – (Fully supervised) Neural programmer interpreter – (Weakly

● Allocation weighting

– 1 – u: used more => allocate less

–               : sorted list of memory usage 

                    –                        least used=> allocate more 

Dynamic Memory Allocation

Page 49: A Not that Comprehensive Introduction to Neural Programming · 2020. 4. 18. · Introduction Outline Symbolic execution – (Fully supervised) Neural programmer interpreter – (Weakly

Write Weighting

● gw: writing gate, deciding to write or not● ga: memory allocation gate for write weighting● Both given by the controller 

Page 50: A Not that Comprehensive Introduction to Neural Programming · 2020. 4. 18. · Introduction Outline Symbolic execution – (Fully supervised) Neural programmer interpreter – (Weakly

Read Weighting

Page 51: A Not that Comprehensive Introduction to Neural Programming · 2020. 4. 18. · Introduction Outline Symbolic execution – (Fully supervised) Neural programmer interpreter – (Weakly

● Precedence weighting

– Degree to which a slot is lastly written

– Current writing weight plus pt­1 

– pt­1  discounted by summed writing weight 

Temporal memory linkage

Page 52: A Not that Comprehensive Introduction to Neural Programming · 2020. 4. 18. · Introduction Outline Symbolic execution – (Fully supervised) Neural programmer interpreter – (Weakly

Temporal memory linkage

● Linking matrix

– Degree to which location i is written after j– slot i written at the current step

slot j written at the last step==> L[i,j]   1 →

– Slot i not written ==> L[i,j] tends to remain– Slot j written ==> L[i,j]   0, i.e., refresh→

Page 53: A Not that Comprehensive Introduction to Neural Programming · 2020. 4. 18. · Introduction Outline Symbolic execution – (Fully supervised) Neural programmer interpreter – (Weakly

● Backward/forward weighting

Temporal memory linkage

Linking weight * last step’s reading weight

Page 54: A Not that Comprehensive Introduction to Neural Programming · 2020. 4. 18. · Introduction Outline Symbolic execution – (Fully supervised) Neural programmer interpreter – (Weakly
Page 55: A Not that Comprehensive Introduction to Neural Programming · 2020. 4. 18. · Introduction Outline Symbolic execution – (Fully supervised) Neural programmer interpreter – (Weakly

Outline● Introduction● Symbolic execution

– (Fully supervised) Neural programmer interpreter– (Weakly supervised) Neural symbolic machine– “Spurious programs” and inductive programming

● Learning semantic parsers from denotations● DeepCoder● More thoughts on “spurious programs”

● Distributed execution– Learning to execute– Neural enquirer– Differentiable neural computer

● Hybrid execution– Neural programmer– Coupling approach

Page 56: A Not that Comprehensive Introduction to Neural Programming · 2020. 4. 18. · Introduction Outline Symbolic execution – (Fully supervised) Neural programmer interpreter – (Weakly
Page 57: A Not that Comprehensive Introduction to Neural Programming · 2020. 4. 18. · Introduction Outline Symbolic execution – (Fully supervised) Neural programmer interpreter – (Weakly

Idea

● Step­by­step fusion of different operators (and arguments)

● Trained by MSE● Fully differentiable● Drawbacks

– Only numerical results (?)– Exponential # of combinatorial states 

Page 58: A Not that Comprehensive Introduction to Neural Programming · 2020. 4. 18. · Introduction Outline Symbolic execution – (Fully supervised) Neural programmer interpreter – (Weakly

Arthitecture● Question RNN● Selector (Controller)● Operators● History RNN

Page 59: A Not that Comprehensive Introduction to Neural Programming · 2020. 4. 18. · Introduction Outline Symbolic execution – (Fully supervised) Neural programmer interpreter – (Weakly

Selector (Controller)

● Select an operator

● Select a column for processing

Page 60: A Not that Comprehensive Introduction to Neural Programming · 2020. 4. 18. · Introduction Outline Symbolic execution – (Fully supervised) Neural programmer interpreter – (Weakly

Operators

Page 61: A Not that Comprehensive Introduction to Neural Programming · 2020. 4. 18. · Introduction Outline Symbolic execution – (Fully supervised) Neural programmer interpreter – (Weakly

Execution Output

Page 62: A Not that Comprehensive Introduction to Neural Programming · 2020. 4. 18. · Introduction Outline Symbolic execution – (Fully supervised) Neural programmer interpreter – (Weakly

Text Matching

● Example: what is the sum of elements in column B whose field in column C is word:1 and field in column A is word:7?

● Text matching (no textual operation or output)

Page 63: A Not that Comprehensive Introduction to Neural Programming · 2020. 4. 18. · Introduction Outline Symbolic execution – (Fully supervised) Neural programmer interpreter – (Weakly

Training Objective

● Scalar answer

● List answer

● Overall

Page 64: A Not that Comprehensive Introduction to Neural Programming · 2020. 4. 18. · Introduction Outline Symbolic execution – (Fully supervised) Neural programmer interpreter – (Weakly

Outline● Introduction● Symbolic execution

– (Fully supervised) Neural programmer interpreter– (Weakly supervised) Neural symbolic machine– “Spurious programs” and inductive programming

● Learning semantic parsers from denotations● DeepCoder● More thoughts on “spurious programs”

● Distributed execution– Learning to execute– Neural enquirer– Differentiable neural computer

● Hybrid execution– Neural programmer– Coupling approach

Page 65: A Not that Comprehensive Introduction to Neural Programming · 2020. 4. 18. · Introduction Outline Symbolic execution – (Fully supervised) Neural programmer interpreter – (Weakly

Overview

Page 66: A Not that Comprehensive Introduction to Neural Programming · 2020. 4. 18. · Introduction Outline Symbolic execution – (Fully supervised) Neural programmer interpreter – (Weakly

Idea

● Fully neuralized model exhibits (imperfect) symbolic interpretation

● Use neural networks’ intermediate results to learn initial policy of the symbolic executor

● Improve policy by REINFORCE or MML variants

Page 67: A Not that Comprehensive Introduction to Neural Programming · 2020. 4. 18. · Introduction Outline Symbolic execution – (Fully supervised) Neural programmer interpreter – (Weakly

Thank you for listening

Q & A?

Page 68: A Not that Comprehensive Introduction to Neural Programming · 2020. 4. 18. · Introduction Outline Symbolic execution – (Fully supervised) Neural programmer interpreter – (Weakly

References1.Scott Reed, Nando de Freitas. “Neural programmer­interpreters.” In ICLR, 2016.

2.Jonathon Cai, Richard Shin, Dawn Song. “Making neural programming architectures generalize via recursion.” In ICLR, 2017.

3. Chen Liang, Jonathan Berant, Quoc Le, Kenneth D. Forbus, Ni Lao. “Neural symbolic machines: Learning semantic parsers on freebase with weak supervision.” arXiv preprint arXiv:1611.00020, 2016.

4. Arun Chaganty, Percy Liang. “Inferring logical forms from denotations.” In ACL, 2016.

5. Matej Balog, Alexander L. Gaunt, Marc Brockschmidt, Sebastian Nowozin, Daniel Tarlow. “DeepCoder: Learning to write programs.” In ICLR, 2017.

6. Wojciech Zaremba, Ilya Sutskever. “Learning to execute.” arXiv preprint arXiv:1410.4615, 2014.

7. Pengcheng Yin, Zhengdong Lu, Hang Li, Ben Kao. “Neural enquirer: Learning to query tables with natural language.” In IJCAI, 2016.

8. Alex Graves, et al. “Hybrid computing using a neural network with dynamic external memory.” Nature, 2016.

9. Arvind Neelakantan, Quoc Le, Ilya Sutskever. “Neural programmer: Inducing latent programs with gradient descent.” In ICLR, 2016.

10.   Lili Mou, Zhengdong Lu, Hang Li, Zhi Jin. "Coupling distributed and symbolic execution for natural language queries." arXiv preprint arXiv:1612.02741, 2016.