38

Designing architectures by hand is hard › wp-content › ... · Designing architectures by hand is hard Change architecture Run experiments on architecture. Analyze results (and

  • Upload
    others

  • View
    4

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Designing architectures by hand is hard › wp-content › ... · Designing architectures by hand is hard Change architecture Run experiments on architecture. Analyze results (and
Page 2: Designing architectures by hand is hard › wp-content › ... · Designing architectures by hand is hard Change architecture Run experiments on architecture. Analyze results (and

Designing architectures by hand is hard

Change architecture

Run experiments on architecture

Analyze results (and bugs, training

details, …) McCulloch-Pitts Neuron: 1943

LSTM: 1997

Page 3: Designing architectures by hand is hard › wp-content › ... · Designing architectures by hand is hard Change architecture Run experiments on architecture. Analyze results (and

Search architectures automatically

• speed up architecture search enormously

• remove the human prior• perhaps reveal what makes a

good architecture

Change architecture

Run experiments on architecture

Analyze results (and bugs, training

details, …)

Controller

PerformanceReward

Boot up GPUs

Baker et al. 2016, Zoph and Le 2017

Page 4: Designing architectures by hand is hard › wp-content › ... · Designing architectures by hand is hard Change architecture Run experiments on architecture. Analyze results (and

Recurrent Neural Networks (RNN)

RNN

𝑥𝑥𝑡𝑡

ℎ𝑡𝑡

Page 5: Designing architectures by hand is hard › wp-content › ... · Designing architectures by hand is hard Change architecture Run experiments on architecture. Analyze results (and

Recurrent Neural Networks (RNN)

Commonly used: Long Short-Term Memory (LSTM)

𝑐𝑐𝑡𝑡

𝑥𝑥𝑡𝑡

ℎ𝑡𝑡

𝑥𝑥𝑡𝑡−1

ℎ𝑡𝑡−1

Page 6: Designing architectures by hand is hard › wp-content › ... · Designing architectures by hand is hard Change architecture Run experiments on architecture. Analyze results (and

Outline

1. Flexible language (DSL) to define architectures

2. Components: Ranking Function & Reinforcement Learning Generator

3. Experiments: Language Modeling & Machine Translation

Page 7: Designing architectures by hand is hard › wp-content › ... · Designing architectures by hand is hard Change architecture Run experiments on architecture. Analyze results (and

Domain Specific Language (DSL)or how to define an architecture

Zoph and Le 2017

Page 8: Designing architectures by hand is hard › wp-content › ... · Designing architectures by hand is hard Change architecture Run experiments on architecture. Analyze results (and

Domain Specific Language (DSL)or how to define an architecture

𝑇𝑇𝑇𝑇𝑇𝑇ℎ(𝐴𝐴𝐴𝐴𝐴𝐴(𝑀𝑀𝑀𝑀 𝑥𝑥𝑡𝑡 ,𝑀𝑀𝑀𝑀 ℎ𝑡𝑡−1 )

Page 9: Designing architectures by hand is hard › wp-content › ... · Designing architectures by hand is hard Change architecture Run experiments on architecture. Analyze results (and

Core• Variables 𝑥𝑥𝑡𝑡 , 𝑥𝑥𝑡𝑡−1,ℎ𝑡𝑡−1• MM• Sigmoid, Tanh, ReLU• Add, Mult• Gate3 𝑥𝑥,𝑦𝑦, 𝑓𝑓

= 𝜎𝜎(𝑓𝑓) � 𝑥𝑥 + (1 − 𝜎𝜎 𝑓𝑓) � 𝑦𝑦• Memory cell 𝑐𝑐𝑡𝑡

Expanded• Sub, Div• Sin, Cos, PosEnc• LayerNorm• SeLU

Domain Specific Language (DSL)or how to define an architecture

Page 10: Designing architectures by hand is hard › wp-content › ... · Designing architectures by hand is hard Change architecture Run experiments on architecture. Analyze results (and

Instantiable Framework

Page 11: Designing architectures by hand is hard › wp-content › ... · Designing architectures by hand is hard Change architecture Run experiments on architecture. Analyze results (and

Architecture Generator

given the current architecture,output the next operator

1. Random

2. REINFORCE

Page 12: Designing architectures by hand is hard › wp-content › ... · Designing architectures by hand is hard Change architecture Run experiments on architecture. Analyze results (and

Reinforcement Learning Generator

ReLU

Performance: 42

Agent Environment

action

observation, reward

Page 13: Designing architectures by hand is hard › wp-content › ... · Designing architectures by hand is hard Change architecture Run experiments on architecture. Analyze results (and

Ranking Function

Goal: predict performance of an architecture

Train with architecture-performance pairs

Page 14: Designing architectures by hand is hard › wp-content › ... · Designing architectures by hand is hard Change architecture Run experiments on architecture. Analyze results (and

Language Modeling

𝑃𝑃 𝑤𝑤𝑖𝑖 𝑤𝑤1,𝑤𝑤2, … ,𝑤𝑤𝑖𝑖−1)“Why did the chicken cross the ___”Performance measurement: perplexity

Page 15: Designing architectures by hand is hard › wp-content › ... · Designing architectures by hand is hard Change architecture Run experiments on architecture. Analyze results (and

Language Modeling (LM) with Random Search + Ranking Function

Page 16: Designing architectures by hand is hard › wp-content › ... · Designing architectures by hand is hard Change architecture Run experiments on architecture. Analyze results (and

LM with Ranking Function:selected architectures improve

Page 17: Designing architectures by hand is hard › wp-content › ... · Designing architectures by hand is hard Change architecture Run experiments on architecture. Analyze results (and

The BC3 cell

Weight matrices 𝑊𝑊,𝑈𝑈,𝑉𝑉,𝑋𝑋 ∈ ℝ𝐻𝐻×𝐻𝐻

Page 18: Designing architectures by hand is hard › wp-content › ... · Designing architectures by hand is hard Change architecture Run experiments on architecture. Analyze results (and

LM with Ranking Function:Improvement over many human architectures

Page 19: Designing architectures by hand is hard › wp-content › ... · Designing architectures by hand is hard Change architecture Run experiments on architecture. Analyze results (and

Machine Translation

Test evaluation: BLEU score

Decoder

Softmax

Encoder

Embed

He loved to eat .

+

Er liebte

ErNULL

Page 20: Designing architectures by hand is hard › wp-content › ... · Designing architectures by hand is hard Change architecture Run experiments on architecture. Analyze results (and

Machine Translationwith Reinforcement Learning Generator

Page 21: Designing architectures by hand is hard › wp-content › ... · Designing architectures by hand is hard Change architecture Run experiments on architecture. Analyze results (and

Machine Translation (MT)with Reinforcement Learning Generator (RL)

• Generator = 3-layer NN (linear-LSTM-linear) outputting action scores

• Choose action with multinomial and epsilon-greedy strategy (𝜖𝜖 = 0.05)

• Train generator on soft priors first (use activations, …)

• Small dataset to evaluate an architecture in ~2 hours

Page 22: Designing architectures by hand is hard › wp-content › ... · Designing architectures by hand is hard Change architecture Run experiments on architecture. Analyze results (and

MT with RL:re-scale loss to reward great architectures more

∞ Loss 0

0Re

war

d

Page 23: Designing architectures by hand is hard › wp-content › ... · Designing architectures by hand is hard Change architecture Run experiments on architecture. Analyze results (and

MT with RL:switch between exploration and exploitation

Epochs

log(

perf

orm

ance

)

Page 24: Designing architectures by hand is hard › wp-content › ... · Designing architectures by hand is hard Change architecture Run experiments on architecture. Analyze results (and

MT with RL:good architectures found

Page 25: Designing architectures by hand is hard › wp-content › ... · Designing architectures by hand is hard Change architecture Run experiments on architecture. Analyze results (and

MT with RL:many good architectures found

Perplexity

Num

ber o

f arc

hite

ctur

es

Page 26: Designing architectures by hand is hard › wp-content › ... · Designing architectures by hand is hard Change architecture Run experiments on architecture. Analyze results (and

MT with RL:rediscovery of human architectures

• 𝐴𝐴𝐴𝐴𝐴𝐴(𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑓𝑓𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇 𝑥𝑥𝑡𝑡 , 𝑥𝑥𝑡𝑡)

variant of residual networks (He et al., 2016)

• 𝐺𝐺𝑇𝑇𝑇𝑇𝐺𝐺𝐺 𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑓𝑓𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇 𝑥𝑥𝑡𝑡 , 𝑥𝑥𝑡𝑡 , 𝑆𝑆𝑇𝑇𝑆𝑆𝑇𝑇𝑇𝑇𝑇𝑇𝐴𝐴 …

highway networks (Srivastava et al., 2015)

• Motifs found in multiple cells

Page 27: Designing architectures by hand is hard › wp-content › ... · Designing architectures by hand is hard Change architecture Run experiments on architecture. Analyze results (and

MT with RL:novel operators only used after “it clicked”

Epochs

Page 28: Designing architectures by hand is hard › wp-content › ... · Designing architectures by hand is hard Change architecture Run experiments on architecture. Analyze results (and

MT with RL:novel operators contribute to successful architectures

Page 29: Designing architectures by hand is hard › wp-content › ... · Designing architectures by hand is hard Change architecture Run experiments on architecture. Analyze results (and

Related work

• Hyper-parameter search: Bergstra et al. 2011, Snoek et al. 2012

• Neuroevolution: Stanley et al. 2009, Bayer et al. 2009, Fernando et al. 2016,

Liu et al. 2017 (← also random search)

• RL search: Baker et al. 2016, Zoph and Le 2017

• Subgraph selection: Pham, Guan et al. 2018

• Weight prediction: Ha et al. 2016, Brock et al. 2018

• Optimizer search: Bello et al. 2017

Page 30: Designing architectures by hand is hard › wp-content › ... · Designing architectures by hand is hard Change architecture Run experiments on architecture. Analyze results (and

Discussion

• Remove need for expert knowledge to a degree• Cost of running these experiments

• us: 5 days on 28 GPUs (best architecture after 40 hours)• Zoph and Le 2017: 4 days using 450 GPUs

• Hard to analyze the diversity of architectures (much more quantitative than qualitative)

• Definition of search space difficult• We’re using a highly complex system

to find other highly complex systemsin a highly complex space

Page 31: Designing architectures by hand is hard › wp-content › ... · Designing architectures by hand is hard Change architecture Run experiments on architecture. Analyze results (and

Contributions

1. Flexible language (DSL) to define

architectures

2. Ranking Function

(Language Modeling)

Reinforcement Learning Generator

(Machine Translation)

3. Explore uncommon operators

• Search architectures that correspond to

biology

• Allow for more flexible search space

• Find architectures that do well on

multiple tasks

Future Work

Page 32: Designing architectures by hand is hard › wp-content › ... · Designing architectures by hand is hard Change architecture Run experiments on architecture. Analyze results (and

Backup

Page 33: Designing architectures by hand is hard › wp-content › ... · Designing architectures by hand is hard Change architecture Run experiments on architecture. Analyze results (and

Compilation: DSL Model

• DSL is basically executable• Traverse tree from source nodes towards final node ℎ𝑡𝑡• Produce code: initialization and forward call• Collect all matrix multiplications on single source node and batch

them

Page 34: Designing architectures by hand is hard › wp-content › ... · Designing architectures by hand is hard Change architecture Run experiments on architecture. Analyze results (and

RL Generator

Maximize expected reward

REINFORCE

Zoph and Le 2017

Page 35: Designing architectures by hand is hard › wp-content › ... · Designing architectures by hand is hard Change architecture Run experiments on architecture. Analyze results (and

Restrictions on generated architectures

• Gate3(…, …, Sigmoid(…))• Have to use 𝑥𝑥𝑡𝑡 ,ℎ𝑡𝑡−1• Maximum 21 nodes, depth 8• Prevent stacking two identical operations

• MM(MM(x)) is mathematically identical to MM(x)• Sigmoid(Sigmoid(x)) is unlikely to be useful• ReLU(ReLU(x)) is redundant

Page 36: Designing architectures by hand is hard › wp-content › ... · Designing architectures by hand is hard Change architecture Run experiments on architecture. Analyze results (and

How to define proper search space?

• Too small will find nothing radically novel• Too big need Google computing ressources

• Baseline experiment parameters restrict successful architectures

Page 37: Designing architectures by hand is hard › wp-content › ... · Designing architectures by hand is hard Change architecture Run experiments on architecture. Analyze results (and

MT with RL:Learned encoding very different

Page 38: Designing architectures by hand is hard › wp-content › ... · Designing architectures by hand is hard Change architecture Run experiments on architecture. Analyze results (and

MT with RL:Parent-Child operator preference