30
Invertible Residual Networks Jens Behrmann* Will Grathwohl* Ricky T. Q. Chen David Duvenaud Jörn-Henrik Jacobsen* (*equal contribution)

Jens Behrmann Will Grathwohl* Ricky T. Q. Chen David ...duvenaud/talks/iresnet-slides.pdf• Previous approaches: – exact computation of log-determinant via constraining architecture

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Jens Behrmann Will Grathwohl* Ricky T. Q. Chen David ...duvenaud/talks/iresnet-slides.pdf• Previous approaches: – exact computation of log-determinant via constraining architecture

Invertible Residual NetworksJens Behrmann*Will Grathwohl*Ricky T. Q. ChenDavid Duvenaud

Jörn-Henrik Jacobsen*

(*equal contribution)

Page 2: Jens Behrmann Will Grathwohl* Ricky T. Q. Chen David ...duvenaud/talks/iresnet-slides.pdf• Previous approaches: – exact computation of log-determinant via constraining architecture

What are Invertible Neural Networks?

Invertible Neural Networks (INNs)are bijective functionapproximators whichhave a forward mapping

and an inverse mapping

2Invertible Residual Networks

Non-invertible Invertible

Page 3: Jens Behrmann Will Grathwohl* Ricky T. Q. Chen David ...duvenaud/talks/iresnet-slides.pdf• Previous approaches: – exact computation of log-determinant via constraining architecture

Why Invertible Networks?• Mostly known because of Normalizing Flows

– Training via maximum-likelihood and evaluation of likelihood

Generated samples from GLOW (Kingma et al. 2018)

3Invertible Residual Networks

Vorführender
Präsentationsnotizen
you could possibly do this 3 different slides, since we want to really motivate this here
Page 4: Jens Behrmann Will Grathwohl* Ricky T. Q. Chen David ...duvenaud/talks/iresnet-slides.pdf• Previous approaches: – exact computation of log-determinant via constraining architecture

Workshop: Invertible Networks and Normalizing Flows

Why Invertible Networks?• Generative modeling via invertible mappings with exact

likelihoods (Dinh et al. 2014, Dinh et a. 2016, Kingma et al. 2018, Ho et al. 2019)

– Normalizing Flows• Mutual information preservation

• Analysis and regularization of invariance (Jacobsen et al. 2019)

• Memory-efficient backprop (Gomez et al. 2017)

• Analyzing inverse problems (Ardizzone et al. 2019)

4Invertible Residual Networks

Vorführender
Präsentationsnotizen
you could possibly do this 3 different slides, since we want to really motivate this here
Page 5: Jens Behrmann Will Grathwohl* Ricky T. Q. Chen David ...duvenaud/talks/iresnet-slides.pdf• Previous approaches: – exact computation of log-determinant via constraining architecture

Invertible Networks use Exotic Architectures

• Dimension partitioning and coupling layers (Dinh et al. 2014/2016, Gomez et al. 2017, Jacobsen et al. 2018, Kingma et al. 2018)

– Transforms one part of the input at a time– Choice of partitioning is important

5Invertible Residual Networks

Vorführender
Präsentationsnotizen
figure from https://hci.iwr.uni-heidelberg.de/vislearn/inverse-problems-invertible-neural-networks/
Page 6: Jens Behrmann Will Grathwohl* Ricky T. Q. Chen David ...duvenaud/talks/iresnet-slides.pdf• Previous approaches: – exact computation of log-determinant via constraining architecture

Invertible Networks use Exotic Architectures

• Dimension partitioning and coupling layers (Dinh et al. 2014/2016, Gomez et al. 2017, Jacobsen et al. 2018, Kingma et al. 2018)

– Transforms one part of the input at a time– Choice of partitioning is important

• Invertible dynamics via Neural ODEs (Chen et al. 2018, Grathwohl et al. 2019)

– Requires numerical integration– Hard to tune and often slow due to need of ODE-solver

6Invertible Residual Networks

Vorführender
Präsentationsnotizen
figure from https://hci.iwr.uni-heidelberg.de/vislearn/inverse-problems-invertible-neural-networks/
Page 7: Jens Behrmann Will Grathwohl* Ricky T. Q. Chen David ...duvenaud/talks/iresnet-slides.pdf• Previous approaches: – exact computation of log-determinant via constraining architecture

Why do we move away from standard architectures?

• Partitioning, coupling layers, ODE-based approaches movefurther away from standard architectures– Many new design choices necessary and not well

understood yet

• Why not use most successful discriminative architecture?

• Use connection of ResNet and Euler integration of ODEs (Haber et al. 2018)

7

ResNets

Invertible Residual Networks

Page 8: Jens Behrmann Will Grathwohl* Ricky T. Q. Chen David ...duvenaud/talks/iresnet-slides.pdf• Previous approaches: – exact computation of log-determinant via constraining architecture

Making ResNets invertibleTheorem (sufficient condition for invertible residual layer):

Let be a residual layer, then it is invertible if

where

8Invertible Residual Networks

Page 9: Jens Behrmann Will Grathwohl* Ricky T. Q. Chen David ...duvenaud/talks/iresnet-slides.pdf• Previous approaches: – exact computation of log-determinant via constraining architecture

Making ResNets invertibleTheorem (sufficient condition for invertible residual layer):

Let be a residual layer, then it is invertible if

whereInvertible Residual Networks (i-ResNet)

9Invertible Residual Networks

Page 10: Jens Behrmann Will Grathwohl* Ricky T. Q. Chen David ...duvenaud/talks/iresnet-slides.pdf• Previous approaches: – exact computation of log-determinant via constraining architecture

i-ResNets: Constructive ProofTheorem: (invertible residual layer)Let be a residual layer, then it is invertible if

Proof:Features:Fixed-point equation:

10Invertible Residual Networks

Page 11: Jens Behrmann Will Grathwohl* Ricky T. Q. Chen David ...duvenaud/talks/iresnet-slides.pdf• Previous approaches: – exact computation of log-determinant via constraining architecture

i-ResNets: Constructive ProofTheorem: (invertible residual layer)Let be a residual layer, then it is invertible if

Proof:Features:Fixed-point equation: Use fixed-point iteration:

11Invertible Residual Networks

Page 12: Jens Behrmann Will Grathwohl* Ricky T. Q. Chen David ...duvenaud/talks/iresnet-slides.pdf• Previous approaches: – exact computation of log-determinant via constraining architecture

i-ResNets: Constructive ProofTheorem: (invertible residual layer)Let be a residual layer, then it is invertible if

Proof:Features:Fixed-point equation: Use fixed-point iteration:

Guaranteed convergence to x if g contractive (Banach fixed-point theorem)

12Invertible Residual Networks

Page 13: Jens Behrmann Will Grathwohl* Ricky T. Q. Chen David ...duvenaud/talks/iresnet-slides.pdf• Previous approaches: – exact computation of log-determinant via constraining architecture

Inverting i-ResNets

• Inversion method from proof• Fixed-point iteration:

– Init:

– Iteration:

13Invertible Residual Networks

Page 14: Jens Behrmann Will Grathwohl* Ricky T. Q. Chen David ...duvenaud/talks/iresnet-slides.pdf• Previous approaches: – exact computation of log-determinant via constraining architecture

Inverting i-ResNets

• Inversion method from proof• Fixed-point iteration:

– Init:

– Iteration:

• Rate of convergence depends on Lipschitz constant

• In practice: cost of inverse is 5-10 forward passes

14Invertible Residual Networks

Page 15: Jens Behrmann Will Grathwohl* Ricky T. Q. Chen David ...duvenaud/talks/iresnet-slides.pdf• Previous approaches: – exact computation of log-determinant via constraining architecture

How to build i-ResNets

• Satisfy Lip-condition: data-independent upper bound

15Invertible Residual Networks

Page 16: Jens Behrmann Will Grathwohl* Ricky T. Q. Chen David ...duvenaud/talks/iresnet-slides.pdf• Previous approaches: – exact computation of log-determinant via constraining architecture

How to build i-ResNets

• Satisfy Lip-condition: data-independent upper bound

• Spectral normalization (Miyato et al. 2018, Gouk et al. 2018)

approx of largest singular value via power-iteration

16Invertible Residual Networks

Page 17: Jens Behrmann Will Grathwohl* Ricky T. Q. Chen David ...duvenaud/talks/iresnet-slides.pdf• Previous approaches: – exact computation of log-determinant via constraining architecture

How to build i-ResNets

• Satisfy Lip-condition: data-independent upper bound

• Spectral normalization (Miyato et al. 2018, Gouk et al. 2018)

approx of largest singular value via power-iteration

17Invertible Residual Networks

Page 18: Jens Behrmann Will Grathwohl* Ricky T. Q. Chen David ...duvenaud/talks/iresnet-slides.pdf• Previous approaches: – exact computation of log-determinant via constraining architecture

Validation• Reconstructions

CIFAR10 Data

Reconstructions: i-ResNet

Reconstructions: standard ResNet

18Invertible Residual Networks

Page 19: Jens Behrmann Will Grathwohl* Ricky T. Q. Chen David ...duvenaud/talks/iresnet-slides.pdf• Previous approaches: – exact computation of log-determinant via constraining architecture

Classification Performance

• Competetive performance

• But what do we get additionally?

Generative models via Normalizing Flows

19Invertible Residual Networks

Page 20: Jens Behrmann Will Grathwohl* Ricky T. Q. Chen David ...duvenaud/talks/iresnet-slides.pdf• Previous approaches: – exact computation of log-determinant via constraining architecture

Maximum-Likelihood Generative Modeling with i-ResNets

• We can define a simple generative model as

Gaussian distribution

Data distribution20

Invertible Residual Networks

Page 21: Jens Behrmann Will Grathwohl* Ricky T. Q. Chen David ...duvenaud/talks/iresnet-slides.pdf• Previous approaches: – exact computation of log-determinant via constraining architecture

Maximum-Likelihood Generative Modeling with i-ResNets

• We can define a simple generative model as

• Maximization (and evaluation) of likelihood via change-of-variables

… if is invertible

Gaussian distribution

Data distribution21

Invertible Residual Networks

Page 22: Jens Behrmann Will Grathwohl* Ricky T. Q. Chen David ...duvenaud/talks/iresnet-slides.pdf• Previous approaches: – exact computation of log-determinant via constraining architecture

Maximum-Likelihood Generative Modeling with i-ResNets

• Maximization (and evaluation) of likelihood via change-of-variables

… if is invertible

• Challenges:– Flexible invertible models– Efficient computation of

log-determinant

Gaussian distribution

Data distribution22

Invertible Residual Networks

Page 23: Jens Behrmann Will Grathwohl* Ricky T. Q. Chen David ...duvenaud/talks/iresnet-slides.pdf• Previous approaches: – exact computation of log-determinant via constraining architecture

Efficient Estimation of Likelihood

• Likelihood with log-determinant of Jacobian

• Previous approaches:– exact computation of log-determinant via constraining

architecture to be triangular (Dinh et al. 2016, Kingma et al. 2018)

– ODE-solver and estimation only of trace of Jacobian (Grathwohl et al. 2019)

• We propose an efficient estimator for i-ResNets based on trace-estimation and truncation of a power series

23Invertible Residual Networks

Page 24: Jens Behrmann Will Grathwohl* Ricky T. Q. Chen David ...duvenaud/talks/iresnet-slides.pdf• Previous approaches: – exact computation of log-determinant via constraining architecture

Generative Modeling Results

Data Samples GLOW

24Invertible Residual Networks

Page 25: Jens Behrmann Will Grathwohl* Ricky T. Q. Chen David ...duvenaud/talks/iresnet-slides.pdf• Previous approaches: – exact computation of log-determinant via constraining architecture

Generative Modeling Results

Data Samples GLOW i-ResNets

25Invertible Residual Networks

Page 26: Jens Behrmann Will Grathwohl* Ricky T. Q. Chen David ...duvenaud/talks/iresnet-slides.pdf• Previous approaches: – exact computation of log-determinant via constraining architecture

Generative Modeling Results

GLOW (Kingma et al. 2018) FFJORD (Grathwohl et al. 2019) i-ResNet

26Invertible Residual Networks

Page 27: Jens Behrmann Will Grathwohl* Ricky T. Q. Chen David ...duvenaud/talks/iresnet-slides.pdf• Previous approaches: – exact computation of log-determinant via constraining architecture

i-ResNets Across Tasks

• i-ResNet as an architecture which works well both in discriminative and generative modeling

• i-ResNets are generative models which use the best discriminativearchitecture

• Promising for:– Unsupervised pre-training– Semi-supervised learning

27Invertible Residual Networks

Page 28: Jens Behrmann Will Grathwohl* Ricky T. Q. Chen David ...duvenaud/talks/iresnet-slides.pdf• Previous approaches: – exact computation of log-determinant via constraining architecture

Drawbacks

• Iterative inverse – Fast convergence in practice– Rate depends on Lip-constant and not on dimension

• Requires estimation of log-determinant– Due to free-form of Jacobian– Properties of i-ResNets allows to design efficient estimator

28Invertible Residual Networks

Page 29: Jens Behrmann Will Grathwohl* Ricky T. Q. Chen David ...duvenaud/talks/iresnet-slides.pdf• Previous approaches: – exact computation of log-determinant via constraining architecture

Conclusion

• Simple modification makes ResNets invertible• Stability is guaranteed by construction

• New class of likelihood-based generative models– without structural constraints

• Excellent performance in discriminative/ generative tasks– with one unified architecture

• Promising approach for:– unsupervised pre-training– semi-supervised learning– tasks which require invertibility

29Invertible Residual Networks

Page 30: Jens Behrmann Will Grathwohl* Ricky T. Q. Chen David ...duvenaud/talks/iresnet-slides.pdf• Previous approaches: – exact computation of log-determinant via constraining architecture

See us at Poster #11 (Pacific Ballroom)

30Invertible Residual Networks

Paper:

Code: Follow-up work: Residual Flows for Invertible Generative ModelingInvertible Networks and Normalizing Flows, workshop on Saturday (contributed talk)