22
About generative aspects of Variational Autoencoders LOD’19 The Fifth International Conference on Machine Learning, Optimization, and Data Science September 10-13, 2019 Certosa di Pontignano, Siena, Tuscany, Italy Andrea Asperti DISI - Department of Informatics: Science and Engineering University of Bologna Mura Anteo Zamboni 7, 40127, Bologna, ITALY [email protected] Andrea Asperti - University of Bologna, DISI 1

About generative aspects of Variational Autoencodersasperti/SLIDES/lod19.pdf · Encoder DNN Decoder DNN Can we use the decoder to generate data by sampling in the latent space?

  • Upload
    others

  • View
    9

  • Download
    0

Embed Size (px)

Citation preview

Page 1: About generative aspects of Variational Autoencodersasperti/SLIDES/lod19.pdf · Encoder DNN Decoder DNN Can we use the decoder to generate data by sampling in the latent space?

About generative aspects ofVariational Autoencoders

LOD’19The Fifth International Conference on Machine Learning,

Optimization, and Data ScienceSeptember 10-13, 2019

Certosa di Pontignano, Siena, Tuscany, Italy

Andrea Asperti

DISI - Department of Informatics: Science and EngineeringUniversity of Bologna

Mura Anteo Zamboni 7, 40127, Bologna, [email protected]

Andrea Asperti - University of Bologna, DISI 1

Page 2: About generative aspects of Variational Autoencodersasperti/SLIDES/lod19.pdf · Encoder DNN Decoder DNN Can we use the decoder to generate data by sampling in the latent space?

Generative Models

Generative models are meant to learn rich data distributions,allowing sampling of new data.

Two main classes of generative models

• Generative Adversarial Networks (GANs)

• Variational Autoencoders (VAEs)

At the current state of the art, GANs give better results.

What is the problem with VAEs?

Andrea Asperti - University of Bologna, DISI 2

Page 3: About generative aspects of Variational Autoencodersasperti/SLIDES/lod19.pdf · Encoder DNN Decoder DNN Can we use the decoder to generate data by sampling in the latent space?

Deterministic autoencoder

An autoencoder is a net trained to reconstruct input data out of alearned internal representation (e.g. minimizing quadratic distance)

Space

LatentEncoder

DNN

Decoder

DNN

Can we use the decoder to generate data by sampling in thelatent space?No, since we do not know the distribution of latent variables.

Andrea Asperti - University of Bologna, DISI 3

Page 4: About generative aspects of Variational Autoencodersasperti/SLIDES/lod19.pdf · Encoder DNN Decoder DNN Can we use the decoder to generate data by sampling in the latent space?

Deterministic autoencoder

An autoencoder is a net trained to reconstruct input data out of alearned internal representation (e.g. minimizing quadratic distance)

Space

LatentEncoder

DNN

Decoder

DNN

Can we use the decoder to generate data by sampling in thelatent space?

No, since we do not know the distribution of latent variables.

Andrea Asperti - University of Bologna, DISI 4

Page 5: About generative aspects of Variational Autoencodersasperti/SLIDES/lod19.pdf · Encoder DNN Decoder DNN Can we use the decoder to generate data by sampling in the latent space?

Deterministic autoencoder

An autoencoder is a net trained to reconstruct input data out of alearned internal representation (e.g. minimizing quadratic distance)

Space

LatentEncoder

DNN

Decoder

DNN

Can we use the decoder to generate data by sampling in thelatent space?No, since we do not know the distribution of latent variables.

Andrea Asperti - University of Bologna, DISI 5

Page 6: About generative aspects of Variational Autoencodersasperti/SLIDES/lod19.pdf · Encoder DNN Decoder DNN Can we use the decoder to generate data by sampling in the latent space?

Variational autoencoder

In a Variational Autoencoder (VAE) [9, 10, 6] we try to forcelatent variables to have a known distribution (e.g. a Normaldistribution)

Encoder

DNN

Decoder

DNN

z~N(0,1)

How can we do it? Is this actually working?

Andrea Asperti - University of Bologna, DISI 6

Page 7: About generative aspects of Variational Autoencodersasperti/SLIDES/lod19.pdf · Encoder DNN Decoder DNN Can we use the decoder to generate data by sampling in the latent space?

Variational autoencoder

In a Variational Autoencoder (VAE) [9, 10, 6] we try to forcelatent variables to have a known distribution (e.g. a Normaldistribution)

Encoder

DNN

Decoder

DNN

z~N(0,1)

How can we do it? Is this actually working?

Andrea Asperti - University of Bologna, DISI 7

Page 8: About generative aspects of Variational Autoencodersasperti/SLIDES/lod19.pdf · Encoder DNN Decoder DNN Can we use the decoder to generate data by sampling in the latent space?

The encoding distribution Q(z |X )

Latent Space

Q(z|X )

1X

1

=

Andrea Asperti - University of Bologna, DISI 8

Page 9: About generative aspects of Variational Autoencodersasperti/SLIDES/lod19.pdf · Encoder DNN Decoder DNN Can we use the decoder to generate data by sampling in the latent space?

Estimate relevant statistics for Q(z |X )

Latent Space

Q(z|X )

1X

1

=

Andrea Asperti - University of Bologna, DISI 9

Page 10: About generative aspects of Variational Autoencodersasperti/SLIDES/lod19.pdf · Encoder DNN Decoder DNN Can we use the decoder to generate data by sampling in the latent space?

Estimate relevant statistics for Q(z |X )

Latent Space

1X =

1 µ (X )1 (X )σQ(z|X ) = G( , )1

Andrea Asperti - University of Bologna, DISI 10

Page 11: About generative aspects of Variational Autoencodersasperti/SLIDES/lod19.pdf · Encoder DNN Decoder DNN Can we use the decoder to generate data by sampling in the latent space?

Estimate relevant statistics for Q(z |X )

1X =

1 µ (X )1 (X )σQ(z|X ) = G( , )1

Latent Space

X =2

µ σ2 (X )2 (X )Q(z|X ) = G( , )2

Andrea Asperti - University of Bologna, DISI 11

Page 12: About generative aspects of Variational Autoencodersasperti/SLIDES/lod19.pdf · Encoder DNN Decoder DNN Can we use the decoder to generate data by sampling in the latent space?

Estimate relevant statistics for Q(z |X )

1X =

Latent Space

X =2

We estimate the variance σ(X ) around µ(X ) by gaussiansampling at training time.

Andrea Asperti - University of Bologna, DISI 12

Page 13: About generative aspects of Variational Autoencodersasperti/SLIDES/lod19.pdf · Encoder DNN Decoder DNN Can we use the decoder to generate data by sampling in the latent space?

Kullback-Leibler regularization

1X =

Latent Space

X =2

N(0,1)

minimize the Kullback-Leibler distance between each Q(z |X ) anda normal distribution:

KL(Q(z |X )||N(0, 1))

Andrea Asperti - University of Bologna, DISI 13

Page 14: About generative aspects of Variational Autoencodersasperti/SLIDES/lod19.pdf · Encoder DNN Decoder DNN Can we use the decoder to generate data by sampling in the latent space?

The marginal posterior

1X =

Latent Space

X =2

N(0,1)

The actual distribution of latent variables is the marginal (akacumulative) distribution Q(z), hopefully resembling the priorP(z) = N(0, 1)

Q(z) =∑

X

Q(z |X ) ≈ N(0, 1)

Andrea Asperti - University of Bologna, DISI 14

Page 15: About generative aspects of Variational Autoencodersasperti/SLIDES/lod19.pdf · Encoder DNN Decoder DNN Can we use the decoder to generate data by sampling in the latent space?

MNIST case

Disposition in the latent space of 100 MNIST digits after 10epochs of training

It does indeed have a Guassian shape... Why?

Andrea Asperti - University of Bologna, DISI 15

Page 16: About generative aspects of Variational Autoencodersasperti/SLIDES/lod19.pdf · Encoder DNN Decoder DNN Can we use the decoder to generate data by sampling in the latent space?

Why is KL-divergence working?

Many different answers ... relatively complex theory.

In this article, we investigate the marginal posteriordistribution as a Gaussian Mixture Model (GMM)(one gaussian for each data point).

Andrea Asperti - University of Bologna, DISI 16

Page 17: About generative aspects of Variational Autoencodersasperti/SLIDES/lod19.pdf · Encoder DNN Decoder DNN Can we use the decoder to generate data by sampling in the latent space?

The normalization idea

- For a neural network, it is relatively easy to perform an affinetransformation of the latent space

- The transformation can be compensated in the next layerof the network, keeping the loss invariant.

(same idea behind batch-normalization layers)

- This means we may assume the network is able to keep afixed ratio ρ between the variance and the mean value ofeach latent variable.

Andrea Asperti - University of Bologna, DISI 17

Page 18: About generative aspects of Variational Autoencodersasperti/SLIDES/lod19.pdf · Encoder DNN Decoder DNN Can we use the decoder to generate data by sampling in the latent space?

Pushing ρ in KL-divergence

Pushing ρ in the closed form of theKL-divergence, we get the expres-sion

1

2(σ2(X )

1 + ρ2

ρ2− log(σ2(X ))−1)

which has a minimum when

σ2(X ) + µ2(X ) = 1

Andrea Asperti - University of Bologna, DISI 18

Page 19: About generative aspects of Variational Autoencodersasperti/SLIDES/lod19.pdf · Encoder DNN Decoder DNN Can we use the decoder to generate data by sampling in the latent space?

Corollaries

- The variance law:averaging on all X, we expect that for each latent variable z

σ2z (X ) + σ2z = 1

(supposing µz (X ) = 0)

- By effect of the KL divergence the two first moments of thedistribution of each latent variable should agree with those ofa Normal N(0, 1) distribution

- What about the other moments? Hard to guess.

Andrea Asperti - University of Bologna, DISI 19

Page 20: About generative aspects of Variational Autoencodersasperti/SLIDES/lod19.pdf · Encoder DNN Decoder DNN Can we use the decoder to generate data by sampling in the latent space?

Conclusion

For several years the cause of the mediocre performance of VAEshas been imputed to the so called overpruning phenomenon[2, 11, 12].

Recent research suggests the problem is due to the difformitybetween the latent distribution and the normal prior [4, 5, 1, 7].

Our contribution: we may reasonably expect the KL-divergencewill force the two first moments of the distribution to agree withthose of a Normal distribution, but we may hardly presume thesame for the other moments.

Andrea Asperti - University of Bologna, DISI 20

Page 21: About generative aspects of Variational Autoencodersasperti/SLIDES/lod19.pdf · Encoder DNN Decoder DNN Can we use the decoder to generate data by sampling in the latent space?

Essential bibliography (1)

Andrea Asperti.

Variational Autoencoders and the Variable Collapse PhenomenonSensors & Transducers V.234, N.3, pages 1-8, 2018.

Yuri Burda, Roger B. Grosse, and Ruslan Salakhutdinov.

Importance weighted autoencoders.CoRR, abs/1509.00519, 2015.

Christopher P. Burgess, Irina Higgins, Arka Pal, Loic Matthey, Nick Watters, Guillaume Desjardins, and Alexander

Lerchner.Understanding disentangling in β-vae.2018.

Bin Dai, Yu Wang, John Aston, Gang Hua, and David P. Wipf.

Connections with robust PCA and the role of emergent sparsity in variational autoencoder models.Journal of Machine Learning Research, 19, 2018.

Bin Dai and David P. Wipf.

Diagnosing and enhancing vae models.In Seventh International Conference on Learning Representations (ICLR 2019), May 6-9, New Orleans, 2019.

Carl Doersch.

Tutorial on variational autoencoders.CoRR, abs/1606.05908, 2016.

Andrea Asperti - University of Bologna, DISI 21

Page 22: About generative aspects of Variational Autoencodersasperti/SLIDES/lod19.pdf · Encoder DNN Decoder DNN Can we use the decoder to generate data by sampling in the latent space?

Essential bibliography (2)

Partha Ghosh, Mehdi S. M. Sajjadi, Antonio Vergari, Michael J. Black, Bernhard Scholkopf

From Variational to Deterministic AutoencodersCoRR, abs/1903.12436.

Diederik P. Kingma, Tim Salimans, and Max Welling.

Improving variational inference with inverse autoregressive flow.CoRR, abs/1606.04934, 2016.

Diederik P. Kingma and Max Welling.

Auto-encoding variational bayes.CoRR, abs/1312.6114, 2013.

Danilo Jimenez Rezende, Shakir Mohamed, and Daan Wierstra.

Stochastic backpropagation and approximate inference in deep generative models.In Proceedings of the 31th International Conference on Machine Learning, ICML 2014, Beijing, China, 21-26 June2014, volume 32 of JMLR Workshop and Conference Proceedings, pages 1278–1286. JMLR.org, 2014.

Serena Yeung, Anitha Kannan, and Yann Dauphin.

Epitomic variational autoencoder.2017.

Serena Yeung, Anitha Kannan, Yann Dauphin, and Li Fei-Fei.

Tackling over-pruning in variational autoencoders.CoRR, abs/1706.03643, 2017.

Andrea Asperti - University of Bologna, DISI 22