25
Modeling documents with Generative Adversarial Networks John Glover

Modeling documents with Generative Adversarial Networks - John Glover

Embed Size (px)

Citation preview

Page 1: Modeling documents with Generative Adversarial Networks - John Glover

Modeling documents with GenerativeAdversarial Networks

John Glover

Page 2: Modeling documents with Generative Adversarial Networks - John Glover

Overview

Learning representations of natural language documents

A brief introduction to Generative Adversarial Networks

Energy-based Generative Adversarial Networks

An adversarial document model

Future work & conclusion

Page 3: Modeling documents with Generative Adversarial Networks - John Glover

Representation learning

I The ability to learn robust, reusable feature representationsfrom unlabelled data has potential applications in a widevariety of machine learning tasks, such as data retrievaland classification.

I One way to create such representations is to train deepgenerative models that can learn to capture the complexdistributions of real-world data.

Page 4: Modeling documents with Generative Adversarial Networks - John Glover

Representation learning

Page 5: Modeling documents with Generative Adversarial Networks - John Glover

Document representations: LDA

I The traditional approach to doing this is to use somethinglike LDA.

I In LDA documents consist of a mixture of topics, with eachtopic defining a probability distribution over the words inthe vocabulary.

I Documents represented by a vector of mixture weightsover associated topics.

Page 6: Modeling documents with Generative Adversarial Networks - John Glover

Document representations: LDA

α

β

z w N

M

θ

I α is the parameter of the Dirichlet prior on theper-document topic distributions, β is the parameter of theDirichlet prior on the per-topic word distribution, θm is thetopic distribution for document m, zmn is the topic for thenth word in document m, and wmn is the specific word.

Page 7: Modeling documents with Generative Adversarial Networks - John Glover

Document representations: beyond LDA

I Replicated softmax (Salakhutdinov and Hinton, 2009).I DocNADE (Larochelle and Lauly, 2012).

Page 8: Modeling documents with Generative Adversarial Networks - John Glover

Generative models: recent trends

I Variational inference: Neural variational inference (Miao,Yu, Blunsom, 2016).

I Generative Adversarial Networks: ?

Page 9: Modeling documents with Generative Adversarial Networks - John Glover

Generative Adversarial Networks

I Generative Adversarial Networks (GANs) involve amin-max adversarial game between a generative model Gand a discriminative model D.

I G(z) is a neural network, that is trained to map samples zfrom a prior noise distribution p(z) to the data space.

I D(x) is another neural network that takes a data sample xas input and outputs a single scalar value representing theprobability that x came from the data distribution instead ofG(z).

Page 10: Modeling documents with Generative Adversarial Networks - John Glover

Generative Adversarial Networks

source: https://ishmaelbelghazi.github.io/ALI

Page 11: Modeling documents with Generative Adversarial Networks - John Glover

Generative Adversarial Networks

I D is trained to maximise the probability of assigning thecorrect label to the input x.

I G is trained to maximally confuse D, using the gradient ofD(x) with respect to x to update its parameters.

minG

maxD

Ex∼p(data)[log D(x)] + Ez∼p(z)[log(1− D(G(z)))]

Page 12: Modeling documents with Generative Adversarial Networks - John Glover

GAN samples

Source: Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networkshttps://arxiv.org/abs/1511.06434v2

Page 13: Modeling documents with Generative Adversarial Networks - John Glover

GAN samples

Source: ”Photo-Realistic Single Image Super-Resolution Using a Generative Adversarial Network”https://arxiv.org/abs/1609.04802

Page 14: Modeling documents with Generative Adversarial Networks - John Glover

Energy-based Generative Adversarial Networks

Source: Yann Lecun’s slides on energy-based GANs, NIPS 2016.

I Energy function: outputs low values on the data manifold,higher values everywhere else.

Page 15: Modeling documents with Generative Adversarial Networks - John Glover

Energy-based Generative Adversarial Networks

Source: Yann Lecun’s slides on energy-based GANs, NIPS 2016.

I Easy to push down energy of observed data via SGD.I How to choose where to push energy up?

Page 16: Modeling documents with Generative Adversarial Networks - John Glover

Energy-based Generative Adversarial Networks

Source: Yann Lecun’s slides on energy-based GANs, NIPS 2016.

I Generator learns to pick points where the energy shouldbe increased.

I Can view D as a learned objective function.

Page 17: Modeling documents with Generative Adversarial Networks - John Glover

Energy-based Generative Adversarial Networks

I The energy function is trained to push down on the energyof real samples x, and to push up on the energy ofgenerated samples x̂. (fD is the value to be minimised ateach iteration and m is a margin between positive andnegative energies):

fD(x, z) = D(x) + max(0,m − D(G(z)))

I At each iteration, the generator G is trained adversariallyagainst D to minimize fG:

fG(z) = D(G(z))

Page 18: Modeling documents with Generative Adversarial Networks - John Glover

Energy-based Generative Adversarial Networks

I In practise, the energy-based GAN formulation seems tobe easier to train.

I Empirical results in ”Energy-based Generative AdversarialNetwork” (https://arxiv.org/abs/1609.03126) with more than6500 experiments.

Page 19: Modeling documents with Generative Adversarial Networks - John Glover

An adversarial document model

I Can we use the GAN formulation to learn representationsof natural language documents?

I Questions:1. How to represent documents? GANs require everything to

be differentiable, but need to deal with discrete text.2. How to get a representation? No explicit mapping back to

latent (z) space.

Page 20: Modeling documents with Generative Adversarial Networks - John Glover

An adversarial document model

z

x

CG Enc

DecMSE

h

D

Using an Energy-Based GAN to learn document representations. G is the generator, Enc and Dec are DAE encoderand decoder networks, C is a corruption process (bypassed at test time) and D is the discriminator.

I Input to discriminator is the binary bag-of-wordsrepresentation of a document: x ∈ {0,1}V .

I Energy-based GAN with Denoising Autoencoderdiscriminator.

Page 21: Modeling documents with Generative Adversarial Networks - John Glover

Document retrieval evaluation

0.0001 0.0002 0.0005 0.002 0.01 0.05 0.2 1.0Recall

0.0

0.1

0.2

0.3

0.4

0.5

0.6

Pre

cisi

on

ADMADM (AE)DocNADEDAE

Precision-recall curves for the document retrieval task on the 20 Newsgroups dataset. DocNADE is described in(Larochelle and Lauly, 2012), ADM is the adversarial document model, ADM (AE) is the adversarial documentmodel with a standard Autoencoder as the discriminator (and so it similar to the Energy-Based GAN), and DAE is aDenoising Autoencoder.

Page 22: Modeling documents with Generative Adversarial Networks - John Glover

Qualitative evaluation: TSNE plot

t-SNE visualizations of the document representations learned by the adversarial document model on the held-outtest dataset of 20 Newsgroups. The documents belong to 20 different topics, which correspond to different colouredpoints in the figure.

Page 23: Modeling documents with Generative Adversarial Networks - John Glover

Future work

I Understanding why the DAE in the GAN discriminatorappears to produce significantly better representationsthan a standalone DAE.

I Exploring the impact of applying additional constraints tothe representation layer.

Page 24: Modeling documents with Generative Adversarial Networks - John Glover

Conclusion

I Showed that a variation on the recently proposedEnergy-Based GAN can be used to learn documentrepresentations in an unsupervised setting.

I In the current formulation still short of state-of-the-art, butstill very early days for this line of research so likely that wecan push this a lot further.

I Suggested some interesting areas for future research.

Page 25: Modeling documents with Generative Adversarial Networks - John Glover

More information

I Introduction to GANs: http://blog.aylien.com/introduction-generative-adversarial-networks-code-tensorflow

I Paper:https://sites.google.com/site/nips2016adversarial/home/accepted-papers