Modeling documents with Generative Adversarial Networks - John Glover

Modeling documents with GenerativeAdversarial Networks

John Glover

Overview

Learning representations of natural language documents

A brief introduction to Generative Adversarial Networks

Energy-based Generative Adversarial Networks

An adversarial document model

Future work & conclusion

Representation learning

I The ability to learn robust, reusable feature representationsfrom unlabelled data has potential applications in a widevariety of machine learning tasks, such as data retrievaland classification.

I One way to create such representations is to train deepgenerative models that can learn to capture the complexdistributions of real-world data.

Representation learning

Document representations: LDA

I The traditional approach to doing this is to use somethinglike LDA.

I In LDA documents consist of a mixture of topics, with eachtopic defining a probability distribution over the words inthe vocabulary.

I Documents represented by a vector of mixture weightsover associated topics.

Document representations: LDA

α

β

z w N

M

θ

I α is the parameter of the Dirichlet prior on theper-document topic distributions, β is the parameter of theDirichlet prior on the per-topic word distribution, θm is thetopic distribution for document m, zmn is the topic for thenth word in document m, and wmn is the specific word.

Document representations: beyond LDA

I Replicated softmax (Salakhutdinov and Hinton, 2009).I DocNADE (Larochelle and Lauly, 2012).

Generative models: recent trends

I Variational inference: Neural variational inference (Miao,Yu, Blunsom, 2016).

I Generative Adversarial Networks: ?

Generative Adversarial Networks

I Generative Adversarial Networks (GANs) involve amin-max adversarial game between a generative model Gand a discriminative model D.

I G(z) is a neural network, that is trained to map samples zfrom a prior noise distribution p(z) to the data space.

I D(x) is another neural network that takes a data sample xas input and outputs a single scalar value representing theprobability that x came from the data distribution instead ofG(z).


source: https://ishmaelbelghazi.github.io/ALI


I D is trained to maximise the probability of assigning thecorrect label to the input x.

I G is trained to maximally confuse D, using the gradient ofD(x) with respect to x to update its parameters.

minG

maxD

Ex∼p(data)[log D(x)] + Ez∼p(z)[log(1− D(G(z)))]

GAN samples

Source: Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networkshttps://arxiv.org/abs/1511.06434v2

GAN samples

Source: ”Photo-Realistic Single Image Super-Resolution Using a Generative Adversarial Network”https://arxiv.org/abs/1609.04802


Source: Yann Lecun’s slides on energy-based GANs, NIPS 2016.

I Energy function: outputs low values on the data manifold,higher values everywhere else.



I Easy to push down energy of observed data via SGD.I How to choose where to push energy up?



I Generator learns to pick points where the energy shouldbe increased.

I Can view D as a learned objective function.


I The energy function is trained to push down on the energyof real samples x, and to push up on the energy ofgenerated samples x̂. (fD is the value to be minimised ateach iteration and m is a margin between positive andnegative energies):

fD(x, z) = D(x) + max(0,m − D(G(z)))

I At each iteration, the generator G is trained adversariallyagainst D to minimize fG:

fG(z) = D(G(z))


I In practise, the energy-based GAN formulation seems tobe easier to train.

I Empirical results in ”Energy-based Generative AdversarialNetwork” (https://arxiv.org/abs/1609.03126) with more than6500 experiments.


I Can we use the GAN formulation to learn representationsof natural language documents?

I Questions:1. How to represent documents? GANs require everything to

be differentiable, but need to deal with discrete text.2. How to get a representation? No explicit mapping back to

latent (z) space.


z

x

CG Enc

DecMSE

h

D

Using an Energy-Based GAN to learn document representations. G is the generator, Enc and Dec are DAE encoderand decoder networks, C is a corruption process (bypassed at test time) and D is the discriminator.

I Input to discriminator is the binary bag-of-wordsrepresentation of a document: x ∈ {0,1}V .

I Energy-based GAN with Denoising Autoencoderdiscriminator.

Document retrieval evaluation

0.0001 0.0002 0.0005 0.002 0.01 0.05 0.2 1.0Recall

0.0

0.1

0.2

0.3

0.4

0.5

0.6

Pre

cisi

on

ADMADM (AE)DocNADEDAE

Precision-recall curves for the document retrieval task on the 20 Newsgroups dataset. DocNADE is described in(Larochelle and Lauly, 2012), ADM is the adversarial document model, ADM (AE) is the adversarial documentmodel with a standard Autoencoder as the discriminator (and so it similar to the Energy-Based GAN), and DAE is aDenoising Autoencoder.

Qualitative evaluation: TSNE plot

t-SNE visualizations of the document representations learned by the adversarial document model on the held-outtest dataset of 20 Newsgroups. The documents belong to 20 different topics, which correspond to different colouredpoints in the figure.

Future work

I Understanding why the DAE in the GAN discriminatorappears to produce significantly better representationsthan a standalone DAE.

I Exploring the impact of applying additional constraints tothe representation layer.

Conclusion

I Showed that a variation on the recently proposedEnergy-Based GAN can be used to learn documentrepresentations in an unsupervised setting.

I In the current formulation still short of state-of-the-art, butstill very early days for this line of research so likely that wecan push this a lot further.

I Suggested some interesting areas for future research.

More information

I Introduction to GANs: http://blog.aylien.com/introduction-generative-adversarial-networks-code-tensorflow

I Paper:https://sites.google.com/site/nips2016adversarial/home/accepted-papers

Science

Modeling documents with Generative Adversarial Networks - John Glover