Upload
sebastian-ruder
View
154
Download
1
Embed Size (px)
Citation preview
Modeling documents with GenerativeAdversarial Networks
John Glover
Overview
Learning representations of natural language documents
A brief introduction to Generative Adversarial Networks
Energy-based Generative Adversarial Networks
An adversarial document model
Future work & conclusion
Representation learning
I The ability to learn robust, reusable feature representationsfrom unlabelled data has potential applications in a widevariety of machine learning tasks, such as data retrievaland classification.
I One way to create such representations is to train deepgenerative models that can learn to capture the complexdistributions of real-world data.
Representation learning
Document representations: LDA
I The traditional approach to doing this is to use somethinglike LDA.
I In LDA documents consist of a mixture of topics, with eachtopic defining a probability distribution over the words inthe vocabulary.
I Documents represented by a vector of mixture weightsover associated topics.
Document representations: LDA
α
β
z w N
M
θ
I α is the parameter of the Dirichlet prior on theper-document topic distributions, β is the parameter of theDirichlet prior on the per-topic word distribution, θm is thetopic distribution for document m, zmn is the topic for thenth word in document m, and wmn is the specific word.
Document representations: beyond LDA
I Replicated softmax (Salakhutdinov and Hinton, 2009).I DocNADE (Larochelle and Lauly, 2012).
Generative models: recent trends
I Variational inference: Neural variational inference (Miao,Yu, Blunsom, 2016).
I Generative Adversarial Networks: ?
Generative Adversarial Networks
I Generative Adversarial Networks (GANs) involve amin-max adversarial game between a generative model Gand a discriminative model D.
I G(z) is a neural network, that is trained to map samples zfrom a prior noise distribution p(z) to the data space.
I D(x) is another neural network that takes a data sample xas input and outputs a single scalar value representing theprobability that x came from the data distribution instead ofG(z).
Generative Adversarial Networks
source: https://ishmaelbelghazi.github.io/ALI
Generative Adversarial Networks
I D is trained to maximise the probability of assigning thecorrect label to the input x.
I G is trained to maximally confuse D, using the gradient ofD(x) with respect to x to update its parameters.
minG
maxD
Ex∼p(data)[log D(x)] + Ez∼p(z)[log(1− D(G(z)))]
GAN samples
Source: Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networkshttps://arxiv.org/abs/1511.06434v2
GAN samples
Source: ”Photo-Realistic Single Image Super-Resolution Using a Generative Adversarial Network”https://arxiv.org/abs/1609.04802
Energy-based Generative Adversarial Networks
Source: Yann Lecun’s slides on energy-based GANs, NIPS 2016.
I Energy function: outputs low values on the data manifold,higher values everywhere else.
Energy-based Generative Adversarial Networks
Source: Yann Lecun’s slides on energy-based GANs, NIPS 2016.
I Easy to push down energy of observed data via SGD.I How to choose where to push energy up?
Energy-based Generative Adversarial Networks
Source: Yann Lecun’s slides on energy-based GANs, NIPS 2016.
I Generator learns to pick points where the energy shouldbe increased.
I Can view D as a learned objective function.
Energy-based Generative Adversarial Networks
I The energy function is trained to push down on the energyof real samples x, and to push up on the energy ofgenerated samples x̂. (fD is the value to be minimised ateach iteration and m is a margin between positive andnegative energies):
fD(x, z) = D(x) + max(0,m − D(G(z)))
I At each iteration, the generator G is trained adversariallyagainst D to minimize fG:
fG(z) = D(G(z))
Energy-based Generative Adversarial Networks
I In practise, the energy-based GAN formulation seems tobe easier to train.
I Empirical results in ”Energy-based Generative AdversarialNetwork” (https://arxiv.org/abs/1609.03126) with more than6500 experiments.
An adversarial document model
I Can we use the GAN formulation to learn representationsof natural language documents?
I Questions:1. How to represent documents? GANs require everything to
be differentiable, but need to deal with discrete text.2. How to get a representation? No explicit mapping back to
latent (z) space.
An adversarial document model
z
x
CG Enc
DecMSE
h
D
Using an Energy-Based GAN to learn document representations. G is the generator, Enc and Dec are DAE encoderand decoder networks, C is a corruption process (bypassed at test time) and D is the discriminator.
I Input to discriminator is the binary bag-of-wordsrepresentation of a document: x ∈ {0,1}V .
I Energy-based GAN with Denoising Autoencoderdiscriminator.
Document retrieval evaluation
0.0001 0.0002 0.0005 0.002 0.01 0.05 0.2 1.0Recall
0.0
0.1
0.2
0.3
0.4
0.5
0.6
Pre
cisi
on
ADMADM (AE)DocNADEDAE
Precision-recall curves for the document retrieval task on the 20 Newsgroups dataset. DocNADE is described in(Larochelle and Lauly, 2012), ADM is the adversarial document model, ADM (AE) is the adversarial documentmodel with a standard Autoencoder as the discriminator (and so it similar to the Energy-Based GAN), and DAE is aDenoising Autoencoder.
Qualitative evaluation: TSNE plot
t-SNE visualizations of the document representations learned by the adversarial document model on the held-outtest dataset of 20 Newsgroups. The documents belong to 20 different topics, which correspond to different colouredpoints in the figure.
Future work
I Understanding why the DAE in the GAN discriminatorappears to produce significantly better representationsthan a standalone DAE.
I Exploring the impact of applying additional constraints tothe representation layer.
Conclusion
I Showed that a variation on the recently proposedEnergy-Based GAN can be used to learn documentrepresentations in an unsupervised setting.
I In the current formulation still short of state-of-the-art, butstill very early days for this line of research so likely that wecan push this a lot further.
I Suggested some interesting areas for future research.
More information
I Introduction to GANs: http://blog.aylien.com/introduction-generative-adversarial-networks-code-tensorflow
I Paper:https://sites.google.com/site/nips2016adversarial/home/accepted-papers