Generative - dsba.korea.ac.krdsba.korea.ac.kr/wp/wp-content/seminar/Paper Review/VanillaGan... · •Generative models can be incorporated into reinforcement learning •Generative

Generative Adversarial Nets

able to produce or create somethingGenerative

Adversarial

Nets

involving people opposing or disagreeing with each other

Why study generative modeling?

• Training and sampling from generative models is an excellent test of our ability to represent and manipulate high-dimensional probability distributions

• Generative models can be incorporated into reinforcement learning• Generative models of time-series data can be used to simulate possible futures

• Generative models can be trained with missing data and can provide predictions on inputs that are missing data

• Generative models, and GANs in particular, enable machine learning to work with multi-modal outputs

2017-08-11 GAN(2014) 2/26

How do GANs work?

2017-08-11 GAN(2014) 3/26

• The basic idea of GANs is to set up a game between two players

• One of them is called the generator• The generator creates samples that are intended to come from the same distribution

as the training data.

• The other player is the discriminator• The discriminator examines samples to determine whether they are real or fake

• We can think of the generator as being like a counterfeiter, trying to make fake money, and the discriminator as being like police, trying to allow legitimate money and catch counterfeit money.

How do GANs work?

• In one scenario, training examples x are randomly sampled from the training set and used as input for the first player, the discriminator, represented by the function D

• In the second scenario, inputs z to the generator are randomly sampled from the model's prior over the latent variables• The discriminator then receives input

G(z), a fake sample created by the generator

2017-08-11 GAN(2014) 4/26

Discriminator Generator

2017-08-11 GAN(2014) 5 /26

two-player minimax game with value function V(G,D):

Ian J. Goodfellow et al.,(2014) Generative Adversarial Nets

discriminator

Generative

distribution

Data

distribution

• The lower horizontal line is the domain from which z is sampled

• The horizontal line above is part of the domain of x

• The upward arrows show how the mapping x = G(z) imposes the non-uniform distribution 𝑝𝑔on transformed samples

2017-08-11 GAN(2014) 6 /26



Discriminator

Generative

distribution

Data

distribution

• (a) Consider an adversarial pair near convergence: 𝑝𝑔 is similar to 𝑝𝑑𝑎𝑡𝑎 and D is a partially accurate classifier.

• (b) In the inner loop of the algorithm D is trained to discriminate samples from data, converging to

• (c) After an update to G, gradient of D has guided G(z) to flow to regions that are more likely to be classified as data.

• (d) After several steps of training, if G and D have enough capacity, they will reach a point at which both cannot improve

𝐷∗(𝑥) =𝑝𝑑𝑎𝑡𝑎(𝑥)

𝑝𝑑𝑎𝑡𝑎 𝑥 + 𝑝𝑔 𝑥

2017-08-11 GAN(2014) 7 /26



Discriminator

Generative

distribution

Data

distribution



• When, D is trained • When, G is trained𝑝𝑔 and 𝑝𝑑𝑎𝑡𝑎 close


𝑝𝑑𝑎𝑡𝑎 𝑥 +𝑝𝑔 𝑥= 1

2

2017-08-11 GAN(2014) 8 /26


Fix G

𝐷∗ 𝑥 = max𝐷𝑉(𝐷)

= 𝐸𝑥~𝑝𝑑𝑎𝑡𝑎(𝑥)[log𝐷(𝑥)] + 𝐸𝑧~𝑝𝑧(𝑧)[log(1 − 𝐷 𝐺 𝑧 )]

= 𝐸𝑥~𝑝𝑑𝑎𝑡𝑎(𝑥)[log𝐷(𝑥)] + 𝐸𝑥~𝑝𝑔(𝑥)[log(1 − 𝐷 𝑥 )]

= 𝑥 𝑝𝑑𝑎𝑡𝑎 𝑥 log 𝐷 𝑥 𝑑𝑥 + 𝑥 𝑝𝑔 𝑥 log 1 − 𝐷 𝑥 𝑑𝑥

continuous

= 𝑥 𝑝𝑑𝑎𝑡𝑎 𝑥 log 𝐷 𝑥 + 𝑝𝑔 𝑥 log 1 − 𝐷 𝑥 𝑑𝑥

2017-08-11 GAN(2014) 9 /26


𝐷∗ 𝑥 = max𝐷𝑉(𝐷)

= 𝑥 𝑝𝑑𝑎𝑡𝑎 𝑥 log 𝐷 𝑥 + 𝑝𝑔 𝑥 log 1 − 𝐷 𝑥 𝑑𝑥

𝑝𝑑𝑎𝑡𝑎 𝑥 log 𝐷 𝑥 + 𝑝𝑔 𝑥 log 1 − 𝐷 𝑥

a y b

𝑎 log 𝑦 + 𝑏 log 1 − 𝑦

𝑑

𝑑𝑦𝑎 log 𝑦 + 𝑏 log 1 − 𝑦

𝑎

𝑦−

𝑏

1 − 𝑦=𝑎 − 𝑎 + 𝑏 𝑦

𝑦(1 − 𝑦)

𝑑

𝑑𝑥log 𝑥 =

1

𝑥

𝑑

𝑑𝑥log 𝑓(𝑥) =

𝑓′(𝑥)

𝑓(𝑥)

𝑎 − 𝑎 + 𝑏 𝑦

𝑦(1 − 𝑦)= 0

𝑦 =𝑏

𝑎 + 𝑏



2017-08-11 GAN(2014) 10 /26



Discriminator

Generative

distribution

Data

distribution






2

2017-08-11 GAN(2014) 11 /26





2017-08-11 GAN(2014) 12 /26





2017-08-11 GAN(2014) 13 /26



𝐾𝐿(𝑃||𝑄) = 𝑥 𝑝(𝑥) log𝑃(𝑥)

𝑄(𝑥)𝑑𝑥

𝐶(𝐺) = 𝐸𝑥~𝑝𝑑𝑎𝑡𝑎[𝑙𝑜𝑔𝑝𝑑𝑎𝑡𝑎 𝑥

𝑝𝑑𝑎𝑡𝑎 𝑥 + 𝑝𝑔 𝑥] + 𝐸𝑥~𝑝𝑔[𝑙𝑜𝑔

𝑝𝑑𝑎𝑡𝑎 𝑥

𝑝𝑑𝑎𝑡𝑎 𝑥 + 𝑝𝑔 𝑥]

= 𝑥 𝑝𝑑𝑎𝑡𝑎𝑙𝑜𝑔𝑝𝑑𝑎𝑡𝑎 𝑥

𝑝𝑑𝑎𝑡𝑎 𝑥 + 𝑝𝑔 𝑥𝑑𝑥 + 𝑥 𝑝𝑔𝑙𝑜𝑔

𝑝𝑑𝑎𝑡𝑎 𝑥𝑝𝑑𝑎𝑡𝑎 𝑥 + 𝑝𝑔 𝑥

𝑑𝑥

2017-08-11 GAN(2014) 14 /26



𝐾𝐿(𝑃||𝑄) = 𝑥 𝑃(𝑥) log𝑃(𝑥)

𝑄(𝑥)𝑑𝑥



𝑝𝑔 𝑥




𝑝𝑔 𝑥

𝑝𝑑𝑎𝑡𝑎 𝑥 + 𝑝𝑔 𝑥𝑑𝑥

= 𝐾𝐿(𝑝𝑑𝑎𝑡𝑎||𝑝𝑑𝑎𝑡𝑎 + 𝑝𝑔) + 𝐾𝐿(𝑝𝑔||𝑝𝑑𝑎𝑡𝑎 + 𝑝𝑔)

2017-08-11 GAN(2014) 15 /26



𝐾𝐿(𝑃||𝑄) = 𝑥 𝑃(𝑥) log𝑃(𝑥)

𝑄(𝑥)𝑑𝑥



𝑝𝑔 𝑥




𝑝𝑔 𝑥

𝑝𝑑𝑎𝑡𝑎 𝑥 + 𝑝𝑔 𝑥𝑑𝑥

= 𝐾𝐿(𝑝𝑑𝑎𝑡𝑎||𝑝𝑑𝑎𝑡𝑎 + 𝑝𝑔) + 𝐾𝐿(𝑝𝑔||𝑝𝑑𝑎𝑡𝑎 + 𝑝𝑔)

2017-08-11 GAN(2014) 16 /26



𝐶(𝐺) = 𝐾𝐿(𝑝𝑑𝑎𝑡𝑎||𝑝𝑑𝑎𝑡𝑎 + 𝑝𝑔) + 𝐾𝐿(𝑝𝑔||𝑝𝑑𝑎𝑡𝑎 + 𝑝𝑔) 𝑥 𝑝𝑑𝑎𝑡𝑎 𝑥 𝑑𝑥 = 1

𝑥 𝑝𝑑𝑎𝑡𝑎 𝑥 ∗ 𝑙𝑜𝑔2𝑑𝑥 = 𝑙𝑜𝑔2



𝑝𝑔 𝑥

𝑝𝑑𝑎𝑡𝑎 𝑥 + 𝑝𝑔 𝑥𝑑𝑥 + 𝑙𝑜𝑔4 − 𝑙𝑜𝑔4



𝑝𝑔 𝑥

𝑝𝑑𝑎𝑡𝑎 𝑥 + 𝑝𝑔 𝑥𝑑𝑥 + 𝑥 𝑝𝑑𝑎𝑡𝑎 𝑥 ∗ 𝑙𝑜𝑔2𝑑𝑥 + 𝑥 𝑝𝑔 𝑥 ∗ 𝑙𝑜𝑔2𝑑𝑥 − 𝑙𝑜𝑔4


𝑝𝑑𝑎𝑡𝑎 𝑥 + 𝑝𝑔 𝑥+ 𝑙𝑜𝑔2𝑑𝑥 + 𝑥 𝑝𝑔𝑙𝑜𝑔

𝑝𝑔 𝑥

𝑝𝑑𝑎𝑡𝑎 𝑥 + 𝑝𝑔 𝑥+ 𝑙𝑜𝑔2𝑑𝑥 − 𝑙𝑜𝑔4

= 𝑥 𝑝𝑑𝑎𝑡𝑎𝑙𝑜𝑔2𝑝𝑑𝑎𝑡𝑎 𝑥


2𝑝𝑔 𝑥

𝑝𝑑𝑎𝑡𝑎 𝑥 + 𝑝𝑔 𝑥𝑑𝑥 − 𝑙𝑜𝑔4



2

𝑑𝑥 + 𝑥 𝑝𝑔𝑙𝑜𝑔𝑝𝑔 𝑥


2

𝑑𝑥 − 𝑙𝑜𝑔4

2017-08-11 GAN(2014) 17 /26





2



2


= 𝐾𝐿(𝑝𝑑𝑎𝑡𝑎||𝑝𝑑𝑎𝑡𝑎 + 𝑝𝑔

2) + 𝐾𝐿(𝑝𝑔||

𝑝𝑑𝑎𝑡𝑎 + 𝑝𝑔

2) − log 4

2017-08-11 GAN(2014) 18 /26





2



2


= 𝐾𝐿(𝑝𝑑𝑎𝑡𝑎||𝑝𝑑𝑎𝑡𝑎 + 𝑝𝑔

2) + 𝐾𝐿(𝑝𝑔||


2) − log 4

2017-08-11 GAN(2014) 19 /26



1

2𝐾𝐿(𝑝𝑑𝑎𝑡𝑎||


2) +1

2𝐾𝐿(𝑝𝑔||


2) +1

2𝐾𝐿(𝑝𝑑𝑎𝑡𝑎||


2) +1

2𝐾𝐿(𝑝𝑔||


2)

𝐾𝐿(𝑝𝑑𝑎𝑡𝑎||𝑝𝑑𝑎𝑡𝑎 + 𝑝𝑔

2) + 𝐾𝐿(𝑝𝑔||


2)

𝐽𝑆𝐷(𝑃||𝑄) =1

2𝐾𝐿(𝑃||𝑀) +

1

2𝐾𝐿(𝑄||𝑀)

𝐽𝑆𝐷(𝑝𝑑𝑎𝑡𝑎||𝑝𝑔) =1

2𝐾𝐿(𝑝𝑑𝑎𝑡𝑎||𝑀) +

1

2𝐾𝐿(𝑝𝑔||𝑀)

𝑤ℎ𝑒𝑟𝑒,𝑀 =1

2(𝑃 + 𝑄)

2017-08-11 GAN(2014) 20 /26



0 ≤ 𝐽𝑆𝐷(𝑃||𝑄) ≤ 1

𝑊ℎ𝑒𝑛, 𝑃 = 𝑄 → 𝐽𝑆𝐷(𝑃| 𝑄 = 0

𝑊ℎ𝑒𝑛, 𝑝𝑑𝑎𝑡𝑎 = 𝑝𝑔 𝐶 𝐺 ℎ𝑎𝑠 𝑔𝑙𝑜𝑏𝑎𝑙 𝑚𝑖𝑛𝑖𝑚𝑢𝑚 𝑣𝑎𝑙𝑢𝑒, −𝑙𝑜𝑔4

2017-08-11 GAN(2014) 21 /26



Discriminator

Generative

distribution

Data

distribution






2

2017-08-11 GAN(2014) 22 /26

Results


Figure 2: Visualization of samples from the model. a) MNIST b) TFD c) CIFAR-10 (fully connected model) d) CIFAR-10 (convolutional discriminator and “deconvolutional” generator)

2017-08-11 GAN(2014) 23 /26

Has Vanilla-GAN good performance? Really?


2017-08-11 GAN(2014) 24 /26



minimizing 𝐷𝐾𝐿(𝑃𝑑𝑎𝑡𝑎||𝑃𝑚𝑜𝑑𝑒𝑙) is different from

minimizing 𝐷𝐾𝐿(𝑃𝑚𝑜𝑑𝑒𝑙||𝑃𝑑𝑎𝑡𝑎) .

Maximum likelihood estimation performs the former;

minimizing the Jensen-Shannon divergence is

somewhat more similar to the later

We can think of 𝐷𝐾𝐿(𝑃𝑑𝑎𝑡𝑎||𝑃𝑚𝑜𝑑𝑒𝑙) as preferring to

place high probability everywhere that the data occurs,

and 𝐷𝐾𝐿(𝑃𝑚𝑜𝑑𝑒𝑙||𝑃𝑑𝑎𝑡𝑎) as preferring to place low

probability wherever the data does not occur.

𝑃𝑑𝑎𝑡𝑎 𝑃𝑚𝑜𝑑𝑒𝑙

2017-08-11 GAN(2014) 25 /26



𝑛𝑜𝑖𝑠𝑒 ℎ𝑎𝑙𝑓 𝑒𝑛𝑑

Any question…..?

2017-08-11 GAN(2014) 26 /26

2017-08-11 GAN(2014) 27 /55

IDEAS using GAN

2017-08-11 GAN(2014) 28 /55

Is it Possible to Solve NLP Task with GAN?

2017-08-11 GAN(2014) 29 /55


Hi there, this is Ian Goodfellow, inventor of GANs (verification: http://imgur.com/WDnukgP).GANs have not been applied to NLP because GANs are only defined for real-valued data.GANs work by training a generator network that outputs synthetic data, then running a discriminator network on the synthetic data. The gradient of the output of the discriminator network with respect to the synthetic data tells you how to slightly change the synthetic data to make it more realistic.You can make slight changes to the synthetic data only if it is based on continuous numbers. If it is based on discrete numbers, there is no way to make a slight change.For example, if you output an image with a pixel value of 1.0, you can change that pixel value to 1.0001 on the next step.If you output the word "penguin", you can't change that to "penguin + .001" on the next step, because there is no such word as "penguin + .001". You have to go all the way from "penguin" to "ostrich".Since all NLP is based on discrete values like words, characters, or bytes, no one really knows how to apply GANs to NLP yet.In principle, you could use the REINFORCE algorithm, but REINFORCE doesn't work very well, and no one has made the effort to try it yet as far as I know.I see other people have said that GANs don't work for RNNs. As far as I know, that's wrong; in theory, there's no reason GANs should have trouble with RNN generators or discriminators. But no one with serious neural net credentials has really tried it yet either, so maybe there is some obstacle that comes up in practice.

http://imgur.com/WDnukgP

2017-08-11 GAN(2014) 30 /55

How about sentence2image

2017-08-11 GAN(2014) 31 /55


Documents

Generative - dsba.korea.ac.krdsba.korea.ac.kr/wp/wp-content/seminar/Paper Review/VanillaGan... · •Generative models can be incorporated into reinforcement learning •Generative