Upload
xavier-giro
View
203
Download
5
Embed Size (px)
Citation preview
Generative Adversarial Text to Image Synthesis
Scott Reed, Zeynep Akata, Xinchen Yan, Lajanugen Logeswaran
[GitHub] [Arxiv]
Slides by Víctor Garcia [GDoc]Computer Vision Reading Group (30/09/2016)
Index● Introduction ● State of the Art● Method
○ Network Architecture○ Losses
● Experiments○ Qualitative Results○ Sentence interpolation○ Style Transfer
● Conclusions
Introduction
Text → Image
GANs
Index● Introduction ● State of the Art● Method
○ Network Architecture○ Losses
● Experiments○ Qualitative Results○ Sentence interpolation○ Style Transfer
● Conclusions
GANs
Discriminator
1/0
True
World
Fake
Generator
GANs
DiscriminatorD(·)
1/0
True
World
Fake
Generator
q(x) xG(z) zx’
GANs
DiscriminatorD(·)
MAX → E[log(D(X))]
True
World
Fake
Generator
q(x) xG(z) zx’
GANs
DiscriminatorD(·)
MAX → E[log(D(X))] + E[ log(1 - D(G(Z))) ]
True
World
Fake
Generator
q(x) xG(z) zx’
GANs
DiscriminatorD(·)
MAX → E[log(D(X))] + E[ log(1 - D(G(Z))) ]
True
World
Fake
Generator
q(x) xG(z) zx’
GANs
DiscriminatorD(·)
True
World
Fake
Generator
q(x) xG(z) zx’
MIN → E[ log(1 - D(G(Z))) ]
GANs with Join DistributionsHow do we generate the image from text?
GANs with Join DistributionsHow do we generate the image from text?
Discriminator
1/0
f(x,t) f(x’,t)
GANs with Join Distributions
Discriminator
1/0
Real Image
+Text
Gen. Image
+Text
Generator +Text
GANs with Join Distributions
Discriminator
1/0
Real Image
+Text
Gen. Image
+Text
Generator +Text
Text EmbedddingIn order to represent the text in a vector...
MIN
WHERE
Text EmbedddingIn order to represent the text in a vector...
MIN
WHERE
This is the recurrent text encoder
Index● Introduction ● State of the Art● Method
○ Network Architecture○ Losses
● Experiments○ Qualitative Results○ Sentence interpolation○ Style Transfer
● Conclusions
Network Architecture
Losses - CLS
log(D(x,t)) log(1-D(G(z,t)))
True Image +
True Text
Fake Image +
True Text
Real Images match the text content?
Losses - CLS
log(D(x,t)) log(1-D(G(z,t))) log(1-D(G(zi,tk)))
True Image +
True Text
Fake Image +
True Text
True Image (i) +
True Text (j)Unmatched
Losses - INT
They train interpolating between different text embedding vector (t1~t2).
So the generator learns to fill GAPS on the data manifold.
Index● Introduction ● State of the Art● Method
○ Network Architecture○ Losses
● Experiments○ Qualitative Results○ Sentence interpolation○ Style Transfer
● Conclusions
Qualitative Results - Birds
Sentence Interpolation
Gen.
z0
+Text1
Gen.
z1
+Text3
Gen.
z0
+Text2
Gen.
z1
+Text4
Disentangling style and content
Generator.
z+
Text
If ‘text’ is describing the content? What is ‘z’ describing?
Disentangling style and content
Generator.
z+
Text
If ‘text’ is describing the content? What is ‘z’ describing?
Style → Pose, Background…, let’s extract ‘z’
Disentangling style and contentz0 z1 z2 z3 z4 z5
Qualitative Results - Flowers
Qualitative Results - MSCOCO
Conclusions
Discriminator
1/0
f(x,t) f(x’,t)
x~t