Generative adversarial text to image synthesis

Generative Adversarial Text to Image Synthesis

Scott Reed, Zeynep Akata, Xinchen Yan, Lajanugen Logeswaran

[GitHub] [Arxiv]

Slides by Víctor Garcia [GDoc]Computer Vision Reading Group (30/09/2016)

https://github.com/paarthneekhara/text-to-image




https://docs.google.com/presentation/d/1pvyOFygLzhG3fifa8G1j46EavuUxsQELMIKxKlbydrU/edit?usp=sharing

https://imatge.upc.edu/web/teaching/computer-vision-reading-group

https://imatge.upc.edu/web/teaching/computer-vision-reading-group

Index● Introduction ● State of the Art● Method

○ Network Architecture○ Losses

● Experiments○ Qualitative Results○ Sentence interpolation○ Style Transfer

● Conclusions

Introduction

Text → Image

GANs




● Conclusions

GANs

Discriminator

1/0

True

World

Fake

Generator

GANs

DiscriminatorD(·)

1/0

True

World

Fake

Generator

q(x) xG(z) zx’

GANs

DiscriminatorD(·)

MAX → E[log(D(X))]

True

World

Fake

Generator

q(x) xG(z) zx’

GANs

DiscriminatorD(·)

MAX → E[log(D(X))] + E[ log(1 - D(G(Z))) ]

True

World

Fake

Generator

q(x) xG(z) zx’

GANs

DiscriminatorD(·)

MAX → E[log(D(X))] + E[ log(1 - D(G(Z))) ]

True

World

Fake

Generator

q(x) xG(z) zx’

GANs

DiscriminatorD(·)

True

World

Fake

Generator

q(x) xG(z) zx’

MIN → E[ log(1 - D(G(Z))) ]

GANs with Join DistributionsHow do we generate the image from text?

GANs with Join DistributionsHow do we generate the image from text?

Discriminator

1/0

f(x,t) f(x’,t)

GANs with Join Distributions

Discriminator

1/0

Real Image

+Text

Gen. Image

+Text

Generator +Text

GANs with Join Distributions

Discriminator

1/0

Real Image

+Text

Gen. Image

+Text

Generator +Text

Text EmbedddingIn order to represent the text in a vector...

MIN

WHERE

Text EmbedddingIn order to represent the text in a vector...

MIN

WHERE

This is the recurrent text encoder




● Conclusions

Network Architecture

Losses - CLS

log(D(x,t)) log(1-D(G(z,t)))

True Image +

True Text

Fake Image +

True Text

Real Images match the text content?

Losses - CLS

log(D(x,t)) log(1-D(G(z,t))) log(1-D(G(zi,tk)))

True Image +

True Text

Fake Image +

True Text

True Image (i) +

True Text (j)Unmatched

Losses - INT

They train interpolating between different text embedding vector (t1~t2).

So the generator learns to fill GAPS on the data manifold.




● Conclusions

Qualitative Results - Birds

Sentence Interpolation

Gen.

z0

+Text1

Gen.

z1

+Text3

Gen.

z0

+Text2

Gen.

z1

+Text4

Disentangling style and content

Generator.

z+

Text

If ‘text’ is describing the content? What is ‘z’ describing?

Disentangling style and content

Generator.

z+

Text

If ‘text’ is describing the content? What is ‘z’ describing?

Style → Pose, Background…, let’s extract ‘z’

Disentangling style and contentz0 z1 z2 z3 z4 z5

Qualitative Results - Flowers

Qualitative Results - MSCOCO

Conclusions

Discriminator

1/0

f(x,t) f(x’,t)

x~t

Data & Analytics

Generative adversarial text to image synthesis