The power of two samples in generative adversarial networkssewoong/slide_pacgan... · 2019-01-15 · \This guy is in black trunks and swimming underwater." \A tennis player in a blue

$Page 1: The power of two samples in generative adversarial networkssewoong/slide_pacgan... · 2019-01-15 · \This guy is in black trunks and swimming underwater." \A tennis player in a blue$
The power of two samplesin generative adversarial networks

Sewoong Oh

Department of Industrial and Enterprise Systems EngineeringUniversity of Illinois at Urbana-Champaign

1 / 21

Generative models learn fundamental representations

= Zglasses

G�1( )

G�1( ) + Zglasses

Z space Image space

G

[DCGAN, Radford et al. 2015] 2 / 21

Generative Adversarial Networks (GAN)

Z

G(Z)

XD(X)real

1fake

0real

1real

1

Real data

Generator G(Z)

Discriminator D(X)

minG

maxD

V (G,D)

3 / 21

Challenges in training GAN

1. Instability: non-convergence of training loss

2. Evaluation: likelihood is not available

3. Mode collapse: loss of diversity

4 / 21

“Mode collapse” is a main challenge

Target samples Generated samples

5 / 21


Target samples Generated samples

5 / 21


“A man in a orange jacket with sunglasses and a hat ski down a hill.”

“This guy is in black trunks and swimming underwater.”

“A tennis player in a blue polo shirt is looking down at the greencourt.”

[“Generating interpretable images with controllable structure”, by Reed et al., 2016]6 / 21

Lack of diversity is easier to detectif the discriminator sees multiple sample jointly

”Improved Techniques for Training GANs”, Salimans, Goodfellow, Zaremba,Cheung, Radford, Chen, 2016

”Progressive Growing of GANs for Improved Quality, Stability, and Variation”,Karras, Aila, Laine, Lehtinen, 2017

”Distributional Adversarial Networks”, Li, Alvarez-Melis, Xu, Jegelka, Sra, 2017

7 / 21

New framework: PacGANlightweight overheadexperimental resultsprincipled

Z

G(Z)

X real 1

fake 0

real 1

real 1

⇥2

D(X1, X2)

8 / 21

Benchmark tests

Target GAN PacGAN2

Modes(Max 25)

GAN 17.3PacGAN2 23.8PacGAN3 24.6PacGAN4 24.8

9 / 21

Benchmark datasets from VeeGAN paperReal data DCGAN PacDCGAN2

Modes (Max 1000)

DCGAN 99.0ALI 16.0Unrolled GAN 48.7VeeGAN 150.0

PacDCGAN2 1000.0PacDCGAN3 1000.0PacDCGAN4 1000.0

10 / 21

Intuition behind packing via toy example

Generator Q1

with mode collapse

Generator Q2

without mode collapse

Target distribution P

1

1

P

1.25

0.2

1

1

Q1

1

1

0.6

1.4

0.5

Q2

dTV(P, Q1) = 0.2 dTV(P, Q2) = 0.2

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

1 2 3 4 5 6

dTV (P,Q1) = dTV (P,Q2) = 0.2

size of packing

11 / 21


Generator Q1

with mode collapse

Generator Q2



1

0.2 1 10.5

0.2

1

1

dTV(P ⇥ P, Q1 ⇥ Q1) = 0.36 dTV(P ⇥ P, Q2 ⇥ Q2) = 0.24

P ⇥ P

Q1 ⇥ Q1 Q2 ⇥ Q2

0.62

1.252

1.42

1.4 ⇥ 0.6

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

1 2 3 4 5 6

dTV(P 2, Q21)

dTV(P 2, Q22)

11 / 21


Generator Q1

with mode collapse

Generator Q2



1

0.2 1 10.5

0.2

1

1

dTV(P ⇥ P, Q1 ⇥ Q1) = 0.36 dTV(P ⇥ P, Q2 ⇥ Q2) = 0.24

P ⇥ P

Q1 ⇥ Q1 Q2 ⇥ Q2

0.62

1.252

1.42

1.4 ⇥ 0.6

m 0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

1 2 3 4 5 6

dTV(Pm, Qm2 )

dTV(Pm, Qm1 )

11 / 21

Evolution of TV distances

through the prism of packing

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

1 3 5 7 9 11

the size of packing m

Total variation

dTV(Pm, Qm)

Through packing, the target-generator pairs are expandedover the strengths of the mode collapse

12 / 21

Evolution of TV distancesthrough the prism of packing

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

1 3 5 7 9 11

the size of packing m

Total variation

dTV(Pm, Qm)

strongmode collapse

weakmode collapse

~www�Through packing, the target-generator pairs are expandedover the strengths of the mode collapse

12 / 21

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

1 3 5 7 9 11

dTV(Pm, Qm)

degree of packing m

maxP,Q

/minP,Q

dTV(P2, Q2)

subject to dTV(P,Q) = τ

we focus on m = 2 for this talk

this is easy, but we have a new proof technique

nothing to do with mode collapse, but we use it as proof technique

13 / 21

Intuition from Blackwell

Definition [mode collapse region]

We say a pair (P,Q) of a target distribution P and a generatordistribution Q has (ε, δ)-mode collapse if there exists a set S such that

P (S) ≥ δ , and Q(S) ≤ ε .

Generator Q1

with mode collapse


1

1

P

1.25

0.2

1

1

Q1

0

0.5

1

0 0.5 1

R(P, Q1)

"

�

14 / 21




P (S) ≥ δ , and Q(S) ≤ ε .

Generator Q1

with mode collapse


1

1

P

1.25

0.2

1

1

Q1

0

0.5

1

0 0.5 1

R(P, Q1)

"

�

14 / 21




P (S) ≥ δ , and Q(S) ≤ ε .

Generator Q1

with mode collapse


1

1

P

1.25

0.2

1

1

Q1

0

0.5

1

0 0.5 1

R(P, Q1)

"

�

?

�

"

14 / 21




P (S) ≥ δ , and Q(S) ≤ ε .

Generator Q1

with mode collapse


1

1

P

1.25

0.2

1

1

Q1

0

0.5

1

0 0.5 1

R(P, Q1)

"

�

?

�

"

14 / 21




P (S) ≥ δ , and Q(S) ≤ ε .

Generator Q1

with mode collapse


1

1

P

1.25

0.2

1

1

Q1

0

0.5

1

0 0.5 1

R(P, Q1)

"

�

?�

"

14 / 21




P (S) ≥ δ , and Q(S) ≤ ε .

Generator Q1

with mode collapse


1

1

P

1.25

0.2

1

1

Q1

0

0.5

1

0 0.5 1

R(P, Q1)

"

�?�

"

14 / 21




P (S) ≥ δ , and Q(S) ≤ ε .

Generator Q1

with mode collapse


1

1

P

1.25

0.2

1

1

Q1

0

0.5

1

0 0.5 1

R(P, Q1)

"

�

?

�

"

14 / 21




P (S) ≥ δ , and Q(S) ≤ ε .

Generator Q1

with mode collapse


1

1

P

1.25

0.2

1

1

Q1

0

0.5

1

0 0.5 1

R(P, Q1)

"

�

?

�

"

14 / 21




P (S) ≥ δ , and Q(S) ≤ ε .


1

1

P

"

�

Generator Q2


1

1

0.6

1.4

0.5

Q2

0

0.5

1

0 0.5 1

R(P, Q2)

14 / 21




P (S) ≥ δ , and Q(S) ≤ ε .


1

1

P

"

�

Generator Q2


1

1

0.6

1.4

0.5

Q2

0

0.5

1

0 0.5 1

R(P, Q2)

dTV(P, Q2) = 0.2

14 / 21

Upper bound

⌧

0

0.5

1

0 0.5 1

R(P, Q)

ε

δ

maxP,Q

dTV(P2, Q2)


R(P,Q) ⊆ Router(τ)

R(P 2, Q2) ⊆ R(P 2outer, Q

2outer)

dTV(P2, Q2) ≤ dTV(P

2outer, Q

2outer)︸︷︷︸

1−(1−τ)2

Blackwell’s theorem

R(P,Q) ⊆ R(P ′, Q′)⇒ R(P 2, Q2) ⊆ R(P ′2, Q′2)

15 / 21

Upper bound

⌧

0

0.5

1

0 0.5 1

R(P, Q)

Router(⌧)

ε

δ

maxP,Q

dTV(P2, Q2)




2outer)


2outer, Q

2outer)︸︷︷︸

1−(1−τ)2


R(P,Q) ⊆ R(P ′, Q′)⇒ R(P 2, Q2) ⊆ R(P ′2, Q′2)

15 / 21

Upper bound

⌧

0

0.5

1

0 0.5 1

R(P, Q)

Router(⌧)

1 � ⌧

1 � ⌧

⌧

⌧

0

0

Qouter(·)

Pouter(·)

ε

δ

maxP,Q

dTV(P2, Q2)




2outer)


2outer, Q

2outer)︸︷︷︸

1−(1−τ)2


R(P,Q) ⊆ R(P ′, Q′)⇒ R(P 2, Q2) ⊆ R(P ′2, Q′2)

15 / 21

Upper bound

⌧

0

0.5

1

0 0.5 1

R(P, Q)

Router(⌧)

1 � ⌧

1 � ⌧

⌧

⌧

0

0

Qouter(·)

Pouter(·)

ε

δ

maxP,Q

dTV(P2, Q2)




2outer)


2outer, Q

2outer)︸︷︷︸

1−(1−τ)2


R(P,Q) ⊆ R(P ′, Q′)⇒ R(P 2, Q2) ⊆ R(P ′2, Q′2)

15 / 21

Upper bound

⌧

0

0.5

1

0 0.5 1

R(P, Q)

Router(⌧)

1 � ⌧

1 � ⌧

⌧

⌧

0

0

Qouter(·)

Pouter(·)

ε

δ

maxP,Q

dTV(P2, Q2)




2outer)


2outer, Q

2outer)︸︷︷︸

1−(1−τ)2


R(P,Q) ⊆ R(P ′, Q′)⇒ R(P 2, Q2) ⊆ R(P ′2, Q′2)

15 / 21

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

1 3 5 7 9 11

dTV(Pm, Qm)

degree of packing m

maxP,Q

/minP,Q

dTV(P2, Q2)


16 / 21

PacGAN naturally penalizes mode collapse

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

1 3 5 7 9 11

without (ε0, δ0)-mode collapse

degree of packing m

dTV(Pm, Qm)

maxP,Q

dTV(P2, Q2)

s.t. dTV(P,Q) = τ

no (ε0, δ0)-mode collapse

17 / 21


0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

1 3 5 7 9 11


degree of packing m

dTV(Pm, Qm)

maxP,Q

dTV(P2, Q2)

s.t. dTV(P,Q) = τ


⌧

0

0.5

1

0 0.5 1

R(P, Q)

("0, �0)

Router(⌧, "0, �0,↵)

↵

17 / 21


0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

1 3 5 7 9 11


degree of packing m

dTV(Pm, Qm)

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

1 3 5 7 9 11

with (ε1, δ1)-mode collapse

degree of packing m

maxP,Q

dTV(P2, Q2)

s.t. dTV(P,Q) = τ


minP,Q

dTV(P2, Q2)

s.t. dTV(P,Q) = τ

(ε1, δ1)-mode collapse

18 / 21

Size of the discriminator

0

100

200

300

400

500

600

700

800

900

1000

0 500000 1x106 1.5x10

6 2x10

6 2.5x10

6 3x10

6 3.5x10

6 4x10

6 4.5x10

6

PacGAN4

PacGAN3

PacGAN2

DCGAN

# modes captured

# of parameters in D(·)

# modes captured

# of parameters in D(·)Mini-batch discrimination requires +38,748,557, PacGAN2 requires +54

19 / 21

0-1 loss (Total Variation) vs.cross entropy loss (Jensen-Shannon Divergence)

0.1

0.15

0.2

0.25

0.3

0.35

0.4

TV (P,Q) JS(P,Q)

strong mode collapse

weak mode collapse 0.15

0.2

0.25

0.3

0.35

0.4

TV (P,Q) JS(P,Q)

strong mode collapse

weak mode collapse

Jensen-Shannon is better measurefor detecting mode collapse

20 / 21

Our paper is available at:https://arxiv.org/abs/1712.04086

All codes for the experiments at:https://github.com/fjxmlzn/PacGAN

Zinan Lin (CMU) Ashish Khetan (UIUC) Giulia Fanti (CMU)

21 / 21

Documents

The power of two samples in generative adversarial networkssewoong/slide_pacgan... · 2019-01-15 · \This guy is in black trunks and swimming underwater." \A tennis player in a blue