54
The power of two samples for Generative Adversarial Networks Sewoong Oh Department of Industrial and Enterprise Systems Engineering University of Illinois at Urbana-Champaign joint work with Giulia Fanti, Ashish Khetan, Zinan Lin 1 / 25

The power of two samples for Generative Adversarial Networks\This guy is in black trunks and swimming underwater." \A tennis player in a blue polo shirt is looking down at the green

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

Page 1: The power of two samples for Generative Adversarial Networks\This guy is in black trunks and swimming underwater." \A tennis player in a blue polo shirt is looking down at the green

The power of two samples forGenerative Adversarial Networks

Sewoong Oh

Department of Industrial and Enterprise Systems EngineeringUniversity of Illinois at Urbana-Champaign

joint work with Giulia Fanti, Ashish Khetan, Zinan Lin

1 / 25

Page 2: The power of two samples for Generative Adversarial Networks\This guy is in black trunks and swimming underwater." \A tennis player in a blue polo shirt is looking down at the green

Generative models

Z ⇠ N(0, Ik⇥k)

Z1

Z50

......

.....................G(Z)12288

G(Z)1

G(Z) 2 R64⇥64⇥3

1

A generative model is a black box that takes a random vector Z ∈ Rkand produces a sample vector G(Z) ∈ Rn

It is a differentiable function and stochastic gradient descent is usedto train G

1[“Compressed Sensing using Generative Models”, A Bora, A Jalal, E Price, AGDimakis, 2017]

2 / 25

Page 3: The power of two samples for Generative Adversarial Networks\This guy is in black trunks and swimming underwater." \A tennis player in a blue polo shirt is looking down at the green

Generative models learn representation2

= Zglasses

G�1( )

G�1( ) + Zglasses

Z space Image space

G

2[DCGAN, Radford et al. 2015] 3 / 25

Page 4: The power of two samples for Generative Adversarial Networks\This guy is in black trunks and swimming underwater." \A tennis player in a blue polo shirt is looking down at the green

Generative Adversarial Networks (GAN)

Z

G(Z)

XD(X)real

1fake

0real

1real

1

Real data

Generator G(Z)

Discriminator D(X)

minG

maxD

V (G,D)

4 / 25

Page 5: The power of two samples for Generative Adversarial Networks\This guy is in black trunks and swimming underwater." \A tennis player in a blue polo shirt is looking down at the green

Mode collapse

Target distribution GAN

5 / 25

Page 6: The power of two samples for Generative Adversarial Networks\This guy is in black trunks and swimming underwater." \A tennis player in a blue polo shirt is looking down at the green

Mode collapse

“A man in a orange jacket with sunglasses and a hat ski down a hill.”3

“This guy is in black trunks and swimming underwater.”

“A tennis player in a blue polo shirt is looking down at the greencourt.”

3[“Generating interpretable images with controllable structure”, by Reed et al., 2016]6 / 25

Page 7: The power of two samples for Generative Adversarial Networks\This guy is in black trunks and swimming underwater." \A tennis player in a blue polo shirt is looking down at the green

New framework: PacGANlightweight overheadexperimental resultsprincipled

Z

G(Z)

X real 1

fake 0

real 1

real 1

⇥2

D(X1, X2)

7 / 25

Page 8: The power of two samples for Generative Adversarial Networks\This guy is in black trunks and swimming underwater." \A tennis player in a blue polo shirt is looking down at the green

Benchmark datasetes from VeeGAN paper

Target GAN PacGAN2

Modes high quality(Max 25) samples

GAN 3.3 0.5 %ALI 15.8 1.6 %Unrolled GAN 23.6 16.0 %VeeGAN 24.6 40.0 %PacGAN2 24.6±0.9 65.8±13.4 %PacGAN3 24.9±0.3 71.4±13.8 %PacGAN4 25.0±0.0 76.0±7.1 %

8 / 25

Page 9: The power of two samples for Generative Adversarial Networks\This guy is in black trunks and swimming underwater." \A tennis player in a blue polo shirt is looking down at the green

Benchmark datasetes from VeeGAN paperReal data DCGAN PacDCGAN2

Modes (Max 1000) KL

DCGAN 99.0 3.40ALI 16.0 5.40Unrolled GAN 48.7 4.32VEEGAN 150.0 2.95

PacDCGAN2 1000.0±0.0 0.06±0.01PacDCGAN3 1000.0±0.0 0.06±0.01PacDCGAN4 1000.0±0.0 0.07±0.01

9 / 25

Page 10: The power of two samples for Generative Adversarial Networks\This guy is in black trunks and swimming underwater." \A tennis player in a blue polo shirt is looking down at the green

Benchmark from Unrolled GAN (small discriminator)Real data DCGAN PacDCGAN2

Modes (Max 1000) KL

DCGAN 30.6±20.73 5.99±0.42Unrolled GAN, 1 step 65.4±34.75 5.91±0.14Unrolled GAN, 5 steps 236.4±63.30 4.67±0.43Unrolled GAN, 10 steps 327.2±74.67 4.66±0.46

PacDCGAN2 370.8±244.34 3.33±1.02PacDCGAN3 534.3±103.68 2.11±0.52PacDCGAN4 557.7±101.37 2.06±0.61

10 / 25

Page 11: The power of two samples for Generative Adversarial Networks\This guy is in black trunks and swimming underwater." \A tennis player in a blue polo shirt is looking down at the green

Intuition behind packing via toy example

Generator Q1

with mode collapse

Generator Q2

without mode collapse

Target distribution P

1

1

P

1.25

0.2

1

1

Q1

1

1

0.6

1.4

0.5

Q2

dTV(P, Q1) = 0.2 dTV(P, Q2) = 0.2

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

1 2 3 4 5 6

11 / 25

Page 12: The power of two samples for Generative Adversarial Networks\This guy is in black trunks and swimming underwater." \A tennis player in a blue polo shirt is looking down at the green

Intuition behind packing via toy example

Generator Q1

with mode collapse

Generator Q2

without mode collapse

Target distribution P

1

0.2 1 10.5

0.2

1

1

dTV(P ⇥ P, Q1 ⇥ Q1) = 0.36 dTV(P ⇥ P, Q2 ⇥ Q2) = 0.24

P ⇥ P

Q1 ⇥ Q1 Q2 ⇥ Q2

0.62

1.252

1.42

1.4 ⇥ 0.6

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

1 2 3 4 5 6

dTV(P 2, Q21)

dTV(P 2, Q22)

11 / 25

Page 13: The power of two samples for Generative Adversarial Networks\This guy is in black trunks and swimming underwater." \A tennis player in a blue polo shirt is looking down at the green

Intuition behind packing via toy example

Generator Q1

with mode collapse

Generator Q2

without mode collapse

Target distribution P

1

0.2 1 10.5

0.2

1

1

dTV(P ⇥ P, Q1 ⇥ Q1) = 0.36 dTV(P ⇥ P, Q2 ⇥ Q2) = 0.24

P ⇥ P

Q1 ⇥ Q1 Q2 ⇥ Q2

0.62

1.252

1.42

1.4 ⇥ 0.6

m 0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

1 2 3 4 5 6

dTV(Pm, Qm2 )

dTV(Pm, Qm1 )

11 / 25

Page 14: The power of two samples for Generative Adversarial Networks\This guy is in black trunks and swimming underwater." \A tennis player in a blue polo shirt is looking down at the green

Evolution of TV distances

through the prism of packing

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

1 3 5 7 9 11

the size of packing m

Total variation

dTV(Pm, Qm)

Through packing, the target-generator pairs are expandedover the strengths of the mode collapse

12 / 25

Page 15: The power of two samples for Generative Adversarial Networks\This guy is in black trunks and swimming underwater." \A tennis player in a blue polo shirt is looking down at the green

Evolution of TV distancesthrough the prism of packing

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

1 3 5 7 9 11

the size of packing m

Total variation

dTV(Pm, Qm)

strongmode collapse

weakmode collapse

~www�Through packing, the target-generator pairs are expandedover the strengths of the mode collapse

12 / 25

Page 16: The power of two samples for Generative Adversarial Networks\This guy is in black trunks and swimming underwater." \A tennis player in a blue polo shirt is looking down at the green

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

1 3 5 7 9 11

maxP,Q

/minP,Q

dTV(P2, Q2)

subject to dTV(P,Q) = τ

we focus on m = 2 for this talk

13 / 25

Page 17: The power of two samples for Generative Adversarial Networks\This guy is in black trunks and swimming underwater." \A tennis player in a blue polo shirt is looking down at the green

Intuition from Blackwell

Definition [mode collapse region]

We say a pair (P,Q) of a target distribution P and a generatordistribution Q has (ε, δ)-mode collapse if there exists a set S such that

P (S) ≥ δ , and Q(S) ≤ ε .

Generator Q1

with mode collapse

Target distribution P

1

1

P

1.25

0.2

1

1

Q1

0

0.5

1

0 0.5 1

R(P, Q1)

"

14 / 25

Page 18: The power of two samples for Generative Adversarial Networks\This guy is in black trunks and swimming underwater." \A tennis player in a blue polo shirt is looking down at the green

Intuition from Blackwell

Definition [mode collapse region]

We say a pair (P,Q) of a target distribution P and a generatordistribution Q has (ε, δ)-mode collapse if there exists a set S such that

P (S) ≥ δ , and Q(S) ≤ ε .

Generator Q1

with mode collapse

Target distribution P

1

1

P

1.25

0.2

1

1

Q1

0

0.5

1

0 0.5 1

R(P, Q1)

"

14 / 25

Page 19: The power of two samples for Generative Adversarial Networks\This guy is in black trunks and swimming underwater." \A tennis player in a blue polo shirt is looking down at the green

Intuition from Blackwell

Definition [mode collapse region]

We say a pair (P,Q) of a target distribution P and a generatordistribution Q has (ε, δ)-mode collapse if there exists a set S such that

P (S) ≥ δ , and Q(S) ≤ ε .

Generator Q1

with mode collapse

Target distribution P

1

1

P

1.25

0.2

1

1

Q1

0

0.5

1

0 0.5 1

R(P, Q1)

"

?

"

14 / 25

Page 20: The power of two samples for Generative Adversarial Networks\This guy is in black trunks and swimming underwater." \A tennis player in a blue polo shirt is looking down at the green

Intuition from Blackwell

Definition [mode collapse region]

We say a pair (P,Q) of a target distribution P and a generatordistribution Q has (ε, δ)-mode collapse if there exists a set S such that

P (S) ≥ δ , and Q(S) ≤ ε .

Generator Q1

with mode collapse

Target distribution P

1

1

P

1.25

0.2

1

1

Q1

0

0.5

1

0 0.5 1

R(P, Q1)

"

?

"

14 / 25

Page 21: The power of two samples for Generative Adversarial Networks\This guy is in black trunks and swimming underwater." \A tennis player in a blue polo shirt is looking down at the green

Intuition from Blackwell

Definition [mode collapse region]

We say a pair (P,Q) of a target distribution P and a generatordistribution Q has (ε, δ)-mode collapse if there exists a set S such that

P (S) ≥ δ , and Q(S) ≤ ε .

Generator Q1

with mode collapse

Target distribution P

1

1

P

1.25

0.2

1

1

Q1

0

0.5

1

0 0.5 1

R(P, Q1)

"

?�

"

14 / 25

Page 22: The power of two samples for Generative Adversarial Networks\This guy is in black trunks and swimming underwater." \A tennis player in a blue polo shirt is looking down at the green

Intuition from Blackwell

Definition [mode collapse region]

We say a pair (P,Q) of a target distribution P and a generatordistribution Q has (ε, δ)-mode collapse if there exists a set S such that

P (S) ≥ δ , and Q(S) ≤ ε .

Generator Q1

with mode collapse

Target distribution P

1

1

P

1.25

0.2

1

1

Q1

0

0.5

1

0 0.5 1

R(P, Q1)

"

�?�

"

14 / 25

Page 23: The power of two samples for Generative Adversarial Networks\This guy is in black trunks and swimming underwater." \A tennis player in a blue polo shirt is looking down at the green

Intuition from Blackwell

Definition [mode collapse region]

We say a pair (P,Q) of a target distribution P and a generatordistribution Q has (ε, δ)-mode collapse if there exists a set S such that

P (S) ≥ δ , and Q(S) ≤ ε .

Generator Q1

with mode collapse

Target distribution P

1

1

P

1.25

0.2

1

1

Q1

0

0.5

1

0 0.5 1

R(P, Q1)

"

?

"

14 / 25

Page 24: The power of two samples for Generative Adversarial Networks\This guy is in black trunks and swimming underwater." \A tennis player in a blue polo shirt is looking down at the green

Intuition from Blackwell

Definition [mode collapse region]

We say a pair (P,Q) of a target distribution P and a generatordistribution Q has (ε, δ)-mode collapse if there exists a set S such that

P (S) ≥ δ , and Q(S) ≤ ε .

Generator Q1

with mode collapse

Target distribution P

1

1

P

1.25

0.2

1

1

Q1

0

0.5

1

0 0.5 1

R(P, Q1)

"

?

"

14 / 25

Page 25: The power of two samples for Generative Adversarial Networks\This guy is in black trunks and swimming underwater." \A tennis player in a blue polo shirt is looking down at the green

Intuition from Blackwell

Definition [mode collapse region]

We say a pair (P,Q) of a target distribution P and a generatordistribution Q has (ε, δ)-mode collapse if there exists a set S such that

P (S) ≥ δ , and Q(S) ≤ ε .

Target distribution P

1

1

P

"

Generator Q2

without mode collapse

1

1

0.6

1.4

0.5

Q2

0

0.5

1

0 0.5 1

R(P, Q2)

14 / 25

Page 26: The power of two samples for Generative Adversarial Networks\This guy is in black trunks and swimming underwater." \A tennis player in a blue polo shirt is looking down at the green

Intuition from Blackwell

Definition [mode collapse region]

We say a pair (P,Q) of a target distribution P and a generatordistribution Q has (ε, δ)-mode collapse if there exists a set S such that

P (S) ≥ δ , and Q(S) ≤ ε .

Target distribution P

1

1

P

"

Generator Q2

without mode collapse

1

1

0.6

1.4

0.5

Q2

0

0.5

1

0 0.5 1

R(P, Q2)

dTV(P, Q2) = 0.2

14 / 25

Page 27: The power of two samples for Generative Adversarial Networks\This guy is in black trunks and swimming underwater." \A tennis player in a blue polo shirt is looking down at the green

Upper bound

0

0.5

1

0 0.5 1

R(P, Q)

ε

δ

maxP,Q

dTV(P2, Q2)

subject to dTV(P,Q) = τ

R(P,Q) ⊆ Router(τ)

R(P 2, Q2) ⊆ R(P 2outer, Q

2outer)

dTV(P2, Q2) ≤ dTV(P

2outer, Q

2outer)︸ ︷︷ ︸

1−(1−τ)2

Blackwell’s theorem

R(P,Q) ⊆ R(P ′, Q′)⇒ R(P 2, Q2) ⊆ R(P ′2, Q′2)

15 / 25

Page 28: The power of two samples for Generative Adversarial Networks\This guy is in black trunks and swimming underwater." \A tennis player in a blue polo shirt is looking down at the green

Upper bound

0

0.5

1

0 0.5 1

R(P, Q)

Router(⌧)

ε

δ

maxP,Q

dTV(P2, Q2)

subject to dTV(P,Q) = τ

R(P,Q) ⊆ Router(τ)

R(P 2, Q2) ⊆ R(P 2outer, Q

2outer)

dTV(P2, Q2) ≤ dTV(P

2outer, Q

2outer)︸ ︷︷ ︸

1−(1−τ)2

Blackwell’s theorem

R(P,Q) ⊆ R(P ′, Q′)⇒ R(P 2, Q2) ⊆ R(P ′2, Q′2)

15 / 25

Page 29: The power of two samples for Generative Adversarial Networks\This guy is in black trunks and swimming underwater." \A tennis player in a blue polo shirt is looking down at the green

Upper bound

0

0.5

1

0 0.5 1

R(P, Q)

Router(⌧)

1 � ⌧

1 � ⌧

0

0

Qouter(·)

Pouter(·)

ε

δ

maxP,Q

dTV(P2, Q2)

subject to dTV(P,Q) = τ

R(P,Q) ⊆ Router(τ)

R(P 2, Q2) ⊆ R(P 2outer, Q

2outer)

dTV(P2, Q2) ≤ dTV(P

2outer, Q

2outer)︸ ︷︷ ︸

1−(1−τ)2

Blackwell’s theorem

R(P,Q) ⊆ R(P ′, Q′)⇒ R(P 2, Q2) ⊆ R(P ′2, Q′2)

15 / 25

Page 30: The power of two samples for Generative Adversarial Networks\This guy is in black trunks and swimming underwater." \A tennis player in a blue polo shirt is looking down at the green

Upper bound

0

0.5

1

0 0.5 1

R(P, Q)

Router(⌧)

1 � ⌧

1 � ⌧

0

0

Qouter(·)

Pouter(·)

ε

δ

maxP,Q

dTV(P2, Q2)

subject to dTV(P,Q) = τ

R(P,Q) ⊆ Router(τ)

R(P 2, Q2) ⊆ R(P 2outer, Q

2outer)

dTV(P2, Q2) ≤ dTV(P

2outer, Q

2outer)︸ ︷︷ ︸

1−(1−τ)2

Blackwell’s theorem

R(P,Q) ⊆ R(P ′, Q′)⇒ R(P 2, Q2) ⊆ R(P ′2, Q′2)

15 / 25

Page 31: The power of two samples for Generative Adversarial Networks\This guy is in black trunks and swimming underwater." \A tennis player in a blue polo shirt is looking down at the green

Upper bound

0

0.5

1

0 0.5 1

R(P, Q)

Router(⌧)

1 � ⌧

1 � ⌧

0

0

Qouter(·)

Pouter(·)

ε

δ

maxP,Q

dTV(P2, Q2)

subject to dTV(P,Q) = τ

R(P,Q) ⊆ Router(τ)

R(P 2, Q2) ⊆ R(P 2outer, Q

2outer)

dTV(P2, Q2) ≤ dTV(P

2outer, Q

2outer)︸ ︷︷ ︸

1−(1−τ)2

Blackwell’s theorem

R(P,Q) ⊆ R(P ′, Q′)⇒ R(P 2, Q2) ⊆ R(P ′2, Q′2)

15 / 25

Page 32: The power of two samples for Generative Adversarial Networks\This guy is in black trunks and swimming underwater." \A tennis player in a blue polo shirt is looking down at the green

Lower bound

0

0.5

1

0 0.5 1

R(P, Q)

ε

δ

minP,Q

dTV(P2, Q2)

subject to dTV(P,Q) = τ

Rinner(τ, α) ⊆ R(P,Q)

R(P 2inner, Q

2inner) ⊆ R(P 2, Q2)

dTV(P2inner, Q

2inner)︸ ︷︷ ︸

τ

≤ dTV(P2, Q2)

Blackwell’s theorem

R(P,Q) ⊆ R(P ′, Q′)⇒ R(P 2, Q2) ⊆ R(P ′2, Q′2)

16 / 25

Page 33: The power of two samples for Generative Adversarial Networks\This guy is in black trunks and swimming underwater." \A tennis player in a blue polo shirt is looking down at the green

Lower bound

0

0.5

1

0 0.5 1

R(P, Q)

Rinner(⌧,↵)

ε

δ

minP,Q

dTV(P2, Q2)

subject to dTV(P,Q) = τ

Rinner(τ, α) ⊆ R(P,Q)

R(P 2inner, Q

2inner) ⊆ R(P 2, Q2)

dTV(P2inner, Q

2inner)︸ ︷︷ ︸

τ

≤ dTV(P2, Q2)

Blackwell’s theorem

R(P,Q) ⊆ R(P ′, Q′)⇒ R(P 2, Q2) ⊆ R(P ′2, Q′2)

16 / 25

Page 34: The power of two samples for Generative Adversarial Networks\This guy is in black trunks and swimming underwater." \A tennis player in a blue polo shirt is looking down at the green

Lower bound

0

0.5

1

0 0.5 1

Pinner(·)

Qinner(·)

1 � ↵

1 � ↵� ⌧↵ + ⌧

R(P, Q)

Rinner(⌧,↵)

ε

δ

minP,Q

dTV(P2, Q2)

subject to dTV(P,Q) = τ

Rinner(τ, α) ⊆ R(P,Q)

R(P 2inner, Q

2inner) ⊆ R(P 2, Q2)

dTV(P2inner, Q

2inner)︸ ︷︷ ︸

τ

≤ dTV(P2, Q2)

Blackwell’s theorem

R(P,Q) ⊆ R(P ′, Q′)⇒ R(P 2, Q2) ⊆ R(P ′2, Q′2)

16 / 25

Page 35: The power of two samples for Generative Adversarial Networks\This guy is in black trunks and swimming underwater." \A tennis player in a blue polo shirt is looking down at the green

Lower bound

0

0.5

1

0 0.5 1

Pinner(·)

Qinner(·)

1 � ↵

1 � ↵� ⌧↵ + ⌧

R(P, Q)

Rinner(⌧,↵)

ε

δ

minP,Q

dTV(P2, Q2)

subject to dTV(P,Q) = τ

Rinner(τ, α) ⊆ R(P,Q)

R(P 2inner, Q

2inner) ⊆ R(P 2, Q2)

dTV(P2inner, Q

2inner)︸ ︷︷ ︸

τ

≤ dTV(P2, Q2)

Blackwell’s theorem

R(P,Q) ⊆ R(P ′, Q′)⇒ R(P 2, Q2) ⊆ R(P ′2, Q′2)

16 / 25

Page 36: The power of two samples for Generative Adversarial Networks\This guy is in black trunks and swimming underwater." \A tennis player in a blue polo shirt is looking down at the green

Lower bound

0

0.5

1

0 0.5 1

Pinner(·)

Qinner(·)

1 � ↵

1 � ↵� ⌧↵ + ⌧

R(P, Q)

Rinner(⌧,↵)

ε

δ

minP,Q

dTV(P2, Q2)

subject to dTV(P,Q) = τ

Rinner(τ, α) ⊆ R(P,Q)

R(P 2inner, Q

2inner) ⊆ R(P 2, Q2)

dTV(P2inner, Q

2inner)︸ ︷︷ ︸

τ

≤ dTV(P2, Q2)

Blackwell’s theorem

R(P,Q) ⊆ R(P ′, Q′)⇒ R(P 2, Q2) ⊆ R(P ′2, Q′2)

16 / 25

Page 37: The power of two samples for Generative Adversarial Networks\This guy is in black trunks and swimming underwater." \A tennis player in a blue polo shirt is looking down at the green

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

1 3 5 7 9 11

dTV(Pm, Qm)

m

maxP,Q

/minP,Q

dTV(P2, Q2)

subject to dTV(P,Q) = τ

17 / 25

Page 38: The power of two samples for Generative Adversarial Networks\This guy is in black trunks and swimming underwater." \A tennis player in a blue polo shirt is looking down at the green

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

1 3 5 7 9 11

dTV(Pm, Qm)

m

maxP,Q

dTV(P2, Q2)

subject to dTV(P,Q) = τ

no (ε0, δ0)-mode collapse

17 / 25

Page 39: The power of two samples for Generative Adversarial Networks\This guy is in black trunks and swimming underwater." \A tennis player in a blue polo shirt is looking down at the green

Upper bound without (ε0, δ0)-mode collapse

0

0.5

1

0 0.5 1

R(P, Q)

("0, �0)

ε

δ

maxP,Q

dTV(P2, Q2)

subject to dTV(P,Q) = τ

no (ε0, δ0)-mode collapse

R(P,Q) ⊆ Router(τ, ε0, δ0, α)

R(P 2, Q2) ⊆ R(P 2outer, Q

2outer)

dTV(P2, Q2) ≤ dTV(P

2outer, Q

2outer)︸ ︷︷ ︸

simple to evaluate

Blackwell’s theorem

R(P,Q) ⊆ R(P ′, Q′)⇒ R(P 2, Q2) ⊆ R(P ′2, Q′2)

18 / 25

Page 40: The power of two samples for Generative Adversarial Networks\This guy is in black trunks and swimming underwater." \A tennis player in a blue polo shirt is looking down at the green

Upper bound without (ε0, δ0)-mode collapse

0

0.5

1

0 0.5 1

R(P, Q)

("0, �0)

Router(⌧, "0, �0,↵)

ε

δ

maxP,Q

dTV(P2, Q2)

subject to dTV(P,Q) = τ

no (ε0, δ0)-mode collapse

R(P,Q) ⊆ Router(τ, ε0, δ0, α)

R(P 2, Q2) ⊆ R(P 2outer, Q

2outer)

dTV(P2, Q2) ≤ dTV(P

2outer, Q

2outer)︸ ︷︷ ︸

simple to evaluate

Blackwell’s theorem

R(P,Q) ⊆ R(P ′, Q′)⇒ R(P 2, Q2) ⊆ R(P ′2, Q′2)

18 / 25

Page 41: The power of two samples for Generative Adversarial Networks\This guy is in black trunks and swimming underwater." \A tennis player in a blue polo shirt is looking down at the green

Upper bound without (ε0, δ0)-mode collapse

Qouter(·)

Pouter(·)

0

0.5

1

0 0.5 1

R(P, Q)

("0, �0)

Router(⌧, "0, �0,↵)

↵1 � ↵� ⌧

(1 � ↵� ⌧)(1 � ↵� �)

1 � ↵� ⌧ � "

ε

δ

maxP,Q

dTV(P2, Q2)

subject to dTV(P,Q) = τ

no (ε0, δ0)-mode collapse

R(P,Q) ⊆ Router(τ, ε0, δ0, α)

R(P 2, Q2) ⊆ R(P 2outer, Q

2outer)

dTV(P2, Q2) ≤ dTV(P

2outer, Q

2outer)︸ ︷︷ ︸

simple to evaluate

Blackwell’s theorem

R(P,Q) ⊆ R(P ′, Q′)⇒ R(P 2, Q2) ⊆ R(P ′2, Q′2)

18 / 25

Page 42: The power of two samples for Generative Adversarial Networks\This guy is in black trunks and swimming underwater." \A tennis player in a blue polo shirt is looking down at the green

Upper bound without (ε0, δ0)-mode collapse

Qouter(·)

Pouter(·)

0

0.5

1

0 0.5 1

R(P, Q)

("0, �0)

Router(⌧, "0, �0,↵)

↵1 � ↵� ⌧

(1 � ↵� ⌧)(1 � ↵� �)

1 � ↵� ⌧ � "

ε

δ

maxP,Q

dTV(P2, Q2)

subject to dTV(P,Q) = τ

no (ε0, δ0)-mode collapse

R(P,Q) ⊆ Router(τ, ε0, δ0, α)

R(P 2, Q2) ⊆ R(P 2outer, Q

2outer)

dTV(P2, Q2) ≤ dTV(P

2outer, Q

2outer)︸ ︷︷ ︸

simple to evaluate

Blackwell’s theorem

R(P,Q) ⊆ R(P ′, Q′)⇒ R(P 2, Q2) ⊆ R(P ′2, Q′2)

18 / 25

Page 43: The power of two samples for Generative Adversarial Networks\This guy is in black trunks and swimming underwater." \A tennis player in a blue polo shirt is looking down at the green

Upper bound without (ε0, δ0)-mode collapse

Qouter(·)

Pouter(·)

0

0.5

1

0 0.5 1

R(P, Q)

("0, �0)

Router(⌧, "0, �0,↵)

↵1 � ↵� ⌧

(1 � ↵� ⌧)(1 � ↵� �)

1 � ↵� ⌧ � "

ε

δ

maxP,Q

dTV(P2, Q2)

subject to dTV(P,Q) = τ

no (ε0, δ0)-mode collapse

R(P,Q) ⊆ Router(τ, ε0, δ0, α)

R(P 2, Q2) ⊆ R(P 2outer, Q

2outer)

dTV(P2, Q2) ≤ dTV(P

2outer, Q

2outer)︸ ︷︷ ︸

simple to evaluate

Blackwell’s theorem

R(P,Q) ⊆ R(P ′, Q′)⇒ R(P 2, Q2) ⊆ R(P ′2, Q′2)

18 / 25

Page 44: The power of two samples for Generative Adversarial Networks\This guy is in black trunks and swimming underwater." \A tennis player in a blue polo shirt is looking down at the green

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

1 3 5 7 9 11

dTV(P2, Q2)

m

minP,Q

dTV(P2, Q2)

subject to dTV(P,Q) = τ

(ε1, δ1)-mode collapse

19 / 25

Page 45: The power of two samples for Generative Adversarial Networks\This guy is in black trunks and swimming underwater." \A tennis player in a blue polo shirt is looking down at the green

Lower bound with (ε1, δ1)-mode collapse

0

0.5

1

0 0.5 1

R(P, Q)

("1, �1)

ε

δ

minP,Q

dTV(P2, Q2)

subject to dTV(P,Q) = τ

(ε1, δ1)-mode collapse

Rinner(τ, α, ε1, δ1) ⊆ R(P,Q)

R(P 2inner, Q

2inner) ⊆ R(P 2, Q2)

dTV(P2inner, Q

2inner)︸ ︷︷ ︸

simple to evaluate

≤ dTV(P2, Q2)

Blackwell’s theorem

R(P,Q) ⊆ R(P ′, Q′)⇒ R(P 2, Q2) ⊆ R(P ′2, Q′2)

20 / 25

Page 46: The power of two samples for Generative Adversarial Networks\This guy is in black trunks and swimming underwater." \A tennis player in a blue polo shirt is looking down at the green

Lower bound with (ε1, δ1)-mode collapse

0

0.5

1

0 0.5 1

R(P, Q)

("1, �1)

Rinner(⌧,↵, "1, �1)

ε

δ

minP,Q

dTV(P2, Q2)

subject to dTV(P,Q) = τ

(ε1, δ1)-mode collapse

Rinner(τ, α, ε1, δ1) ⊆ R(P,Q)

R(P 2inner, Q

2inner) ⊆ R(P 2, Q2)

dTV(P2inner, Q

2inner)︸ ︷︷ ︸

simple to evaluate

≤ dTV(P2, Q2)

Blackwell’s theorem

R(P,Q) ⊆ R(P ′, Q′)⇒ R(P 2, Q2) ⊆ R(P ′2, Q′2)

20 / 25

Page 47: The power of two samples for Generative Adversarial Networks\This guy is in black trunks and swimming underwater." \A tennis player in a blue polo shirt is looking down at the green

Lower bound with (ε1, δ1)-mode collapse

0

0.5

1

0 0.5 1

R(P, Q)

("1, �1)

↵ + ⌧

Pinner(·)

Qinner(·)

�11 � ↵� �1

"1

1 � ↵� ⌧ � "1

Rinner(⌧,↵, "1, �1)

ε

δ

minP,Q

dTV(P2, Q2)

subject to dTV(P,Q) = τ

(ε1, δ1)-mode collapse

Rinner(τ, α, ε1, δ1) ⊆ R(P,Q)

R(P 2inner, Q

2inner) ⊆ R(P 2, Q2)

dTV(P2inner, Q

2inner)︸ ︷︷ ︸

simple to evaluate

≤ dTV(P2, Q2)

Blackwell’s theorem

R(P,Q) ⊆ R(P ′, Q′)⇒ R(P 2, Q2) ⊆ R(P ′2, Q′2)

20 / 25

Page 48: The power of two samples for Generative Adversarial Networks\This guy is in black trunks and swimming underwater." \A tennis player in a blue polo shirt is looking down at the green

Lower bound with (ε1, δ1)-mode collapse

0

0.5

1

0 0.5 1

R(P, Q)

("1, �1)

↵ + ⌧

Pinner(·)

Qinner(·)

�11 � ↵� �1

"1

1 � ↵� ⌧ � "1

Rinner(⌧,↵, "1, �1)

ε

δ

minP,Q

dTV(P2, Q2)

subject to dTV(P,Q) = τ

(ε1, δ1)-mode collapse

Rinner(τ, α, ε1, δ1) ⊆ R(P,Q)

R(P 2inner, Q

2inner) ⊆ R(P 2, Q2)

dTV(P2inner, Q

2inner)︸ ︷︷ ︸

simple to evaluate

≤ dTV(P2, Q2)

Blackwell’s theorem

R(P,Q) ⊆ R(P ′, Q′)⇒ R(P 2, Q2) ⊆ R(P ′2, Q′2)

20 / 25

Page 49: The power of two samples for Generative Adversarial Networks\This guy is in black trunks and swimming underwater." \A tennis player in a blue polo shirt is looking down at the green

Lower bound with (ε1, δ1)-mode collapse

0

0.5

1

0 0.5 1

R(P, Q)

("1, �1)

↵ + ⌧

Pinner(·)

Qinner(·)

�11 � ↵� �1

"1

1 � ↵� ⌧ � "1

Rinner(⌧,↵, "1, �1)

ε

δ

minP,Q

dTV(P2, Q2)

subject to dTV(P,Q) = τ

(ε1, δ1)-mode collapse

Rinner(τ, α, ε1, δ1) ⊆ R(P,Q)

R(P 2inner, Q

2inner) ⊆ R(P 2, Q2)

dTV(P2inner, Q

2inner)︸ ︷︷ ︸

simple to evaluate

≤ dTV(P2, Q2)

Blackwell’s theorem

R(P,Q) ⊆ R(P ′, Q′)⇒ R(P 2, Q2) ⊆ R(P ′2, Q′2)

20 / 25

Page 50: The power of two samples for Generative Adversarial Networks\This guy is in black trunks and swimming underwater." \A tennis player in a blue polo shirt is looking down at the green

Achievable TV distances for distributions

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

1 3 5 7 9 11

with (ε1, δ1)-mode collapse

dTV(Pm, Qm)

m

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

1 3 5 7 9 11

without (ε0, δ0)-mode collapse

m

with packing, the discriminator naturally penalizes (P,Q) with severemode collapses

21 / 25

Page 51: The power of two samples for Generative Adversarial Networks\This guy is in black trunks and swimming underwater." \A tennis player in a blue polo shirt is looking down at the green

Does larger m help?

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

1 3 5 7 9 11

H1H0

overlap

m

dTV(Pm, Qm)

22 / 25

Page 52: The power of two samples for Generative Adversarial Networks\This guy is in black trunks and swimming underwater." \A tennis player in a blue polo shirt is looking down at the green

Evolution of TV distancesthrough the prism of packing

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

1 3 5 7 9 11

the size of packing m

Total variation

dTV(Pm, Qm)

strongmode collapse

weakmode collapse

~www�Through the prism packing, the target-generator pairs are expandedover the spectrum of (the strengths of) mode collapse

23 / 25

Page 53: The power of two samples for Generative Adversarial Networks\This guy is in black trunks and swimming underwater." \A tennis player in a blue polo shirt is looking down at the green

Our paper is available at:https://arxiv.org/abs/1712.04086

24 / 25

Page 54: The power of two samples for Generative Adversarial Networks\This guy is in black trunks and swimming underwater." \A tennis player in a blue polo shirt is looking down at the green

Packing is a principled solution to mode collapseI Lightweight overheadI Excellent numerical resultsI Theoretical analysis

Exciting time for bridging the theory and practice of GAN• “Generalization and Equilibrium in Generative Adversarial Nets”, SanjeevArora, Rong Ge, Yingyu Liang, Tengyu Ma, Yi Zhang• “Approximation and Convergence Properties of Generative AdversarialLearning”, Shuang Liu, Olivier Bousquet, Kamalika Chaudhuri• “Compressed Sensing using Generative Models”, Ashish Bora, Ajil Jalal,Eric Price, Alexandros G. Dimakis

• “Towards Understanding the Dynamics of Generative Adversarial

Networks”, Jerry Li, Aleksander Madry, John Peebles, Ludwig Schmidt

Giulia Fanti (CMU) Ashish Khetan (UIUC) Zinan Lin (CMU) 25 / 25