Upload
others
View
3
Download
0
Embed Size (px)
Citation preview
The power of two samples forGenerative Adversarial Networks
Sewoong Oh
Department of Industrial and Enterprise Systems EngineeringUniversity of Illinois at Urbana-Champaign
joint work with Giulia Fanti, Ashish Khetan, Zinan Lin
1 / 25
Generative models
Z ⇠ N(0, Ik⇥k)
Z1
Z50
......
.....................G(Z)12288
G(Z)1
G(Z) 2 R64⇥64⇥3
1
A generative model is a black box that takes a random vector Z ∈ Rkand produces a sample vector G(Z) ∈ Rn
It is a differentiable function and stochastic gradient descent is usedto train G
1[“Compressed Sensing using Generative Models”, A Bora, A Jalal, E Price, AGDimakis, 2017]
2 / 25
Generative models learn representation2
= Zglasses
G�1( )
G�1( ) + Zglasses
Z space Image space
G
2[DCGAN, Radford et al. 2015] 3 / 25
Generative Adversarial Networks (GAN)
Z
G(Z)
XD(X)real
1fake
0real
1real
1
Real data
Generator G(Z)
Discriminator D(X)
minG
maxD
V (G,D)
4 / 25
Mode collapse
Target distribution GAN
5 / 25
Mode collapse
“A man in a orange jacket with sunglasses and a hat ski down a hill.”3
“This guy is in black trunks and swimming underwater.”
“A tennis player in a blue polo shirt is looking down at the greencourt.”
3[“Generating interpretable images with controllable structure”, by Reed et al., 2016]6 / 25
New framework: PacGANlightweight overheadexperimental resultsprincipled
Z
G(Z)
X real 1
fake 0
real 1
real 1
⇥2
D(X1, X2)
7 / 25
Benchmark datasetes from VeeGAN paper
Target GAN PacGAN2
Modes high quality(Max 25) samples
GAN 3.3 0.5 %ALI 15.8 1.6 %Unrolled GAN 23.6 16.0 %VeeGAN 24.6 40.0 %PacGAN2 24.6±0.9 65.8±13.4 %PacGAN3 24.9±0.3 71.4±13.8 %PacGAN4 25.0±0.0 76.0±7.1 %
8 / 25
Benchmark datasetes from VeeGAN paperReal data DCGAN PacDCGAN2
Modes (Max 1000) KL
DCGAN 99.0 3.40ALI 16.0 5.40Unrolled GAN 48.7 4.32VEEGAN 150.0 2.95
PacDCGAN2 1000.0±0.0 0.06±0.01PacDCGAN3 1000.0±0.0 0.06±0.01PacDCGAN4 1000.0±0.0 0.07±0.01
9 / 25
Benchmark from Unrolled GAN (small discriminator)Real data DCGAN PacDCGAN2
Modes (Max 1000) KL
DCGAN 30.6±20.73 5.99±0.42Unrolled GAN, 1 step 65.4±34.75 5.91±0.14Unrolled GAN, 5 steps 236.4±63.30 4.67±0.43Unrolled GAN, 10 steps 327.2±74.67 4.66±0.46
PacDCGAN2 370.8±244.34 3.33±1.02PacDCGAN3 534.3±103.68 2.11±0.52PacDCGAN4 557.7±101.37 2.06±0.61
10 / 25
Intuition behind packing via toy example
Generator Q1
with mode collapse
Generator Q2
without mode collapse
Target distribution P
1
1
P
1.25
0.2
1
1
Q1
1
1
0.6
1.4
0.5
Q2
dTV(P, Q1) = 0.2 dTV(P, Q2) = 0.2
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
1 2 3 4 5 6
11 / 25
Intuition behind packing via toy example
Generator Q1
with mode collapse
Generator Q2
without mode collapse
Target distribution P
1
0.2 1 10.5
0.2
1
1
dTV(P ⇥ P, Q1 ⇥ Q1) = 0.36 dTV(P ⇥ P, Q2 ⇥ Q2) = 0.24
P ⇥ P
Q1 ⇥ Q1 Q2 ⇥ Q2
0.62
1.252
1.42
1.4 ⇥ 0.6
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
1 2 3 4 5 6
dTV(P 2, Q21)
dTV(P 2, Q22)
11 / 25
Intuition behind packing via toy example
Generator Q1
with mode collapse
Generator Q2
without mode collapse
Target distribution P
1
0.2 1 10.5
0.2
1
1
dTV(P ⇥ P, Q1 ⇥ Q1) = 0.36 dTV(P ⇥ P, Q2 ⇥ Q2) = 0.24
P ⇥ P
Q1 ⇥ Q1 Q2 ⇥ Q2
0.62
1.252
1.42
1.4 ⇥ 0.6
m 0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
1 2 3 4 5 6
dTV(Pm, Qm2 )
dTV(Pm, Qm1 )
11 / 25
Evolution of TV distances
through the prism of packing
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
1 3 5 7 9 11
the size of packing m
Total variation
dTV(Pm, Qm)
Through packing, the target-generator pairs are expandedover the strengths of the mode collapse
12 / 25
Evolution of TV distancesthrough the prism of packing
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
1 3 5 7 9 11
the size of packing m
Total variation
dTV(Pm, Qm)
strongmode collapse
weakmode collapse
~www�Through packing, the target-generator pairs are expandedover the strengths of the mode collapse
12 / 25
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
1 3 5 7 9 11
maxP,Q
/minP,Q
dTV(P2, Q2)
subject to dTV(P,Q) = τ
we focus on m = 2 for this talk
13 / 25
Intuition from Blackwell
Definition [mode collapse region]
We say a pair (P,Q) of a target distribution P and a generatordistribution Q has (ε, δ)-mode collapse if there exists a set S such that
P (S) ≥ δ , and Q(S) ≤ ε .
Generator Q1
with mode collapse
Target distribution P
1
1
P
1.25
0.2
1
1
Q1
0
0.5
1
0 0.5 1
R(P, Q1)
"
�
14 / 25
Intuition from Blackwell
Definition [mode collapse region]
We say a pair (P,Q) of a target distribution P and a generatordistribution Q has (ε, δ)-mode collapse if there exists a set S such that
P (S) ≥ δ , and Q(S) ≤ ε .
Generator Q1
with mode collapse
Target distribution P
1
1
P
1.25
0.2
1
1
Q1
0
0.5
1
0 0.5 1
R(P, Q1)
"
�
14 / 25
Intuition from Blackwell
Definition [mode collapse region]
We say a pair (P,Q) of a target distribution P and a generatordistribution Q has (ε, δ)-mode collapse if there exists a set S such that
P (S) ≥ δ , and Q(S) ≤ ε .
Generator Q1
with mode collapse
Target distribution P
1
1
P
1.25
0.2
1
1
Q1
0
0.5
1
0 0.5 1
R(P, Q1)
"
�
?
�
"
14 / 25
Intuition from Blackwell
Definition [mode collapse region]
We say a pair (P,Q) of a target distribution P and a generatordistribution Q has (ε, δ)-mode collapse if there exists a set S such that
P (S) ≥ δ , and Q(S) ≤ ε .
Generator Q1
with mode collapse
Target distribution P
1
1
P
1.25
0.2
1
1
Q1
0
0.5
1
0 0.5 1
R(P, Q1)
"
�
?
�
"
14 / 25
Intuition from Blackwell
Definition [mode collapse region]
We say a pair (P,Q) of a target distribution P and a generatordistribution Q has (ε, δ)-mode collapse if there exists a set S such that
P (S) ≥ δ , and Q(S) ≤ ε .
Generator Q1
with mode collapse
Target distribution P
1
1
P
1.25
0.2
1
1
Q1
0
0.5
1
0 0.5 1
R(P, Q1)
"
�
?�
"
14 / 25
Intuition from Blackwell
Definition [mode collapse region]
We say a pair (P,Q) of a target distribution P and a generatordistribution Q has (ε, δ)-mode collapse if there exists a set S such that
P (S) ≥ δ , and Q(S) ≤ ε .
Generator Q1
with mode collapse
Target distribution P
1
1
P
1.25
0.2
1
1
Q1
0
0.5
1
0 0.5 1
R(P, Q1)
"
�?�
"
14 / 25
Intuition from Blackwell
Definition [mode collapse region]
We say a pair (P,Q) of a target distribution P and a generatordistribution Q has (ε, δ)-mode collapse if there exists a set S such that
P (S) ≥ δ , and Q(S) ≤ ε .
Generator Q1
with mode collapse
Target distribution P
1
1
P
1.25
0.2
1
1
Q1
0
0.5
1
0 0.5 1
R(P, Q1)
"
�
?
�
"
14 / 25
Intuition from Blackwell
Definition [mode collapse region]
We say a pair (P,Q) of a target distribution P and a generatordistribution Q has (ε, δ)-mode collapse if there exists a set S such that
P (S) ≥ δ , and Q(S) ≤ ε .
Generator Q1
with mode collapse
Target distribution P
1
1
P
1.25
0.2
1
1
Q1
0
0.5
1
0 0.5 1
R(P, Q1)
"
�
?
�
"
14 / 25
Intuition from Blackwell
Definition [mode collapse region]
We say a pair (P,Q) of a target distribution P and a generatordistribution Q has (ε, δ)-mode collapse if there exists a set S such that
P (S) ≥ δ , and Q(S) ≤ ε .
Target distribution P
1
1
P
"
�
Generator Q2
without mode collapse
1
1
0.6
1.4
0.5
Q2
0
0.5
1
0 0.5 1
R(P, Q2)
14 / 25
Intuition from Blackwell
Definition [mode collapse region]
We say a pair (P,Q) of a target distribution P and a generatordistribution Q has (ε, δ)-mode collapse if there exists a set S such that
P (S) ≥ δ , and Q(S) ≤ ε .
Target distribution P
1
1
P
"
�
Generator Q2
without mode collapse
1
1
0.6
1.4
0.5
Q2
0
0.5
1
0 0.5 1
R(P, Q2)
dTV(P, Q2) = 0.2
14 / 25
Upper bound
⌧
0
0.5
1
0 0.5 1
R(P, Q)
ε
δ
maxP,Q
dTV(P2, Q2)
subject to dTV(P,Q) = τ
R(P,Q) ⊆ Router(τ)
R(P 2, Q2) ⊆ R(P 2outer, Q
2outer)
dTV(P2, Q2) ≤ dTV(P
2outer, Q
2outer)︸ ︷︷ ︸
1−(1−τ)2
Blackwell’s theorem
R(P,Q) ⊆ R(P ′, Q′)⇒ R(P 2, Q2) ⊆ R(P ′2, Q′2)
15 / 25
Upper bound
⌧
0
0.5
1
0 0.5 1
R(P, Q)
Router(⌧)
ε
δ
maxP,Q
dTV(P2, Q2)
subject to dTV(P,Q) = τ
R(P,Q) ⊆ Router(τ)
R(P 2, Q2) ⊆ R(P 2outer, Q
2outer)
dTV(P2, Q2) ≤ dTV(P
2outer, Q
2outer)︸ ︷︷ ︸
1−(1−τ)2
Blackwell’s theorem
R(P,Q) ⊆ R(P ′, Q′)⇒ R(P 2, Q2) ⊆ R(P ′2, Q′2)
15 / 25
Upper bound
⌧
0
0.5
1
0 0.5 1
R(P, Q)
Router(⌧)
1 � ⌧
1 � ⌧
⌧
⌧
0
0
Qouter(·)
Pouter(·)
ε
δ
maxP,Q
dTV(P2, Q2)
subject to dTV(P,Q) = τ
R(P,Q) ⊆ Router(τ)
R(P 2, Q2) ⊆ R(P 2outer, Q
2outer)
dTV(P2, Q2) ≤ dTV(P
2outer, Q
2outer)︸ ︷︷ ︸
1−(1−τ)2
Blackwell’s theorem
R(P,Q) ⊆ R(P ′, Q′)⇒ R(P 2, Q2) ⊆ R(P ′2, Q′2)
15 / 25
Upper bound
⌧
0
0.5
1
0 0.5 1
R(P, Q)
Router(⌧)
1 � ⌧
1 � ⌧
⌧
⌧
0
0
Qouter(·)
Pouter(·)
ε
δ
maxP,Q
dTV(P2, Q2)
subject to dTV(P,Q) = τ
R(P,Q) ⊆ Router(τ)
R(P 2, Q2) ⊆ R(P 2outer, Q
2outer)
dTV(P2, Q2) ≤ dTV(P
2outer, Q
2outer)︸ ︷︷ ︸
1−(1−τ)2
Blackwell’s theorem
R(P,Q) ⊆ R(P ′, Q′)⇒ R(P 2, Q2) ⊆ R(P ′2, Q′2)
15 / 25
Upper bound
⌧
0
0.5
1
0 0.5 1
R(P, Q)
Router(⌧)
1 � ⌧
1 � ⌧
⌧
⌧
0
0
Qouter(·)
Pouter(·)
ε
δ
maxP,Q
dTV(P2, Q2)
subject to dTV(P,Q) = τ
R(P,Q) ⊆ Router(τ)
R(P 2, Q2) ⊆ R(P 2outer, Q
2outer)
dTV(P2, Q2) ≤ dTV(P
2outer, Q
2outer)︸ ︷︷ ︸
1−(1−τ)2
Blackwell’s theorem
R(P,Q) ⊆ R(P ′, Q′)⇒ R(P 2, Q2) ⊆ R(P ′2, Q′2)
15 / 25
Lower bound
⌧
0
0.5
1
0 0.5 1
R(P, Q)
↵
ε
δ
minP,Q
dTV(P2, Q2)
subject to dTV(P,Q) = τ
Rinner(τ, α) ⊆ R(P,Q)
R(P 2inner, Q
2inner) ⊆ R(P 2, Q2)
dTV(P2inner, Q
2inner)︸ ︷︷ ︸
τ
≤ dTV(P2, Q2)
Blackwell’s theorem
R(P,Q) ⊆ R(P ′, Q′)⇒ R(P 2, Q2) ⊆ R(P ′2, Q′2)
16 / 25
Lower bound
⌧
0
0.5
1
0 0.5 1
↵
R(P, Q)
Rinner(⌧,↵)
ε
δ
minP,Q
dTV(P2, Q2)
subject to dTV(P,Q) = τ
Rinner(τ, α) ⊆ R(P,Q)
R(P 2inner, Q
2inner) ⊆ R(P 2, Q2)
dTV(P2inner, Q
2inner)︸ ︷︷ ︸
τ
≤ dTV(P2, Q2)
Blackwell’s theorem
R(P,Q) ⊆ R(P ′, Q′)⇒ R(P 2, Q2) ⊆ R(P ′2, Q′2)
16 / 25
Lower bound
⌧
0
0.5
1
0 0.5 1
↵
Pinner(·)
Qinner(·)
↵
1 � ↵
1 � ↵� ⌧↵ + ⌧
R(P, Q)
Rinner(⌧,↵)
ε
δ
minP,Q
dTV(P2, Q2)
subject to dTV(P,Q) = τ
Rinner(τ, α) ⊆ R(P,Q)
R(P 2inner, Q
2inner) ⊆ R(P 2, Q2)
dTV(P2inner, Q
2inner)︸ ︷︷ ︸
τ
≤ dTV(P2, Q2)
Blackwell’s theorem
R(P,Q) ⊆ R(P ′, Q′)⇒ R(P 2, Q2) ⊆ R(P ′2, Q′2)
16 / 25
Lower bound
⌧
0
0.5
1
0 0.5 1
↵
Pinner(·)
Qinner(·)
↵
1 � ↵
1 � ↵� ⌧↵ + ⌧
R(P, Q)
Rinner(⌧,↵)
ε
δ
minP,Q
dTV(P2, Q2)
subject to dTV(P,Q) = τ
Rinner(τ, α) ⊆ R(P,Q)
R(P 2inner, Q
2inner) ⊆ R(P 2, Q2)
dTV(P2inner, Q
2inner)︸ ︷︷ ︸
τ
≤ dTV(P2, Q2)
Blackwell’s theorem
R(P,Q) ⊆ R(P ′, Q′)⇒ R(P 2, Q2) ⊆ R(P ′2, Q′2)
16 / 25
Lower bound
⌧
0
0.5
1
0 0.5 1
↵
Pinner(·)
Qinner(·)
↵
1 � ↵
1 � ↵� ⌧↵ + ⌧
R(P, Q)
Rinner(⌧,↵)
ε
δ
minP,Q
dTV(P2, Q2)
subject to dTV(P,Q) = τ
Rinner(τ, α) ⊆ R(P,Q)
R(P 2inner, Q
2inner) ⊆ R(P 2, Q2)
dTV(P2inner, Q
2inner)︸ ︷︷ ︸
τ
≤ dTV(P2, Q2)
Blackwell’s theorem
R(P,Q) ⊆ R(P ′, Q′)⇒ R(P 2, Q2) ⊆ R(P ′2, Q′2)
16 / 25
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
1 3 5 7 9 11
dTV(Pm, Qm)
m
maxP,Q
/minP,Q
dTV(P2, Q2)
subject to dTV(P,Q) = τ
17 / 25
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
1 3 5 7 9 11
dTV(Pm, Qm)
m
maxP,Q
dTV(P2, Q2)
subject to dTV(P,Q) = τ
no (ε0, δ0)-mode collapse
17 / 25
Upper bound without (ε0, δ0)-mode collapse
⌧
0
0.5
1
0 0.5 1
R(P, Q)
("0, �0)
ε
δ
maxP,Q
dTV(P2, Q2)
subject to dTV(P,Q) = τ
no (ε0, δ0)-mode collapse
R(P,Q) ⊆ Router(τ, ε0, δ0, α)
R(P 2, Q2) ⊆ R(P 2outer, Q
2outer)
dTV(P2, Q2) ≤ dTV(P
2outer, Q
2outer)︸ ︷︷ ︸
simple to evaluate
Blackwell’s theorem
R(P,Q) ⊆ R(P ′, Q′)⇒ R(P 2, Q2) ⊆ R(P ′2, Q′2)
18 / 25
Upper bound without (ε0, δ0)-mode collapse
⌧
0
0.5
1
0 0.5 1
R(P, Q)
("0, �0)
Router(⌧, "0, �0,↵)
↵
ε
δ
maxP,Q
dTV(P2, Q2)
subject to dTV(P,Q) = τ
no (ε0, δ0)-mode collapse
R(P,Q) ⊆ Router(τ, ε0, δ0, α)
R(P 2, Q2) ⊆ R(P 2outer, Q
2outer)
dTV(P2, Q2) ≤ dTV(P
2outer, Q
2outer)︸ ︷︷ ︸
simple to evaluate
Blackwell’s theorem
R(P,Q) ⊆ R(P ′, Q′)⇒ R(P 2, Q2) ⊆ R(P ′2, Q′2)
18 / 25
Upper bound without (ε0, δ0)-mode collapse
Qouter(·)
Pouter(·)
⌧
0
0.5
1
0 0.5 1
R(P, Q)
("0, �0)
Router(⌧, "0, �0,↵)
↵
↵1 � ↵� ⌧
⌧
↵
(1 � ↵� ⌧)(1 � ↵� �)
1 � ↵� ⌧ � "
ε
δ
maxP,Q
dTV(P2, Q2)
subject to dTV(P,Q) = τ
no (ε0, δ0)-mode collapse
R(P,Q) ⊆ Router(τ, ε0, δ0, α)
R(P 2, Q2) ⊆ R(P 2outer, Q
2outer)
dTV(P2, Q2) ≤ dTV(P
2outer, Q
2outer)︸ ︷︷ ︸
simple to evaluate
Blackwell’s theorem
R(P,Q) ⊆ R(P ′, Q′)⇒ R(P 2, Q2) ⊆ R(P ′2, Q′2)
18 / 25
Upper bound without (ε0, δ0)-mode collapse
Qouter(·)
Pouter(·)
⌧
0
0.5
1
0 0.5 1
R(P, Q)
("0, �0)
Router(⌧, "0, �0,↵)
↵
↵1 � ↵� ⌧
⌧
↵
(1 � ↵� ⌧)(1 � ↵� �)
1 � ↵� ⌧ � "
ε
δ
maxP,Q
dTV(P2, Q2)
subject to dTV(P,Q) = τ
no (ε0, δ0)-mode collapse
R(P,Q) ⊆ Router(τ, ε0, δ0, α)
R(P 2, Q2) ⊆ R(P 2outer, Q
2outer)
dTV(P2, Q2) ≤ dTV(P
2outer, Q
2outer)︸ ︷︷ ︸
simple to evaluate
Blackwell’s theorem
R(P,Q) ⊆ R(P ′, Q′)⇒ R(P 2, Q2) ⊆ R(P ′2, Q′2)
18 / 25
Upper bound without (ε0, δ0)-mode collapse
Qouter(·)
Pouter(·)
⌧
0
0.5
1
0 0.5 1
R(P, Q)
("0, �0)
Router(⌧, "0, �0,↵)
↵
↵1 � ↵� ⌧
⌧
↵
(1 � ↵� ⌧)(1 � ↵� �)
1 � ↵� ⌧ � "
ε
δ
maxP,Q
dTV(P2, Q2)
subject to dTV(P,Q) = τ
no (ε0, δ0)-mode collapse
R(P,Q) ⊆ Router(τ, ε0, δ0, α)
R(P 2, Q2) ⊆ R(P 2outer, Q
2outer)
dTV(P2, Q2) ≤ dTV(P
2outer, Q
2outer)︸ ︷︷ ︸
simple to evaluate
Blackwell’s theorem
R(P,Q) ⊆ R(P ′, Q′)⇒ R(P 2, Q2) ⊆ R(P ′2, Q′2)
18 / 25
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
1 3 5 7 9 11
dTV(P2, Q2)
m
minP,Q
dTV(P2, Q2)
subject to dTV(P,Q) = τ
(ε1, δ1)-mode collapse
19 / 25
Lower bound with (ε1, δ1)-mode collapse
⌧
0
0.5
1
0 0.5 1
↵
R(P, Q)
("1, �1)
ε
δ
minP,Q
dTV(P2, Q2)
subject to dTV(P,Q) = τ
(ε1, δ1)-mode collapse
Rinner(τ, α, ε1, δ1) ⊆ R(P,Q)
R(P 2inner, Q
2inner) ⊆ R(P 2, Q2)
dTV(P2inner, Q
2inner)︸ ︷︷ ︸
simple to evaluate
≤ dTV(P2, Q2)
Blackwell’s theorem
R(P,Q) ⊆ R(P ′, Q′)⇒ R(P 2, Q2) ⊆ R(P ′2, Q′2)
20 / 25
Lower bound with (ε1, δ1)-mode collapse
⌧
0
0.5
1
0 0.5 1
↵
R(P, Q)
("1, �1)
Rinner(⌧,↵, "1, �1)
ε
δ
minP,Q
dTV(P2, Q2)
subject to dTV(P,Q) = τ
(ε1, δ1)-mode collapse
Rinner(τ, α, ε1, δ1) ⊆ R(P,Q)
R(P 2inner, Q
2inner) ⊆ R(P 2, Q2)
dTV(P2inner, Q
2inner)︸ ︷︷ ︸
simple to evaluate
≤ dTV(P2, Q2)
Blackwell’s theorem
R(P,Q) ⊆ R(P ′, Q′)⇒ R(P 2, Q2) ⊆ R(P ′2, Q′2)
20 / 25
Lower bound with (ε1, δ1)-mode collapse
⌧
0
0.5
1
0 0.5 1
↵
R(P, Q)
("1, �1)
↵
↵ + ⌧
Pinner(·)
Qinner(·)
�11 � ↵� �1
"1
1 � ↵� ⌧ � "1
Rinner(⌧,↵, "1, �1)
ε
δ
minP,Q
dTV(P2, Q2)
subject to dTV(P,Q) = τ
(ε1, δ1)-mode collapse
Rinner(τ, α, ε1, δ1) ⊆ R(P,Q)
R(P 2inner, Q
2inner) ⊆ R(P 2, Q2)
dTV(P2inner, Q
2inner)︸ ︷︷ ︸
simple to evaluate
≤ dTV(P2, Q2)
Blackwell’s theorem
R(P,Q) ⊆ R(P ′, Q′)⇒ R(P 2, Q2) ⊆ R(P ′2, Q′2)
20 / 25
Lower bound with (ε1, δ1)-mode collapse
⌧
0
0.5
1
0 0.5 1
↵
R(P, Q)
("1, �1)
↵
↵ + ⌧
Pinner(·)
Qinner(·)
�11 � ↵� �1
"1
1 � ↵� ⌧ � "1
Rinner(⌧,↵, "1, �1)
ε
δ
minP,Q
dTV(P2, Q2)
subject to dTV(P,Q) = τ
(ε1, δ1)-mode collapse
Rinner(τ, α, ε1, δ1) ⊆ R(P,Q)
R(P 2inner, Q
2inner) ⊆ R(P 2, Q2)
dTV(P2inner, Q
2inner)︸ ︷︷ ︸
simple to evaluate
≤ dTV(P2, Q2)
Blackwell’s theorem
R(P,Q) ⊆ R(P ′, Q′)⇒ R(P 2, Q2) ⊆ R(P ′2, Q′2)
20 / 25
Lower bound with (ε1, δ1)-mode collapse
⌧
0
0.5
1
0 0.5 1
↵
R(P, Q)
("1, �1)
↵
↵ + ⌧
Pinner(·)
Qinner(·)
�11 � ↵� �1
"1
1 � ↵� ⌧ � "1
Rinner(⌧,↵, "1, �1)
ε
δ
minP,Q
dTV(P2, Q2)
subject to dTV(P,Q) = τ
(ε1, δ1)-mode collapse
Rinner(τ, α, ε1, δ1) ⊆ R(P,Q)
R(P 2inner, Q
2inner) ⊆ R(P 2, Q2)
dTV(P2inner, Q
2inner)︸ ︷︷ ︸
simple to evaluate
≤ dTV(P2, Q2)
Blackwell’s theorem
R(P,Q) ⊆ R(P ′, Q′)⇒ R(P 2, Q2) ⊆ R(P ′2, Q′2)
20 / 25
Achievable TV distances for distributions
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
1 3 5 7 9 11
with (ε1, δ1)-mode collapse
dTV(Pm, Qm)
m
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
1 3 5 7 9 11
without (ε0, δ0)-mode collapse
m
with packing, the discriminator naturally penalizes (P,Q) with severemode collapses
21 / 25
Does larger m help?
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
1 3 5 7 9 11
H1H0
overlap
m
dTV(Pm, Qm)
22 / 25
Evolution of TV distancesthrough the prism of packing
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
1 3 5 7 9 11
the size of packing m
Total variation
dTV(Pm, Qm)
strongmode collapse
weakmode collapse
~www�Through the prism packing, the target-generator pairs are expandedover the spectrum of (the strengths of) mode collapse
23 / 25
Our paper is available at:https://arxiv.org/abs/1712.04086
24 / 25
Packing is a principled solution to mode collapseI Lightweight overheadI Excellent numerical resultsI Theoretical analysis
Exciting time for bridging the theory and practice of GAN• “Generalization and Equilibrium in Generative Adversarial Nets”, SanjeevArora, Rong Ge, Yingyu Liang, Tengyu Ma, Yi Zhang• “Approximation and Convergence Properties of Generative AdversarialLearning”, Shuang Liu, Olivier Bousquet, Kamalika Chaudhuri• “Compressed Sensing using Generative Models”, Ashish Bora, Ajil Jalal,Eric Price, Alexandros G. Dimakis
• “Towards Understanding the Dynamics of Generative Adversarial
Networks”, Jerry Li, Aleksander Madry, John Peebles, Ludwig Schmidt
Giulia Fanti (CMU) Ashish Khetan (UIUC) Zinan Lin (CMU) 25 / 25