Upload
others
View
2
Download
0
Embed Size (px)
Citation preview
The power of two samplesin generative adversarial networks
Sewoong Oh
Department of Industrial and Enterprise Systems EngineeringUniversity of Illinois at Urbana-Champaign
1 / 21
Generative models learn fundamental representations
= Zglasses
G�1( )
G�1( ) + Zglasses
Z space Image space
G
[DCGAN, Radford et al. 2015] 2 / 21
Generative Adversarial Networks (GAN)
Z
G(Z)
XD(X)real
1fake
0real
1real
1
Real data
Generator G(Z)
Discriminator D(X)
minG
maxD
V (G,D)
3 / 21
Challenges in training GAN
1. Instability: non-convergence of training loss
2. Evaluation: likelihood is not available
3. Mode collapse: loss of diversity
4 / 21
“Mode collapse” is a main challenge
Target samples Generated samples
5 / 21
“Mode collapse” is a main challenge
Target samples Generated samples
5 / 21
“Mode collapse” is a main challenge
“A man in a orange jacket with sunglasses and a hat ski down a hill.”
“This guy is in black trunks and swimming underwater.”
“A tennis player in a blue polo shirt is looking down at the greencourt.”
[“Generating interpretable images with controllable structure”, by Reed et al., 2016]6 / 21
Lack of diversity is easier to detectif the discriminator sees multiple sample jointly
”Improved Techniques for Training GANs”, Salimans, Goodfellow, Zaremba,Cheung, Radford, Chen, 2016
”Progressive Growing of GANs for Improved Quality, Stability, and Variation”,Karras, Aila, Laine, Lehtinen, 2017
”Distributional Adversarial Networks”, Li, Alvarez-Melis, Xu, Jegelka, Sra, 2017
7 / 21
New framework: PacGANlightweight overheadexperimental resultsprincipled
Z
G(Z)
X real 1
fake 0
real 1
real 1
⇥2
D(X1, X2)
8 / 21
Benchmark tests
Target GAN PacGAN2
Modes(Max 25)
GAN 17.3PacGAN2 23.8PacGAN3 24.6PacGAN4 24.8
9 / 21
Benchmark datasets from VeeGAN paperReal data DCGAN PacDCGAN2
Modes (Max 1000)
DCGAN 99.0ALI 16.0Unrolled GAN 48.7VeeGAN 150.0
PacDCGAN2 1000.0PacDCGAN3 1000.0PacDCGAN4 1000.0
10 / 21
Intuition behind packing via toy example
Generator Q1
with mode collapse
Generator Q2
without mode collapse
Target distribution P
1
1
P
1.25
0.2
1
1
Q1
1
1
0.6
1.4
0.5
Q2
dTV(P, Q1) = 0.2 dTV(P, Q2) = 0.2
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
1 2 3 4 5 6
dTV (P,Q1) = dTV (P,Q2) = 0.2
size of packing
11 / 21
Intuition behind packing via toy example
Generator Q1
with mode collapse
Generator Q2
without mode collapse
Target distribution P
1
0.2 1 10.5
0.2
1
1
dTV(P ⇥ P, Q1 ⇥ Q1) = 0.36 dTV(P ⇥ P, Q2 ⇥ Q2) = 0.24
P ⇥ P
Q1 ⇥ Q1 Q2 ⇥ Q2
0.62
1.252
1.42
1.4 ⇥ 0.6
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
1 2 3 4 5 6
dTV(P 2, Q21)
dTV(P 2, Q22)
11 / 21
Intuition behind packing via toy example
Generator Q1
with mode collapse
Generator Q2
without mode collapse
Target distribution P
1
0.2 1 10.5
0.2
1
1
dTV(P ⇥ P, Q1 ⇥ Q1) = 0.36 dTV(P ⇥ P, Q2 ⇥ Q2) = 0.24
P ⇥ P
Q1 ⇥ Q1 Q2 ⇥ Q2
0.62
1.252
1.42
1.4 ⇥ 0.6
m 0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
1 2 3 4 5 6
dTV(Pm, Qm2 )
dTV(Pm, Qm1 )
11 / 21
Evolution of TV distances
through the prism of packing
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
1 3 5 7 9 11
the size of packing m
Total variation
dTV(Pm, Qm)
Through packing, the target-generator pairs are expandedover the strengths of the mode collapse
12 / 21
Evolution of TV distancesthrough the prism of packing
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
1 3 5 7 9 11
the size of packing m
Total variation
dTV(Pm, Qm)
strongmode collapse
weakmode collapse
~www�Through packing, the target-generator pairs are expandedover the strengths of the mode collapse
12 / 21
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
1 3 5 7 9 11
dTV(Pm, Qm)
degree of packing m
maxP,Q
/minP,Q
dTV(P2, Q2)
subject to dTV(P,Q) = τ
we focus on m = 2 for this talk
this is easy, but we have a new proof technique
nothing to do with mode collapse, but we use it as proof technique
13 / 21
Intuition from Blackwell
Definition [mode collapse region]
We say a pair (P,Q) of a target distribution P and a generatordistribution Q has (ε, δ)-mode collapse if there exists a set S such that
P (S) ≥ δ , and Q(S) ≤ ε .
Generator Q1
with mode collapse
Target distribution P
1
1
P
1.25
0.2
1
1
Q1
0
0.5
1
0 0.5 1
R(P, Q1)
"
�
14 / 21
Intuition from Blackwell
Definition [mode collapse region]
We say a pair (P,Q) of a target distribution P and a generatordistribution Q has (ε, δ)-mode collapse if there exists a set S such that
P (S) ≥ δ , and Q(S) ≤ ε .
Generator Q1
with mode collapse
Target distribution P
1
1
P
1.25
0.2
1
1
Q1
0
0.5
1
0 0.5 1
R(P, Q1)
"
�
14 / 21
Intuition from Blackwell
Definition [mode collapse region]
We say a pair (P,Q) of a target distribution P and a generatordistribution Q has (ε, δ)-mode collapse if there exists a set S such that
P (S) ≥ δ , and Q(S) ≤ ε .
Generator Q1
with mode collapse
Target distribution P
1
1
P
1.25
0.2
1
1
Q1
0
0.5
1
0 0.5 1
R(P, Q1)
"
�
?
�
"
14 / 21
Intuition from Blackwell
Definition [mode collapse region]
We say a pair (P,Q) of a target distribution P and a generatordistribution Q has (ε, δ)-mode collapse if there exists a set S such that
P (S) ≥ δ , and Q(S) ≤ ε .
Generator Q1
with mode collapse
Target distribution P
1
1
P
1.25
0.2
1
1
Q1
0
0.5
1
0 0.5 1
R(P, Q1)
"
�
?
�
"
14 / 21
Intuition from Blackwell
Definition [mode collapse region]
We say a pair (P,Q) of a target distribution P and a generatordistribution Q has (ε, δ)-mode collapse if there exists a set S such that
P (S) ≥ δ , and Q(S) ≤ ε .
Generator Q1
with mode collapse
Target distribution P
1
1
P
1.25
0.2
1
1
Q1
0
0.5
1
0 0.5 1
R(P, Q1)
"
�
?�
"
14 / 21
Intuition from Blackwell
Definition [mode collapse region]
We say a pair (P,Q) of a target distribution P and a generatordistribution Q has (ε, δ)-mode collapse if there exists a set S such that
P (S) ≥ δ , and Q(S) ≤ ε .
Generator Q1
with mode collapse
Target distribution P
1
1
P
1.25
0.2
1
1
Q1
0
0.5
1
0 0.5 1
R(P, Q1)
"
�?�
"
14 / 21
Intuition from Blackwell
Definition [mode collapse region]
We say a pair (P,Q) of a target distribution P and a generatordistribution Q has (ε, δ)-mode collapse if there exists a set S such that
P (S) ≥ δ , and Q(S) ≤ ε .
Generator Q1
with mode collapse
Target distribution P
1
1
P
1.25
0.2
1
1
Q1
0
0.5
1
0 0.5 1
R(P, Q1)
"
�
?
�
"
14 / 21
Intuition from Blackwell
Definition [mode collapse region]
We say a pair (P,Q) of a target distribution P and a generatordistribution Q has (ε, δ)-mode collapse if there exists a set S such that
P (S) ≥ δ , and Q(S) ≤ ε .
Generator Q1
with mode collapse
Target distribution P
1
1
P
1.25
0.2
1
1
Q1
0
0.5
1
0 0.5 1
R(P, Q1)
"
�
?
�
"
14 / 21
Intuition from Blackwell
Definition [mode collapse region]
We say a pair (P,Q) of a target distribution P and a generatordistribution Q has (ε, δ)-mode collapse if there exists a set S such that
P (S) ≥ δ , and Q(S) ≤ ε .
Target distribution P
1
1
P
"
�
Generator Q2
without mode collapse
1
1
0.6
1.4
0.5
Q2
0
0.5
1
0 0.5 1
R(P, Q2)
14 / 21
Intuition from Blackwell
Definition [mode collapse region]
We say a pair (P,Q) of a target distribution P and a generatordistribution Q has (ε, δ)-mode collapse if there exists a set S such that
P (S) ≥ δ , and Q(S) ≤ ε .
Target distribution P
1
1
P
"
�
Generator Q2
without mode collapse
1
1
0.6
1.4
0.5
Q2
0
0.5
1
0 0.5 1
R(P, Q2)
dTV(P, Q2) = 0.2
14 / 21
Upper bound
⌧
0
0.5
1
0 0.5 1
R(P, Q)
ε
δ
maxP,Q
dTV(P2, Q2)
subject to dTV(P,Q) = τ
R(P,Q) ⊆ Router(τ)
R(P 2, Q2) ⊆ R(P 2outer, Q
2outer)
dTV(P2, Q2) ≤ dTV(P
2outer, Q
2outer)︸ ︷︷ ︸
1−(1−τ)2
Blackwell’s theorem
R(P,Q) ⊆ R(P ′, Q′)⇒ R(P 2, Q2) ⊆ R(P ′2, Q′2)
15 / 21
Upper bound
⌧
0
0.5
1
0 0.5 1
R(P, Q)
Router(⌧)
ε
δ
maxP,Q
dTV(P2, Q2)
subject to dTV(P,Q) = τ
R(P,Q) ⊆ Router(τ)
R(P 2, Q2) ⊆ R(P 2outer, Q
2outer)
dTV(P2, Q2) ≤ dTV(P
2outer, Q
2outer)︸ ︷︷ ︸
1−(1−τ)2
Blackwell’s theorem
R(P,Q) ⊆ R(P ′, Q′)⇒ R(P 2, Q2) ⊆ R(P ′2, Q′2)
15 / 21
Upper bound
⌧
0
0.5
1
0 0.5 1
R(P, Q)
Router(⌧)
1 � ⌧
1 � ⌧
⌧
⌧
0
0
Qouter(·)
Pouter(·)
ε
δ
maxP,Q
dTV(P2, Q2)
subject to dTV(P,Q) = τ
R(P,Q) ⊆ Router(τ)
R(P 2, Q2) ⊆ R(P 2outer, Q
2outer)
dTV(P2, Q2) ≤ dTV(P
2outer, Q
2outer)︸ ︷︷ ︸
1−(1−τ)2
Blackwell’s theorem
R(P,Q) ⊆ R(P ′, Q′)⇒ R(P 2, Q2) ⊆ R(P ′2, Q′2)
15 / 21
Upper bound
⌧
0
0.5
1
0 0.5 1
R(P, Q)
Router(⌧)
1 � ⌧
1 � ⌧
⌧
⌧
0
0
Qouter(·)
Pouter(·)
ε
δ
maxP,Q
dTV(P2, Q2)
subject to dTV(P,Q) = τ
R(P,Q) ⊆ Router(τ)
R(P 2, Q2) ⊆ R(P 2outer, Q
2outer)
dTV(P2, Q2) ≤ dTV(P
2outer, Q
2outer)︸ ︷︷ ︸
1−(1−τ)2
Blackwell’s theorem
R(P,Q) ⊆ R(P ′, Q′)⇒ R(P 2, Q2) ⊆ R(P ′2, Q′2)
15 / 21
Upper bound
⌧
0
0.5
1
0 0.5 1
R(P, Q)
Router(⌧)
1 � ⌧
1 � ⌧
⌧
⌧
0
0
Qouter(·)
Pouter(·)
ε
δ
maxP,Q
dTV(P2, Q2)
subject to dTV(P,Q) = τ
R(P,Q) ⊆ Router(τ)
R(P 2, Q2) ⊆ R(P 2outer, Q
2outer)
dTV(P2, Q2) ≤ dTV(P
2outer, Q
2outer)︸ ︷︷ ︸
1−(1−τ)2
Blackwell’s theorem
R(P,Q) ⊆ R(P ′, Q′)⇒ R(P 2, Q2) ⊆ R(P ′2, Q′2)
15 / 21
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
1 3 5 7 9 11
dTV(Pm, Qm)
degree of packing m
maxP,Q
/minP,Q
dTV(P2, Q2)
subject to dTV(P,Q) = τ
16 / 21
PacGAN naturally penalizes mode collapse
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
1 3 5 7 9 11
without (ε0, δ0)-mode collapse
degree of packing m
dTV(Pm, Qm)
maxP,Q
dTV(P2, Q2)
s.t. dTV(P,Q) = τ
no (ε0, δ0)-mode collapse
17 / 21
PacGAN naturally penalizes mode collapse
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
1 3 5 7 9 11
without (ε0, δ0)-mode collapse
degree of packing m
dTV(Pm, Qm)
maxP,Q
dTV(P2, Q2)
s.t. dTV(P,Q) = τ
no (ε0, δ0)-mode collapse
⌧
0
0.5
1
0 0.5 1
R(P, Q)
("0, �0)
Router(⌧, "0, �0,↵)
↵
17 / 21
PacGAN naturally penalizes mode collapse
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
1 3 5 7 9 11
without (ε0, δ0)-mode collapse
degree of packing m
dTV(Pm, Qm)
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
1 3 5 7 9 11
with (ε1, δ1)-mode collapse
degree of packing m
maxP,Q
dTV(P2, Q2)
s.t. dTV(P,Q) = τ
no (ε0, δ0)-mode collapse
minP,Q
dTV(P2, Q2)
s.t. dTV(P,Q) = τ
(ε1, δ1)-mode collapse
18 / 21
Size of the discriminator
0
100
200
300
400
500
600
700
800
900
1000
0 500000 1x106 1.5x10
6 2x10
6 2.5x10
6 3x10
6 3.5x10
6 4x10
6 4.5x10
6
PacGAN4
PacGAN3
PacGAN2
DCGAN
# modes captured
# of parameters in D(·)
# modes captured
# of parameters in D(·)Mini-batch discrimination requires +38,748,557, PacGAN2 requires +54
19 / 21
0-1 loss (Total Variation) vs.cross entropy loss (Jensen-Shannon Divergence)
0.1
0.15
0.2
0.25
0.3
0.35
0.4
TV (P,Q) JS(P,Q)
strong mode collapse
weak mode collapse 0.15
0.2
0.25
0.3
0.35
0.4
TV (P,Q) JS(P,Q)
strong mode collapse
weak mode collapse
Jensen-Shannon is better measurefor detecting mode collapse
20 / 21
Our paper is available at:https://arxiv.org/abs/1712.04086
All codes for the experiments at:https://github.com/fjxmlzn/PacGAN
Zinan Lin (CMU) Ashish Khetan (UIUC) Giulia Fanti (CMU)
21 / 21