Deep Adversarial Gaussian Mixture Auto-Encoder …harchaoui.org/warith/dac_ppt.pdfOptimization Scheme Input Space Code Space Gaussian Clusters ( ˇ, , ) Encoder Decoder GMM Discriminator

Deep Adversarial

Gaussian Mixture

Auto-Encoder

for Clustering

Warith HARCHAOUI Pierre-Alexandre MATTEI

Charles BOUVEYRON

Université Paris Descartes � MAP5

Oscaro.com � Research & Development

February 2017

1/17

Clustering

Clustering is grouping similar objects together!

2/17

Thesis

Representation Learning and Clustering operate a symbiosis

3/17

Gaussian Mixture Model

I Density Estimation applied to Clustering for Kmodes/clusters

I Linear complexity suitable for Large Scale Problems

4/17

Learning Representations

I Successful in a supervised context (Kernel SVM)

I Successful in an unsupervised context (Spectral Clustering)

5/17

Auto-Encoder

An auto-encoder is a neural network that consists of:

I an Encoder: E : RD → Rd (compression)

I a Decoder: D : Rd → RD (decompression)

D >> d

D(E(x)) ' x

6/17

Optimization Scheme

Input Space

Code Space

Gaussian Clusters (π, µ, Σ)

Encoder Decoder

GMM Discriminator

Figure: Global Optimization Scheme for DAC

7/17

Adversarial Auto-Encoder

An adversarial auto-encoder is a neural network that consists of:

I an Encoder: E : RD → Rd (compression)

I a Decoder: D : Rd → RD (decompression)

I a Prior: P : Rd → R and∫Rd P = 1 associated with a random

generator of distribution PI a Discriminator: A : Rd → [0, 1] ⊂ R that distinguishes fake

data from the random generator and real data from the

encoder

8/17

Optimizations

3 lines objectives:

I The encoder and decoder try to minimize the reconstruction

loss

I The discriminator tries to distinguish fake codes (from the

random generator associated with the prior) and real codes

(from the encoder)

I The encoder also tries to fool the discriminator (opposite

discriminator loss function)

9/17

Results

Datasets MNIST-70k Reuters-10k HHAR

DAC EC (Ensemble Clustering) 96.50 73.34 81.24DAC 94.08 72.14 80.5

GMVAE 88.54 - -

DEC 84.30 72.17 79.86

AE + GMM (full covariances, median accuracy over 10 runs) 82.56 70.12 78.48

GMM 53.73 54.72 60.34

KM 53.47 54.04 59.98

Table: Experimental accuracy results (%, the higher, the better) based onthe Hungarian method

10/17

Visualizations

11

3

2

23

7

3

2

5

6

6710

0

2

3

1

0

1

0

1

7486

0

59

69

90

20

22

22

84

6863

173

68

127

235

17

2

94

0

6819

42

8

22

318

13

80

7

7

6747

2

13

88

1

44

170

2

124

6104

4

125

3

2

43

3

17

0

6678

39

11

1

3

8

49

104

15

7025

0

1

17

24

29

25

2

62

6281

15

21

29

1

81

27

80

8

6230

20

59

0

10

18

3

4

1

0

11

3

2

23

7

3

2

5

6

6710

0

2

3

1

0

1

0

1

7486

0

59

69

90

20

22

22

84

6863

173

68

127

235

17

2

94

0

6819

42

8

22

318

13

80

7

7

6747

2

13

88

1

44

170

2

124

6104

4

125

3

2

43

3

17

0

6678

39

11

1

3

8

49

104

15

7025

0

1

17

24

29

25

2

62

6281

15

21

29

1

81

27

80

8

6230

20

59

0

10

18

3

4

1

0

9

8

7

6

5

4

3

2

1

0

0 1 2 3 4 5 6 7 8 9

Predicted class

Actu

al cla

ss

Figure: Confusion matrix for DAC on MNIST. (best seen in color)

11/17

Visualizations

µk

µk + 0.5σ

µk + 1σ

µk + 1.5σ

µk + 2σ

µk + 2.5σ

µk + 3σ

µk + 3.5σ

Figure: Generated digits images. From left to right, we have the tenclasses found by DAC and ordered thanks to the Hungarian algorithm.From top to bottom, we go further and further in random directions fromthe centroids (the �rst row being the decoded centroids).

12/17

Visualizations

Figure: Principal Component Analysis rendering of the code space forMNIST at the end of the DAC optimization, with colors indicating thetrue labels. (best seen in color)

13/17

Conclusion

Representation Learning and Clustering operate a symbiosis

14/17

References I

Christopher M.. Bishop.

Pattern recognition and machine learning.

Springer, 2006.

Ian Goodfellow, Yoshua Bengio, and Aaron Courville.

Deep Learning.

MIT Press, 2016.

http://www.deeplearningbook.org.

Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu,

David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua

Bengio.

Generative adversarial nets.

In Advances in Neural Information Processing Systems, pages

2672�2680, 2014.

15/17

http://www.deeplearningbook.org

References II

Alireza Makhzani, Jonathon Shlens, Navdeep Jaitly, and Ian

Goodfellow.

Adversarial autoencoders.

arXiv preprint arXiv:1511.05644, 2015.

Pascal Vincent, Hugo Larochelle, Isabelle Lajoie, Yoshua

Bengio, and Pierre-Antoine Manzagol.

Stacked denoising autoencoders: Learning useful

representations in a deep network with a local denoising

criterion.

Journal of Machine Learning Research, 11(Dec):3371�3408,

2010.

16/17

Documents

Deep Adversarial Gaussian Mixture Auto-Encoder …harchaoui.org/warith/dac_ppt.pdfOptimization Scheme Input Space Code Space Gaussian Clusters ( ˇ, , ) Encoder Decoder GMM Discriminator