23
GeneGAN Learning Object Transfiguration and Attribute Subspace from Unpaired Data Shuchang Zhou, Taihong Xiao, Yi Yang, Dieqiao Feng, Qinyao He, Weiran He {zsc, xiaotaihong, yangyi}@megvii.com, {glassices, he.qinyao.me}@gmail.com, [email protected] Sep. 2017

Subspace from Unpaired Data Learning Object ... · representation using adversarial training. NIPS 2016: 5041-5049 5. Xi Chen, Xi Chen, Yan Duan, Rein Houthooft, John Schulman, Ilya

  • Upload
    others

  • View
    8

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Subspace from Unpaired Data Learning Object ... · representation using adversarial training. NIPS 2016: 5041-5049 5. Xi Chen, Xi Chen, Yan Duan, Rein Houthooft, John Schulman, Ilya

GeneGANLearning Object Transfiguration and Attribute

Subspace from Unpaired Data

Shuchang Zhou, Taihong Xiao, Yi Yang, Dieqiao Feng, Qinyao He, Weiran Hezsc, xiaotaihong, [email protected],

glassices, [email protected], [email protected]. 2017

Page 2: Subspace from Unpaired Data Learning Object ... · representation using adversarial training. NIPS 2016: 5041-5049 5. Xi Chen, Xi Chen, Yan Duan, Rein Houthooft, John Schulman, Ilya

Conditional Image Generation Applications: Image Editing, Training Data Synthesis Photo-realistic modeling and rendering are difficult.

SURREAL dataset (CVPR‘17)3DMM-family methodshttp://cn.arxiv.org/pdf/1612.04904.pdf

Model Transform RenderImage Generated Image

Page 3: Subspace from Unpaired Data Learning Object ... · representation using adversarial training. NIPS 2016: 5041-5049 5. Xi Chen, Xi Chen, Yan Duan, Rein Houthooft, John Schulman, Ilya

Feature Space Transformation

Φ-1(Φ(A)) Φ-1(Φ(A)+½ v) Φ-1(Φ(A)+ v)Deep Feature Interpolation (2016)

Image Space

Feature Space

Φ Φ-1

Φ(A)

Φ(A)+v

A Φ-1(Φ(A)+v)

Transformation vector v as difference between feature cluster centers. Diversity limited by the number of clusters.

Project Transform Project back

Image Generated Image

Page 4: Subspace from Unpaired Data Learning Object ... · representation using adversarial training. NIPS 2016: 5041-5049 5. Xi Chen, Xi Chen, Yan Duan, Rein Houthooft, John Schulman, Ilya

Generation by Exemplars with Paired Training Data Using a pair of image for specifying the transformation

Increase diversity. But paired training data are hard to collect.

Deep Visual Analogy-Making, NIPS'15 Disentangling Factors of Variation, NIPS'16X1 and X1' are required to have the same label, i.e., s1 == s1'.

X1 X1' X2'

X2

PairedPaired

Page 5: Subspace from Unpaired Data Learning Object ... · representation using adversarial training. NIPS 2016: 5041-5049 5. Xi Chen, Xi Chen, Yan Duan, Rein Houthooft, John Schulman, Ilya

Feature Space Interpolation MethodsGeneration by Exemplar

Unpaired Training data

Exploits Cyclic Loss

Deep Feature Interpolation

InfoGAN

Visual Analogy-Making

Disentangling Factors of Variation

CycleGAN

GeneGAN

Page 6: Subspace from Unpaired Data Learning Object ... · representation using adversarial training. NIPS 2016: 5041-5049 5. Xi Chen, Xi Chen, Yan Duan, Rein Houthooft, John Schulman, Ilya

GeneGAN Training Data A positive set and a negative set

need not be paired

Glasses Hair Lighting Smiling

Positive Eyeglass/sun glasses

Bangs Side/Up/Down Smiling

Negative No glasses Bald/Receding Hairline

Frontal lighting Not smiling

Page 7: Subspace from Unpaired Data Learning Object ... · representation using adversarial training. NIPS 2016: 5041-5049 5. Xi Chen, Xi Chen, Yan Duan, Rein Houthooft, John Schulman, Ilya

GeneGAN components: Encoder and Decoder Encoder: disentangle the object (smiling) from the background

(face). Object can be abstract. Decoder: inverse of Encoder

A

uAu Au

Encoder Decoder

B

εBε Bε

Encoder Decoder

Have the Object (positive)

Have not the Object (negative)

Background

Object

Page 8: Subspace from Unpaired Data Learning Object ... · representation using adversarial training. NIPS 2016: 5041-5049 5. Xi Chen, Xi Chen, Yan Duan, Rein Houthooft, John Schulman, Ilya

GeneGAN Usage: Object Removal

Background vector "A"

"smiling" Object "u"

Au Aε

Encoder Decoder

Have the Object ("smiling")

εHave not the Object ("non-smiling")

Page 9: Subspace from Unpaired Data Learning Object ... · representation using adversarial training. NIPS 2016: 5041-5049 5. Xi Chen, Xi Chen, Yan Duan, Rein Houthooft, John Schulman, Ilya

GeneGAN Usage: Swapping Objects

A

u

Au Av

Encoder Decoder

v

B

Bv Bu

Encoder Decoder

Page 10: Subspace from Unpaired Data Learning Object ... · representation using adversarial training. NIPS 2016: 5041-5049 5. Xi Chen, Xi Chen, Yan Duan, Rein Houthooft, John Schulman, Ilya

How much smile?

How much smile?

Image A

Image B Reconstructed B

Au

Bε Reconstructed B

Underdetermined CycleGAN pattern

Information Preserving GeneGAN pattern

Smiling from A

Smiling from AAε

Bu

Page 11: Subspace from Unpaired Data Learning Object ... · representation using adversarial training. NIPS 2016: 5041-5049 5. Xi Chen, Xi Chen, Yan Duan, Rein Houthooft, John Schulman, Ilya

Shorten the Cycle to help Training Lift the grandchildren to be children

Au

Au

Bu

A

u

ε

B

Only one "generation", less distortions.

Au

Bε Bε

Au

Two generations

Page 12: Subspace from Unpaired Data Learning Object ... · representation using adversarial training. NIPS 2016: 5041-5049 5. Xi Chen, Xi Chen, Yan Duan, Rein Houthooft, John Schulman, Ilya

Au

Au

Bu

A

u

ε

B

DiscriminatorLoss

DiscriminatorLoss

Reconstruction Loss

Reconstruction Loss

Recombination in Disentangled Feature Space Nulling Loss ||ε||

GeneGAN Training Diagram

Page 13: Subspace from Unpaired Data Learning Object ... · representation using adversarial training. NIPS 2016: 5041-5049 5. Xi Chen, Xi Chen, Yan Duan, Rein Houthooft, John Schulman, Ilya

Mechanism

Constraints Discriminator loss used in Adversarial Training

The background output of encoder will not contain smiling information, as "Bε" is not smiling

"u" contains the smiling information. As "Bu" is smiling. Nulling loss

the object output of encoder will not contain background information, as "ε" can replace it without problem.

Reconstruction loss Decoder and Encoder are inverse to each other "A" contains background information, as Decoder can recreate "Au"

from "A" and "u"

Page 14: Subspace from Unpaired Data Learning Object ... · representation using adversarial training. NIPS 2016: 5041-5049 5. Xi Chen, Xi Chen, Yan Duan, Rein Houthooft, John Schulman, Ilya

Experiments: Diversity from ExemplarsExemplar objects

Novel instancesExemplar backgrounds

The same sunglasses

Page 15: Subspace from Unpaired Data Learning Object ... · representation using adversarial training. NIPS 2016: 5041-5049 5. Xi Chen, Xi Chen, Yan Duan, Rein Houthooft, John Schulman, Ilya

Swapping Attributes: Diversity of Smiles

Can tell a smile by the mouth, and sometimes by eyes.

Page 16: Subspace from Unpaired Data Learning Object ... · representation using adversarial training. NIPS 2016: 5041-5049 5. Xi Chen, Xi Chen, Yan Duan, Rein Houthooft, John Schulman, Ilya

Object Subspace Multidimensional representation of hair

Page 17: Subspace from Unpaired Data Learning Object ... · representation using adversarial training. NIPS 2016: 5041-5049 5. Xi Chen, Xi Chen, Yan Duan, Rein Houthooft, John Schulman, Ilya

Interpolation in Object SubspaceCheck the directions of the hairs.

Bi-linearly interpolated

ε instance

Page 18: Subspace from Unpaired Data Learning Object ... · representation using adversarial training. NIPS 2016: 5041-5049 5. Xi Chen, Xi Chen, Yan Duan, Rein Houthooft, John Schulman, Ilya

Conclusion & Future Work Disentangle the factors in feature space

Feature space = object space + background space

Only require unpaired training data Two unpaired image: positive and negative

Usage cases For single input, can output disentangled object code and background code. For two inputs that both contain objects, can swap the objects in them. The objects can be

null. Can interpolate the objects in feature space.

Futre work Investigate whether more complex crossbreeding patterns between more parents would allow

further improvements

Page 19: Subspace from Unpaired Data Learning Object ... · representation using adversarial training. NIPS 2016: 5041-5049 5. Xi Chen, Xi Chen, Yan Duan, Rein Houthooft, John Schulman, Ilya

References1. Yoshua Bengio, Grégoire Mesnil, Yann Dauphin, and Salah Rifai. Better mixing via deep representations. In Proceedings of the 30th

International Conference on Machine Learning, ICML 2013, Atlanta, GA, USA, 16-21 June 2013, pages 552–560, 2013. URL http://jmlr.org/proceedings/papers/v28/bengio13.html.

2. Scott E. Reed, Yi Zhang, Yuting Zhang, and Honglak Lee. Deep visual analogy- making. In Advances in Neural Information Processing Systems 28: Annual Con- ference on Neural Information Processing Systems 2015, December 7-12, 2015, Mon- treal, Quebec, Canada, pages 1252–1260, 2015. URL http://papers.nips. cc/paper/5845-deep-visual-analogy-making.

3. Paul Upchurch, Jacob R. Gardner, Kavita Bala, Robert Pless, Noah Snavely, and Kil- ian Q. Weinberger. Deep feature interpolation for image content changes. CoRR, abs/1611.05507, 2016. URL http://arxiv.org/abs/1611.05507.

4. Michaël Mathieu, Junbo Jake Zhao, Pablo Sprechmann, Aditya Ramesh, Yann LeCun: Disentangling factors of variation in deep representation using adversarial training. NIPS 2016: 5041-5049

5. Xi Chen, Xi Chen, Yan Duan, Rein Houthooft, John Schulman, Ilya Sutskever, Pieter Abbeel: InfoGAN: Interpretable Representation Learning by Information Maximizing Generative Adversarial Nets. NIPS 2016: 2172-2180

6. Gül Varol, Javier Romero, Xavier Martin, Naureen Mahmood, Michael J. Black, Ivan Laptev, Cordelia Schmid: Learning from Synthetic Humans. CoRR abs/1701.01370 (2017)

7. Jun-Yan Zhu, Taesung Park, Phillip Isola, and Alexei A. Efros. Unpaired image-to- image translation using cycle-consistent adversarial networks. CoRR, abs/1703.10593, 2017. URL http://arxiv.org/abs/1703.10593.

More in the paper.

Page 20: Subspace from Unpaired Data Learning Object ... · representation using adversarial training. NIPS 2016: 5041-5049 5. Xi Chen, Xi Chen, Yan Duan, Rein Houthooft, John Schulman, Ilya

Backup after this slide

Github: https://github.com/Prinsphield/GeneGAN

Page 21: Subspace from Unpaired Data Learning Object ... · representation using adversarial training. NIPS 2016: 5041-5049 5. Xi Chen, Xi Chen, Yan Duan, Rein Houthooft, John Schulman, Ilya

CycleGAN/DiscoGAN and Object Transfiguration

Pros Learn from Unpaired Data Exploits Cyclic loss to

stabilize training Cons

Backgrounds change when transforming objects

Under-determination problem non-smiling is well

defined. But smiling's have different levels and styles. Cyclic loss

Page 22: Subspace from Unpaired Data Learning Object ... · representation using adversarial training. NIPS 2016: 5041-5049 5. Xi Chen, Xi Chen, Yan Duan, Rein Houthooft, John Schulman, Ilya

Underdetermination Problem

How much smile?

How much smile?

GXY

GXYGYX

GYX

Assumed non-smiling faceParent Image A

Parent Image B Reconstructed Image B

Page 23: Subspace from Unpaired Data Learning Object ... · representation using adversarial training. NIPS 2016: 5041-5049 5. Xi Chen, Xi Chen, Yan Duan, Rein Houthooft, John Schulman, Ilya

Parallelogram loss

Recombination Recombination

Parallelogram LossL = ||xAu + xBε - xBu - xAε ||

Au

Bu

Induces Bu to have sunglasses.