80
Deep Style TJ Torres Data Scientist, Stitch Fix PyData NYC 2015 Using Variational Auto-encoders for Image Generation

Deep Style: Using Variational Auto-encoders for Image Generation

Embed Size (px)

Citation preview

Page 1: Deep Style: Using Variational Auto-encoders for Image Generation

Deep Style

TJ Torres Data Scientist, Stitch Fix

PyData NYC 2015

Using Variational Auto-encoders for Image Generation

Page 2: Deep Style: Using Variational Auto-encoders for Image Generation

Data Labs

Page 3: Deep Style: Using Variational Auto-encoders for Image Generation

Data Labs

Page 4: Deep Style: Using Variational Auto-encoders for Image Generation

Data Labs

Page 5: Deep Style: Using Variational Auto-encoders for Image Generation

Data Labs

Page 6: Deep Style: Using Variational Auto-encoders for Image Generation

MOTIVATIONOur goal at Stitch Fix

Total Inventory

Recommendation Algo

Stylists

Filtered Items

1 2 3 4 5

Final Items Sent

Page 7: Deep Style: Using Variational Auto-encoders for Image Generation

COLD START PROBLEM

New Clients New Clothing

Page 8: Deep Style: Using Variational Auto-encoders for Image Generation

New Clients New Clothing

1. Get new clothing.

2. Get new clients.

3. ????????

4. PROFIT!!!

COLD START PROBLEM

Page 9: Deep Style: Using Variational Auto-encoders for Image Generation

New Clients New Clothing

1. Get new clothing.

2. Get new clients.

3. ????????

4. PROFIT!!!

Preemptive Modeling

COLD START PROBLEM

Page 10: Deep Style: Using Variational Auto-encoders for Image Generation

TURN TO IMAGES

• Style/fashion is primarily visual.

• We wish to use images for modeling purposes.

• Heuristics for how we process image data

unknown or quite complex.

• We don’t want to have to develop image

features.

• Turn to deep learning to learn the feature

extraction.

Page 11: Deep Style: Using Variational Auto-encoders for Image Generation

OUTLINE

1. Introduction to NNs 2. Unsupervised Deep Learning 3. Getting started with Chainer 4. Training a simple model.

Page 12: Deep Style: Using Variational Auto-encoders for Image Generation

1. Introduction to NNs 2. Unsupervised Deep Learning 3. Getting started with Chainer 4. Training a simple model.

OUTLINE

Page 13: Deep Style: Using Variational Auto-encoders for Image Generation

1. Introduction to NNs 2. Unsupervised Deep Learning 3. Getting started with Chainer 4. Training a simple model.

5. Open source package! 6. Conclusions/Future (current) Directions

OUTLINE

Page 14: Deep Style: Using Variational Auto-encoders for Image Generation

NEURAL NETWORKS

http://www.wired.com/2013/02/three-awesome-tools-scientists-may-use-to-map-your-brain-in-the-future/

Page 15: Deep Style: Using Variational Auto-encoders for Image Generation

http://googleresearch.blogspot.com/2015/06/inceptionism-going-deeper-into-neural.html

Page 16: Deep Style: Using Variational Auto-encoders for Image Generation

WhoaDude!

http://googleresearch.blogspot.com/2015/06/inceptionism-going-deeper-into-neural.html

Page 17: Deep Style: Using Variational Auto-encoders for Image Generation

http://arxiv.org/pdf/1502.04623v2.pdf

Page 18: Deep Style: Using Variational Auto-encoders for Image Generation

Begin with input:

INTRO TO NEURAL NETS1 2 3 4 5 6

Page 19: Deep Style: Using Variational Auto-encoders for Image Generation

Begin with input: 1 2 3 4 layer 1 (Input)

5 6

layer 2

f

(l)i (x) = tanh

0

@X

j

W

(l)ij x

(l�1)j + b

(l)

1

A

INTRO TO NEURAL NETS

Page 20: Deep Style: Using Variational Auto-encoders for Image Generation

Begin with input: 1 2 3 4 layer 1 (Input)

5 6

layer 2

f

(l)i (x) = tanh

0

@X

j

W

(l)ij x

(l�1)j + b

(l)

1

A

layer 3 (output)

Transform data repeatedly with non-linear function.

f

(1) � · · · � f (n)(x)

INTRO TO NEURAL NETS

Page 21: Deep Style: Using Variational Auto-encoders for Image Generation

1 2 3 4 layer 1(Input)

5 6

layer 2

layer 3(output)

Calculate loss function and update weights

f

(1) � · · · � f (n)(x)

L(xout

, y) =

MSEz }| {1

m

mX

k=1

(xk � yk)2

Begin with input:

f

(l)i (x) = tanh

0

@X

j

W

(l)ij x

(l�1)j + b

(l)

1

A

Transform data repeatedly with non-linear function.

INTRO TO NEURAL NETS

Page 22: Deep Style: Using Variational Auto-encoders for Image Generation

1 2 3 4 layer 1(Input)

5 6

layer 2

layer 3(output)

L(xout

, y) =

MSEz }| {1

m

mX

k=1

(xk � yk)2

W (l)⇤ij = W (l)

ij

✓1� ↵

@L@Wij

Calculate loss function and update weights

f

(1) � · · · � f (n)(x)

Begin with input:

f

(l)i (x) = tanh

0

@X

j

W

(l)ij x

(l�1)j + b

(l)

1

A

Transform data repeatedly with non-linear function.

INTRO TO NEURAL NETS

Page 23: Deep Style: Using Variational Auto-encoders for Image Generation

1 2 3 4 layer 1(Input)

5 6

layer 2

layer 3(output)

L(xout

, y) =

MSEz }| {1

m

mX

k=1

(xk � yk)2

W (l)⇤ij = W (l)

ij

✓1� ↵

@L@Wij

◆@L

@W

(l)ij

=

✓@L

@x

out

◆✓@x

out

@f

(n�1)

◆· · ·

@f

(l)

@W

(l)ij

!

Calculate loss function and update weights

f

(1) � · · · � f (n)(x)

Begin with input:

f

(l)i (x) = tanh

0

@X

j

W

(l)ij x

(l�1)j + b

(l)

1

A

Transform data repeatedly with non-linear function.

INTRO TO NEURAL NETS

Page 24: Deep Style: Using Variational Auto-encoders for Image Generation

WHY DEEP LEARNING?

1) With no hidden layers NN resemble just a linear transformation.

2) Shallow networks approximate PCA

3) Composing non-linear activation functions adds increasing nonlinearity.

f

(1) � · · · � f (n)(x)

4) Learn more complex/nonlinear models with deep architectures.

Page 25: Deep Style: Using Variational Auto-encoders for Image Generation
Page 26: Deep Style: Using Variational Auto-encoders for Image Generation

DL WITH SUPERVISIONMost deep learning methods rely on supervised training data.

MO:

Feature Extraction w/ Deep Learning

Final Classification Layer(s)

http://parse.ele.tue.nl/education/cluster2

Page 27: Deep Style: Using Variational Auto-encoders for Image Generation

ISSUES FOR STYLE

PROBLEM No reliable system of style labels for image data.

Page 28: Deep Style: Using Variational Auto-encoders for Image Generation

Thankfully we can learn feature representations of unsupervised data.

The key is to compress the data with a nonlinear encoding process.

PROBLEM No reliable system of style labels for image data.

ISSUES FOR STYLE

Page 29: Deep Style: Using Variational Auto-encoders for Image Generation

UNSUPERVISED DEEP LEARNING

Page 30: Deep Style: Using Variational Auto-encoders for Image Generation

UNSUPERVISED DEEP LEARNING

Page 31: Deep Style: Using Variational Auto-encoders for Image Generation

AUTO-ENCODERSTwo different processes combined into one.

1) Encoding (inferential) 2) Decoding (generative)

Page 32: Deep Style: Using Variational Auto-encoders for Image Generation

Compressed Data

OriginalImage

ReconstructedImage

Encode Decode

Two different processes combined into one.

1) Encoding (inferential) 2) Decoding (generative)

AUTO-ENCODERS

Page 33: Deep Style: Using Variational Auto-encoders for Image Generation

Compressed Data

OriginalImage

ReconstructedImage

Encode Decode

AUTO-ENCODERSTraining:

1) Initialize to random weights in layers.

AUTO-ENCODERS

Page 34: Deep Style: Using Variational Auto-encoders for Image Generation

Compressed Data

OriginalImage

ReconstructedImage

Encode Decode

AUTO-ENCODERSTraining:

1) Initialize to random weights in layers. 2) Full forward pass of batch through encoding and then decoding

of encoded rep.

AUTO-ENCODERS

Page 35: Deep Style: Using Variational Auto-encoders for Image Generation

Compressed Data

OriginalImage

ReconstructedImage

Encode Decode

AUTO-ENCODERSTraining:

1) Initialize to random weights in layers. 2) Full forward pass of batch through encoding and then decoding

of encoded rep. 3) Construct loss via MSE of original data to reconstructed data.

AUTO-ENCODERS

Page 36: Deep Style: Using Variational Auto-encoders for Image Generation

Compressed Data

OriginalImage

ReconstructedImage

Encode Decode

AUTO-ENCODERSTraining:

1) Initialize to random weights in layers. 2) Full forward pass of batch through encoding and then decoding

of encoded rep. 3) Construct loss via MSE of original data to reconstructed data. 4) Calculate gradients and backprop through to train new weights.

AUTO-ENCODERS

Page 37: Deep Style: Using Variational Auto-encoders for Image Generation

Compressed Data

OriginalImage

ReconstructedImage

Encode Decode

Training:

1) Initialize to random weights in layers. 2) Full forward pass of batch through encoding and then decoding

of encoded rep. 3) Construct loss via MSE of original data to reconstructed data. 4) Calculate gradients and backprop through to train new weights. 5) Iterate.

AUTO-ENCODERS

Page 38: Deep Style: Using Variational Auto-encoders for Image Generation

AUTO-ENCODER ISSUES1) AEs will often overfit unless amount of training data is

large.

2) Gradients diminish quickly, thus weight corrections small “far away” from output.

Page 39: Deep Style: Using Variational Auto-encoders for Image Generation

SOLUTION

1) Use variational component to “regularize” training.

2) *Not Covered* Stack auto-encoders and train greedily (DBN)

1) AEs will often overfit unless amount of training data is large.

2) Gradients diminish quickly, thus weight corrections small “far away” from output.

AUTO-ENCODER ISSUES

Page 40: Deep Style: Using Variational Auto-encoders for Image Generation

PassageMANY deep learning frameworks!!!

Page 41: Deep Style: Using Variational Auto-encoders for Image Generation

Passage

Page 42: Deep Style: Using Variational Auto-encoders for Image Generation

Easy-to-use framework for training Neural Networks.

BASIC OBJECTS

Variables Functions

Wrapper on ndarrays. Operate on Variable objects

Operations of functions on variables memorized in sequence.

Back propagation done by simply automatic differentiation moving backwards through the sequence of operations.

INTRO TO CHAINER

Page 43: Deep Style: Using Variational Auto-encoders for Image Generation

x = np.ones(1)*5y = np.ones(1)*3x = chainer.Variable(x)y = chainer.Variable(y)z = x**2 + y**2 + 2*y

INTRO TO CHAINER

Page 44: Deep Style: Using Variational Auto-encoders for Image Generation

x = np.ones(1)*5y = np.ones(1)*3x = chainer.Variable(x)y = chainer.Variable(y)z = x**2 + y**2 + 2*y

INTRO TO CHAINER

Page 45: Deep Style: Using Variational Auto-encoders for Image Generation

x = np.ones(1)*5y = np.ones(1)*3x = chainer.Variable(x)y = chainer.Variable(y)z = x**2 + y**2 + 2*y

In [3]: z.dataOut[3]: array([ 40.])

INTRO TO CHAINER

Page 46: Deep Style: Using Variational Auto-encoders for Image Generation

x = np.ones(1)*5y = np.ones(1)*3x = chainer.Variable(x)y = chainer.Variable(y)z = x**2 + y**2 + 2*y

In [3]: z.dataOut[3]: array([ 40.])

INTRO TO CHAINER

Page 47: Deep Style: Using Variational Auto-encoders for Image Generation

x = np.ones(1)*5y = np.ones(1)*3x = chainer.Variable(x)y = chainer.Variable(y)z = x**2 + y**2 + 2*y

In [3]: z.dataOut[3]: array([ 40.])

#calculate gradientsz.backwards()

INTRO TO CHAINER

Page 48: Deep Style: Using Variational Auto-encoders for Image Generation

Steps to NN

1. Define a model using chainer.FunctionSet

1. Contains all parametric functions.

2. Simple way to wrap computational elements into one

object.

2. Design and code forward network pass.

3. Set optimizer: chainer.optimizers

4. Make a train script which iteratively passes batches forward

through the network and updates the weights:

optimizer.update()loss.backwards()

INTRO TO CHAINER

Page 49: Deep Style: Using Variational Auto-encoders for Image Generation

ADVANTAGES

1. Forward pass through networks are intuitive and easily

debugged.

2. Can use arbitrary control flow statements.

3. Backpropagation easily implemented through backwards

traversal of computational graph.

4. High level of readability.

INTRO TO CHAINER

Page 50: Deep Style: Using Variational Auto-encoders for Image Generation

BUILDING A SIMPLE AUTO-ENCODER

Page 51: Deep Style: Using Variational Auto-encoders for Image Generation

MODEL SETUP#layer setuplayers = {}

#encoding layerslayers[‘encode0’] = F.Linear(img_size, n0)layers[‘encode1’] = F.Linear(n0, 2*encoding_size)

#decoding layerslayers[‘decode0’] = F.Linear(encoding_size, n0)layers[‘decode1’] = F.Linear(n0, img_size)

#model setupmodel = chainer.FunctionSet(**layers)optimizer = optimizers.Adam()optimizer.setup(model)

Page 52: Deep Style: Using Variational Auto-encoders for Image Generation

ENCODING# Encoderinput = chainer.Variable(input)

input

Page 53: Deep Style: Using Variational Auto-encoders for Image Generation

# Encoderinput = chainer.Variable(input)

input = F.relu(model.encode0(input))

input

ENCODING

Page 54: Deep Style: Using Variational Auto-encoders for Image Generation

# Encoderinput = chainer.Variable(input)

input = F.relu(model.encode0(input))

latent = F.relu(model.encode1(input))

latent

ENCODING

Page 55: Deep Style: Using Variational Auto-encoders for Image Generation

VARIATIONAL STEP

sample from distribution

# Variational layermean, std = F.split_axis(latent, 2, 1)

noise = np.random.standard_normal(mean.data.shape)

}�

q�(z) = N (z;µ(i),�2(i)I)

Page 56: Deep Style: Using Variational Auto-encoders for Image Generation

VARIATIONAL STEP

sampled

# Variational layermean, std = F.split_axis(latent, 2, 1)

noise = np.random.standard_normal(mean.data.shape)

sampled = noise * F.exp(0.5 * std) + mean

Page 57: Deep Style: Using Variational Auto-encoders for Image Generation

DECODING# Decoderoutput = F.relu(model.decode0(sampled))

output

Page 58: Deep Style: Using Variational Auto-encoders for Image Generation

DECODING# Decoderoutput = F.relu(model.decode0(sampled))

reconstruction = F.sigmoid(model.decode1(output))

reconstruction

Page 59: Deep Style: Using Variational Auto-encoders for Image Generation

UPDATE# Loss is just RMSEloss = F.mean_squared_error(reconstruction, input)

# “Regularize” the latent vectorloss += F.gaussian_kl_divergence(mean, std)

L(x) = DKL(q�(z)||N (0, I)) +MSE(x,yout

)

Page 60: Deep Style: Using Variational Auto-encoders for Image Generation

UPDATE# Loss is just RMSEloss = F.mean_squared_error(reconstruction, input)

# “Regularize” the latent vectorloss += F.gaussian_kl_divergence(mean, std)

#backpropoptimizer.zero_grads()loss.backward()optimizer.update()

Page 61: Deep Style: Using Variational Auto-encoders for Image Generation

AFTER TRAINING

Page 62: Deep Style: Using Variational Auto-encoders for Image Generation

RESULTSStill testing the efficacy of modeling style with the encoded space.

Normally, the generative portion would be thrown out after training, but here we can use it to look at our style space.

Page 63: Deep Style: Using Variational Auto-encoders for Image Generation
Page 64: Deep Style: Using Variational Auto-encoders for Image Generation

TRY IT YOURSELF

https://github.com/stitchfix/fauxtograph

Page 65: Deep Style: Using Variational Auto-encoders for Image Generation

COMMAND LINE TOOL

$ pip install fauxtograph

$ fauxtograph download images/

$ fauxtograph train images/ models/model_out

$ fauxtograph generate models/model_out generated_images/

Page 66: Deep Style: Using Variational Auto-encoders for Image Generation
Page 67: Deep Style: Using Variational Auto-encoders for Image Generation

source: @genekogan

Page 68: Deep Style: Using Variational Auto-encoders for Image Generation

FUTURE DIRECTIONSIssues with scaling to high resolution.

Page 69: Deep Style: Using Variational Auto-encoders for Image Generation

For 100x200 RGB Image:100x200x3 = 60000 node input layer

60,000x(step down layer 4000) = 240M

240M x 32-bits = ~ 960 MB

FUTURE DIRECTIONSIssues with scaling to high resolution.

Page 70: Deep Style: Using Variational Auto-encoders for Image Generation

Add Convolution Layers:

1) Reduce # of parameters.

2) Add translation robustness.

3) Hierarchical feature structure.

FUTURE DIRECTIONS

For 100x200 RGB Image:100x200x3 = 60000 node input layer

60,000x(step down layer 4000) = 240M

240M x 32-bits = ~ 960 MB

Issues with scaling to high resolution.

Page 71: Deep Style: Using Variational Auto-encoders for Image Generation

Add Convolution Layers:

1) Reduce # of parameters.

2) Add translation robustness.

3) Hierarchical feature structure.

FUTURE DIRECTIONS

For 100x200 RGB Image:100x200x3 = 60000 node input layer

60,000x(step down layer 4000) = 240M

240M x 32-bits = ~ 960 MB

Issues with scaling to high resolution.

COMING SOON

Page 72: Deep Style: Using Variational Auto-encoders for Image Generation

CONCLUSIONS

1) Style feature space would help resolve cold-start problem for both clients and items.

2) Auto-encoders are useful for deducing feature space in an unsupervised way.

3) Turn to VAE for drag and drop way to prevent overfitting.

4) Convolution on it’s way.

You can check out the branch: convolutional-vae

Page 73: Deep Style: Using Variational Auto-encoders for Image Generation

QUESTIONS?Original VAE Paper: http://arxiv.org/abs/1312.6114

Blog Post: http://multithreaded.stitchfix.com/blog/2015/09/17/deep-style/

Page 74: Deep Style: Using Variational Auto-encoders for Image Generation

APPENDIX: VARIATIONAL INFERENCE

Want to solve for posterior: p✓(z|x) =p✓(x|z)p✓(z)

p✓(x)

But posterior can be intractable to calculate efficiently.

Approximate

p✓(z|x) ⇡ q�(z)

Minimize KL Divergence

DKL (q�(z)||p✓(z|x)) =Z

dz q�(z) ln

✓q�(z)

p✓(z|x)

Page 75: Deep Style: Using Variational Auto-encoders for Image Generation

APPENDIX: VARIATIONAL AUTO-ENCODER

Auto-encoder learns/infers in the Bayesian sense too.

Learning encoding is equivalent to maximizing likelihood:

argmax

zp✓(x|z)

And generating decoding by maximizing posterior:

argmax

x

p✓(z|x)

Apply variational inference at the decoding step to calculate posterior.

Page 76: Deep Style: Using Variational Auto-encoders for Image Generation

Auto-encoder now models distributions for latent space.

If we guess a normal form for our “variational distribution” …

APPENDIX: VARIATIONAL AUTO-ENCODER

Page 77: Deep Style: Using Variational Auto-encoders for Image Generation

DKL (q�(z)||p✓(z|x)) = log

�2

�1+

��21 � �2

2

�+ (µ1 � µ2)

2

2�22

Auto-encoder now models distributions for latent space.

If we guess a normal form for our “variational distribution” …

APPENDIX: VARIATIONAL AUTO-ENCODER

Page 78: Deep Style: Using Variational Auto-encoders for Image Generation

DKL (q�(z)||p✓(z|x)) = log

�2

�1+

��21 � �2

2

�+ (µ1 � µ2)

2

2�22

L2 Loss

Auto-encoder now models distributions for latent space.

If we guess a normal form for our “variational distribution” …

APPENDIX: VARIATIONAL AUTO-ENCODER

Page 79: Deep Style: Using Variational Auto-encoders for Image Generation

DKL (q�(z)||p✓(z|x)) = log

�2

�1+

��21 � �2

2

�+ (µ1 � µ2)

2

2�22

L2 Loss

=

X

i

✓1

2

⇥�2i + µ2

i � 1

⇤� log �i

Auto-encoder now models distributions for latent space.

If we guess a normal form for our “variational distribution” …

APPENDIX: VARIATIONAL AUTO-ENCODER

Page 80: Deep Style: Using Variational Auto-encoders for Image Generation

DKL (q�(z)||p✓(z|x)) = log

�2

�1+

��21 � �2

2

�+ (µ1 � µ2)

2

2�22

L2 Loss

=

X

i

✓1

2

⇥�2i + µ2

i � 1

⇤� log �i

Drop in loss term to regularize latent space!

Auto-encoder now models distributions for latent space.

If we guess a normal form for our “variational distribution” …

APPENDIX: VARIATIONAL AUTO-ENCODER