106
@graphific Roelof Pieters Mul--Modal Embeddings: from Discrimina-ve to Genera-ve Models and Crea-ve AI 2 May 2016 KTH www.csc.kth.se/~roelof/ [email protected]

Multi-modal embeddings: from discriminative to generative models and creative ai

Embed Size (px)

Citation preview

Page 1: Multi-modal embeddings: from discriminative to generative models and creative ai

@graphificRoelof Pieters

Mul--ModalEmbeddings:fromDiscrimina-vetoGenera-veModelsand

Crea-veAI2May2016

KTH

www.csc.kth.se/~roelof/ [email protected]

Page 2: Multi-modal embeddings: from discriminative to generative models and creative ai

Modalities

2

Page 3: Multi-modal embeddings: from discriminative to generative models and creative ai

• AE/VAE

• DBN

• Latent Vectors/Manifold Walking

• RNN / LSTM /GRU

• Modded RNNs (ie Biaxial RNN)

• CNN + LSTM/GRU

• X + Mixture density network (MDN)

Common Generative Architectures

• DRAW• GRAN• DCGAN

• DeepDream & other CNN visualisations

• Splitting/Remixing NNs:• Image Analogy• Style Transfer • Semantic Style Transfer • Texture Synthesis

• Compositional Pattern-Producing Networks (CPPN)

• NEAT• CPPN w/ GAN+VAE.

Page 4: Multi-modal embeddings: from discriminative to generative models and creative ai

• AE/VAE

• DBN

• Latent Vectors/Manifold Walking

• RNN / LSTM /GRU

• Modded RNNs (ie Biaxial RNN)

• CNN + LSTM/GRU

• X + Mixture density network (MDN)

Common Generative Architectures

• DRAW• GRAN• DCGAN

• DeepDream & other CNN visualisations

• Splitting/Remixing NNs:• Image Analogy• Style Transfer • Semantic Style Transfer • Texture Synthesis

• Compositional Pattern-Producing Networks (CPPN)

• NEAT• CPPN w/ GAN+VAE.

what we’ll cover

Page 5: Multi-modal embeddings: from discriminative to generative models and creative ai

Text Generation

5

Page 6: Multi-modal embeddings: from discriminative to generative models and creative ai

Alex Graves (2014) Generating Sequences WithRecurrent Neural Networks

Wanna Play ?

Text Prediction (1)

Page 7: Multi-modal embeddings: from discriminative to generative models and creative ai

Alex Graves (2014) Generating Sequences WithRecurrent Neural Networks

Wanna Play ?

Text Prediction (1)

Page 8: Multi-modal embeddings: from discriminative to generative models and creative ai

Alex Graves (2014) Generating Sequences WithRecurrent Neural Networks

Wanna Play ?

Text Prediction (1)

Page 9: Multi-modal embeddings: from discriminative to generative models and creative ai

Alex Graves (2014) Generating Sequences WithRecurrent Neural Networks

Wanna Play ?

Text Prediction (2)

Page 10: Multi-modal embeddings: from discriminative to generative models and creative ai

Alex Graves (2014) Generating Sequences WithRecurrent Neural Networks

Wanna Play ?

Text Prediction (2)

Page 11: Multi-modal embeddings: from discriminative to generative models and creative ai

Alex Graves (2014) Generating Sequences WithRecurrent Neural Networks

Page 12: Multi-modal embeddings: from discriminative to generative models and creative ai

Alex Graves (2014) Generating Sequences WithRecurrent Neural Networks

Wanna Play ?

Handwriting Prediction

Page 13: Multi-modal embeddings: from discriminative to generative models and creative ai

Alex Graves (2014) Generating Sequences WithRecurrent Neural Networks

Wanna Play ?

Handwriting Prediction(we’re skipping the density mixture network details for now)

Page 14: Multi-modal embeddings: from discriminative to generative models and creative ai
Page 15: Multi-modal embeddings: from discriminative to generative models and creative ai

Wanna Play ?

Text generation

15

Karpathy (2015), The Unreasonable Effectiveness of Recurrent Neural Networks (blog)

Page 16: Multi-modal embeddings: from discriminative to generative models and creative ai

Wanna Play ?

Text generation

16

Karpathy (2015), The Unreasonable Effectiveness of Recurrent Neural Networks (blog)

Page 17: Multi-modal embeddings: from discriminative to generative models and creative ai
Page 18: Multi-modal embeddings: from discriminative to generative models and creative ai
Page 19: Multi-modal embeddings: from discriminative to generative models and creative ai

Karpathy (2015), The Unreasonable Effectiveness of Recurrent Neural Networks (blog)

Page 20: Multi-modal embeddings: from discriminative to generative models and creative ai

Andrej Karpathy, Justin Johnson, Li Fei-Fei (2015) Visualizing and Understanding Recurrent Networks

Page 21: Multi-modal embeddings: from discriminative to generative models and creative ai

Karpathy (2015), The Unreasonable Effectiveness

of Recurrent Neural Networks (blog)

Page 22: Multi-modal embeddings: from discriminative to generative models and creative ai

http://www.creativeai.net/posts/aeh3orR8g6k65Cy9M/generating-magic-cards-using-deep-recurrent-convolutional

Page 23: Multi-modal embeddings: from discriminative to generative models and creative ai

More…

more at:http://gitxiv.com/category/natural-language-

processing-nlp http://www.creativeai.net/?cat%5B0%5D=read-

write

Page 24: Multi-modal embeddings: from discriminative to generative models and creative ai

Image Generation

24

Page 25: Multi-modal embeddings: from discriminative to generative models and creative ai

Turn Convnet Around: “Deep Dream”

Image -> NN -> What do you (think) you see -> Whats the (text) label

Image -> NN -> What do you (think) you see -> feed back activations ->

optimize image to “fit” to the ConvNets “hallucination” (iteratively)

Page 26: Multi-modal embeddings: from discriminative to generative models and creative ai

Google, Inceptionism: Going Deeper into Neural Networks

Turn Convnet Around: “Deep Dream”

see also: www.csc.kth.se/~roelof/deepdream/

Page 27: Multi-modal embeddings: from discriminative to generative models and creative ai

Turn Convnet Around: “Deep Dream”

see also: www.csc.kth.se/~roelof/deepdream/ Google, Inceptionism: Going Deeper into Neural Networks

Page 29: Multi-modal embeddings: from discriminative to generative models and creative ai

https://www.flickr.com/photos/graphific/albums/72157657250972188

Single Units

Roelof Pieters 2015

Page 30: Multi-modal embeddings: from discriminative to generative models and creative ai

Multifaceted Feature Visualization

Anh Nguyen, Jason Yosinski, Jeff Clune (2016) Multifaceted Feature Visualization: Uncovering the Different Types of Features Learned By Each Neuron in Deep Neural Networks

Page 31: Multi-modal embeddings: from discriminative to generative models and creative ai

Multifaceted Feature Visualization

Page 32: Multi-modal embeddings: from discriminative to generative models and creative ai

Multifaceted Feature Visualization

Page 33: Multi-modal embeddings: from discriminative to generative models and creative ai

Preferred stimuli generation

Anh Nguyen, Alexey Dosovitskiy, Jason Yosinski, Thomas Brox, and Jeff Clune (2016) AI Neuroscience: Understanding Deep Neural Networks by Synthetically Generating the Preferred Stimuli for Each of Their Neurons

Page 34: Multi-modal embeddings: from discriminative to generative models and creative ai
Page 35: Multi-modal embeddings: from discriminative to generative models and creative ai

Inter-modal: Style Transfer (“Style Net” 2015)

Leon A. Gatys, Alexander S. Ecker, Matthias Bethge , 2015. A Neural Algorithm of Artistic Style (GitXiv)

Page 36: Multi-modal embeddings: from discriminative to generative models and creative ai
Page 37: Multi-modal embeddings: from discriminative to generative models and creative ai

Inter-modal: Image Analogies (2001)

A. Hertzmann, C. Jacobs, N. Oliver, B. Curless, D. Salesin.(2001) Image Analogies, SIGGRAPH 2001 Conference Proceedings.

A. Hertzmann (2001) Algorithms for Rendering in Artistic StylesPh.D thesis. New York University. May, 2001.

Page 38: Multi-modal embeddings: from discriminative to generative models and creative ai

Inter-modal: Image Analogies (2001)

Page 39: Multi-modal embeddings: from discriminative to generative models and creative ai

Inter-modal: Style Transfer (“Style Net” 2015)

Leon A. Gatys, Alexander S. Ecker, Matthias Bethge , 2015. A Neural Algorithm of Artistic Style (GitXiv)

Page 40: Multi-modal embeddings: from discriminative to generative models and creative ai

style layers: 3_1,4_1,5_1

Page 41: Multi-modal embeddings: from discriminative to generative models and creative ai

style layers: 3_2

Page 42: Multi-modal embeddings: from discriminative to generative models and creative ai

style layers: 5_1

Page 43: Multi-modal embeddings: from discriminative to generative models and creative ai

style layers: 5_2

Page 44: Multi-modal embeddings: from discriminative to generative models and creative ai

style layers: 5_3

Page 45: Multi-modal embeddings: from discriminative to generative models and creative ai

style layers: 5_4

Page 46: Multi-modal embeddings: from discriminative to generative models and creative ai

style layers: 5_1 + 5_2 + 5_3 + 5_4

Page 47: Multi-modal embeddings: from discriminative to generative models and creative ai

Gene Kogan, 2015. Why is a Raven Like a Writing Desk? (vimeo)

Page 48: Multi-modal embeddings: from discriminative to generative models and creative ai

Inter-modal: Style Transfer+MRF (2016)

Combining Markov Random Fields and Convolutional Neural Networks for Image Synthesis, 2016, Chuan Li, Michael Wand

Page 49: Multi-modal embeddings: from discriminative to generative models and creative ai

Inter-modal: Style Transfer+MRF (2016)

Combining Markov Random Fields and Convolutional Neural Networks for Image Synthesis, 2016, Chuan Li, Michael Wand

Page 50: Multi-modal embeddings: from discriminative to generative models and creative ai

Inter-modal: Pretrained Style Transfer (2016)

Texture Networks: Feed-forward Synthesis of Textures and Stylized Images, 2016, Dmitry Ulyanov, Vadim Lebedev, Andrea Vedaldi, Victor Lempitsky

500x speedup! (avg min loss from 10s to 20ms)

Page 51: Multi-modal embeddings: from discriminative to generative models and creative ai

Inter-modal: Pretrained Style Transfer #2 (2016)

Precomputed Real-Time Texture Synthesis with Markovian Generative Adversarial Networks, Chuan Li, Michael Wand, 2016

similarly 500x speedup

Page 52: Multi-modal embeddings: from discriminative to generative models and creative ai

Inter-modal: Perceptual Loss ST (2016)

Perceptual Losses for Real-Time Style Transfer and Super-Resolution, 2016, Justin Johnson, Alexandre Alahi, Li Fei-Fei

Page 53: Multi-modal embeddings: from discriminative to generative models and creative ai

Perceptual Losses for Real-Time Style Transfer and Super-Resolution, 2016, Justin Johnson, Alexandre Alahi, Li Fei-Fei

Page 54: Multi-modal embeddings: from discriminative to generative models and creative ai

54

Inter-modal: Style Transfer (“Style Net” 2015)

@DeepForger

Page 55: Multi-modal embeddings: from discriminative to generative models and creative ai

55

Inter-modal: Style Transfer (“Style Net” 2015)

@DeepForger

Page 56: Multi-modal embeddings: from discriminative to generative models and creative ai

+

+

=

Inter-modal: Semantic Style Transfer

Page 57: Multi-modal embeddings: from discriminative to generative models and creative ai

https://github.com/alexjc/neural-doodle

Semantic Style Transfer (“Neural Doodle”)

Page 58: Multi-modal embeddings: from discriminative to generative models and creative ai

Synthesize textures

Page 59: Multi-modal embeddings: from discriminative to generative models and creative ai

Synthesise textures (random weights)

- which activation function should i use?- pooling?- min nr of units?Experiment:

https://nucl.ai/blog/extreme-style-machines/

Random Weights (kinda like “Extreme Learning Machines”)

Page 60: Multi-modal embeddings: from discriminative to generative models and creative ai

Synthesise textures (random weights)totally random initialised weights:

https://nucl.ai/blog/extreme-style-machines/

Page 61: Multi-modal embeddings: from discriminative to generative models and creative ai

Synthesise textures (randim weights)

https://nucl.ai/blog/extreme-style-machines/

Activation Functions

Page 62: Multi-modal embeddings: from discriminative to generative models and creative ai

Synthesise textures (randim weights)

https://nucl.ai/blog/extreme-style-machines/

Activation Functions

Page 63: Multi-modal embeddings: from discriminative to generative models and creative ai

Synthesise textures (randim weights)

https://nucl.ai/blog/extreme-style-machines/

Activation Functions

Page 64: Multi-modal embeddings: from discriminative to generative models and creative ai

Synthesise textures (randim weights)

https://nucl.ai/blog/extreme-style-machines/

Down-sampling

Page 65: Multi-modal embeddings: from discriminative to generative models and creative ai

Synthesise textures (randim weights)

https://nucl.ai/blog/extreme-style-machines/

Down-sampling

Page 66: Multi-modal embeddings: from discriminative to generative models and creative ai

Synthesise textures (randim weights)

https://nucl.ai/blog/extreme-style-machines/

Down-sampling

Page 67: Multi-modal embeddings: from discriminative to generative models and creative ai

Synthesise textures (randim weights)

https://nucl.ai/blog/extreme-style-machines/

Nr of units

Page 68: Multi-modal embeddings: from discriminative to generative models and creative ai

Synthesise textures (randim weights)

https://nucl.ai/blog/extreme-style-machines/

Nr of units

Page 69: Multi-modal embeddings: from discriminative to generative models and creative ai

Synthesise textures (randim weights)

https://nucl.ai/blog/extreme-style-machines/

Nr of units

Page 70: Multi-modal embeddings: from discriminative to generative models and creative ai

• Image Analogies, 2001, A. Hertzmann, C. Jacobs, N. Oliver, B. Curless, D. Sales

• A Neural Algorithm of Artistic Style, 2015. Leon A. Gatys, Alexander S. Ecker, Matthias Bethge

• Combining Markov Random Fields and Convolutional Neural Networks for Image Synthesis, 2016, Chuan Li, Michael Wand

• Semantic Style Transfer and Turning Two-Bit Doodles into Fine Artworks, 2016, Alex J. Champandard

• Texture Networks: Feed-forward Synthesis of Textures and Stylized Images, 2016, Dmitry Ulyanov, Vadim Lebedev, Andrea Vedaldi, Victor Lempitsky

• Perceptual Losses for Real-Time Style Transfer and Super-Resolution, 2016, Justin Johnson, Alexandre Alahi, Li Fei-Fei

• Precomputed Real-Time Texture Synthesis with Markovian Generative Adversarial Networks, 2016, Chuan Li, Michael Wand

• @DeepForger

70

“Style Transfer” papers

Page 71: Multi-modal embeddings: from discriminative to generative models and creative ai

“A stop sign is flying in blue skies.”

“A herd of elephants flying in the blue skies.”

Elman Mansimov, Emilio Parisotto, Jimmy Lei Ba, Ruslan Salakhutdinov, 2015. Generating Images from Captions with Attention (arxiv) (examples)

Caption -> Image generation

Page 72: Multi-modal embeddings: from discriminative to generative models and creative ai

Image Colorisation

Page 73: Multi-modal embeddings: from discriminative to generative models and creative ai

More…

more at:http://gitxiv.com/category/computer-vision http://www.creativeai.net/ (no category for

images just yet)

Page 74: Multi-modal embeddings: from discriminative to generative models and creative ai

Audio Generation

74

Page 75: Multi-modal embeddings: from discriminative to generative models and creative ai

Early LSTM music composition (2002)

Douglas Eck and Jurgen Schmidhuber (2002) Learning The Long-Term Structure of the Blues?

Page 76: Multi-modal embeddings: from discriminative to generative models and creative ai

Markov constraints

http://www.flow-machines.com/

Page 77: Multi-modal embeddings: from discriminative to generative models and creative ai

Markov constraints

http://www.flow-machines.com/

Page 78: Multi-modal embeddings: from discriminative to generative models and creative ai

78

Audio Generation: Midi

https://soundcloud.com/graphific/pyotr-lstm-tchaikovsky

A Recurrent Latent Variable Model for Sequential Data, 2016,

J. Chung, K. Kastner, L. Dinh, K. Goel, A. Courville, Y. Bengio

+ “modded VRNN:

Page 79: Multi-modal embeddings: from discriminative to generative models and creative ai

79

Audio Generation: Midi

https://soundcloud.com/graphific/neural-remix-net

A Recurrent Latent Variable Model for Sequential Data, 2016,

J. Chung, K. Kastner, L. Dinh, K. Goel, A. Courville, Y. Bengio

+ “modded VRNN:

Page 80: Multi-modal embeddings: from discriminative to generative models and creative ai

80

Audio Generation: Raw

Gated Recurrent Unit (GRU)stanford cs224d projectAran Nayebi, Matt Vitelli (2015) GRUV: Algorithmic Music Generation using Recurrent Neural Networks

Page 81: Multi-modal embeddings: from discriminative to generative models and creative ai

• LSTM improvements

• Recurrent Batch Normalization http://gitxiv.com/posts/MwSDm6A4wPG7TcuPZ/recurrent-batch-normalization

• also: hidden-to-hidden transition (earlier only input-to-hidden transformation of RNNs)

• faster convergence and improved generalization.

LSTM improvements

Page 82: Multi-modal embeddings: from discriminative to generative models and creative ai

• LSTM improvements

• weight normalisation http://gitxiv.com/posts/p9B6i9Kzbkc5rP3cp/weight-normalization-a-simple-reparameterization-to

LSTM improvements

Page 83: Multi-modal embeddings: from discriminative to generative models and creative ai

• LSTM improvements

• Associative Long Short-Term Memory http://gitxiv.com/posts/jpfdiFPsu5c6LLsF4/associative-long-short-term-memory

LSTM improvements

Page 84: Multi-modal embeddings: from discriminative to generative models and creative ai

• LSTM improvements

• Bayesian RNN dropout http://gitxiv.com/posts/CsCDjy7WpfcBvZ88R/bayesianrnn

LSTM improvements

Page 85: Multi-modal embeddings: from discriminative to generative models and creative ai

Chor-RNN

Continuous Generation

Luka Crnkovic-Friis & Louise Crnkovic-Friis (2016) Generative Choreography using Deep Learning

Page 86: Multi-modal embeddings: from discriminative to generative models and creative ai

• Mixture Density LSTM

Generative

1

2

3

4

5

6

Page 87: Multi-modal embeddings: from discriminative to generative models and creative ai

• Mixture Density LSTM

Generative

1

2

3

4

5

6

Page 88: Multi-modal embeddings: from discriminative to generative models and creative ai

• Mixture Density LSTM

Generative

1

2

3

4

5

6

Page 89: Multi-modal embeddings: from discriminative to generative models and creative ai

• Mixture Density LSTM

Generative

1

2

3

4

5

6

Page 90: Multi-modal embeddings: from discriminative to generative models and creative ai

• Mixture Density LSTM

Generative

1

2

3

4

5

6

Page 91: Multi-modal embeddings: from discriminative to generative models and creative ai

Wanna be Doing Deep Learning?

Page 92: Multi-modal embeddings: from discriminative to generative models and creative ai

python has a wide range of deep learning-related libraries available

Deep Learning with Python

Low level

High level

deeplearning.net/software/theano

caffe.berkeleyvision.org

tensorflow.org/

lasagne.readthedocs.org/en/latest

and of course:

keras.io

Page 93: Multi-modal embeddings: from discriminative to generative models and creative ai

Code & Papers?http://gitxiv.com/ #GitXiv

Page 94: Multi-modal embeddings: from discriminative to generative models and creative ai

Creative AI projects?http://www.creativeai.net/ #Crea-veAI

Page 95: Multi-modal embeddings: from discriminative to generative models and creative ai

Questions?love letters? existential dilemma’s? academic questions? gifts?

find me at:www.csc.kth.se/~roelof/

[email protected]

@graphific

Oh, and soon we’re looking for Creative AI enthusiasts !

- job- internship- thesis work

inAI (Deep Learning)

&Creativity

Page 96: Multi-modal embeddings: from discriminative to generative models and creative ai

https://medium.com/@ArtificialExperience/creativeai-9d4b2346faf3

Page 97: Multi-modal embeddings: from discriminative to generative models and creative ai

Creative AI > a “brush” > rapid experimentation

human-machine collaboration

Page 98: Multi-modal embeddings: from discriminative to generative models and creative ai

Creative AI > a “brush” > rapid experimentation

(YouTube, Paper)

Page 99: Multi-modal embeddings: from discriminative to generative models and creative ai

Creative AI > a “brush” > rapid experimentation

(YouTube, Paper)

Page 100: Multi-modal embeddings: from discriminative to generative models and creative ai

Creative AI > a “brush” > rapid experimentation

(Vimeo, Paper)

Page 101: Multi-modal embeddings: from discriminative to generative models and creative ai

101

Generative Adverserial Nets

Emily Denton, Soumith Chintala, Arthur Szlam, Rob Fergus, 2015. Deep Generative Image Models using a Laplacian Pyramid of Adversarial Networks (GitXiv)

Page 102: Multi-modal embeddings: from discriminative to generative models and creative ai

102

Generative Adverserial Nets

Alec Radford, Luke Metz, Soumith Chintala , 2015. Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks (GitXiv)

Page 103: Multi-modal embeddings: from discriminative to generative models and creative ai

103

Generative Adverserial Nets

Alec Radford, Luke Metz, Soumith Chintala , 2015. Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks (GitXiv)

Page 104: Multi-modal embeddings: from discriminative to generative models and creative ai

104

Generative Adverserial Nets

Alec Radford, Luke Metz, Soumith Chintala , 2015. Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks (GitXiv)

”turn” vector created from four averaged samples of faces looking left vs looking right.

Page 105: Multi-modal embeddings: from discriminative to generative models and creative ai

walking through the manifold

Generative Adverserial Nets

Page 106: Multi-modal embeddings: from discriminative to generative models and creative ai

top: unmodified samplesbottom: same samples dropping out ”window” filters

Generative Adverserial Nets