Upload
others
View
6
Download
0
Embed Size (px)
Citation preview
Lecture 11 Recap
I2DL: Prof. Niessner, Dr. Dai 1
Transfer Learning
3I2DL: Prof. Niessner, Dr. Dai
P1 P2
Large dataset Small dataset
Distribution Distribution
Use what has been learned for another
setting
Transfer Learning
5I2DL: Prof. Niessner, Dr. Dai
Trained on ImageNet New dataset with C classes
TRAIN
FROZEN
[Donahue et al., ICML’14] DeCAF, [Razavian et al., CVPRW’14] CNN Features off-the-shelf
Source : http://cs231n.stanford.edu/slides/2016/winter1516_lecture11.pdf
Basic Structure of RNN• We want to have notion of “time” or “sequence”
7I2DL: Prof. Niessner, Dr. Dai
Hidden state
InputPrevious hidden state
𝑨𝑡 = 𝜽𝑐𝑨𝑡−1 + 𝜽𝑥𝒙𝑡
Source: https://colah.github.io/posts/2015-08-Understanding-LSTMs/
Long-Term Dependencies
8I2DL: Prof. Niessner, Dr. Dai
I moved to Germany … so I speak German fluently.Source: https://colah.github.io/posts/2015-08-Understanding-LSTMs/
Long-Short Term Memory Units(LSTM)
11I2DL: Prof. Niessner, Dr. Dai
Source: https://colah.github.io/posts/2015-08-Understanding-LSTMs/
Long-Short Term Memory Units• Key ingredients • Cell = transports the information through the unit
12I2DL: Prof. Niessner, Dr. Dai
Source: https://colah.github.io/posts/2015-08-Understanding-LSTMs/
LSTM• Highway for the gradient to flow
13I2DL: Prof. Niessner, Dr. Dai
Source: https://colah.github.io/posts/2015-08-Understanding-LSTMs/
RNNs in Computer Vision• Caption generation
14I2DL: Prof. Niessner, Dr. Dai
[Xu et al., PMLR’15] Neural Image Caption Generation
Autoencoders
I2DL: Prof. Niessner, Dr. Dai 15
Machine Learning
16
• Labels or target classes
• Goal: Learn a mapping from input to label
• Classification, regression
Supervised learning
I2DL: Prof. Niessner, Dr. Dai
DOG DOG
DOG
CAT
CAT
CAT
Machine Learning
17
Unsupervised learning Supervised learning
I2DL: Prof. Niessner, Dr. Dai
Unsupervised learning Supervised learning
Machine Learning
• No label or target class• Find out properties of
the structure of the data
• Clustering (k-means, PCA)
18
DOG DOG
DOG
CAT
CAT
CAT
I2DL: Prof. Niessner, Dr. Dai
DOG DOG
DOG
CAT
CAT
CAT
Machine Learning
19
Unsupervised learning Supervised learning
I2DL: Prof. Niessner, Dr. Dai
Machine Learning
20
Unsupervised learning Supervised learning
DOG DOG
DOG
CAT
CAT
CAT
I2DL: Prof. Niessner, Dr. Dai
Autoencoders• Unsupervised approach for learning a lower-
dimensional feature representation from unlabeled training data
21I2DL: Prof. Niessner, Dr. Dai
Source: https://hackernoon.com
Autoencoders• From an input image
to a feature representation (bottleneck layer)
• Encoder: a CNN in our case
22I2DL: Prof. Niessner, Dr. Dai
𝑥
Conv
𝑧
Input ImageSource: https://bit.ly/37dpsbQ
Autoencoders• Why do we need this dimensionality reduction?
• To capture the patterns, the most meaningful factors of variation in our data
• Other dimensionality reduction methods?
23I2DL: Prof. Niessner, Dr. Dai
Autoencoder Training
24
Conv Transpose Conv
Input Image Output Image
ReconstructionLoss (like L1, L2)
I2DL: Prof. Niessner, Dr. Dai
Source: https://bit.ly/37dpsbQ
Autoencoder Training
25
Latent space 𝑧dim 𝑧 < dim(𝑥)
Inp
ut 𝑥
Rec
onst
ruct
ion 𝑥′
Input images
Reconstructed images
I2DL: Prof. Niessner, Dr. Dai
26
Autoencoder Training• No labels
required
• We can use unlabeled data to first get its structure
I2DL: Prof. Niessner, Dr. Dai
Latent space 𝑧dim 𝑧 < dim(𝑥)
Inp
ut 𝑥
Rec
onst
ruct
ion 𝑥′
27
Autoencoder Use CasesEmbedding of
MNIST numbers
I2DL: Prof. Niessner, Dr. Dai
Source: https://lts2.epfl.ch/blog/perekres/2015/02/21/layer-by-layer-visualizations-of-mnist-dataset-feature-representations/
28
Autoencoder for Pre-Training• Test case: Medical applications based on CT images
– Large set of unlabeled data.– Small set of labeled data.
• We cannot take a network pre-trained on ImageNet. Why?
• The image features are different for CT vs natural images
I2DL: Prof. Niessner, Dr. Dai
29
Autoencoder for Pre-Training• Test case: medical applications based on CT images
– Large set of unlabeled data.– Small set of labeled data.
• We can pre-train our network using an autoencoder to “learn” the type of features present in CT images
I2DL: Prof. Niessner, Dr. Dai
30
Autoencoder for Pre-Training• Step 1: Unsupervised training with autoencoders
Input Reconstruction
I2DL: Prof. Niessner, Dr. Dai
Source: https://bit.ly/37dpsbQ
31
Autoencoder for Pre-Training• Step 2: Supervised training with the labeled data
Input Reconstruction
Throw away the decoder
I2DL: Prof. Niessner, Dr. Dai
Source: https://bit.ly/37dpsbQ
Autoencoder for Pre-Training• Step 2: Supervised training with the labeled data
32
Input
Ground truth labels for supervised learning
Loss
Backprop as always
I2DL: Prof. Niessner, Dr. Dai
𝑥 𝑦
𝑦∗
𝑧
Why use Autoencoders?• Pre-training, as mentioned before
– Image same image reconstructed– Use the encoder as “feature extractor”
• Use them to get pixel-wise predictions– Image semantic segmentation– Low-resolution image High-resolution image– Image Depth map
33I2DL: Prof. Niessner, Dr. Dai
Autoencoders for Pixel-wise Predictions
34I2DL: Prof. Niessner, Dr. Dai
Semantic Segmentation (FCN)• Recall the Fully Convolutional Networks
35[Long et al., CVPR’15] : Fully Convolutional Networks for Semantic Segmentation
Can we do better?
I2DL: Prof. Niessner, Dr. Dai
SegNet
36I2DL: Prof. Niessner, Dr. Dai[Badrinarayanan et al., TPAMI‘16] SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation
SegNet
37
Input
Ground Truth
SegNet
I2DL: Prof. Niessner, Dr. Dai[Badrinarayanan et al., TPAMI‘16] SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation
SegNet• Encoder: normal convolutional filters + pooling
• Decoder: Upsampling + convolutional filters
38I2DL: Prof. Niessner, Dr. Dai[Badrinarayanan et al., TPAMI‘16] SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation
SegNet• Encoder: Normal convolutional filters + pooling
• Decoder: Upsampling + convolutional filters
39I2DL: Prof. Niessner, Dr. Dai[Badrinarayanan et al., TPAMI‘16] SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation
SegNet• Encoder: normal convolutional filters + pooling
• Decoder: Upsampling + convolutional filters
• The convolutional filters in the decoder are learned using backprop and their goal is to refine the upsampling
40I2DL: Prof. Niessner, Dr. Dai[Badrinarayanan et al., TPAMI‘16] SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation
Generative Models• Given training data, how to generate new samples
from the same distribution
42I2DL: Prof. Niessner, Dr. Dai Source: https://openai.com/blog/generative-models/
Rea
l Im
ages
Gen
erat
ed Im
ages
Generative Models
43I2DL: Prof. Niessner, Dr. Dai
Explicit Density Implicit Density
Tractable Density Approximate Density
Variational Markov Chain
Markov Chain Direct
Variational Autoencoder Boltzmann Machine
GSN GANFully Visible Belief Nets
Figure copyright and adapted from Ian Goodfellow, Tutorial on Generative Adversarial Networks, 2017
44
Variational Autoencoders
I2DL: Prof. Niessner, Dr. Dai
Autoencoders• Encode the input into a representation (bottleneck)
and reconstruct it with the decoder
45
Conv Transpose Conv
Encoder Decoder
I2DL: Prof. Niessner, Dr. Dai
𝑥 𝑥
𝑧
Autoencoders• Encode the input into a representation (bottleneck)
and reconstruct it with the decoder
46I2DL: Prof. Niessner, Dr. Dai
Source: https://bit.ly/37ctFMS
Latent space learnedby autoencoder on MNIST
𝑧
Variational Autoencoder
47
Conv Transpose Conv
Encoder Decoder
I2DL: Prof. Niessner, Dr. Dai
𝑥 𝑥𝜙 𝜃
𝑞𝜙 𝑧 𝑥 𝑝𝜃 𝑥 𝑧)
48
Variational Autoencoder
Conv Transpose Conv
Goal: Sample from the latent distribution to generate new outputs!
I2DL: Prof. Niessner, Dr. Dai
𝑧
𝑥 𝑥𝜙 𝜃
Variational Autoencoder
49
• Latent space is now a distribution• Specifically it is a Gaussian
Encoder Decoder
Sample
I2DL: Prof. Niessner, Dr. Dai
𝑥 𝜙 𝜃 𝑥
𝜇𝑧|𝑥
Σ𝑧|𝑥 𝑧
𝑧|𝑥 ∼ 𝒩(𝜇𝑧|𝑥, Σ𝑧|𝑥)
Variational Autoencoder
50
• Latent space is now a distribution• Specifically it is a Gaussian
EncoderMean
Diagonal covariance
I2DL: Prof. Niessner, Dr. Dai
𝑥 𝜙
𝜇𝑧|𝑥
Σ𝑧|𝑥
𝑧|𝑥 ∼ 𝒩(𝜇𝑧|𝑥, Σ𝑧|𝑥)
Variational Autoencoder
51
• Training
I2DL: Prof. Niessner, Dr. Dai
Encoder Decoder
Sample𝑥 𝜃 𝑥
𝜇𝑧|𝑥
Σ𝑧|𝑥 𝑧
𝑧|𝑥 ∼ 𝒩(𝜇𝑧|𝑥, Σ𝑧|𝑥)
𝜙
Variational Autoencoder• Sampling operation is not differentiable
-> We can‘t backpropagate through the latent space
52I2DL: Prof. Niessner, Dr. Dai
Encoder Decoder
Sample𝑥 𝜙 𝜃 𝑥
𝜇𝑧|𝑥
Σ𝑧|𝑥 𝑧
𝑧|𝑥 ∼ 𝒩(𝜇𝑧|𝑥, Σ𝑧|𝑥)
🚫
Reparametrization Trick• Now we only need to backpropagate through an
addition and a multiplication
53I2DL: Prof. Niessner, Dr. Dai
Encoder Decoder
Sample
𝑥 𝜙 𝜃 𝑥
𝜇𝑧|𝑥
Σ𝑧|𝑥 𝑧
𝒩(0,1)
* +
Variational Autoencoder
54
• Test: Sample from the latent space
I2DL: Prof. Niessner, Dr. Dai
Decoder
Sample 𝜃 𝑥
𝜇𝑧|𝑥
Σ𝑧|𝑥 𝑧
𝑧|𝑥 ∼ 𝒩(𝜇𝑧|𝑥, Σ𝑧|𝑥)
Autoencoder vs VAE
Autoencoder Variational Autoencoder Ground TruthSource: https://github.com/kvfrans/variational-autoencoder
55I2DL: Prof. Niessner, Dr. Dai
Autoencoder Overview• Autoencoders (AE)
– Reconstruct input– Unsupervised learning– Latent space features are useful
• Variational Autoencoders (VAE)– Probability distribution in latent space (e.g., Gaussian)– Interpretable latent space (head pose, smile)– Sample from model to generate output
56I2DL: Prof. Niessner, Dr. Dai
Generative Adversarial Networks (GANs)
57I2DL: Prof. Niessner, Dr. Dai
Generative Adversarial Networks (GANs)
58
Source: https://github.com/hindupuravinash/the-gan-zoo
I2DL: Prof. Niessner, Dr. Dai
Convolution and Deconvolution
Convolutionno padding, no stride
Source: https://github.com/vdumoulin/conv_arithmetic
Transposed convolutionno padding, no stride
Input
Output
Input
Output
I2DL: Prof. Niessner, Dr. Dai 59
Autoencoder
Conv DeconvI2DL: Prof. Niessner, Dr. Dai 60
Decoder as Generative Model
Latent space 𝑧dim 𝑧 < dim(𝑥)
Test time:-> reconstruction from
‘random’ vector
Output Image
ReconstructionLoss (often L2)
I2DL: Prof. Niessner, Dr. Dai 63
Decoder as Generative Model
Interpolation between two chair models
I2DL: Prof. Niessner, Dr. Dai 64
[Dosovitsky et al., ‘14] Learning to Generate Chairs
Decoder as Generative Model
Morphing betweenchair models
I2DL: Prof. Niessner, Dr. Dai 65
[Dosovitsky et al., ‘14] Learning to Generate Chairs
Decoder as Generative Model
Latent space zdim (z) < dim (x)
“Test time”:-> reconstruction from
‘random’ vector
Reconstruction Loss Often L2, i.e., sum of squared dist.-> L2 distributes error equally
-> mean is opt.-> res. Is blurry
Instead of L2, can we “learn” a loss function?
I2DL: Prof. Niessner, Dr. Dai 66
Generative Adversarial Networks (GANs)
[Goodfellow et al., NIPS‘14] Generative Adversarial Networks (slide from McGuinness)
67
𝑧𝐺
𝐺(𝑧)
𝐷
𝐷(𝐺(𝑧))
I2DL: Prof. Niessner, Dr. Dai
Generative Adversarial Networks (GANs)
68
𝑧𝐺
𝐺(𝑧)
𝐷
𝑥
𝐷(𝑥)
𝐷(𝐺(𝑧))
I2DL: Prof. Niessner, Dr. Dai
[Goodfellow et al., NIPS‘14] Generative Adversarial Networks (slide from McGuinness)
Generative Adversarial Networks (GANs)
real data fake data
I2DL: Prof. Niessner, Dr. Dai [Goodfellow, NIPS‘16] Tutorial: Generative Adversarial Networks 69
• Minimax Game:– G minimizes probability that D is correct– Equilibrium is saddle point of discriminator loss
• Discriminator loss
• Generator loss binary cross entropy
GANs: Loss Functions
• D provides supervision (i.e., gradients) for G
I2DL: Prof. Niessner, Dr. Dai[Goodfellow et al., NIPS‘14] Generative Adversarial Networks
𝐽 𝐷 = −1
2𝔼𝐱∼𝑝𝑑𝑎𝑡𝑎 log𝐷 𝒙 −
1
2𝔼𝒛 log 1 − 𝐷 𝐺 𝒛
𝐽(𝐺) = −𝐽 𝐷
70
• Heuristic Method (often used in practice)– G maximizes the log-probability of D being mistaken– G can still learn even when D rejects all generator samples
• Discriminator loss
GANs: Loss Functions
• Generator loss
I2DL: Prof. Niessner, Dr. Dai
𝐽 𝐷 = −1
2𝔼𝐱∼𝑝𝑑𝑎𝑡𝑎 log𝐷 𝒙 −
1
2𝔼𝒛 log 1 − 𝐷 𝐺 𝒛
𝐽(𝐺) = −1
2𝔼𝒛 log𝐷 𝐺 𝒛
71[Goodfellow et al., NIPS‘14] Generative Adversarial Networks
Alternating Gradient Updates• Step 1: Fix G, and perform gradient step to
• Step 2: Fix D, and perform gradient step to
72I2DL: Prof. Niessner, Dr. Dai
𝐽 𝐷 = −1
2𝔼𝐱∼𝑝𝑑𝑎𝑡𝑎 log𝐷 𝒙 −
1
2𝔼𝒛 log 1 − 𝐷 𝐺 𝒛
𝐽(𝐺) = −1
2𝔼𝒛 log𝐷 𝐺 𝒛
Training a GAN
74
Source: https://medium.com/ai-society/gans-from-scratch-1-a-deep-introduction-with-code-in-pytorch-and-tensorflow-cb03cdcdba0f
I2DL: Prof. Niessner, Dr. Dai
GANs: Loss FunctionsMinimax
Heuristic
I2DL: Prof. Niessner, Dr. Dai 75[Goodfellow et al., NIPS‘14] Generative Adversarial Networks
DCGAN: Generator
Generator of Deep Convolutional GANs
I2DL: Prof. Niessner, Dr. Dai 76[Radford et al., ICLR‘16] DCGAN : Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks
DCGAN: Results
Results on MNIST
77I2DL: Prof. Niessner, Dr. Dai[Radford et al., ICLR‘16] DCGAN : Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks
DCGAN: Results
Results on CelebA (200k relatively well aligned portrait photos)
I2DL: Prof. Niessner, Dr. Dai 78
[Radford et al., ICLR‘16] DCGAN : Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks
Conditional Generative Adversarial
Networks (cGANs)79I2DL: Prof. Niessner, Dr. Dai
Pix2Pix: Image-to-Image Translation
I2DL: Prof. Niessner, Dr. Dai
[Isola et al., CVPR‘17] Pix2Pix : Image-to-Image Translation with Conditional Adversarial Networks
80
real or fake?
Discriminator
z G(z)
D
Generator
G
min𝐺
max𝐷
𝔼𝑧,𝑥 log 𝐷(𝐺 𝑧 ) + log(1 − 𝐷 𝑥 )
I2DL: Prof. Niessner, Dr. Dai
min𝐺
max𝐷
𝔼𝑥,𝑦 log 𝐷(𝐺 𝑥 ) + log(1 − 𝐷 𝑦 )
81
[Isola et al., CVPR‘17] Pix2Pix : Image-to-Image Translation with Conditional Adversarial Networks
G(x)x
real or fake?
Discriminator
D
Generator
G
I2DL: Prof. Niessner, Dr. Dai
min𝐺
max𝐷
𝔼𝑥,𝑦 log 𝐷(𝐺 𝑥 ) + log(1 − 𝐷 𝑦 )
82
[Isola et al., CVPR‘17] Pix2Pix : Image-to-Image Translation with Conditional Adversarial Networks
Real!
DiscriminatorGenerator
I2DL: Prof. Niessner, Dr. Dai
G(x)x
DG
min𝐺
max𝐷
𝔼𝑥,𝑦 log 𝐷(𝐺 𝑥 ) + log(1 − 𝐷 𝑦 )
83
[Isola et al., CVPR‘17] Pix2Pix : Image-to-Image Translation with Conditional Adversarial Networks
min𝐺
max𝐷
𝔼𝑥,𝑦 log 𝐷(𝐺 𝑥 ) + log(1 − 𝐷 𝑦 )
DiscriminatorGenerator
Real too!
I2DL: Prof. Niessner, Dr. Dai
G(x)x
DG
84
[Isola et al., CVPR‘17] Pix2Pix : Image-to-Image Translation with Conditional Adversarial Networks
min𝐺
max𝐷
𝔼𝑥,𝑦 log𝐷(𝑥, 𝐺 𝑥 ) + log(1 − 𝐷 𝑥, 𝑦 )
real or fake pair?
match joint distribution p G x , y ∼ p(x, y)
fake pair real pair
I2DL: Prof. Niessner, Dr. Dai
G
G(x)x
D
85[Isola et al., CVPR‘17] Pix2Pix : Image-to-Image Translation with Conditional Adversarial Networks
Pix2Pix
86I2DL: Prof. Niessner, Dr. Dai
Edges → ImagesInput Output Input Output Input Output
I2DL: Prof. Niessner, Dr. Dai
[Isola et al., CVPR‘17] Pix2Pix : Image-to-Image Translation with Conditional Adversarial NetworksEdges from [Xie et al., ICCV’15] Holistically-Nested Edge Detection]
87
Sketches → ImagesInput Output Input Output Input Output
Trained on Edges → Images
Data from [Eitz, Hays, Alexa, 2012]I2DL: Prof. Niessner, Dr. Dai 88
Input Output Ground Truth
I2DL: Prof. Niessner, Dr. Dai
Data from maps.google.com
89
[Isola et al., CVPR‘17] Pix2Pix : Image-to-Image Translation with Conditional Adversarial Networks
BW → ColorInput Output Input Output Input Output
I2DL: Prof. Niessner, Dr. Dai 90
Data from ImageNet [Isola et al., CVPR‘17] Pix2Pix : Image-to-Image Translation with Conditional Adversarial Networks
GAN Applications
91I2DL: Prof. Niessner, Dr. Dai
BigGAN: HD Image Generation
92I2DL: Prof. Niessner, Dr. Dai
[Brock et al., ICLR‘18] BigGAN : Large Scale GAN Training for High Fidelity Natural Image Synthesis
StyleGAN: Face Image Generation
93I2DL: Prof. Niessner, Dr. Dai
[Karras et al., ‘18] StyleGAN : A Style-Based Generator Architecture for Generative Adversarial Networks [Karras et al., ‘19] StyleGAN2 : Analyzing and Improving the Image Quality of StyleGAN
Cycle GAN: Unpaired Image-to-Image Translation
I2DL: Prof. Niessner, Dr. Dai[Zhu et al., ICCV‘17] Cycle GAN : Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks
94
SPADE: GAN-Based Image Editing
95I2DL: Prof. Niessner, Dr. Dai[Park et al., CVPR‘19] SPADE : Semantic Image Synthesis with Spatially-Adaptive Normalization
Few-Shot-Vid2Vid: Single-Shot Video Generation
96I2DL: Prof. Niessner, Dr. Dai
• https://www.youtube.com/watch?v=APoB1u3kTOU• https://www.youtube.com/watch?v=kkA6CHRovKA
[Wang et al., NeurIPS‘19] Few-Shot Video-to-Video Synthesis
References for Further Reading• https://towardsdatascience.com/intuitively-
understanding-variational-autoencoders-1bfe67eb5daf
• https://phillipi.github.io/pix2pix/
• http://cs231n.stanford.edu/slides/2017/cs231n_2017_lecture13.pdf
I2DL: Prof. Niessner, Dr. Dai 97
Next Lecture
I2DL: Prof. Niessner, Dr. Dai 98
• Next lecture on 28th January: - Course outlook
• Reminder: Exercise 3 due tomorrow at 18:00
• Thursday exercise session on exercise 3 and exam tips
See you next time!
99I2DL: Prof. Niessner, Dr. Dai