Upload
dohanh
View
214
Download
0
Embed Size (px)
Citation preview
Deep Learning with CNNs
University of Rome "La Sapienza"Dep. of Computer, Control and Management Engineering A. Ruberti
Valsamis Ntouskos, ALCOR Lab
Outline
Deep Learning with CNNs
• Introduction - Motivation
• Theoretical aspects
• Brief history of CNNs
• Evolution of CNNs for image classification
• Applications of CNNs in computer vision
Deep Learning with CNNs
Deep Learning with CNNs
Compositional Models
Learned End-to-End
Hierarchy of Representations- vision: pixel, motif, part, object
- text: character, word, clause, sentence
- speech: audio, band, phone, word
concrete abstractlearning
Slides from Caffe framework tutorial @ CVPR2015
Deep Learning with CNNs
Deep Learning with CNNs
Compositional Models
Learned End-to-End
Back-propagation jointly learns
all of the model parameters to
optimize the output for the task.
Slides from Caffe framework tutorial @ CVPR2015
Motivation
Deep Learning with CNNs
Up to now we treated inputs as general feature vectors
In some cases inputs have special structure:• Audio• Images• Videos
Signals: Numerical representations of physical quantities
Deep learning can be directly applied on signals by using suitable operators
Motivation
Deep Learning with CNNs
. . . 0.0468 0.0468 0.0468 0.0390 0.0390 0.0390 0.0546 0.0625 0.0625 0.0390 0.0312 0.0468 0.0625 . . .
1D data - (variable length) vectors
Audio
Motivation
Deep Learning with CNNs
Images
A sequence of images sampled through time - 3D data
2D data - matrices
Video
What is a CNN?
Deep Learning with CNNs
Some theory
Deep Learning with CNNs
Convolution
From Steve Seitz and Richard Szeliski's slides(https://courses.cs.washington.edu/courses/cse576/08sp/)
Interactive examples:http://setosa.io/ev/image-kernels/
Luca
Some theory
Deep Learning with CNNs
Convolution
• Image filtering is
based on convolution
with special kernels
Some theory
Deep Learning with CNNs
Pooling
• Introduces subsampling
Some theory
Deep Learning with CNNs
Activation
Standard way to model a neuron
f(x) = tanh(x) or f(x) = (1 + e-x)-1
Very slow to train (saturation)
Non-saturating nonlinearity (RELU)f(x) = max(0, x)Quick to train
Some theory
Deep Learning with CNNs
Regularization
Dropout
• Applied on the fully-connected layers
• During training prune nodes with probability α
• During testing nodes are weighed by α
Image from Srivastava et al.. "Dropout: A Simple Way to Prevent Neural Networks from Overfitting"
Some theory
Deep Learning with CNNs
Every convolutional layer of a CNN transforms the 3D input
volume to a 3D output volume of neuron activations.
A regular 3-layer Neural Network
Material from Fei-Fei’s group
Some theory
Deep Learning with CNNs
Each neuron is connected to a
local region in the input volume
spatially, but to all channels
The neurons still compute a dot
product of their weights with the
input followed by a non-linearity
Material from Fei-Fei’s group
Terminology
• Kernel: matrix corresponding to convolution / filter
• Depth: number of feature maps / filters (d)
• Depth slice: a single feature map
• Padding: zero filled addition outer rows/columns (p)
• Stride: step of sliding kernel (s)
– e.g. value 1 takes one pixel at a time
• Receptive field: 2D dimensions of kernel (𝑤𝑘 × ℎk)
• Weight or parameter sharing: Parameters of the
filter are shared through a depth slice
– i.e. parameters are the same for all units of the same
features map
Deep Learning with CNNs
Algorithms
Deep Learning with CNNs
• Each* neuron/layer is differentiable!
• Just apply backpropagation (chain-rule)
• Use standard gradient-based optimization algorithms
(SGD, AdaGrad, …)
• The devil lies in the details though …
▪Choosing hyperparameters / loss-function
▪Exploding/Vanishing gradients – batch normalization
▪Overfitting – Regularization
▪Cost of performing experiments
▪Convergence
▪…
*what about max-pooling?
Kernels and
Feature maps
Deep Learning with CNNs
Material from Fei-Fei’s group
Brief history of CNNs
Foundational work done in the middle of the 1900s
• 1940s-1960s: Cybernetics [McCulloch and Pitts 1943,
Hebb 1949, Rosenblatt 1958]
• 1980s-mid 1990s: Connectionism [Rumelhart 1986,
Hinton 1989]
• 1990s: modern convolutional networks [LeCun et al.
1998], LSTM [Hochreiter & Schmidhuber 1997,
MNIST and other large datasets]
Deep Learning with CNNs
Brief history of CNNs
Deep Learning with CNNs
Hubel & Wiesel [60s] Simple & Complex cells architecture:
Hubel & Wiesel [60s] Simple & Complex cells architecture Fukushima’s Neocognitron [70s]
Yann LeCun’s Early CNNs [80s]:
Brief history of CNNs
Deep Learning with CNNs
Convolutional Networks: 1989
LeNet: a layered model composed of convolution and subsampling operations followed
by a holistic representation and ultimately a classifier for handwritten digits. [ LeNet ]
Recent success
• Parallel Computation (GPU)
• Larger training sets
• International Competitions
• Theoretical advancements
– Dropout
– ReLUs
– Batch Normalization
Deep Learning with CNNs
Recent success
Deep Learning with CNNs
CUDA Jetson TX1, TK1
Android lib, demo
OpenCL branch
Better Hardware – GPUs
Recent success
ImageNet
• Over 15M labeled high resolution images
• Roughly 22K categories
• Collected from web and labeled by Amazon Mechanical Turk
Deep Learning with CNNs
Larger training sets
Recent success
ILSVRC
• Annual competition of image classification at large scale
• 1.2M images in 1K categories
• Classification: make 5 guesses about the image label
Deep Learning with CNNs
Competitions
EntleBucher Appenzeller
Evolution of CNNs for image classification
Deep Learning with CNNs
AlexNet: a layered model composed of convolution, subsampling, and further operations
followed by a holistic representation and all-in-all a landmark classifier on
ILSVRC12. [ AlexNet ]
Convolutional Nets: 2012
AlexNet
Evolution of CNNs for image classification
Deep Learning with CNNs
Convolutional Nets: 2014
ILSVRC14 Winners: ~6.6% Top-5 error
- GoogLeNet: composition of multi-scale
dimension-reduced modules
+ depth
+ data
+ dimensionality reduction
Evolution of CNNs for image classification
Deep Learning with CNNs
Convolutional Nets: 2014
ILSVRC14 Winners: ~6.6% Top-5 error
- VGG: 16 layers of 3x3 convolution
interleaved with max pooling +
3 fully-connected layers
+ depth
+ data
+ dimensionality reduction
Evolution of CNNs for image classification
Deep Learning with CNNs
Convolutional Nets: 2015
ResNet
ILSVRC15 Winner: ~3.6% Top-5 error
Intuition: Easier to learn zero than identity function
Evolution of CNNs for image classification
Deep Learning with CNNs
Reasonable questions
• Is this just for a particular dataset? – No!
Deep Learning with CNNs
Slides from ICCV 2015 Math of Deep Learning tutorial
Reasonable questions
Deep Learning with CNNs
Object Localization[R-CNN, HyperColumns, Overfeat, etc.]
Pose estimation [Thomson et al, CVPR’15]
• Is this just for a particular task? – No!
Slides from ICCV 2015 Math of Deep Learning tutorial
Reasonable questions
Deep Learning with CNNs
Semantic Segmentation[Pinhero, Collobert, Dollar, ICCV’15]
• Is this just for a particular task? – No!
Slides from ICCV 2015 Math of Deep Learning tutorial
Reasonable questions
Deep Learning with CNNs
• Is this just for a particular task? – No!
Slides from ICCV 2015 Math of Deep Learning tutorial
Fine Tuning
Deep Learning with CNNs
Dogs vs. Cats
top 10 in 10 minutes
Take a pre-trained model and fine-tune to new tasks [DeCAF] [Zeiler-Fergus] [OverFeat]
© kaggle.com
Your Task
Style
RecognitionLots of Data
ImageNet
Pixelwise Prediction
Deep Learning with CNNs
Fully convolutional networks for pixel prediction
in particular semantic segmentation
- end-to-end learning
- efficient inference and learning
100 ms per-image prediction
- multi-modal, multi-task
Applications
- semantic segmentation
- denoising
- depth estimation
- optical flow
Jon Long* & Evan Shelhamer*,
Trevor Darrell. CVPR’15
Dealing with sequences
Deep Learning with CNNs
Recurrent Nets and Long Short Term Memories (LSTM)
are sequential models
- video
- language
- dynamics
learned by backpropagation through time
Recurrent Networks for Sequences
LRCN: Long-term Recurrent Convolutional Network
- activity recognition (sequence-in)
- image captioning (sequence-out)
- video captioning (sequence-to-sequence)
LRCN:
recurrent + convolutional
for visual sequences
Dealing with sequences
Deep Learning with CNNs
Visual Sequence Tasks
Jeff Donahue et al. CVPR’15
50
Based on Long short-term memory (LSTM) layers
What’s next?
Various questions/problems are still open:
• Learning with constraints / on manifolds
• Using high-level knowledge/structure
• Exploring the mathematics of the networks
– what types of functions they can represent?
– are these functions useful/interesting?
– convergence/efficiency
• Rotation invariance (group operators)
• CNNs can be easily ‘fooled’
• …
Deep Learning with CNNs
Resources
Frameworks:
• Caffe/Caffe 2 (UC Berkeley) | C/C++, Python, Matlab
• TensorFlow (Google) | C/C++, Python, Java, Go
• Theano (U Montreal) | Python
• CNTK (Microsoft) | Python, C++ , C#/.Net, Java
• Torch/PyTorch (Facebook) | Lua/Python
• MxNet (DMLC) | Python, C++, R, Perl, …
• Darknet (Redmon J.) | C
• …
Deep Learning with CNNs
Resources
High-level libraries:
• Keras | Backends: TensorFlow (TF), Theano
Models:
• Depends on the framework, e.g.
– https://github.com/BVLC/caffe/wiki/Model-Zoo (Caffe)
– https://github.com/tensorflow/models/tree/master/research (TF)
Interactive Interfaces:
• DIGITS (NVIDIA) | Caffe, TF, Torch
• TensorBoard (TF)
Tools:
• http://ethereon.github.io/netscope (for networks defined in protobuf )
Deep Learning with CNNs