Deep Networks and Kernel Methods -...

Deep Networksand

Kernel MethodsEdgar Marca

edgarms@lncc.br

Grupo de Reconocimiento de Patrones e Inteligencia Artificial Aplicada — PUCLima, Perú

June 18th, 2015

1 / 35

Table of Contents IImage Classification Problem

Deep NetworksConvolutional Neural NetworksSoftwareHow to start

Kernel MethodsSVMThe Kernel TrickHistory of Kernel MethodsSoftwareHow to start

Kernels and Deep LearningConvolutional Kernel NetworksDeep Fried Convnets

2 / 35

Image Classification Problem

Figure: http://www.image-net.org/

3 / 35

Deep Networks

Human-level control through deep reinforcement learning

▶ Volodymyr Mnih et al., Human-level control through deepreinforcement learning.

5 / 35

Deep Networks Convolutional Neural Networks

Convolutional Neural Networks

6 / 35

Deep Networks Software

Software

▶ Torch7 — http://torch.ch/.▶ Caffe — http://caffe.berkeleyvision.org/.▶ Minerva — https://github.com/dmlc/minerva.▶ Theano — http://deeplearning.net/software/theano/.

7 / 35

Deep Networks How to start

How to start I

▶ Deep Learning Course by Nando de Freitas —https://www.youtube.com/watch?v=PlhFWT7vAEw&list=

PLjK8ddCbDMphIMSXn-w1IjyYpHU3DaUYw.▶ Alex Smola Lecture on Deep Networks —

https://www.youtube.com/watch?v=xZzZb7wZ6eE.▶ Convolutional Neural Networks for Visual Recognition —

http://vision.stanford.edu/teaching/cs231n/.▶ Deep Learning, Spring 2015 —

http://cilvr.cs.nyu.edu/doku.php?id=courses:

deeplearning2015:start.

8 / 35

Deep Networks How to start

How to start II

▶ Deep Learning for Natural Language Processing —http://cs224d.stanford.edu/.

▶ Applied Deep Learning for Computer Vision with Torch –http://torch.ch/docs/cvpr15.html.

▶ DEEP LEARNING, An MIT Press book in preparation —http://www.iro.umontreal.ca/~bengioy/dlbook/.

▶ Reading List — http://deeplearning.net/reading-list/.

9 / 35

Kernel Methods

Kernel Methods SVM

Linear Support Vector Machine

〈w,x〉 + b = 1

〈w,x〉 + b = −1

〈w,x〉 + b = 0

margen

Figure: Linear Support Vector Machine

11 / 35

Kernel Methods SVM

MSV Lineal - Primal ProblemGiven a linear separable training set

D = {(x1, y1), (x2, y2), ..., (xl, yl)} ⊂ Rn × {+1,−1},

we can calculate the max margin decision surface ⟨w∗,x⟩ = b∗ solving the convexprogram

minw,b

ϕ(w, b) = 12⟨w,w⟩

subject to ⟨w, yixi⟩ ≥ 1 + yib,

where (xi, yi) ∈ D ⊂ Rn × {−1,+1}.

1. The objective function doesn’t depends on b.2. The displacement b appears in the restrictions.3. The number of restrictions is equal to the number of training points.

12 / 35

Kernel Methods SVM

MSVL - Dual Problem

h(α) = maxα

i=1αi − 1

l∑i=1

l∑j=1

αiαjyiyj⟨xi,xj⟩)

sujeto al∑

i=1αiyi = 0,

αi ≥ 0

for i = 1, . . . , l.

The calculus of b es in terms of w∗, as following:

b+ = min {⟨w∗,x⟩ | (x, y) ∈ Dwhere y = +1)}

b− = max {⟨w∗,x⟩ | (x, y) ∈ Dwhere y = −1)}

Then b∗ = b++b−

The training vectors associated to αi > 0 are named support vectors.

13 / 35

Kernel Methods SVM

f̂(x) = sign

α∗i yi⟨xi,x⟩ − b∗

14 / 35

The Kernel Trick

Kernel Methods The Kernel Trick

Motivation

▶ How we can split data that is not linear separable?▶ How we can utilize algorithms that works for linear separable data

that only depends on the inner product?

16 / 35

R to R2 CaseHow to separate two classes?

ϕ(x) = (x, x2)

Figure: Separating the two classes of points by transforming the points into ahigher dimensional space where the data is separable.

17 / 35

R2 to R3 Case

Figure: Data which is not linear separable.18 / 35

R2 to R3 CaseA simulation

Figure: SVM with polynomial kernel visualization.

19 / 35

ϕ(+)ϕ(+)

ϕ(−)

ϕ(+)ϕ(+)

Figure: φ is a non-linear mapping from the input space to the feature space.

20 / 35

Non Linear Support Vector Machine

Now we can use a non-linear function φ to map the information fromthe initial space to a higher dimensional space.

Non Linear Support Vector Machine

f̂(x) = sign

α∗i yi⟨φ(xi), φ(x)⟩ − b∗

21 / 35

Definition 3.1 (Kernel)Let X a non-empty set. A function k : X ×X → K is called kernel inX if and only if there is Hilbert Space H and a mapping Φ : X → Hsuch that for all s, t it holds

k(t, s) := ⟨Φ(t),Φ(s)⟩H (2)

The function Φ is called feature mapping and H feature space of k.

22 / 35

Example 3.2Consider X = R and the function k defined by

k(s, t) = st =

⟨[s√2

[t√2t√2

where the feature mappings are Φ(s) = s and Φ̃(s) =[

]and the

features spaces are H = R and H̃ = R2 respectively.

23 / 35

Non Linear Support Vector MachinesUsing the kernel trick we can replace ⟨φ(xi), φ(x)⟩ by a kernel k(xi,x).

f̂(x) = sign

α∗i yik(xi,x)− b∗

24 / 35

History of Kernel Methods

Kernel Methods History of Kernel Methods

Timeline

Table: Timeline of Support Vector Machines Algorithm Development

1909 • Mercer Theorem — James Mercer."Functions of Positive and Negative Type, and their Connection with the

Theory of Integral Equations".

1950 • "Moore-Aronzajn Theorem" — Nachman Aronszajn."Reproducing Kernel Hilbert Spaces".

1964 • Introduced the geometrical interpretation of the kernels asinner products in a feature space — Aizerman, Bravermanand Rozonoer."Theoretical Foundations of the Potential Function Method in Pattern

Recognition Learning".

1964 • Original SVM algorithm — Vladimir Vapnik and AlexeyChervonenkis."A Note on One Class of Perceptrons"

26 / 35

Kernel Methods History of Kernel Methods

Timeline

Table: Timeline of Support Vector Machines Algorithm Development

1965 • Cover’s Theorem — Thomas Cover."Geometrical and Statistical Properties of Systems of Linear Inequalities

with Applications in Pattern Recognition".

1992 • Support Vector Machines — Bernhard Boser, IsabelleGuyon and Vladimir Vapnik."A Training Algorithm for Optimal Margin Classifiers".

1995 • Soft Support Vector Machines — Corinna Cortes andVladimir Vapnik."Support Vector Networks".

27 / 35

Kernel Methods Software

Software

▶ LibSVM — https://www.csie.ntu.edu.tw/~cjlin/libsvm/.▶ SVMLight — http://svmlight.joachims.org/.▶ Scikit Learn —

http://scikit-learn.org/stable/modules/svm.html.

28 / 35

Kernel Methods How to start

How to start

▶ Introduction to Support Vector Machines —https://beta.oreilly.com/learning/intro-to-svm

▶ Lutz H. Hamel, Knowledge Discovery with Support VectorMachines.

▶ John Shawe-Taylor and Nello Cristianini, Kernel Methods forPattern Analysis.

29 / 35

Kernel and Deep Learning

Kernels and Deep Learning

▶ Julien Mairal et al., Convolutional Kernel Networks.▶ Zichao Yang et al., Deep Fried Convnets.

31 / 35

Kernels and Deep Learning Convolutional Kernel Networks

Convolutional Kernel Networks

32 / 35

Kernels and Deep Learning Deep Fried Convnets

Deep Fried Convnets

▶ Quoc Viet Le et al., Fastfood: Approximate Kernel Expansions inLoglinear Time.

33 / 35

Questions?

Thanks

Deep Networks and Kernel Methods -...

Documents

Deep CNN-LSTM with Combined Kernels from Multiple …performed sentiment classification, all on social media datasets using neural networks. In [2] and [23] authors compared multiple

Artificial Neural Networks and Deep Learning · 101 9x9 convolution (64 kernels) 9x9 convolution (4096 kernels) 10x10 pooling, 5x5 subsampling 6x6 pooling 4x4 subsamp Figure 2:Diagram

Debugging deep neural networks

Deep Expander Networks: Efficient Deep Networks …openaccess.thecvf.com/content_ECCV_2018/papers/Ameya...Deep Expander Networks: Eﬃcient Deep Networks from Graph Theory Ameya Prabhu⋆

Lecture 10: Neural Networks and Deep Learningsaravanan-thirumuruganathan.github.io/cse5334... · Neural Networks Deep Learning Convolutional Neural Networks Recurrent Neural Networks

DATAMINING2 (Deep) Neural Networks

Deep Feedforwards Networks

Slidessindhwani - Kernels Random Embeddings and Deep Learning

Deep Neural Networks - dmi.unibas.ch

An introduction to: Deep Learning aka or related to Deep Neural Networks Deep Structural Learning Deep Belief Networks etc,

Ebooksclub.org Machine Learning Methods in the Environmental Sciences Neural Networks and Kernels

Learning Scalable Deep Kernels with Recurrent Structure › papers › volume18 › 16-498 › 16-498.pdfLearning Scalable Deep Kernels with Recurrent Structure 2. Background Weconsidertheproblemoflearningaregressionfunctionthatmapssequencestoreal-valued

Optimizing Deep Neural Networks

Deep Neural Networks for Object Detectionpapers.nips.cc/.../5207-deep-neural-networks-for-object-detection.pdf · Deep Neural Networks for Object Detection ... recognition becomes

Deep Learning Tutorial: From Perceptrons to Deep Networks

Special Topic - University of Georgiacobweb.cs.uga.edu/~khaled/MLcourse/Deep_Learning_Lecture.pdf · Deep Neural Networks Deep (but not that deep) Neural Network 6. Deep Neural Networks

Objectives: Feedforward Networks Multilayer Networks Backpropagation Posteriors Kernels

Deep Neural Networks - uchile.cl

Compositional Convolutional Neural Networks: A Deep ...ayuille/JHUcourses/...models and deep networks into compositional convo-lutional neural networks, a uniﬁed deep model with

Deep Neural Networks Convolutional Networks II - Deep Learningdeeplearning.cs.cmu.edu/document/slides/lec9.CNN.pdf · Deep Neural Networks Convolutional Networks II Bhiksha Raj Spring