Deep Networks and Kernel Methods -...

Preview:

Citation preview

Deep Networksand

Kernel MethodsEdgar Marca

edgarms@lncc.br

Grupo de Reconocimiento de Patrones e Inteligencia Artificial Aplicada — PUCLima, Perú

June 18th, 2015

1 / 35

Table of Contents IImage Classification Problem

Deep NetworksConvolutional Neural NetworksSoftwareHow to start

Kernel MethodsSVMThe Kernel TrickHistory of Kernel MethodsSoftwareHow to start

Kernels and Deep LearningConvolutional Kernel NetworksDeep Fried Convnets

2 / 35

Image Classification Problem

Image Classification Problem

Figure: http://www.image-net.org/

3 / 35

Deep Networks

Deep Networks

Human-level control through deep reinforcement learning

▶ Volodymyr Mnih et al., Human-level control through deepreinforcement learning.

5 / 35

Deep Networks Convolutional Neural Networks

Convolutional Neural Networks

6 / 35

Deep Networks Software

Software

▶ Torch7 — http://torch.ch/.▶ Caffe — http://caffe.berkeleyvision.org/.▶ Minerva — https://github.com/dmlc/minerva.▶ Theano — http://deeplearning.net/software/theano/.

7 / 35

Deep Networks How to start

How to start I

▶ Deep Learning Course by Nando de Freitas —https://www.youtube.com/watch?v=PlhFWT7vAEw&list=

PLjK8ddCbDMphIMSXn-w1IjyYpHU3DaUYw.▶ Alex Smola Lecture on Deep Networks —

https://www.youtube.com/watch?v=xZzZb7wZ6eE.▶ Convolutional Neural Networks for Visual Recognition —

http://vision.stanford.edu/teaching/cs231n/.▶ Deep Learning, Spring 2015 —

http://cilvr.cs.nyu.edu/doku.php?id=courses:

deeplearning2015:start.

8 / 35

Deep Networks How to start

How to start II

▶ Deep Learning for Natural Language Processing —http://cs224d.stanford.edu/.

▶ Applied Deep Learning for Computer Vision with Torch –http://torch.ch/docs/cvpr15.html.

▶ DEEP LEARNING, An MIT Press book in preparation —http://www.iro.umontreal.ca/~bengioy/dlbook/.

▶ Reading List — http://deeplearning.net/reading-list/.

9 / 35

Kernel Methods

Kernel Methods SVM

Linear Support Vector Machine

〈w,x〉 + b = 1

〈w,x〉 + b = −1

〈w,x〉 + b = 0

margen

Figure: Linear Support Vector Machine

11 / 35

Kernel Methods SVM

Linear Support Vector Machine

MSV Lineal - Primal ProblemGiven a linear separable training set

D = {(x1, y1), (x2, y2), ..., (xl, yl)} ⊂ Rn × {+1,−1},

we can calculate the max margin decision surface ⟨w∗,x⟩ = b∗ solving the convexprogram

(P )

minw,b

ϕ(w, b) = 12⟨w,w⟩

subject to ⟨w, yixi⟩ ≥ 1 + yib,

where (xi, yi) ∈ D ⊂ Rn × {−1,+1}.

(1)

1. The objective function doesn’t depends on b.2. The displacement b appears in the restrictions.3. The number of restrictions is equal to the number of training points.

12 / 35

Kernel Methods SVM

Linear Support Vector Machine

MSVL - Dual Problem

(DP )

maxα

h(α) = maxα

(l∑

i=1αi − 1

2

l∑i=1

l∑j=1

αiαjyiyj⟨xi,xj⟩)

sujeto al∑

i=1αiyi = 0,

αi ≥ 0

for i = 1, . . . , l.

The calculus of b es in terms of w∗, as following:

b+ = min {⟨w∗,x⟩ | (x, y) ∈ Dwhere y = +1)}

b− = max {⟨w∗,x⟩ | (x, y) ∈ Dwhere y = −1)}

Then b∗ = b++b−

2

The training vectors associated to αi > 0 are named support vectors.

13 / 35

Kernel Methods SVM

Linear Support Vector Machine

Linear Support Vector Machine

f̂(x) = sign

(l∑

i=1

α∗i yi⟨xi,x⟩ − b∗

)

14 / 35

The Kernel Trick

Kernel Methods The Kernel Trick

Motivation

▶ How we can split data that is not linear separable?▶ How we can utilize algorithms that works for linear separable data

that only depends on the inner product?

16 / 35

Kernel Methods The Kernel Trick

R to R2 CaseHow to separate two classes?

0

R

R2

ϕ(x) = (x, x2)

ϕ

Figure: Separating the two classes of points by transforming the points into ahigher dimensional space where the data is separable.

17 / 35

Kernel Methods The Kernel Trick

R2 to R3 Case

+

+

+

+

+

+

+

+

++

+

+

+

+

+

-

-

-

-

-

-

-

-

-

-

- -

-

-

-

-

-

-

-

-

-

--

-

-

-

-

-

-

-

-

-

-

--

-

-

-

Figure: Data which is not linear separable.18 / 35

Kernel Methods The Kernel Trick

R2 to R3 CaseA simulation

Figure: SVM with polynomial kernel visualization.

19 / 35

Kernel Methods The Kernel Trick

Idea

ϕ

ϕ(+)ϕ(+)

ϕ(+)

ϕ(−)

ϕ(−)

ϕ(−)

ϕ(−)

ϕ(−)

ϕ(+)ϕ(+)

Figure: φ is a non-linear mapping from the input space to the feature space.

20 / 35

Kernel Methods The Kernel Trick

Non Linear Support Vector Machine

Now we can use a non-linear function φ to map the information fromthe initial space to a higher dimensional space.

Non Linear Support Vector Machine

f̂(x) = sign

(l∑

i=1

α∗i yi⟨φ(xi), φ(x)⟩ − b∗

)

21 / 35

Kernel Methods The Kernel Trick

Definition 3.1 (Kernel)Let X a non-empty set. A function k : X ×X → K is called kernel inX if and only if there is Hilbert Space H and a mapping Φ : X → Hsuch that for all s, t it holds

k(t, s) := ⟨Φ(t),Φ(s)⟩H (2)

The function Φ is called feature mapping and H feature space of k.

22 / 35

Kernel Methods The Kernel Trick

Example 3.2Consider X = R and the function k defined by

k(s, t) = st =

⟨[s√2

s√2

],

[t√2t√2

]⟩

where the feature mappings are Φ(s) = s and Φ̃(s) =[

s√2

s√2

]and the

features spaces are H = R and H̃ = R2 respectively.

23 / 35

Kernel Methods The Kernel Trick

Non Linear Support Vector MachinesUsing the kernel trick we can replace ⟨φ(xi), φ(x)⟩ by a kernel k(xi,x).

f̂(x) = sign

(l∑

i=1

α∗i yik(xi,x)− b∗

)

24 / 35

History of Kernel Methods

Kernel Methods History of Kernel Methods

Timeline

Table: Timeline of Support Vector Machines Algorithm Development

1909 • Mercer Theorem — James Mercer."Functions of Positive and Negative Type, and their Connection with the

Theory of Integral Equations".

1950 • "Moore-Aronzajn Theorem" — Nachman Aronszajn."Reproducing Kernel Hilbert Spaces".

1964 • Introduced the geometrical interpretation of the kernels asinner products in a feature space — Aizerman, Bravermanand Rozonoer."Theoretical Foundations of the Potential Function Method in Pattern

Recognition Learning".

1964 • Original SVM algorithm — Vladimir Vapnik and AlexeyChervonenkis."A Note on One Class of Perceptrons"

26 / 35

Kernel Methods History of Kernel Methods

Timeline

Table: Timeline of Support Vector Machines Algorithm Development

1965 • Cover’s Theorem — Thomas Cover."Geometrical and Statistical Properties of Systems of Linear Inequalities

with Applications in Pattern Recognition".

1992 • Support Vector Machines — Bernhard Boser, IsabelleGuyon and Vladimir Vapnik."A Training Algorithm for Optimal Margin Classifiers".

1995 • Soft Support Vector Machines — Corinna Cortes andVladimir Vapnik."Support Vector Networks".

27 / 35

Kernel Methods Software

Software

▶ LibSVM — https://www.csie.ntu.edu.tw/~cjlin/libsvm/.▶ SVMLight — http://svmlight.joachims.org/.▶ Scikit Learn —

http://scikit-learn.org/stable/modules/svm.html.

28 / 35

Kernel Methods How to start

How to start

▶ Introduction to Support Vector Machines —https://beta.oreilly.com/learning/intro-to-svm

▶ Lutz H. Hamel, Knowledge Discovery with Support VectorMachines.

▶ John Shawe-Taylor and Nello Cristianini, Kernel Methods forPattern Analysis.

29 / 35

Kernel and Deep Learning

Kernels and Deep Learning

Kernels and Deep Learning

▶ Julien Mairal et al., Convolutional Kernel Networks.▶ Zichao Yang et al., Deep Fried Convnets.

31 / 35

Kernels and Deep Learning Convolutional Kernel Networks

Convolutional Kernel Networks

32 / 35

Kernels and Deep Learning Deep Fried Convnets

Deep Fried Convnets

▶ Quoc Viet Le et al., Fastfood: Approximate Kernel Expansions inLoglinear Time.

33 / 35

Questions?

Thanks

Recommended