28
CHAPTER 06 SUPPORT VECTOR MACHINES CSC445: Neural Networks Prof. Dr. Mostafa Gadal-Haqq M. Mostafa Computer Science Department Faculty of Computer & Information Sciences AIN SHAMS UNIVERSITY (some of the figures in this presentation are copyrighted to Pearson Education, Inc.)

Neural Networks: Support Vector machines

Embed Size (px)

Citation preview

Page 1: Neural Networks: Support Vector machines

CHAPTER 06

SUPPORT VECTOR MACHINES

CSC445: Neural Networks

Prof. Dr. Mostafa Gadal-Haqq M. Mostafa

Computer Science Department

Faculty of Computer & Information Sciences

AIN SHAMS UNIVERSITY

(some of the figures in this presentation are copyrighted to Pearson Education, Inc.)

Page 2: Neural Networks: Support Vector machines

ASU-CSC445: Neural Networks

Prof. Dr. Mostafa Gadal-Haqq

Introduction

Optimal Hyperplane for Linearly Separable Pattern

Quadratic Optimization for Finding the Optimal Hyperplan

Optimal Hyperplane for Nonseparable Patterns

Underlying Philosophy of SVM for Pattern Calssification

SVM viewed as Kernel Machine

The XOR problem

Computer Experiment

2

Outlines

Page 3: Neural Networks: Support Vector machines

ASU-CSC445: Neural Networks

Prof. Dr. Mostafa Gadal-Haqq 3

Introduction

The main idea of the SVMs may be summed up as follows:

“Given a training samples, the SVM constructs a

hyperplane as decision surface in such a way the

margin of separation between positive and negative

examples is maximized.”

Page 4: Neural Networks: Support Vector machines

ASU-CSC445: Neural Networks

Prof. Dr. Mostafa Gadal-Haqq 4

Linearly Separable Patterns

SVM is a binary learning machine.

Binary classification is the task of separating classes in feature space.

wTx + b = 0

wTx + b < 0

wTx + b > 0

bxwxg T

)(

Page 5: Neural Networks: Support Vector machines

ASU-CSC445: Neural Networks

Prof. Dr. Mostafa Gadal-Haqq 5

Linearly Separable Patterns

Which of the linear separators is optimal?

Page 6: Neural Networks: Support Vector machines

ASU-CSC445: Neural Networks

Prof. Dr. Mostafa Gadal-Haqq

Optimal Decision Boundary

The optimal decision boundary is the one that maximize the margin

6

r

ρ

Page 7: Neural Networks: Support Vector machines

ASU-CSC445: Neural Networks

Prof. Dr. Mostafa Gadal-Haqq

The Margin

7

|||| w

wrxx P

Page 8: Neural Networks: Support Vector machines

ASU-CSC445: Neural Networks

Prof. Dr. Mostafa Gadal-Haqq

The Margin

||||)( then

,0 since

||||)()(

|||| , )(

wrxg

bxw

ww

wrbxwxg

w

wrxxbxwxg

P

T

T

P

T

P

T

8

Page 9: Neural Networks: Support Vector machines

ASU-CSC445: Neural Networks

Prof. Dr. Mostafa Gadal-Haqq

The Margin

1||||

1

1||||

1

||||

)(

11)(

difw

difw

w

xgr

dforbxwxg T

9

r

ρ

1 bxwT

1 bxwT

0 bxwT

||||

22

wr

Then the margin is given as:

Page 10: Neural Networks: Support Vector machines

ASU-CSC445: Neural Networks

Prof. Dr. Mostafa Gadal-Haqq

Optimal Decision Boundary

Let {x1, ..., xn} be our data set and let di {1,-1} be the class label of xi

The decision boundary should classify all points correctly.

That is, we have a constrained optimization problem

Maximize = 𝟐𝒓 =𝟐

𝒘, or Minimize 𝒘

Subject to 𝒅𝒊(𝒘𝑻𝒙 ± 𝒃) ≥ 𝟏

10

Page 11: Neural Networks: Support Vector machines

ASU-CSC445: Neural Networks

Prof. Dr. Mostafa Gadal-Haqq

The Optimization Problem

Introduce Lagrange multipliers ,

That is, the Lagrange function:

Is to be minimized with respect to w and b, i.e,

𝜕𝑱(𝒘,𝒃,)𝜕𝒘

= 𝟎 ; and 𝜕𝑱(𝒘,𝒃, )

𝜕𝒃= 𝟎

)1][(||||2

1),,(

1

2

bxwdwbwJ i

T

i

N

i

i

11

Page 12: Neural Networks: Support Vector machines

ASU-CSC445: Neural Networks

Prof. Dr. Mostafa Gadal-Haqq

Solving the Optimization Problem

Need to optimize a quadratic function subject to linear constraints.

The solution involves constructing a dual problem where a Lagrange multiplier αi is associated with every constraint in the primary problem:

Find 𝛼1…𝛼𝑁such that

𝑸 𝜶 = 𝛼𝑖 −1

2 𝛼𝑖𝛼𝑗𝑑𝑖𝑑𝑗x𝑖x𝑗𝑗𝑖

𝑵𝒊=𝟏

is maximized and

(1) 𝛼𝑖𝑑𝑖𝑗

(2) 𝛼1 ≥ 0 ∀ 𝑖

12

Page 13: Neural Networks: Support Vector machines

ASU-CSC445: Neural Networks

Prof. Dr. Mostafa Gadal-Haqq

The Optimization Problem

The solution has the form:

and such that 𝒊 ≠ 𝟎

Each non-zero αi indicates that corresponding xi is a support vector.

Then the classifying function will have the form:

Notice that it relies on an inner product between the test point x and the

support vectors xi

Also keep in mind that solving the optimization problem involved computing

the inner products xiTxj between all training points!

13

ii

N

i

i xd

1

w iii

N

i

idb xx11

bdxg iii

N

i

i

xx)(1

Page 14: Neural Networks: Support Vector machines

ASU-CSC445: Neural Networks

Prof. Dr. Mostafa Gadal-Haqq

6=1.4

The Optimization Problem

Support vectors are samples that have non-zero

Class 1

Class 2

1=0.8

2=0

3=0

4=0

5=0

7=0

8=0.6

9=0

10=0

Page 15: Neural Networks: Support Vector machines

ASU-CSC445: Neural Networks

Prof. Dr. Mostafa Gadal-Haqq

Optimal Hyperplane for Nonseparable Patterns

Figure 6.3 Soft margin hyperplane (a) Data point xi (belonging to class C1,

represented by a small square) falls inside the region of separation, but on the correct side of the decision surface. (b) Data point xi (belonging to class C2,

represented by a small circle) falls on the wrong side of the decision surface.

15

Page 16: Neural Networks: Support Vector machines

ASU-CSC445: Neural Networks

Prof. Dr. Mostafa Gadal-Haqq

Optimal Hyperplane for Nonseparable Patterns

We allow “error” xi in classification

16

ξi

ξi

Page 17: Neural Networks: Support Vector machines

ASU-CSC445: Neural Networks

Prof. Dr. Mostafa Gadal-Haqq

Soft Margin Hyperplane

The old formulation:

The new formulation incorporating relaxed variables:

Parameter C can be viewed as a way to control overfitting.

17

Find w and b such that

∅ 𝑾 = 𝟏

𝟐𝑾𝑻𝑾 is minimized and for all {(xi ,yi)}

Subject to: 𝒅𝒊(𝒘𝑻𝒙 ± 𝒃) ≥ 𝟏

Find w and b such that

∅ 𝐖 = 𝟏

𝟐𝐖𝐓𝐖+ 𝐜 𝝃𝒊𝒊 is minimized for all {(xi ,yi)}

Subject to: 𝒅𝒊(𝒘𝑻𝒙 ± 𝒃) ≥ 𝟏 , and ξi ≥ 0 for all i

Page 18: Neural Networks: Support Vector machines

ASU-CSC445: Neural Networks

Prof. Dr. Mostafa Gadal-Haqq

Soft Margin Hyperplane

Again, xi with non-zero αi will be support vectors.

Solution to the dual problem is:

𝑾 = 𝜶𝒊𝒅𝒊𝒙𝒊𝒊

and

𝒃 = 𝒅𝒊 𝟏 − 𝝃𝒊 −𝑾𝑻𝒙𝒊

18

Page 19: Neural Networks: Support Vector machines

ASU-CSC445: Neural Networks

Prof. Dr. Mostafa Gadal-Haqq

Extension to Non-linear Decision Boundary

Key idea: transform xi to a higher dimensional space

Input space: the space of xi

Feature space: the “kernel” space of f(xi)

19

f( )

f( )

f( ) f( ) f( )

f( )

f( ) f( )

f(.) f( )

f( )

f( )

f( ) f( )

f( )

f( )

f( ) f( )

f( )

Feature space Input space

Page 20: Neural Networks: Support Vector machines

ASU-CSC445: Neural Networks

Prof. Dr. Mostafa Gadal-Haqq

Kernel Trick

The linear classifier relies on inner product between vectors:

𝑲 𝐱𝒊, 𝐱𝒋 = 𝐱𝒊𝑻𝐱𝒋

If every datapoint is mapped into high-dimensional space via some transformation Φ: x → φ(x), the inner product becomes:

𝑲 𝐱𝒊, 𝐱𝒋 = 𝛟 𝐱𝐢𝑻𝛟(𝐱𝒋)

A kernel function is some function that corresponds to an inner product into some feature space.

K (x, xj) needs to satisfy a technical condition (Mercer condition) in order for f(.) to exist

20

Page 21: Neural Networks: Support Vector machines

ASU-CSC445: Neural Networks

Prof. Dr. Mostafa Gadal-Haqq

Mercer’s Theorem

𝑲 = 𝒌(𝒙𝒊, 𝒙𝒋) ∀𝒊, 𝒋 has to be non-negative definite or

positive semidefinite , that is, it satisfies:

𝒂𝑻K𝒂 ≥ 𝟎

Some of kernel functions that satisfy Mercer’s condition:

21

Page 22: Neural Networks: Support Vector machines

ASU-CSC445: Neural Networks

Prof. Dr. Mostafa Gadal-Haqq

The SVM viewed as Kernel Machine

Figure 6.5 Architecture of support vector machine, using a

radial-basis function network.

22

Page 23: Neural Networks: Support Vector machines

ASU-CSC445: Neural Networks

Prof. Dr. Mostafa Gadal-Haqq

The XOR Problem

For the two dimensional vectors x=[x1 x2];

Define the following Kernel:

𝒌 x,x𝒊 = 𝟏 + x𝑻x𝒊2

Need to show that

K(xi,xj)= φ(xi) Tφ(xj)

K(xi,xj)=(1 + xiTxj)

2

= 1+ xi12xj1

2 + 2 xi1xj1 xi2xj2+ xi2

2xj22 + 2xi1xj1 + 2xi2xj2=

= [1 xi12 √2 xi1xi2 xi2

2 √2xi1 √2xi2]T [1 xj12 √2 xj1xj2 xj2

2 √2xj1 √2xj2]

= φ(xi) Tφ(xj),

where

φ(x) = [1 x12 √2 x1x2 x2

2 √2x1 √2x2]

23

Page 24: Neural Networks: Support Vector machines

ASU-CSC445: Neural Networks

Prof. Dr. Mostafa Gadal-Haqq

The XOR Problem

Which give the optimal hyperplane as:

−𝒙𝟏𝒙𝟐 = 𝟎

This yields

Figure 6.6 (a) Polynomial machine for solving the XOR problem. (b) Induced

images in the feature space due to the four data points of the XOR problem.

24

(1, -1)

(-1,1)

(-1, -1) (1,1)

-1.0

Page 25: Neural Networks: Support Vector machines

ASU-CSC445: Neural Networks

Prof. Dr. Mostafa Gadal-Haqq

Conclusion

SVM is a useful alternative to neural networks

Two key concepts of SVM: maximize the margin

and the kernel trick

Many active research is taking place on areas

related to SVM

Many SVM implementations are available on the

web for you to try on your data set!

25

Page 26: Neural Networks: Support Vector machines

ASU-CSC445: Neural Networks

Prof. Dr. Mostafa Gadal-Haqq

Computer Experiment

Figure 6.7 Experiment on SVM for the double-moon of Fig. 1.8 with

distance d = –6.

26

Page 27: Neural Networks: Support Vector machines

ASU-CSC445: Neural Networks

Prof. Dr. Mostafa Gadal-Haqq

Computer Experiment

Figure 6.8 Experiment on SVM for the double-moon of Fig. 1.8 with

distance d = –6.5.

27

Page 28: Neural Networks: Support Vector machines

Principal Component Analysis (PCA)

Next Time

28