47
Support Vector Machines Piyush Kumar

Support Vector Machines Piyush Kumar. Perceptrons revisited Class 1 : (+1) Class 2 : (-1) Is this unique?

Embed Size (px)

Citation preview

Page 1: Support Vector Machines Piyush Kumar. Perceptrons revisited Class 1 : (+1) Class 2 : (-1) Is this unique?

Support Vector Machines

Piyush Kumar

Page 2: Support Vector Machines Piyush Kumar. Perceptrons revisited Class 1 : (+1) Class 2 : (-1) Is this unique?

Perceptrons revisited

Class 1 : (+1)

Class 2 : (-1)

Is this unique?

Page 3: Support Vector Machines Piyush Kumar. Perceptrons revisited Class 1 : (+1) Class 2 : (-1) Is this unique?

Which one is the best?

• Perceptron outputs :

Page 4: Support Vector Machines Piyush Kumar. Perceptrons revisited Class 1 : (+1) Class 2 : (-1) Is this unique?

Perceptrons: What went wrong?

• Slow convergence • Can overfit • Cant do complicated functions

easily• Theoretical guarantees are not as

strong

Page 5: Support Vector Machines Piyush Kumar. Perceptrons revisited Class 1 : (+1) Class 2 : (-1) Is this unique?

Perceptron: The first NN

Proposed by Frank Rosenblatt in 1956Neural net researchers accuse

Rosenblatt of promising ‘too much’ Numerous variants Also helps to study LP One of the simplest Neural Network.

Page 6: Support Vector Machines Piyush Kumar. Perceptrons revisited Class 1 : (+1) Class 2 : (-1) Is this unique?

The new concept CarFrom Perceptrons to SVMs

• Margins• Linearization • Kernels

Page 7: Support Vector Machines Piyush Kumar. Perceptrons revisited Class 1 : (+1) Class 2 : (-1) Is this unique?

Support Vector Machines

Margin

Page 8: Support Vector Machines Piyush Kumar. Perceptrons revisited Class 1 : (+1) Class 2 : (-1) Is this unique?

Classification Margin• Distance from example to the separator is • Examples closest to the hyperplane are support vectors. • Margin ρ of the separator is the width of separation

between classes.

w

xwT

r

r

ρ

Page 9: Support Vector Machines Piyush Kumar. Perceptrons revisited Class 1 : (+1) Class 2 : (-1) Is this unique?

Support Vector Machines

• Maximizing the margin is good according to intuition and PAC theory.

• Implies that only support vectors are important; other training examples are ignorable.

• Leads to Simple classifiers and hence better? (Simple = large margin)

Page 10: Support Vector Machines Piyush Kumar. Perceptrons revisited Class 1 : (+1) Class 2 : (-1) Is this unique?

Let’s start some math…

)),(),...,,(),,{( 2211 mm yxyxyx

0. xw

0. wx j

N samples :

Can we find a hyperplane that separates the two classes?(labeled by y) i.e.

: For all j such that y = +1

0. wx j

: For all j such that y = -1

Where y = +/- 1 are labels for the data. nx R

Page 11: Support Vector Machines Piyush Kumar. Perceptrons revisited Class 1 : (+1) Class 2 : (-1) Is this unique?

Further assumption 1Which we will relax later!

Lets assume that the hyperplane that we are looking forpasses thru the origin

Page 12: Support Vector Machines Piyush Kumar. Perceptrons revisited Class 1 : (+1) Class 2 : (-1) Is this unique?

Further assumption 2

• Lets assume that we are looking for a halfspace that contains a set of points

Relax now!!

Page 13: Support Vector Machines Piyush Kumar. Perceptrons revisited Class 1 : (+1) Class 2 : (-1) Is this unique?

Lets Relax FA 1 now

• “Homogenize” the coordinates by adding a new coordinate to the input.

• Think of it as moving the whole red and blue points in one higher dimension

• From 2D to 3D it is just the x-y plane shifted to z = 1. This takes care of the “bias” or our assumption that the halfspace can pass thru the origin.

Page 14: Support Vector Machines Piyush Kumar. Perceptrons revisited Class 1 : (+1) Class 2 : (-1) Is this unique?

Further Assumption 3

• Assume all points on a unit sphere!• If they are not after applying

transformations for FA 1 and FA 2 , make them so.

Relax now!

Page 15: Support Vector Machines Piyush Kumar. Perceptrons revisited Class 1 : (+1) Class 2 : (-1) Is this unique?

What did we want?

• Maximize the margin.

• What does it mean in the new space?

Page 16: Support Vector Machines Piyush Kumar. Perceptrons revisited Class 1 : (+1) Class 2 : (-1) Is this unique?

What’s the new optimization problem?

• Max |ρ|• subject to

– xi.w >= ρ • (Note that we have gotten rid of the y’s

by mirroring around the origin).• Here w is a unit vector. ||w|| = 1.

Page 17: Support Vector Machines Piyush Kumar. Perceptrons revisited Class 1 : (+1) Class 2 : (-1) Is this unique?

Same Problem

• Min 1/ρ

• subject to xi.((1/ρ)w) >= 1

• Let v = (1/ρ) w

• Then the constraint becomes xi.v >= 1.

• Objective = Min 1/ρ = Min || (1/ρ) w || = Min ||v|| is the same as Min ||v||2

Page 18: Support Vector Machines Piyush Kumar. Perceptrons revisited Class 1 : (+1) Class 2 : (-1) Is this unique?

New formulation

Min ||v||2 Subject to : v.xi >= 1

Using matlab, this is a piece of cake to solve.Decision boundary sign(w.xi)

Only for support vectors v.xi = 1.

Page 19: Support Vector Machines Piyush Kumar. Perceptrons revisited Class 1 : (+1) Class 2 : (-1) Is this unique?

Support Vector Machines

• Linear Learning Machines like perceptrons.

• Map non-linearly to higher dimension to overcome the linearity constraint.

• Select between hyperplanes, Use margin as a test (This is what perceptrons don’t do) From learning theory, maximum margin is good

Page 20: Support Vector Machines Piyush Kumar. Perceptrons revisited Class 1 : (+1) Class 2 : (-1) Is this unique?

Another Reformulation

Unlike Perceptrons SVMs have a unique solutionbut are harder to solve.

<QP>

Page 21: Support Vector Machines Piyush Kumar. Perceptrons revisited Class 1 : (+1) Class 2 : (-1) Is this unique?

Support Vector Machines

• There are very simple algorithms to solve SVMs ( as simple as perceptrons )

• If you are interested in learning those, come and talk to me. (Out of reach for this course)

Page 22: Support Vector Machines Piyush Kumar. Perceptrons revisited Class 1 : (+1) Class 2 : (-1) Is this unique?

Another twist : Linearization

• If the data is separable with say a sphere, how would you use a svm to separate it? (Ellipsoids?)

Page 23: Support Vector Machines Piyush Kumar. Perceptrons revisited Class 1 : (+1) Class 2 : (-1) Is this unique?

Linearization a.k.a Feature Expansion

Delaunay!??

Lift the points to a paraboloid in one higher dimension,For instance if the data is in 2D, (x,y) -> (x,y,x2+y2)

Page 24: Support Vector Machines Piyush Kumar. Perceptrons revisited Class 1 : (+1) Class 2 : (-1) Is this unique?

Linearization

• Note that replacing x by (x) the decision boundary changes from w.x = 0 to w.(x) = 0

• This helps us get non-linear separators compared to linear separators when is non-linear (as in the last example).

• Another feature expansion example: – (x,y) -> (x^2, xy, y^2, x, y) – What kind of separators are there?

Page 25: Support Vector Machines Piyush Kumar. Perceptrons revisited Class 1 : (+1) Class 2 : (-1) Is this unique?

Linearization

• The more features, the more power.• There is a danger of overfitting.• When there are lot of features

(sometimes even infinite), we can use the “kernel trick” to solve the optimization problem faster.

• Lets look back at optimization for a moment again…

Page 26: Support Vector Machines Piyush Kumar. Perceptrons revisited Class 1 : (+1) Class 2 : (-1) Is this unique?

Lagrange Multipliers

Page 27: Support Vector Machines Piyush Kumar. Perceptrons revisited Class 1 : (+1) Class 2 : (-1) Is this unique?

Lagrangian function

Page 28: Support Vector Machines Piyush Kumar. Perceptrons revisited Class 1 : (+1) Class 2 : (-1) Is this unique?
Page 29: Support Vector Machines Piyush Kumar. Perceptrons revisited Class 1 : (+1) Class 2 : (-1) Is this unique?

At optimum

Page 30: Support Vector Machines Piyush Kumar. Perceptrons revisited Class 1 : (+1) Class 2 : (-1) Is this unique?

More precisely

Page 31: Support Vector Machines Piyush Kumar. Perceptrons revisited Class 1 : (+1) Class 2 : (-1) Is this unique?

The optimization Problem Revisited

Page 32: Support Vector Machines Piyush Kumar. Perceptrons revisited Class 1 : (+1) Class 2 : (-1) Is this unique?

Removing v

Page 33: Support Vector Machines Piyush Kumar. Perceptrons revisited Class 1 : (+1) Class 2 : (-1) Is this unique?

Support Vectors

v is a linear combination of ‘some examples’ or support vectors.More than likely if we see too many support vectors, we are overfitting. Simple and Short classifiers are preferable.

Page 34: Support Vector Machines Piyush Kumar. Perceptrons revisited Class 1 : (+1) Class 2 : (-1) Is this unique?

Substitution

Page 35: Support Vector Machines Piyush Kumar. Perceptrons revisited Class 1 : (+1) Class 2 : (-1) Is this unique?

Gram Matrix

Page 36: Support Vector Machines Piyush Kumar. Perceptrons revisited Class 1 : (+1) Class 2 : (-1) Is this unique?

The decision surface Recovered

Page 37: Support Vector Machines Piyush Kumar. Perceptrons revisited Class 1 : (+1) Class 2 : (-1) Is this unique?

What is Gram Matrix reduction good for?

• The Kernel Trick• Even if the number of features is

infinite, G might still be small and hence the optimization problem solvable.

• We could compute G without computing X, at least sometimes (by redefining the dot product in the feature space).

Page 38: Support Vector Machines Piyush Kumar. Perceptrons revisited Class 1 : (+1) Class 2 : (-1) Is this unique?

Recall

Page 39: Support Vector Machines Piyush Kumar. Perceptrons revisited Class 1 : (+1) Class 2 : (-1) Is this unique?

The kernel Matrix

• The trick that ML community uses for Linearization is to use a function that redefines distances between points.

• Example :

• The optimization problem no longer needs to be explicitly evaluated. As long as we can figure out the distance between two mapped points, its enough.

2|| || / 2( , ) x zK x z e

Page 40: Support Vector Machines Piyush Kumar. Perceptrons revisited Class 1 : (+1) Class 2 : (-1) Is this unique?

Example Kernels

Page 41: Support Vector Machines Piyush Kumar. Perceptrons revisited Class 1 : (+1) Class 2 : (-1) Is this unique?

The decision Surface?

Page 42: Support Vector Machines Piyush Kumar. Perceptrons revisited Class 1 : (+1) Class 2 : (-1) Is this unique?
Page 43: Support Vector Machines Piyush Kumar. Perceptrons revisited Class 1 : (+1) Class 2 : (-1) Is this unique?

A demo using libsvm

• Some implementations of SVM – libsvm– svmlight– svmtorch

Page 44: Support Vector Machines Piyush Kumar. Perceptrons revisited Class 1 : (+1) Class 2 : (-1) Is this unique?

Checkerboard Dataset

Page 45: Support Vector Machines Piyush Kumar. Perceptrons revisited Class 1 : (+1) Class 2 : (-1) Is this unique?

k-Nearest Neighbor Algorithm

Page 46: Support Vector Machines Piyush Kumar. Perceptrons revisited Class 1 : (+1) Class 2 : (-1) Is this unique?

LSVM on Checkerboard

Page 47: Support Vector Machines Piyush Kumar. Perceptrons revisited Class 1 : (+1) Class 2 : (-1) Is this unique?

Conclusions

• SVM is an step towards improving perceptrons

• They use large margin for good genralization• In order to make large feature expansions,

we can use the gram matrix formulation of the optimization problem (or use kernels).

• SVMs are popular classifiers because they achieve good accuracy on real world data.