Introduction to Non-linear Support Vector Machine (SVM)

Introduction to Non-linear Support Vector Machine (SVM)

Author: Jean-Philippe Vert

Bioinformatics Center, Kyoto University, Japan

Advisor: Dr.Hsu

Graduate: Ching-Wen Hong

Outline

• 1.Linear SVM• 2.Non-linear SVM• 3.Training a SVM in the feature space• 4.Kernal• 5.Popular kernals• 6.The approach for Non-linear SVM • 7. Classification with a Polynomial kernel• 8. Classification with a Gaussian kernel• 9.Conclusion

Linear SVM

.

,...),,,,(

:

.

,log

)(

,1,1::

1,1

,,),(),...,,(: 11

risktheindicatesy

genometypebloodsexagefeaturesofsetaisx

diagnosisMedicalexampleFor

problemtionclassificaas

consideredbecanproblemsManyybionalcomputatioIn

Rxobjectnewanyforxfthepredictswhich

RfclassifieraOutput

y

RxobjectyxyxSsettrainingaInput

n

n

i

niNN

Linear SVM

Linear SVM

Linear SVM

Linear SVM

Linear SVM

Linear SVM

Linear SVM

•

bxxyxfei

bxwxffunctiondecisionThe

xywbygivenisw

hyperplaneoptimalthefindcanWefoundisNiOnce

hyperplaneoptimalThe

N

iiii

N

iiii

i

1

1

)(..

)(

,.,...,1,

Non-linear SVM

Non-linear SVM

Non-linear SVM

bxWRxxxxRxxxf

isfunctiondecisionThe

RbW

xxxxxxxx

)(),2,()1,0,1()(

,)1,0,1(

),2,()(),,(

22221

21

222

21

2

2221

2121

Training a SVM in the feature space

• (1)Input:a training set S={(x1,y1),…,(XN,YN)} is not linearly separable.

• (2)A mapping Φ(xi)=( Φ1(xi) , … , ΦM(Xi) ) , i=1,…,N• (3)The training set Φ(S)={ (Φ(x1),y1),…,(Φ(xN), yN) } can

be linearly separable in the feature space.• (4)The dual problem is to maximize• Max LD=∑αi-1/2∑αiαjyiyjΦ(xi)․Φ(xj)• S.t. 0 ≤ αi ≤ C , i=1,…,N ,and ∑ αiyi = 0• (5)We can find the decision function• f(x)=w․Φ(x)+b = ∑αiyiΦ(xi)․Φ(x) + b• K(x,x‘) =Φ(x)․Φ(x') is a Kernel function

Kernel

• (1).Kernel K(x,x‘)=Φ(x)․Φ(x‘)• (x,x‘) is any two points in the input space• Φ(x) is a mapping to a feature space

.exp

)().3(

)()()()(),(

),2,()(),(

).2(

22'22

'11

2221

2121

computedlicitly

notisxifevencalculatetosimpleusuallyareKernels

xxxxxxxxxxKKernal

xxxxxxxx

mappingtheConsider

Example

Popular Kernels

12)(

)1,2,2,,2,()(tan

,,,,,1..

var2

)()()(),(,2

:

)(

),2,()(tan

,,..var2

)()()(),(,2

:

tan,deg,)(),(

)(),(

).1(

.

22121

212221

21

2221

2121

22

222

21

2221

21

2221

21

21

2

1

xxxxxffunctiondecisionpossiblea

xxxxxxxceinsfor

xxxxxxei

iablesmostatofproductsallbyspannedisspacefeatureThe

cxxxxxxKd

Example

Rxxxffunctiondecisionpossiblea

xxxxxceinsfor

xxxxeiiablesofproductsallbyspannedisspacefeatureThe

xxxxxxKd

Example

tconsaiscpolynomtheofreetheisdcxxxxK

xxxxK

KernelsPolynomial

packagesSVMmostinusedareKernelsPopular

poly

poly

dpoly

dpoly

Popular Kernels

N

iiii

N

i

i

ii

xkxyxfisfunctiondecisionThe

thresholdaisgainaiskxkxxxK

KernelSigmoid

xxyxfisfunctiondecisionThe

computetoeasyisfunctionKernelthebut

computedbecannotmappinglicitthewhereexampletypicalaisThis

parameteraisxx

xxK

KernelfunctionbasisRadial

1

12

2

'

)tanh()(

.,),tanh(),(

).3(

)2

exp()(

.

exp

,)2

exp(),(

).2(

The approach for Non-linear SVM

• The following steps:• (1).Input a training set S={(x1,y1),…,(xN,yN)}• (2).Choose a Kernel K(․,․)• (3).Training a SVM in the feature space• i.e.To find the decision function f(x)=∑αiyiK(xi,x) • (4).Classify any new object and to test efficiency on the r

esearch of data.• There is usually no automatic way to choose a Kernel an

d to adjust the corresponding parameters,Therefore we usually has to try different Kernels and paramters.

Classification with a Polynomial kernel

Classification with a Gaussian kernel

Conclusion

• Non-linear SVM is a extremely powerful learning algorithm for binary classification.

• It is important to find Kernel but it is difficult.• If we can find a way to Kernel,That is a nice thin

g to develop in the machine learning.

Documents

Introduction to Non-linear Support Vector Machine (SVM)