22
Problem 2-Category Linearly Separable Case A- A+ x 0 w + b= à 1 w x 0 w + b= + 1 x 0 w + b= 0 Malignant Benign

Classification Problem 2-Category Linearly Separable Case A- A+ Malignant Benign

  • View
    218

  • Download
    2

Embed Size (px)

Citation preview

Page 1: Classification Problem 2-Category Linearly Separable Case A- A+ Malignant Benign

Classification Problem2-Category Linearly Separable

Case

A-

A+

x0w+ b= à 1

wx0w+ b= + 1x0w+ b= 0

Malignant

Benign

Page 2: Classification Problem 2-Category Linearly Separable Case A- A+ Malignant Benign

Support Vector MachinesMaximizing the Margin between Bounding

Planes

x0w+ b= + 1

x0w+ b= à 1

A+

A-

w

jjwjj22 = Margin

Page 3: Classification Problem 2-Category Linearly Separable Case A- A+ Malignant Benign

Algebra of the Classification Problem

2-Category Linearly Separable Case

Given m points in the n dimensional real spaceRn

Represented by anmâ nmatrixAor Membership of each pointA iin the classesAà A+

is specified by anmâ mdiagonal matrix D :

D ii = à 1 if A i 2 Aà and D ii = 1 A i 2 A+if SeparateAà and A+by two bounding planes such that:

A iw+ b > + 1; for D ii = + 1;A iw+ b 6 à 1; for D ii = à 1

More succinctly:D(Aw+ eb)>e

e= [1;1;. . .;1]02 Rm:

, where

Page 4: Classification Problem 2-Category Linearly Separable Case A- A+ Malignant Benign

Support Vector Classification(Linearly Separable Case)

Let S = f (x1;y1);(x2;y2);. . .(xl;yl)gbe a linearly separable training sample and represented by

matrices

A =

(x1)0

(x2)0...

(xl)0

2

64

3

75 2 R lâ n; D =

y1 ááá 0......

...0 ááá yl

" #

2 R lâ l

Page 5: Classification Problem 2-Category Linearly Separable Case A- A+ Malignant Benign

Support Vector Classification(Linearly Separable Case, Primal)

The hyperplane that solves the minimization problem:

(w;b)

min(w;b)2R n+1

21 jjwjj22

D(Aw+ eb)>e;

realizes the maximal margin hyperplane withgeometric margin í = jjwjj2

1

Page 6: Classification Problem 2-Category Linearly Separable Case A- A+ Malignant Benign

Support Vector Classification(Linearly Separable Case, Dual Form)

The dual problem of previous MP:

maxë2R l

e0ë à 21ë0DAA0Dë

subject to

e0Dë = 0; ë>0:Applying the KKT optimality conditions, we have

w = A0Dë. But where isb?

06ë ? D(Aw+ eb) à e>0Don’t forget

Page 7: Classification Problem 2-Category Linearly Separable Case A- A+ Malignant Benign

Dual Representation of SVM

(Key of Kernel Methods: )

The hypothesis is determined by(ëã;bã)

h(x) = sgn(êx;A0Dëã

ë+ bã)

= sgn(P

i=1

l

yiëãi

êxi;x

ë+ bã)

= sgn(P

ëãi >0

yiëãi

êxi;x

ë+ bã)

w = A0Dëã =P

i=1

`

yiëiA0i

Remember : A0i = xi

Page 8: Classification Problem 2-Category Linearly Separable Case A- A+ Malignant Benign

Compute the Geometric Margin via Dual Solution

The geometric margin í = jjwãjj21 and

êwã;wã

ë= (ëã)0DAA0Dëã, hence we can

computeí by usingëã. Use KKT again (in dual)!

0 6 ëã ? D(AA0Dëã + bãe) à e> 0 Don’t forgete0Dëã = 0

í = (e0ëã)à 21

= (P

ëãi >0

ëãi )

à 21

Page 9: Classification Problem 2-Category Linearly Separable Case A- A+ Malignant Benign

Soft Margin SVM(Nonseparable Case)

If data are not linearly separable Primal problem is infeasible Dual problem is unbounded above

Introduce the slack variable for each training point

yi(w0xi + b)>1à øi; øi>0 8 i

The inequality system is always feasible

w = 0; b= 0 & ø= ee.g.

Page 10: Classification Problem 2-Category Linearly Separable Case A- A+ Malignant Benign

xj

x

x

x

x

x

x

x

x

o

o

o

o

o

o

o

oi

í

í

øj

øi

Page 11: Classification Problem 2-Category Linearly Separable Case A- A+ Malignant Benign

Two Different Measures of Training Error

min(w;b;ø)2R n+1+l

21jjwjj22 + 2

Cjjøjj22

D(Aw+ eb) + ø>e

2-Norm Soft Margin:

1-Norm Soft Margin:min

(w;b;ø)2R n+1+l21jjwjj22 + Ce0ø

D(Aw+ eb) + ø>e

ø> 0

Page 12: Classification Problem 2-Category Linearly Separable Case A- A+ Malignant Benign

2-Norm Soft Margin Dual Formulation

The Lagrangian for 2-norm soft margin:

L (w;b;ø;ë) = 21w0w+ 2

Cø0ø+ë0[eà D(Aw+ eb) à ø]

where ë>0

The partial derivatives with respect to primalvariables equal zeros

@w@L (w;b;ø;ë) = wà A0Dë = 0

@b@L (w;b;ø;ë) = e0Dë = 0; @ø

@L (w;b;ø;ë) = Cøà ë = 0

Page 13: Classification Problem 2-Category Linearly Separable Case A- A+ Malignant Benign

Dual Maximization ProblemFor 2-Norm Soft Margin

Dual:

ë>0

maxë2R l

e0ë à 21ë0D(AA0+ C

1I )Dë

e0Dë = 0

The corresponding KKT complementarity:

06ë ? D(Aw+ eb) + øà e>0 Use above conditions to find bã

Page 14: Classification Problem 2-Category Linearly Separable Case A- A+ Malignant Benign

f (x) =ð P

i=1

?wiþi(x)

ñ+ b

Linear Machine in Feature Space

Let þ : X ! Fbe a nonlinear map from the

input space to some feature space

The classifier will be in the form (Primal):

Make it in the dual form:

f (x) =ð P

i=1

lë iyi

êþ(xi) áþ(x)

ëñ+ b

Page 15: Classification Problem 2-Category Linearly Separable Case A- A+ Malignant Benign

K (x;z) =êþ(x) áþ(z)

ë

Kernel: Represent Inner Product in Feature Space

The classifier will become:

f (x) =ð P

i=1

lë iyiK (xi;x)

ñ+ b

Definition: A kernel is a functionK : X â X ! Rsuch thatfor all x;z 2 X

where þ : X ! F

Page 16: Classification Problem 2-Category Linearly Separable Case A- A+ Malignant Benign

Introduce Kernel into DualFormulation

Let S = f (x1;y1);(x2;y2);. . .(xl;yl)gbe a linearly separable training sample in the feature space

implicitly defined by the kernel K (x;z).The SV classifier is determined byëã that

solvesmaxë2R l

e0ë à 21ë0DK (A;A0)Dë

subject to

e0Dë = 0; ë>0:

Page 17: Classification Problem 2-Category Linearly Separable Case A- A+ Malignant Benign

The value of kernel function represents the inner product in feature space

Kernel functions merge two steps 1. map input data from input space to feature space (might be infinite dim.) 2. do inner product in the feature space

Kernel TechniqueBased on Mercer’s Condition (1909)

Page 18: Classification Problem 2-Category Linearly Separable Case A- A+ Malignant Benign

Mercer’s Conditions Guarantees the Convexity of QP

and k(x;z)is a symmetric function onX .

K 2 Rnâ n

be a finite spaceX = f x1; x2; . . .; xngLet

Then k(x;z)is a kernel function if and only if

is positive semi-definite.;K i j = k(xi;xj)

Page 19: Classification Problem 2-Category Linearly Separable Case A- A+ Malignant Benign

Introduce Kernel in Dual FormulationFor 2-Norm Soft Margin

ë>0

maxë2R l

e0ë à 21ë0D(K (A;A0) + C

1I )Dë

e0Dë = 0

Then the decision rule is defined by

Use above conditions to find

The feature space implicitly defined byk(x;z) Supposeëãsolves the QP problem:

h(x) = sgn(K (x;A0)Dëã + bã)

Page 20: Classification Problem 2-Category Linearly Separable Case A- A+ Malignant Benign

Introduce Kernel in Dual Formulationfor 2-Norm Soft Margin

for any

bã is chosen so that

yi[K (A0i;A

0)Dëã + bã] = 1à Cëã

i

i with ëãi 6= 0

06ëã ? D(K (A;A0)Dëã + ebã)+ øã à e> 0

Because:

and ëã = Cøã

Page 21: Classification Problem 2-Category Linearly Separable Case A- A+ Malignant Benign

Geometric Margin in Feature Spacefor 2-Norm Soft Margin

The geometric margin in the feature space is defined by

í = jjwãjj21 =

àe0ëã à C

1jjëãjj22áà 2

1

jjwãjj22 = (ëã)0DK (A;A0)Dëã

...= e0ëã à C

1 jjëãjj22

Why e0øã > jjøãjj22 ?

Page 22: Classification Problem 2-Category Linearly Separable Case A- A+ Malignant Benign

Discussion about Cfor 2-Norm Soft Margin

The only difference between “hard margin” and 2-norm soft margin is the objective function in the optimization problem

Larger C will give you a smaller margin in the feature space

CompareK (A;A0) & (K (A;A0) + C1I )

Smaller C will give you a better numerical condition