Proximal Support Vector Machine Classifiers KDD 2001 San Francisco August 26-29, 2001

Proximal Support Vector Machine ClassifiersKDD 2001

San Francisco August 26-29, 2001

Glenn Fung & Olvi Mangasarian

Data Mining Institute

University of Wisconsin - Madison

Support Vector MachinesMaximizing the Margin between Bounding

Planes

x0w = í + 1

x0w = í à 1

A+

A-

jjwjj22

w

Proximal Vector MachinesFitting the Data using two parallel

Bounding Planes

x0w = í + 1

x0w = í à 1

A+

A-

jjwjj22

w

Standard Support Vector Machine Formulation

Margin is maximized by minimizing21kw;í k2

2

÷> 0 Solve the quadratic program for some :

2÷kyk2

2 + 21kw;í k2

2

D(Awà eí ) + y > ey;w;ímin

s. t.(QP)

,

, denoteswhere D ii = æ1 A+ Aàor membership.

PSVM Formulation

We have from the QP SVM formulation:

w;í (QP)2÷kyk2

2 + 21kw;í k2

2

D(Awà eí ) + y

min

s. t. = e=

This simple, but critical modification, changes the nature of the optimization problem tremendously!!

Solving for in terms of and gives:

minw;í 2

÷keà D(Awà eí )k22 + 2

1kw; í k22

y w í

Advantages of New Formulation

Objective function remains strongly convex

An explicit exact solution can be written in terms of the problem data

PSVM classifier is obtained by solving a single system of linear equations in the usually small dimensional input space

Exact leave-one-out-correctness can be obtained in terms of problem data

Linear PSVM

We want to solve:

w;ímin

2÷keà D(Awà eí )k2

2 + 21kw; í k2

2

Setting the gradient equal to zero, gives a nonsingular system of linear equations.

Solution of the system gives the desired PSVM classifier

Linear PSVM Solution

H = [A à e]Here,

íw

h i= (÷

I + H 0H)à 1H 0De

The linear system to solve depends on:

H 0H(n + 1) â (n + 1)which is of the size

is usually much smaller than n m

Linear Proximal SVM Algorithm

Classifier: sign(w0x à í )

Input A;D

Define H = [A à e]

Solve (÷I + H 0H) í

wh i

= v

v = H0DeCalculate

Nonlinear PSVM Formulation

By QP “duality”, w = A0Du. Maximizing the margin

in the “dual space” , gives:

2÷keà D(AA0Du à eí )k2

2+ 21ku;í k2

2u;í

min

K (A;A0) Replace AA0by a nonlinear kernel :

2÷keà D(K (A;A0)Du à eí )k2

2+ 21ku;í k2

2u;ímin

Linear PSVM: (Linear separating surface:x0w = í )

w;í (QP)2÷kyk2

2 + 21kw;í k2

2

D(Awà eí ) + y

min

s. t. = e

The Nonlinear Classifier

K (A;A0) : Rmâ n â Rnâ m7à! Rmâ m

K (x0;A0)Du = í

The nonlinear classifier:

Where K is a nonlinear kernel, e.g.: Gaussian (Radial Basis) Kernel :

"à ökA ià A jk22; i; j = 1;. . .;mK (A;A0)ij =

The ij -entry of K (A;A0) represents the “similarity” of data points A i A jand

Nonlinear PSVM

H = [K (A;A0) à e]Defining slightly different:

íu

h i= (÷

I + H 0H)à 1H 0De

Similar to the linear case, setting the gradient equal to zero, we obtain:

However, reduced kernels techniques can be used (RSVM)to reduce dimensionality.

Here, the linear system to solve is of the size

(m+ 1) â (m+ 1)

Linear Proximal SVM Algorithm

Input A;D

Solve (÷I + H 0H) í

wh i

= v

v = H0DeCalculate

Non

Define H = [A à e] K = K (A;A0)K

Classifier: sign(w0x à í ) Classifier: sign(K (x0;A0)u à í )

u u = Du

Documents

Proximal Support Vector Machine Classifiers KDD 2001 San Francisco August 26-29, 2001