Universum Support Vector Machine -A generalized approach Junfeng He with help from Professor Tony...

Preview:

Citation preview

Universum Support Vector Machine-A generalized approach

Junfeng He

with help from Professor Tony Jebara, Gerry Tesauro and Vladimir Naumovich Vapnik

SVM for Classification

2

,1

1min ||w||

2

. . ( ) 1 ,1

0

m

iw bi

i i i

i

C

s t y w x b i m

Universum SVM for Classification

Idea: Contradiction on Universum

Universum SVM for Classification

Approximation: If is close to zero, then a small change in will cause a contradiction on universum data

*, ( )w b if x

*ix

,w bf

Universum SVM for Classification

| |2 *

1 , ,,

1

1min ||w|| [ ( )] [ ( )]

2

Um

i w b i w b iw b

i i

H y f x U f x

1

( ) max( - )

, ( ( ))

max(1- ( ),0)

( )

i i

i i

i

H t t

hence H y w x b

y w x b

( ) ( ) ( )U t H t H t 2( )U t t

*, force ( ) to be close to 0w b iTo f x

Universum SVM for Classification

Dual form: (With U as ε-insenstive function)

Problem

Only suitable for two-label classification

Can we generalize universum SVM to both classification and regression?

*, ( ) 0w b if x

Idea

View regression as many two-label classification problems: For any given y,

, ( ) 0?

< ?w b if x y

For this two-lable classification problem, using the idea of universum SVM, the loss function should be: | |

*,[ ( ) ]

U

w b ii

U f x y With all possible y, the total loss function on universum data:

| |

*,[ ( ) ] ( )

U

w b ii

U f x y p y dy

Generalized Universum Support Vector Machine

| |2 *

, ,,

1

1min ||w|| [ ( )] [ ( )] ( )

2

Um

i w b i w b iw b

i i

F y f x G y f x p y dy

2( ) ( ) ( ), ( )F t H t H t G t t

| | | |* * 2

, ,1, 1

| | | |* 2 * 2

, ,

[ , ( )] ( ) 0.5( ( ))

( ( ) 1) ( ( ) )

U U

w b i w b ii i y

U U

w b i w b ii i

G y f x p y dy y f x

f x f x

For two classification, i.e., y = {+1,-1}, if p(y=+1)=p(y=-1) = 0.5,

degenerated as Universum SVM:

*,

* 2

2 * * 2

* * 2

* * 2

[ ( )] ( )

( ) ( )

( ) 2 ( ) ( ) ( ) ( )

2( ) ( ) ( ) ( )

2( ) ( )

w b i

i

i i

i i

i i

G y f x p y dy

y wx b p y dy

y p y dy wx b yp y dy wx b p y dy

D wx b yp y dy wx b p y dy

D wx b E wx b

Generalized Support Vector Machine

| |2 *

, ,,

1

2

,1

| |* 2 *

1min ||w|| [ , ( )] [ , ( )] ( )

2

1min ||w|| [ ( ) ( )]

2

[( ) 2( ) ]

Um

i w b i w b iw b

i i

m

i i i iw b

i

U

i ii

F y f x G y f x p y dy

H y wx b H y wx b

wx b wx b E

| |

2 ' 2

,1

'

*

1min ||w|| ( ) ( 2 )

2

. ., , , 1,...,

0 , 1,...,| |

i

i

Um

i i u i iw b

i i

i i i i i

i

C C v Ev

s t wx b y y wx b i m

wx b v i U

Dual form| |

' *( ( ) )i i

um

i i ii i

y x x x b '

| |' ' * 2

| |'

'

min ( , ', ) ( )

1 ( ) || ( ) ||

2

. ., ( ) 0,

0 , 0

i

i i i

i

i

m

ii

um m

i i i i ii i i

um

i ii i

i

W

y x x

s t

C C

Replacing by , we get the kernel version.i jx x ( , )i jK x x

Property

Suitable for both classification and regresson.

Without the universum part traditional SVR.

Sparse in training data, not sparse in universum data ( because of loss function).

L2 version

,

2,

2

[ , ( )]

( ( ))

( )

i w b i

i w b i

i i

F y f x

y f x

y wx b

*,

* 2,

* 2

[ , ( )]

( ( ))

( )

w b i

w b i

i

G y f x

y f x

y wx b

| |2 *

, ,,

1

1min ||w|| [ ( )] [ ( )] ( )

2

Um

i w b i w b iw b

i i

F y f x G y f x p y dy

L2 version

| |2 *

, ,,

1

| |2 2 * 2 *

,1

1min ||w|| [ ( )] [ ( )] ( )

2

1min ||w|| ( ) [( ) 2( ) ]

2

Um

i w b i w b iw b

i i

Um

i i i iw b

i i

F y f x G y f x p y dy

y wx b wx b wx b E

| |2 2 2

,1

*

1min ||w|| ( 2 )

2

. .,

0i

Um

i u i iw b

i i

i i i

i

C C v Ev

s t wx b y

wx b v

Dual form

0, , 0

1, , ,

21

, , 2

T Tl u

ll ll lu

uT

u lu uuu

I Ib

A Y where A I K I KC

EII K K I

C

| |*( )i

um

i i ii i

y x x x b

* * *( , ) , ( , ) , ( , ) , [1,...,1]j i j

Tlll i j lu i uuK i j x x K i j x x K i j x x I

Property

Suitable for both regression and classification .

Without the universum part LS-SVM.

For classification y={+1,-1}, if E = 0,

degenerated to Universum LS-SVM [Fabian Sinz 2007].

Property

Not sparse in training or universum data.

Because of loss function:

It can be used for online learning.

can be computed based on 1

,

,

Told

new

A B

B K

1oldA

Experiments - male/female face classification Yale Face Dataset

Training: male 250 female168 Test: male 171 female 168

Universum: 1700.

Created by: a * male + (1-a) * female

Classification Error on Test Set

aSVM LS-

SVMUniversum LS-SVM (i.e.,E=0)

Our result (L2 version)

0.5 0.2212 0.2094 0.2330 0.1858 (E = 0.2)

0.7 0.2212 0.2094 0.3333 0.1799 (E = 0.6)

0.1 0.2212 0.2094 0.4307 0.2006 (E=-0.6)

More experiments

Coming soon…

Thank You! 谢谢!ありがとう ! Vielen Dank !

Kop Koon Ka! 謝謝!Merci beaucoup ! 감사합니다 !Spasiba ! Ευχαριστίες !

! شكور Grazias !

Köszönöm ! Obrigado !

Q & A?

Recommended