Upload
paul-ellis
View
215
Download
0
Tags:
Embed Size (px)
Citation preview
Universum Support Vector Machine-A generalized approach
Junfeng He
with help from Professor Tony Jebara, Gerry Tesauro and Vladimir Naumovich Vapnik
SVM for Classification
2
,1
1min ||w||
2
. . ( ) 1 ,1
0
m
iw bi
i i i
i
C
s t y w x b i m
Universum SVM for Classification
Idea: Contradiction on Universum
Universum SVM for Classification
Approximation: If is close to zero, then a small change in will cause a contradiction on universum data
*, ( )w b if x
*ix
,w bf
Universum SVM for Classification
| |2 *
1 , ,,
1
1min ||w|| [ ( )] [ ( )]
2
Um
i w b i w b iw b
i i
H y f x U f x
1
( ) max( - )
, ( ( ))
max(1- ( ),0)
( )
i i
i i
i
H t t
hence H y w x b
y w x b
( ) ( ) ( )U t H t H t 2( )U t t
*, force ( ) to be close to 0w b iTo f x
Universum SVM for Classification
Dual form: (With U as ε-insenstive function)
Problem
Only suitable for two-label classification
Can we generalize universum SVM to both classification and regression?
*, ( ) 0w b if x
Idea
View regression as many two-label classification problems: For any given y,
, ( ) 0?
< ?w b if x y
For this two-lable classification problem, using the idea of universum SVM, the loss function should be: | |
*,[ ( ) ]
U
w b ii
U f x y With all possible y, the total loss function on universum data:
| |
*,[ ( ) ] ( )
U
w b ii
U f x y p y dy
Generalized Universum Support Vector Machine
| |2 *
, ,,
1
1min ||w|| [ ( )] [ ( )] ( )
2
Um
i w b i w b iw b
i i
F y f x G y f x p y dy
2( ) ( ) ( ), ( )F t H t H t G t t
| | | |* * 2
, ,1, 1
| | | |* 2 * 2
, ,
[ , ( )] ( ) 0.5( ( ))
( ( ) 1) ( ( ) )
U U
w b i w b ii i y
U U
w b i w b ii i
G y f x p y dy y f x
f x f x
For two classification, i.e., y = {+1,-1}, if p(y=+1)=p(y=-1) = 0.5,
degenerated as Universum SVM:
*,
* 2
2 * * 2
* * 2
* * 2
[ ( )] ( )
( ) ( )
( ) 2 ( ) ( ) ( ) ( )
2( ) ( ) ( ) ( )
2( ) ( )
w b i
i
i i
i i
i i
G y f x p y dy
y wx b p y dy
y p y dy wx b yp y dy wx b p y dy
D wx b yp y dy wx b p y dy
D wx b E wx b
Generalized Support Vector Machine
| |2 *
, ,,
1
2
,1
| |* 2 *
1min ||w|| [ , ( )] [ , ( )] ( )
2
1min ||w|| [ ( ) ( )]
2
[( ) 2( ) ]
Um
i w b i w b iw b
i i
m
i i i iw b
i
U
i ii
F y f x G y f x p y dy
H y wx b H y wx b
wx b wx b E
| |
2 ' 2
,1
'
*
1min ||w|| ( ) ( 2 )
2
. ., , , 1,...,
0 , 1,...,| |
i
i
Um
i i u i iw b
i i
i i i i i
i
C C v Ev
s t wx b y y wx b i m
wx b v i U
Dual form| |
' *( ( ) )i i
um
i i ii i
y x x x b '
| |' ' * 2
| |'
'
min ( , ', ) ( )
1 ( ) || ( ) ||
2
. ., ( ) 0,
0 , 0
i
i i i
i
i
m
ii
um m
i i i i ii i i
um
i ii i
i
W
y x x
s t
C C
Replacing by , we get the kernel version.i jx x ( , )i jK x x
Property
Suitable for both classification and regresson.
Without the universum part traditional SVR.
Sparse in training data, not sparse in universum data ( because of loss function).
L2 version
,
2,
2
[ , ( )]
( ( ))
( )
i w b i
i w b i
i i
F y f x
y f x
y wx b
*,
* 2,
* 2
[ , ( )]
( ( ))
( )
w b i
w b i
i
G y f x
y f x
y wx b
| |2 *
, ,,
1
1min ||w|| [ ( )] [ ( )] ( )
2
Um
i w b i w b iw b
i i
F y f x G y f x p y dy
L2 version
| |2 *
, ,,
1
| |2 2 * 2 *
,1
1min ||w|| [ ( )] [ ( )] ( )
2
1min ||w|| ( ) [( ) 2( ) ]
2
Um
i w b i w b iw b
i i
Um
i i i iw b
i i
F y f x G y f x p y dy
y wx b wx b wx b E
| |2 2 2
,1
*
1min ||w|| ( 2 )
2
. .,
0i
Um
i u i iw b
i i
i i i
i
C C v Ev
s t wx b y
wx b v
Dual form
0, , 0
1, , ,
21
, , 2
T Tl u
ll ll lu
uT
u lu uuu
I Ib
A Y where A I K I KC
EII K K I
C
| |*( )i
um
i i ii i
y x x x b
* * *( , ) , ( , ) , ( , ) , [1,...,1]j i j
Tlll i j lu i uuK i j x x K i j x x K i j x x I
Property
Suitable for both regression and classification .
Without the universum part LS-SVM.
For classification y={+1,-1}, if E = 0,
degenerated to Universum LS-SVM [Fabian Sinz 2007].
Property
Not sparse in training or universum data.
Because of loss function:
It can be used for online learning.
can be computed based on 1
,
,
Told
new
A B
B K
1oldA
Experiments - male/female face classification Yale Face Dataset
Training: male 250 female168 Test: male 171 female 168
Universum: 1700.
Created by: a * male + (1-a) * female
Classification Error on Test Set
aSVM LS-
SVMUniversum LS-SVM (i.e.,E=0)
Our result (L2 version)
0.5 0.2212 0.2094 0.2330 0.1858 (E = 0.2)
0.7 0.2212 0.2094 0.3333 0.1799 (E = 0.6)
0.1 0.2212 0.2094 0.4307 0.2006 (E=-0.6)
More experiments
Coming soon…
Thank You! 谢谢!ありがとう ! Vielen Dank !
Kop Koon Ka! 謝謝!Merci beaucoup ! 감사합니다 !Spasiba ! Ευχαριστίες !
! شكور Grazias !
Köszönöm ! Obrigado !
Q & A?