Universum Support Vector Machine -A generalized approach Junfeng He with help from Professor Tony...

Universum Support Vector Machine-A generalized approach

Junfeng He

with help from Professor Tony Jebara, Gerry Tesauro and Vladimir Naumovich Vapnik

SVM for Classification

1min ||w||

. . ( ) 1 ,1

s t y w x b i m

Universum SVM for Classification

Idea: Contradiction on Universum

Approximation: If is close to zero, then a small change in will cause a contradiction on universum data

*, ( )w b if x

| |2 *

1 , ,,

1min ||w|| [ ( )] [ ( )]

i w b i w b iw b

H y f x U f x

( ) max( - )

, ( ( ))

max(1- ( ),0)

hence H y w x b

y w x b

( ) ( ) ( )U t H t H t 2( )U t t

*, force ( ) to be close to 0w b iTo f x

Dual form: (With U as ε-insenstive function)

Problem

Only suitable for two-label classification

Can we generalize universum SVM to both classification and regression?

*, ( ) 0w b if x

View regression as many two-label classification problems: For any given y,

, ( ) 0?

< ?w b if x y

For this two-lable classification problem, using the idea of universum SVM, the loss function should be: | |

*,[ ( ) ]

w b ii

U f x y With all possible y, the total loss function on universum data:

*,[ ( ) ] ( )

w b ii

U f x y p y dy

Generalized Universum Support Vector Machine

| |2 *

1min ||w|| [ ( )] [ ( )] ( )

i w b i w b iw b

F y f x G y f x p y dy

2( ) ( ) ( ), ( )F t H t H t G t t

| | | |* * 2

, ,1, 1

| | | |* 2 * 2

[ , ( )] ( ) 0.5( ( ))

( ( ) 1) ( ( ) )

w b i w b ii i y

w b i w b ii i

G y f x p y dy y f x

f x f x

For two classification, i.e., y = {+1,-1}, if p(y=+1)=p(y=-1) = 0.5,

degenerated as Universum SVM:

2 * * 2

[ ( )] ( )

( ) ( )

( ) 2 ( ) ( ) ( ) ( )

2( ) ( ) ( ) ( )

2( ) ( )

G y f x p y dy

y wx b p y dy

y p y dy wx b yp y dy wx b p y dy

D wx b yp y dy wx b p y dy

D wx b E wx b

Generalized Support Vector Machine

| |2 *

| |* 2 *

1min ||w|| [ , ( )] [ , ( )] ( )

1min ||w|| [ ( ) ( )]

[( ) 2( ) ]

i w b i w b iw b

i i i iw b

H y wx b H y wx b

wx b wx b E

1min ||w|| ( ) ( 2 )

. ., , , 1,...,

0 , 1,...,| |

i i u i iw b

i i i i i

C C v Ev

s t wx b y y wx b i m

wx b v i U

Dual form| |

' *( ( ) )i i

i i ii i

y x x x b '

| |' ' * 2

min ( , ', ) ( )

1 ( ) || ( ) ||

. ., ( ) 0,

i i i i ii i i

i ii i

Replacing by , we get the kernel version.i jx x ( , )i jK x x

Property

Suitable for both classification and regresson.

Without the universum part traditional SVR.

Sparse in training data, not sparse in universum data ( because of loss function).

L2 version

[ , ( )]

( ( ))

i w b i

F y f x

y wx b

[ , ( )]

( ( ))

G y f x

y wx b

| |2 *

1min ||w|| [ ( )] [ ( )] ( )

i w b i w b iw b

L2 version

| |2 *

| |2 2 * 2 *

1min ||w|| [ ( )] [ ( )] ( )

1min ||w|| ( ) [( ) 2( ) ]

i w b i w b iw b

i i i iw b

y wx b wx b wx b E

| |2 2 2

1min ||w|| ( 2 )

i u i iw b

C C v Ev

s t wx b y

wx b v

Dual form

0, , 0

1, , ,

T Tl u

ll ll lu

u lu uuu

A Y where A I K I KC

EII K K I

| |*( )i

i i ii i

y x x x b

* * *( , ) , ( , ) , ( , ) , [1,...,1]j i j

Tlll i j lu i uuK i j x x K i j x x K i j x x I

Property

Suitable for both regression and classification .

Without the universum part LS-SVM.

For classification y={+1,-1}, if E = 0,

degenerated to Universum LS-SVM [Fabian Sinz 2007].

Property

Not sparse in training or universum data.

Because of loss function:

It can be used for online learning.

can be computed based on 1

Experiments - male/female face classification Yale Face Dataset

Training: male 250 female168 Test: male 171 female 168

Universum: 1700.

Created by: a * male + (1-a) * female

Classification Error on Test Set

aSVM LS-

SVMUniversum LS-SVM (i.e.,E=0)

Our result (L2 version)

0.5 0.2212 0.2094 0.2330 0.1858 (E = 0.2)

0.7 0.2212 0.2094 0.3333 0.1799 (E = 0.6)

0.1 0.2212 0.2094 0.4307 0.2006 (E=-0.6)

More experiments

Coming soon…

Thank You! 谢谢！ありがとう！ Vielen Dank ！

Kop Koon Ka! 謝謝！Merci beaucoup ！ 감사합니다 ！Spasiba ！ Ευχαριστίες !

！ شكور Grazias ！

Köszönöm ！ Obrigado ！

Q & A?

Universum Support Vector Machine -A generalized approach Junfeng He with help from Professor Tony...

Documents

INTRODUCTIONTO MachineLearningpeople.sabanciuniv.edu/berrin/cs512/lectures/2015/11...Vapnik and Chervonenkis – 1963 ! Boser, Guyon and Vapnik – 1992 (kernel trick) ! Cortes and

GÜNLÜK GÜNEùLENME SÜRESİNİN DESTEK VEKTÖR MAKİNELERİ …uzalmet.mgm.gov.tr/tammetin/10.pdf · Destek Vektör Makineleri DVM Vapnik tarafından gelitirilmi eğitimli öğrenmeye

METHODS OF PATTERN RECOGNITION CHAPTER 5 OF: STATISTICAL LEARNING METHODS BY VAPNIK Zahra Zojaji 1

Methods of Pattern Recognition chapter 5 of: Statistical learning methods by Vapnik

Chapter5 Vapnik ADVANCED

Naïve Bayes, Support Vector Machinesdannag/Courses/IntroTo... · 2020-02-18 · Cortes & Vapnik. "Support-vector networks."Machine learning, 1995. Boser, Guyon, & Vapnik. "A training

Machine Learning - Columbia Universityjebara/4771/notes/class9x.pdf · 2014-10-21 · Machine Learning 4771 Instructor: Tony Jebara . Tony Jebara, Columbia University Topic 9

Vapnik-Chervonenkis Dimension Part I: Definition and Lower bound

Curriculum Vitae - Columbia Engineering€¦ · Curriculum Vitae Tony Jebara, PhD, Associate Professor, Department of Computer Science, Columbia University 1214 Amsterdam Avenue,

Clustering Graphs, Spectra and Semidefinite Programmingjebara/6772/notes/notes11.pdfClustering Graphs, Spectra and Semideﬁnite Programming Tony Jebara April 13, 2015. Clustering

Structure from Motion1 3D structure from motion Tony Jebara, Ali Azarbayejani, Alex Pentland IEEE signal processing Magazine 16(3)

Structure Preserving Embedding Blake Shaw, Tony Jebara ICML 2009 (Best Student Paper nominee) Presented by Feng Chen

Tony Jebara, Columbia University Advanced Machine Learning & Perception Instructor: Tony Jebara

Applying Static Analysis to Software Architecturesext.math.umass.edu/~avrunin/papers/naumovich-fse97.pdfApplying Static Analysis to Software Architectures Gleb Naumovich, George S

Bounding the Vapnik-Chervonenkis dimension of concept classes … · 2017. 8. 26. · BOUNDING THE VAPNIK-CHERVONENKIS DIMENSION 133 shattered by C, or oo if arbitrarily large subsets

Dr Karim Ben Jebara Head, Animal Health Information Department OIE The Role of OIE Reference Laboratories and Collaborating Centre in Disease Reporting

A Conservative Algorithm for Computing the Flow of Permissions in Java Programs Gleb Naumovich Polytechnic University Brooklyn, USA Presented by David

The complexity and complementarity...Rozenberg Igor Naumovich (Russia, Moscow) Professor Doctor of Technical Sciences, General Director JSC "Scientific-research and design Institute

Spectral Clustering for one mic Audio Blind Separation › ~jebara › 6772 › proj › oldprojects › marcvinyes.pdf’mix.wav’ = ’guitar.wav’ + ’kick.wav’ + ’snare.wav’

Machine Learningjebara/4771/notes/class15x.pdf · 2015. 12. 9. · 4771 Instructor: Tony Jebara . Tony Jebara, Columbia University Topic 15 •Graphical Models •Maximum Likelihood