Mathematical Programming in Support Vector Machines Olvi L. Mangasarian University of Wisconsin -...

Mathematical Programming in Support Vector Machines

Olvi L. Mangasarian

University of Wisconsin - Madison

High Performance Computation for Engineering Systems Seminar

MIT October 4, 2000

What is a Support Vector Machine?

An optimally defined surfaceTypically nonlinear in the input spaceLinear in a higher dimensional spaceImplicitly defined by a kernel function

What are Support Vector Machines Used For?

ClassificationRegression & Data FittingSupervised & Unsupervised Learning

(Will concentrate on classification)

Example of Nonlinear Classifier:Checkerboard Classifier

Outline of Talk

Generalized support vector machines (SVMs)Completely general kernel allows complex classification

(No Mercer condition!) Smooth support vector machines

Smooth & solve SVM by a fast Newton method Lagrangian support vector machines

Very fast simple iterative scheme-One matrix inversion: No LP. No QP.

Reduced support vector machinesHandle large datasets with nonlinear kernels

Generalized Support Vector Machines2-Category Linearly Separable Case

wx0w = í + 1

x0w = í à 1

Generalized Support Vector MachinesAlgebra of 2-Category Linearly Separable Case

Given m points in n dimensional space Represented by an m-by-n matrix A Membership of each in class +1 or –1 specified by:A i

An m-by-m diagonal matrix D with +1 & -1 entries

D(Awà eí )=e;

More succinctly:

where e is a vector of ones.

x0w = í æ1: Separate by two bounding planes,

A iw=í + 1; for D i i = + 1;A iw5í à 1; for D i i = à 1:

Generalized Support Vector MachinesMaximizing the Margin between Bounding Planes

wx0w = í + 1

x0w = í à 1

jjwjj22

Generalized Support Vector MachinesThe Linear Support Vector Machine Formulation

s.t. D(Awà eí ) + y = e

Solve the following mathematical program for some :

w;í ;ymin ÷e0y+ 2

y = 0:

The nonnegative slack variable is zero iff: Convex hulls of and do not intersect is sufficiently large

yA + A à

D(Awà eí )=e

Breast Cancer Diagnosis Application97% Tenfold Cross Validation Correctness780 Samples:494 Benign, 286 Malignant

Another Application: Disputed Federalist PapersBosch & Smith 1998

56 Hamilton, 50 Madison, 12 Disputed

Generalized Support Vector Machine Motivation

(Nonlinear Kernel Without Mercer Condition)

Linear SVM: Linear separating surface: x0w = ímin ÷e0y+ k w k1

s.t. D(Awà eí ) + y=e; y=0 Set w = A0Du. Resulting linear surface:x0A0Du = í

min ÷e0y+ k u k1

s.t. D(AA0Du à eí ) + y=e; y=0Replace AA0by arbitrary nonlinear kernel K (A;A0) Resulting nonlinear surface: K (x0;A0)Du = í

min ÷e0y+ k u k1

s.t. D(K (A;A0)Du à eí ) + y=e;y=0

SSVM: Smooth Support Vector Machine(SVM as Unconstrained Minimization Problem)

Changing to 2-norm and measuring margin in( ) space:

Smoothing the Plus Function: Integrate the Sigmoid Function

SSVM: The Smooth Support Vector Machine Smoothing the Plus Function

Integrating the sigmoid approximation to the step function:

s(x;ë) = 1+"à ëx1 ;

gives a smooth, excellent approximation to the plus function:

p(x;ë) = x + ë1 log(1+ "à ëx); ë > 0:

Replacing the plus function in the nonsmooth SVMby the smooth approximation gives our SSVM:

min Ðë(w;í ) :=

min2÷k p(eà D(Awà eí );ë) k2

2 + 21 k w;í k2

Newton: Minimize a sequence of quadratic approximationsto the strongly convex objective function, i.e. solve a sequenceof linear equations in n+1 variables. (Small dimensional inputspace.)

Armijo: Shorten distance between successive iterates so as to generate sufficient decrease in objective function. (In computational reality, not needed!)

Global Quadratic Convergence: Starting from any point,the iterates guaranteed to converge to the unique solution at a quadratic rate, i.e. errors get squared. (Typically, 6 to 8 iterations without an Armijo.)

SSVM with a Nonlinear Kernel Nonlinear Separating Surface in Input Space

Examples of Kernels Generate Nonlinear Separating Surfaces in Input Space

A 2 Rmâ n;a 2 Rm;ö 2 R;dintegerPolynomial Kernel

(AA0+ öaa0)d

Neural Network Kernel

(AA0+ öaa0)ã(á)ã : R ! f 0;1g

Gaussian (Radial Basis) Kernel

"à ökA ià A jk2; i;j=1;. . .;m:

LSVM: Lagrangian Support Vector MachineDual of SVM

Taking the dual of the SVM formulation:

gives the following simple dual problem:

min0ô u2R m 21u0(÷

I + D(AA0+ ee0)D)uà e0u

The variables (w;í ;y) of SSVM are related to u by:

w = A0Du; y = ÷u; í = à e0Du:

LSVM: Lagrangian Support Vector MachineDual SVM as Symmetric Linear Complementarity Problem

The optimality condition for this dual SVM is the LCP:

0 ô u ? Qu à eõ 0;

min 0ô u2Rm f (u) := 21u0Qu à e0u:

Reduces the dual SVM to:

Defining the two matrices:

H = D[A à e]; Q = ÷I + HH0

which, by Implicit Lagrangian Theory, is equivalent to:

Qu à e= ((Qu à e) à ëu)+:ë > 0;

LSVM AlgorithmSimple & Linearly Convergent – One Small Matrix Inversion

ui+1 = Qà 1(e+ ((Qui à e) à ëui)+); i = 0;1; . . .Where:

0< ë < ÷2

Key Idea: Sherman-Morrison-Woodbury formula allows the inversioninversion of an extremely large m-by-m matrix Q by merely invertinga much smaller n-by-n matrix as follows:

(÷I + HH0)à 1 = ÷(I à H(÷

I + H0H)à 1H0):

LSVM Algorithm – Linear Kernel11 Lines of MATLAB Code

function [it, opt, w, gamma] = svml(A,D,nu,itmax,tol)% lsvm with SMW for min 1/2*u'*Q*u-e'*u s.t. u=>0,% Q=I/nu+H*H', H=D[A -e]% Input: A, D, nu, itmax, tol; Output: it, opt, w, gamma% [it, opt, w, gamma] = svml(A,D,nu,itmax,tol); [m,n]=size(A);alpha=1.9/nu;e=ones(m,1);H=D*[A -e];it=0; S=H*inv((speye(n+1)/nu+H'*H)); u=nu*(1-S*(H'*e));oldu=u+1; while it<itmax & norm(oldu-u)>tol z=(1+pl(((u/nu+H*(H'*u))-alpha*u)-1)); oldu=u; u=nu*(z-S*(H'*z)); it=it+1; end; opt=norm(u-oldu);w=A'*D*u;gamma=-e'*D*u;

function pl = pl(x); pl = (abs(x)+x)/2;

LSVM Algorithm – Linear KernelComputational Results

2 Million random points in 10 dimensional spaceClassified in 6.7 minutes in 6 iterations & e-5 accuracy250 MHz UltraSPARC II with 2 gigabyte memoryCPLEX ran out of memory

32562 points in 123-dimensional space (UCI Adult Dataset)Classified in141 seconds & 55 iterations to 85% correctness400 MHz Pentium II with 2 gigabyte memorySVM classified in 178 seconds & 4497 iterationslight

LSVM – Nonlinear KernelFormulation

K (A;B) : Rmâ n â Rnâ l ! Rmâ l;

For the nonlinear kernel:

the separating nonlinear surface is given by:

K ([x0 à 1]; à e0A0

h i)Du = 0

Where u is the solution of the dual problem:

05u2Rmmin f (u) := 2

1u0Qu à e0u;with Q redefined as:

G = [A à e]; Q = ÷I + DK (G;G0)D

LSVM Algorithm – Nonlinear Kernel Application 100 Iterations, 58 Seconds on Pentium II, 95.9% Accuracy

Reduced Support Vector Machines (RSVM)

Large Nonlinear Kernel Classification Problems

is a small random sample ofK (A;Aö0);where Aö0 A0 Key idea: Use a rectangular kernel.

Typically Aö has 1% to 10% of the rows of A

Two important consequences:RSVM can solve very large problems

Aö Nonlinear separator depends on only

uö;í ;ymin

2÷y0y+ 2

1(uö0uö+ í 2)

s:t: D(K (A;Aö0)Döuöà eí ) + y=e;y=0

gives lousy resultsK (Aö;Aö0) Separating surface: K (x0;Aö0)Döuö = í

Conventional SVM Result on Checkerboard Using 50 Random Points Out of 1000

RSVM Result on Checkerboard Using SAME 50 Random Points Out of 1000

RSVM on Large Classification ProblemsStandard Error over 50 Runs = 0.001 to 0.002RSVM Time = 1.24 * (Random Points Time)

Conclusion

Mathematical Programming plays an essential role in SVMs

TheoryNew formulations

Generalized SVMsNew algorithm-generating concepts

Smoothing (SSVM)

Implicit Lagrangian (LSVM)Algorithms

Fast : SSVMMassive: LSVM, RSVM

Future Research

TheoryConcave minimization

Concurrent feature & data selection Multiple-instance problems

SVMs as complementarity problems

Algorithms

Multicategory classification algorithms

Kernel methods in nonlinear programming

Chunking for massive classification: 108

Talk & Papers Available on Web

www.cs.wisc.edu/~olvi

Mathematical Programming in Support Vector Machines Olvi L. Mangasarian University of Wisconsin -...

Documents

Feature Selection in k-Median Clustering Olvi Mangasarian and Edward Wild University of Wisconsin - Madison

OLVI OYJ Y-tunnus 0170318-9...solla yhteenlasketun myyntivolyymin kasva-essa 1,5 prosenttia. Myyntivolyymi, miljoonaa litraa 1-12/ 2017 1-12/ 2016 Suomi (Olvi Oyj) 199,7 178,0 Viro

Olvi L. Mangasarian with G. M. Fung, Y.-J. Lee, J.W. Shavlik, W. H. Wolberg

OLVI De Reuzenboom - Verbetersleutel wiskunde, taal en ......2020/04/02 · Verbetersleutel – wiskunde, taal en spelling – Week 2Spelling

Kalkulus Variasitbakhtiar.staff.ipb.ac.id/files/2016/02/handout4.pdfOutline Pengantar: syarat perlu, syarat cukup Variasi pertama, variasi kedua Syarat cukup 1 Teorema Mangasarian

ICC Module Computation –Theory of Computation Feedback · ICC Module Computation –Theory of Computation 17 § Every algorithm has a finite description, which is a text written

FINANCIAL STATEMENTS 2014 - olvigroup.fi of Directors‘ Report 3 ... Olvi Group‘s overall performance remained ... as well as project-specific

Makalah Pengantar Metabolisme (Olvi Merdeka Putri TK1 Kebidanan STIKes Piala Sakti)

Nonlinear Data Discrimination via Generalized Support Vector Machines David R. Musicant and Olvi L. Mangasarian University of Wisconsin - Madison musicant

Support Vector Machines for Data Fitting and Classification David R. Musicant with Olvi L. Mangasarian UW-Madison Data Mining Institute Annual Review June

VUOSIKERTOMUS 2017 - Olvi · 2018-10-14 · Sivu 02. Suomi 100-juhlavuotena OLVI-säätiö osallistui 50.000 euron panoksella Vuosi- ... (vuodet 2016 ja 2017, yhteensä 200.000 euroa)

Multiple Instance Learning via Successive Linear Programming Olvi Mangasarian Edward Wild University of Wisconsin-Madison

Proximal Plane Classification KDD 2001 San Francisco August 26-29, 2001 Glenn Fung Olvi Mangasarian Second Annual Review June 1, 2001 Data Mining Institute

Knowledge-Based Support Vector Machine Classifierspapers.nips.cc/paper/2222-knowledge-based-support...Knowledge-Based Support Vector Machine Classifiers Glenn M. Fung, Olvi L. Mangasarian

Mental computation strategies - e q · PDF fileMental . computation strategies . addition, subtraction, multiplication and . division . Years 2 to 5. Mental computation. i Mental computation

Appendix Computation

Mathematical Programming in Data Mining Author: O. L. Mangasarian Advisor: Dr. Hsu Graduate: Yan-Cheng Lin

Optimization in Data Mining Olvi L. Mangasarian with G. M. Fung, J. W. Shavlik, Y.-J. Lee, E.W. Wild & Collaborators at ExonHit – Paris University of Wisconsin

Nonlinear Programming - Olvi L. Mangasarian

Knowledge-Based Breast Cancer Prognosis Olvi Mangasarian UW Madison & UCSD La Jolla Edward Wild UW Madison Computation and Informatics in Biology and Medicine