22
Locally Linear Locally Linear Support Vector Machines Support Vector Machines Ľ Ľ ubor Ladick ubor Ladick ý ý Philip Philip H.S. Torr H.S. Torr

Locally Linear Support Vector Machines Ľubor Ladický Philip H.S. Torr

Embed Size (px)

DESCRIPTION

Given : { [x 1, y 1 ], [x 2, y 2 ], … [x n, y n ] } x i - feature vector y i  {-1, 1} - label The task is : To design y = H(x) that predicts a label y for a new x Several approaches proposed : Support Vector Machines, Boosting, Random Forests, Neural networks, Nearest Neighbour,.. Binary classification task

Citation preview

Page 1: Locally Linear Support Vector Machines Ľubor Ladický Philip H.S. Torr

Locally LinearLocally LinearSupport Vector MachinesSupport Vector Machines

ĽĽubor Ladickubor Ladický ý Philip H.S. TorrPhilip H.S. Torr

Page 2: Locally Linear Support Vector Machines Ľubor Ladický Philip H.S. Torr

Given : { [x1, y1], [x2, y2], … [xn, yn] }

Binary classification task

xi - feature vector yi {-1, 1} - label

The task is :

To design y = H(x) that predicts a label y for a new x

Page 3: Locally Linear Support Vector Machines Ľubor Ladický Philip H.S. Torr

Given : { [x1, y1], [x2, y2], … [xn, yn] }

xi - feature vector yi {-1, 1} - label

The task is :

To design y = H(x) that predicts a label y for a new x

Several approaches proposed :Support Vector Machines, Boosting, Random Forests, Neural networks, Nearest Neighbour, ..

Binary classification task

Page 4: Locally Linear Support Vector Machines Ľubor Ladický Philip H.S. Torr

Linear SVMs• Fast training and evaluation• Applicable to large scale data sets• Low discriminative power

Kernel SVMs• Slow training and evaluation• Not feasible to too large data sets• Much better performance for hard problems

Linear vs. Kernel SVMs

Page 5: Locally Linear Support Vector Machines Ľubor Ladický Philip H.S. Torr

The goal is to design an SVM with:• Good trade-off between the performance and speed• Scalability to large scale data sets (solvable using SGD)

Motivation

Page 6: Locally Linear Support Vector Machines Ľubor Ladický Philip H.S. Torr

Points approximated as a weighted sum of anchor points

v – anchor points γ V(x) - coordinates

Local codings

Page 7: Locally Linear Support Vector Machines Ľubor Ladický Philip H.S. Torr

Points approximated as a weighted sum of anchor points

Coordinates obtained using :• Distance based methods (Gemert et al. 2008, Zhou et al. 2009)• Reconstruction methods

(Roweis et al.2000, Yu et al. 2009, Gao et al. 2010)

v – anchor points γ V(x) - coordinates

Local codings

Page 8: Locally Linear Support Vector Machines Ľubor Ladický Philip H.S. Torr

For a normalised codings and any Lipschitz function f : (Yu et al. 2009)

v – anchor points γ V(x) - coordinates

Local codings

Page 9: Locally Linear Support Vector Machines Ľubor Ladický Philip H.S. Torr

The form of the classifier

Weights w and bias b obtained as :

(Vapnik & Learner 1963, Cortes & Vapnik 1995)

[xk, yk] – training samples λ- regularisation weight

S – number of samples

Linear SVMs

Page 10: Locally Linear Support Vector Machines Ľubor Ladický Philip H.S. Torr

The decision boundary should be smooth• Approximately linear is sufficiently small region

Locally Linear SVMs

Page 11: Locally Linear Support Vector Machines Ľubor Ladický Philip H.S. Torr

The decision boundary should be smooth• Approximately linear is sufficiently small region

Locally Linear SVMs

Page 12: Locally Linear Support Vector Machines Ľubor Ladický Philip H.S. Torr

The decision boundary should be smooth• Approximately linear is sufficiently small region

Locally Linear SVMs

Page 13: Locally Linear Support Vector Machines Ľubor Ladický Philip H.S. Torr

The decision boundary should be smooth• Approximately linear is sufficiently small region

• Curvature is bounded• Functions wi(x) and b(x) are Lipschitz

• wi(x) and b(x) can be approximated using local coding

Locally Linear SVMs

Page 14: Locally Linear Support Vector Machines Ľubor Ladický Philip H.S. Torr

Locally Linear SVMs

The decision boundary should be smooth• Approximately linear is sufficiently small region

• Curvature is bounded• Functions wi(x) and b(x) are Lipschitz

• wi(x) and b(x) can be approximated using local coding

Page 15: Locally Linear Support Vector Machines Ľubor Ladický Philip H.S. Torr

The classifier takes the form :

Weights W and biases b are obtained as :

where

Locally Linear SVMs

Page 16: Locally Linear Support Vector Machines Ľubor Ladický Philip H.S. Torr

Optimised using stochastic gradient descent (Bordes et al. 2005) :

Locally Linear SVMs

Page 17: Locally Linear Support Vector Machines Ľubor Ladický Philip H.S. Torr

• Generalisation of Linear SVM on x :

- representable by W = (w w ..)T and b = (b’ b’ ..)T

Generalisation of Linear SVM on γ:

- representable by W = 0 and b = w

• Generalisation of model selecting Latent(MI)-SVM :

- representable by γ = (0, 0, .., 1, .. 0)

Relation to other models

Page 18: Locally Linear Support Vector Machines Ľubor Ladický Philip H.S. Torr

The finite kernel classifier takes the form

where is the kernel function

Weights W and b obtained as :

Extension to finite kernels

Page 19: Locally Linear Support Vector Machines Ľubor Ladický Philip H.S. Torr

MNIST, LETTER & USPS datasets• Anchor points obtained using K-means clustering• Coordinates evaluated on kNN (k = 8) (slow part)• Coordinates obtained using weighted inverse distance• Raw data used

CALTECH-101 (15 training samples per class)• Coordinates evaluated on kNN (k = 5)• Approximated Intersection kernel used (Vedaldi & Zisserman 2010)• Spatial pyramid of BOW features• Coordinates evaluated based on histogram over whole image

Experiments

Page 20: Locally Linear Support Vector Machines Ľubor Ladický Philip H.S. Torr

• MNIST

Experiments

Page 21: Locally Linear Support Vector Machines Ľubor Ladický Philip H.S. Torr

• UIUC / LETTER

• Caltech-101

Experiments

Page 22: Locally Linear Support Vector Machines Ľubor Ladický Philip H.S. Torr

We propose novel Locally Linear SVM formulation with Good trade-off between speed and performance Scalability to large scale data sets Easy to implement

Optimal way of learning anchor points is an open question

Questions ?

Conclusions