Trading Convexity for Scalability

Trading Convexity for Scalability

Marco A. AlvarezCS7680

Department of Computer ScienceUtah State University

Paper Collobert, R., Sinz, F., Weston, J., and Bottou, L.

2006. Trading convexity for scalability. In Proceedings of the 23rd International Conference on Machine Learning (Pittsburgh, Pennsylvania, June 25 - 29, 2006). ICML '06, vol. 148. ACM Press, New York, NY, 201-208.

Introduction Previously in Machine Learning

Non-convex cost function in MLP Difficult to optimize Work efficiently

SVM are defined by a convex function Easier optimization (algorithms) Unique solution (we can write theorems)

Goal of the paper Sometimes non-convexity has benefits

Faster == training and testing (less support vectors) Non-convex SVMs (faster and sparser) Fast transductive SVMs

From SVM Decision function

Primal formulation

Minimize ||w|| so that margin is maximized w is a combination of a small number of data (sparsity) Decision boundary is determined by the support vectors

Dual formulation

y=w⋅x b

minw,b

12∥w∥2C⋅∑

iH1[ y i⋅y x i]

min

G =12∑i , j i jx i x j−∑

iyii

s.t. ∑i

i=0

0 y i iC

SVM problem Number of support vectors increases linearly with L Cost attributed to one example (x,y):

From:

C H 1 [ y y x ]

Ramp Loss Function Given: z= y y x Outliers

Non SV

R s z =H 1 z −H s z

Concave-Convex Procedure (CCCP) Given a cost function: Decompose into a convex part and a concave part

Is guaranteed to decrease at each iteration

J

J = J VEX J CAV

Using the Ramp Loss

CCCP for Ramp Loss

Results

Speedup

Time and Number of SVs

Transductive SVMs

Loss Function Cost to be minimized:

Balancing Constraint Necessary for TSVMs

Results

Training Time

Quadratic Fit

Documents

Trading Convexity for Scalability