Lec5 Class Margin

8/9/2019 Lec5 Class Margin

1/28

Linear Classication: Maximizing the Margin

Yufei TaoDepartment of Computer Science and Engineering

Chinese University of Hong Kong

1 / 28 Yufei Tao Linear Classication: Maximizing the Margin
http://find/


2/28

Consider that we are given a linearly separable dataset P in R d

. In otherwords, P has points of two colors: red and blue; and we can nd a planec 1x 1 + c 2x 2 + ... + c d x d = 0 such that given a point p (x 1,..., x d ) in P :

if p is red, then c 1x 1 + c 2x 2 + ... + c d x d > 0.

if p is blue, then c 1x 1 + c 2x 2 + ... + c d x d < 0.

There can be many such separation planes. So far we have been satisedwith nding any of them. In this lecture, we will introduce a metric tomeasure the quality of a plane, and then discuss how to nd the bestplane according to the metric.

http://find/http://goback/


3/28

Given a separation plane with respect to P , we dene its margin to beits smallest distance to the points in P .

Example 1.

margin margin

Which plane would you choose?

http://find/


4/28

Denition 2.

Let P be a linearly separable dataset in R d . The goal of the large margin

separation problem is to nd a separation plane with the maximummargin.

margin

Any algorithm solving this problem is called a support vector machine.

http://find/


5/28

Next, we will discuss two methods to approach the problem. The rstone nd gives the optimal solution, but is quite complicated and (often)computationally expensive. The second method, on the other hand, is

much simpler and (often) much faster, but gives an approximate solutionwhich is nearly optimal.

http://find/


6/28

Finding the Optimal Plane

http://find/


7/28

We will model the problem as a quadratic programing problem. Towardsthat purpose, let us do some analysis. Consider an arbitrary separationplane c 1x 1 + c 2x 2 + ... + c d x d = 0. Imagine that there are two copies of

the plane. One person moves a copy up, while another person moves theother copy down, both at the same speed. They stop as soon as one of the two copies hits a point in P .

margin

1

2

http://find/


8/28

Now, focus on the two copies of the plane in their nal positions. If onecopy has equation c 1x 1 + c 2x 2 + ... + c d x d = c d +1 , then the other copymust have equation c 1x 1 + c 2x 2 + ... + c d x d =

c d +1 . Here c d +1 is a

strictly positive value. Let p (x 1,..., x d ) be a point in P . We must have(think: why?):

if p is red, then c 1x 1 + c 2x 2 + ... + c d x d c d +1 ;if p is blue, then c 1x 1 + c 2x 2 + ... + c d x d c d +1 .

By dividing c d +1 on both sides of each inequality, we have:if p is red, then w 1x 1 + w 2x 2 + ... + w d x d 1if p is blue, then w 1x 1 + w 2x 2 + ... + w d x d 1

where

w i = c i c d +1

.

http://find/


9/28

We will refer to the following plane as 1:

w 1x 1 + w 2x 2 + ... + w d x d = 1

the following plane as 2:

w 1x 1 + w 2x 2 + ... + w d x d = 1The margin of the original separation plane is exactly half of the distancebetween 1 and 2:

margin

1

2

http://find/


10/28

Lemma 3.

The distance between 1 and 2 is 2 P d i =1 w 2i .

To prove the lemma, we will introduce vector ~ w = [w 1, w 2,..., w d ]. Also,given a point p (x 1 , x 2,..., x d ), we dene vector ~ p = [x 1, x 2,..., x d ]. Thus,the equation of 1 is ~ w ~ p = 1, and that of 2 is ~ w ~ p = 1. Note that| ~ w | = q P

d i =1 w

2i .

http://find/


11/28

Proof of Lemma 3Take an arbitrary point p 1 on 1 , and an arbitrary poitn p 2 on 2 . Hence,

~ w ~ p 1 = 1 and ~ w ~ p 2 = 1. It follows that ~ w (~ p 1 ~ p 2) = 2.1

2

p 1

p 2

~ p 1 ~ p 2

~ w

The distance between the two planes is precisely ~ w | ~ w | (~ p 1 ~ p 2) = 2| ~ w | .

http://find/


12/28

In summary of the above, the essence of the large margin separationproblem is essentially the following problem. We want to nd w 1, ..., w d to minimize Pd i =1 w 2i (i.e., maximizing 2 P d i =1 w 2i ) subject to thefollowing constraints:

For each red point p (x 1,..., x d ) in P :

w 1x 1 + w 2x 2 + ... + w d x d 1For each blue point p (x 1,..., x d ) in P :

w 1x 1 + w 2x 2 + ... + w d x d 1.

This is an instance of quadratic programming .

http://find/


13/28

We will not go into the details of quadratic programming, except tomention:

The instance of quadratic programming on the previous slide ispolynomial-time solvable. The algorithm, however, is exceedinglycomplex and falls out of the scope of the course.

There are standard packages online that will allow you to solve theinstance within a reasonable amount of time, provided that theinput set P is not too large.

http://find/


14/28

Finding an Approximate Plane

http://find/http://goback/


15/28

Since quadratic programming is complex and expensive, next we tradeprecision for simplicity (and usually, also efficiency). Let opt be themargin of the optimal solution to the large margin separation problem.We say that a separation plane is c -approximate if its margin is at least opt / c .

We will give a simple algorithm to nd a 4-approximate separation plane.Here, the constant 4 is chosen to simplify our presentation. In fact, ouralgorithm can be easily extended to obtain a (1 + )-approximateseparation plane for any arbitrarily small > 0.

http://find/


16/28

Let us rst assume that we know a value satisfying opt (we willclarify how to nd later). Recall that a separation plane has theequation of c 1x 1 + c 2x 2 + ... + c d x d = 0. Dene vector ~ c = [c 1, c 2,..., c d ],and refer to the plane as the plane determined by ~ c . The goal is to nd agood ~ c .

Our weapon is once again Perceptron. The diff erence from before is thatwe will now correct our ~ c not only when a point falls on the wrong side

of the plane determined by ~ c , but also when the point is too close to theplane. Specically, we say that a point p causes a violation in any of thefollowing situations:

Its distance to the plane determined by ~ c is less than or equal to / 2, regardless of the color.

p is red but ~ c ~ p < 0.

p is blue but ~ c ~ p > 0.

http://find/


17/28

Margin Perceptron

The algorithm starts with ~ c = [0, 0, ..., 0], and then runs in iterations.

In each iteration, it simply checks whether any point in p P causes aviolation. If so, the algorithm adjusts ~ c as follows:If p is red, then ~ c

~ c + ~ p .

If p is blue, then ~ c ~ c ~ p .As soon as ~ c has been adjusted, the current iteration nishes; and a newiteration starts.

The algorithm nishes if no point causes any violation in the currentiteration.

http://find/


18/28

Dene:

R = maxp P

{| ~ p |}

Namely, R is the maximum distance from the origin to the points in P .

Theorem 4.Margin Perceptron always terminates in at most 1 + 8 R 2/ 2opt iterations.At termination, it returns a separation plane with margin at least / 2.

The proof can be found in the appendix.

http://find/


19/28

Margin Perceptron requires a parameter opt . Theorem 4 tells usthat the larger is, the better the quality guarantee is for the returnedplane. Ideally, we would like to set = opt , which will give us a2-approximate plane. Unfortunately, we do not know the value of opt .

Next, we present a strategy that allows us to obtain an estimate of opt that is off by at most a factor of 4. The strategy is to start a low value of , and then gradually increases it until we are sure that it is greater than opt already.

http://find/


20/28

An Incremental Algorithm

1 R the maximum distance from the origin to the points in P 2 an arbitrary separation plane (obtained by, e.g., running theold Perceptron algorithm or linear programming). Let 0 be the

margin of this plane.

3 1 4 0; i 1.4

Run Margin Perceptron with parameter i . If the algorithm doesnot terminate after 1 + 8R 2

2i iterations, stop it manually. In this case,

we return as our nal answer.

5 Update to the plane returned by the algorithm. Let 0 be themargin of .

6 i +1 4 0

7 i i + 1; Go to Line 4.

http://find/


21/28

Lemma 5.

Consider the i -th call to Margin Perceptron. If manual terminationoccurs at Line 4, then i > opt . Otherwise, i +1 2 i .

Proof.First consider the case of manual termination. If

opt , then by

Theorem 4, we know that Margin Perceptron should have terminated inat most 1 + 8R

2

2opt 1 + 8R 2

2i iterations. Contradiction.

Now consider that Margin Perceptron terminated normally. Then, byTheorem 4, 0 (the quality of the plane returned) must be at least i / 2.

Hence, i +1 = 4 0

2 i .

http://find/


22/28

Corollary 6.The algorithm terminates after at most 1 + log 2

opt 0 calls to Margin

Perceptron.

Proof.

From the previous lemma, we know that i 2i 0. Suppose that the

algorithm makes k calls to Margin Perceptron. Then, opt k 1 2

k 1 0. Solving k from the inequality gives thecorollary.

http://find/


23/28

Theorem 7.

Our incremental algorithm returns a separation plane with margin at least opt / 4.

Proof.Suppose that the algorithm terminates after k calls to MarginPerceptron. Hence:

k > opt

The plane we return eventually has quality k / 4, which is at least

opt / 4.

http://find/


24/28

Appendix

http://find/


25/28

Proof of Theorem 4.Let ~ u ~ x = 0 be the optimal separation plane with margin opt . Withoutloss of generality, suppose that |~ u | = 1. Hence:

opt = minp P

{|~ p ~ u |}

Recall that the perceptron algorithm adjusts ~ c in each iteration. Let ~ c i (i 1) be the ~ c after the i -th iteration. Also, let ~ c 0 = [0,..., 0] be theinitial ~ c before the rst iteration. Also, let k be the total number of adjustments.

http://find/


26/28

Proof (cont.).

We claim that, for any i 0, ~ c i +1 ~ u ~ c i ~ u + opt . Due to symmetry,we prove this only for the case where ~ c i +1 was adjusted from ~ c i becauseof the violation of a red point ~ p .

In this case, ~ c i +1 = ~ c i + ~ p ; and hence, ~ c i +1 ~ u = ~ c i ~ u + ~ p ~ u . From the

denition of opt , we know that ~ p ~ u opt . Therefore,~ c i +1 ~ u ~ c i ~ u + opt .It follows that

|~ c k |

~ c k ~ u

k opt . (1)


P f ( )
http://find/


27/28

Proof (cont.).

We also claim that, for any i 0, |~ c i +1 | |~ c i | + R 2

2|~ c i | + opt

2 . Due tosymmetry, we will prove this only for the case where ~ c i +1 was adjusted

from ~ c i due to the violation of a red point ~ p .~

ci

O(origin)

~

p

~

p1

~

p2

As shown above, ~ p = ~ p 1 + ~ p 2, where ~ p 1 is perpendicular to ~ c i , and ~ p 2 isparallel to ~ c i (and hence, perpendicular to the plane determined by ~ c i ).Therefore, ~ c i +1 = ~ c i + ~ p = ~ c i + ~ p 1 + ~ p 2. The claim is true due to:

By denition of violation, ~ p 2 is either is to the opposite of ~ c i or hasa norm of |~ p 2 | / 2 opt / 2.Notice that |~ p 1 | |~ p | R . Hence, |~ c i + ~ p 1 |2 = |~ c i |2 + |~ p 1 |2 |~ c i |2 + R 2 (|~ c i | +

R 2

2|~ c i | )2. It thus follows that |~ c i + ~ p 1 | |~ c i | +

R 2

2|~ c i | .

http://find/


28/28

Proof (cont.).

The claim on the previous slide implies that when |~ c i |

2R 2 opt ,

|~ c i +1 | |~ c i | + 3 opt 4 . Therefore:

|~ c k | 2R 2

opt +

3k opt 4

Combining the above and (1) gives:

k opt 2R 2

opt +

3k opt 4

k

8R 2

2opt .

http://find/

Documents

Lec5 Class Margin