Lec5 Class Margin

  • Upload
    dshong

  • View
    227

  • Download
    0

Embed Size (px)

Citation preview

  • 8/9/2019 Lec5 Class Margin

    1/28

    Linear Classication: Maximizing the Margin

    Yufei TaoDepartment of Computer Science and Engineering

    Chinese University of Hong Kong

    1 / 28 Yufei Tao Linear Classication: Maximizing the Margin

    http://find/
  • 8/9/2019 Lec5 Class Margin

    2/28

    Consider that we are given a linearly separable dataset P in R d

    . In otherwords, P has points of two colors: red and blue; and we can nd a planec 1x 1 + c 2x 2 + ... + c d x d = 0 such that given a point p (x 1,..., x d ) in P :

    if p is red, then c 1x 1 + c 2x 2 + ... + c d x d > 0.

    if p is blue, then c 1x 1 + c 2x 2 + ... + c d x d < 0.

    There can be many such separation planes. So far we have been satisedwith nding any of them. In this lecture, we will introduce a metric tomeasure the quality of a plane, and then discuss how to nd the bestplane according to the metric.

    2 / 28 Yufei Tao Linear Classication: Maximizing the Margin

    http://find/http://goback/
  • 8/9/2019 Lec5 Class Margin

    3/28

    Given a separation plane with respect to P , we dene its margin to beits smallest distance to the points in P .

    Example 1.

    margin margin

    Which plane would you choose?

    3 / 28 Yufei Tao Linear Classication: Maximizing the Margin

    http://find/
  • 8/9/2019 Lec5 Class Margin

    4/28

    Denition 2.

    Let P be a linearly separable dataset in R d . The goal of the large margin

    separation problem is to nd a separation plane with the maximummargin.

    margin

    Any algorithm solving this problem is called a support vector machine.

    4 / 28 Yufei Tao Linear Classication: Maximizing the Margin

    http://find/
  • 8/9/2019 Lec5 Class Margin

    5/28

    Next, we will discuss two methods to approach the problem. The rstone nd gives the optimal solution, but is quite complicated and (often)computationally expensive. The second method, on the other hand, is

    much simpler and (often) much faster, but gives an approximate solutionwhich is nearly optimal.

    5 / 28 Yufei Tao Linear Classication: Maximizing the Margin

    http://find/
  • 8/9/2019 Lec5 Class Margin

    6/28

    Finding the Optimal Plane

    6 / 28 Yufei Tao Linear Classication: Maximizing the Margin

    http://find/
  • 8/9/2019 Lec5 Class Margin

    7/28

    We will model the problem as a quadratic programing problem. Towardsthat purpose, let us do some analysis. Consider an arbitrary separationplane c 1x 1 + c 2x 2 + ... + c d x d = 0. Imagine that there are two copies of

    the plane. One person moves a copy up, while another person moves theother copy down, both at the same speed. They stop as soon as one of the two copies hits a point in P .

    margin

    1

    2

    7 / 28 Yufei Tao Linear Classication: Maximizing the Margin

    http://find/
  • 8/9/2019 Lec5 Class Margin

    8/28

    Now, focus on the two copies of the plane in their nal positions. If onecopy has equation c 1x 1 + c 2x 2 + ... + c d x d = c d +1 , then the other copymust have equation c 1x 1 + c 2x 2 + ... + c d x d =

    c d +1 . Here c d +1 is a

    strictly positive value. Let p (x 1,..., x d ) be a point in P . We must have(think: why?):

    if p is red, then c 1x 1 + c 2x 2 + ... + c d x d c d +1 ;if p is blue, then c 1x 1 + c 2x 2 + ... + c d x d c d +1 .

    By dividing c d +1 on both sides of each inequality, we have:if p is red, then w 1x 1 + w 2x 2 + ... + w d x d 1if p is blue, then w 1x 1 + w 2x 2 + ... + w d x d 1

    where

    w i = c i c d +1

    .

    8 / 28 Yufei Tao Linear Classication: Maximizing the Margin

    http://find/
  • 8/9/2019 Lec5 Class Margin

    9/28

    We will refer to the following plane as 1:

    w 1x 1 + w 2x 2 + ... + w d x d = 1

    the following plane as 2:

    w 1x 1 + w 2x 2 + ... + w d x d = 1The margin of the original separation plane is exactly half of the distancebetween 1 and 2:

    margin

    1

    2

    9 / 28 Yufei Tao Linear Classication: Maximizing the Margin

    http://find/
  • 8/9/2019 Lec5 Class Margin

    10/28

    Lemma 3.

    The distance between 1 and 2 is 2 P d i =1 w 2i .

    To prove the lemma, we will introduce vector ~ w = [w 1, w 2,..., w d ]. Also,given a point p (x 1 , x 2,..., x d ), we dene vector ~ p = [x 1, x 2,..., x d ]. Thus,the equation of 1 is ~ w ~ p = 1, and that of 2 is ~ w ~ p = 1. Note that| ~ w | = q P

    d i =1 w

    2i .

    10 / 28 Yufei Tao Linear Classication: Maximizing the Margin

    http://find/
  • 8/9/2019 Lec5 Class Margin

    11/28

    Proof of Lemma 3Take an arbitrary point p 1 on 1 , and an arbitrary poitn p 2 on 2 . Hence,

    ~ w ~ p 1 = 1 and ~ w ~ p 2 = 1. It follows that ~ w (~ p 1 ~ p 2) = 2.1

    2

    p 1

    p 2

    ~ p 1 ~ p 2

    ~ w

    The distance between the two planes is precisely ~ w | ~ w | (~ p 1 ~ p 2) = 2| ~ w | .

    11 / 28 Yufei Tao Linear Classication: Maximizing the Margin

    http://find/
  • 8/9/2019 Lec5 Class Margin

    12/28

    In summary of the above, the essence of the large margin separationproblem is essentially the following problem. We want to nd w 1, ..., w d to minimize Pd i =1 w 2i (i.e., maximizing 2 P d i =1 w 2i ) subject to thefollowing constraints:

    For each red point p (x 1,..., x d ) in P :

    w 1x 1 + w 2x 2 + ... + w d x d 1For each blue point p (x 1,..., x d ) in P :

    w 1x 1 + w 2x 2 + ... + w d x d 1.

    This is an instance of quadratic programming .

    12 / 28 Yufei Tao Linear Classication: Maximizing the Margin

    http://find/
  • 8/9/2019 Lec5 Class Margin

    13/28

    We will not go into the details of quadratic programming, except tomention:

    The instance of quadratic programming on the previous slide ispolynomial-time solvable. The algorithm, however, is exceedinglycomplex and falls out of the scope of the course.

    There are standard packages online that will allow you to solve theinstance within a reasonable amount of time, provided that theinput set P is not too large.

    13 / 28 Yufei Tao Linear Classication: Maximizing the Margin

    http://find/
  • 8/9/2019 Lec5 Class Margin

    14/28

    Finding an Approximate Plane

    14 / 28 Yufei Tao Linear Classication: Maximizing the Margin

    http://find/http://goback/
  • 8/9/2019 Lec5 Class Margin

    15/28

    Since quadratic programming is complex and expensive, next we tradeprecision for simplicity (and usually, also efficiency). Let opt be themargin of the optimal solution to the large margin separation problem.We say that a separation plane is c -approximate if its margin is at least opt / c .

    We will give a simple algorithm to nd a 4-approximate separation plane.Here, the constant 4 is chosen to simplify our presentation. In fact, ouralgorithm can be easily extended to obtain a (1 + )-approximateseparation plane for any arbitrarily small > 0.

    15 / 28 Yufei Tao Linear Classication: Maximizing the Margin

    http://find/
  • 8/9/2019 Lec5 Class Margin

    16/28

    Let us rst assume that we know a value satisfying opt (we willclarify how to nd later). Recall that a separation plane has theequation of c 1x 1 + c 2x 2 + ... + c d x d = 0. Dene vector ~ c = [c 1, c 2,..., c d ],and refer to the plane as the plane determined by ~ c . The goal is to nd agood ~ c .

    Our weapon is once again Perceptron. The diff erence from before is thatwe will now correct our ~ c not only when a point falls on the wrong side

    of the plane determined by ~ c , but also when the point is too close to theplane. Specically, we say that a point p causes a violation in any of thefollowing situations:

    Its distance to the plane determined by ~ c is less than or equal to / 2, regardless of the color.

    p is red but ~ c ~ p < 0.

    p is blue but ~ c ~ p > 0.

    16 / 28 Yufei Tao Linear Classication: Maximizing the Margin

    http://find/
  • 8/9/2019 Lec5 Class Margin

    17/28

    Margin Perceptron

    The algorithm starts with ~ c = [0, 0, ..., 0], and then runs in iterations.

    In each iteration, it simply checks whether any point in p P causes aviolation. If so, the algorithm adjusts ~ c as follows:If p is red, then ~ c

    ~ c + ~ p .

    If p is blue, then ~ c ~ c ~ p .As soon as ~ c has been adjusted, the current iteration nishes; and a newiteration starts.

    The algorithm nishes if no point causes any violation in the currentiteration.

    17 / 28 Yufei Tao Linear Classication: Maximizing the Margin

    http://find/
  • 8/9/2019 Lec5 Class Margin

    18/28

    Dene:

    R = maxp P

    {| ~ p |}

    Namely, R is the maximum distance from the origin to the points in P .

    Theorem 4.Margin Perceptron always terminates in at most 1 + 8 R 2/ 2opt iterations.At termination, it returns a separation plane with margin at least / 2.

    The proof can be found in the appendix.

    18 / 28 Yufei Tao Linear Classication: Maximizing the Margin

    http://find/
  • 8/9/2019 Lec5 Class Margin

    19/28

    Margin Perceptron requires a parameter opt . Theorem 4 tells usthat the larger is, the better the quality guarantee is for the returnedplane. Ideally, we would like to set = opt , which will give us a2-approximate plane. Unfortunately, we do not know the value of opt .

    Next, we present a strategy that allows us to obtain an estimate of opt that is off by at most a factor of 4. The strategy is to start a low value of , and then gradually increases it until we are sure that it is greater than opt already.

    19 / 28 Yufei Tao Linear Classication: Maximizing the Margin

    http://find/
  • 8/9/2019 Lec5 Class Margin

    20/28

    An Incremental Algorithm

    1 R the maximum distance from the origin to the points in P 2 an arbitrary separation plane (obtained by, e.g., running theold Perceptron algorithm or linear programming). Let 0 be the

    margin of this plane.

    3 1 4 0; i 1.4

    Run Margin Perceptron with parameter i . If the algorithm doesnot terminate after 1 + 8R 2

    2i iterations, stop it manually. In this case,

    we return as our nal answer.

    5 Update to the plane returned by the algorithm. Let 0 be themargin of .

    6 i +1 4 0

    7 i i + 1; Go to Line 4.

    20 / 28 Yufei Tao Linear Classication: Maximizing the Margin

    http://find/
  • 8/9/2019 Lec5 Class Margin

    21/28

    Lemma 5.

    Consider the i -th call to Margin Perceptron. If manual terminationoccurs at Line 4, then i > opt . Otherwise, i +1 2 i .

    Proof.First consider the case of manual termination. If

    opt , then by

    Theorem 4, we know that Margin Perceptron should have terminated inat most 1 + 8R

    2

    2opt 1 + 8R 2

    2i iterations. Contradiction.

    Now consider that Margin Perceptron terminated normally. Then, byTheorem 4, 0 (the quality of the plane returned) must be at least i / 2.

    Hence, i +1 = 4 0

    2 i .

    21 / 28 Yufei Tao Linear Classication: Maximizing the Margin

    http://find/
  • 8/9/2019 Lec5 Class Margin

    22/28

    Corollary 6.The algorithm terminates after at most 1 + log 2

    opt 0 calls to Margin

    Perceptron.

    Proof.

    From the previous lemma, we know that i 2i 0. Suppose that the

    algorithm makes k calls to Margin Perceptron. Then, opt k 1 2

    k 1 0. Solving k from the inequality gives thecorollary.

    22 / 28 Yufei Tao Linear Classication: Maximizing the Margin

    http://find/
  • 8/9/2019 Lec5 Class Margin

    23/28

    Theorem 7.

    Our incremental algorithm returns a separation plane with margin at least opt / 4.

    Proof.Suppose that the algorithm terminates after k calls to MarginPerceptron. Hence:

    k > opt

    The plane we return eventually has quality k / 4, which is at least

    opt / 4.

    23 / 28 Yufei Tao Linear Classication: Maximizing the Margin

    http://find/
  • 8/9/2019 Lec5 Class Margin

    24/28

    Appendix

    24 / 28 Yufei Tao Linear Classication: Maximizing the Margin

    http://find/
  • 8/9/2019 Lec5 Class Margin

    25/28

    Proof of Theorem 4.Let ~ u ~ x = 0 be the optimal separation plane with margin opt . Withoutloss of generality, suppose that |~ u | = 1. Hence:

    opt = minp P

    {|~ p ~ u |}

    Recall that the perceptron algorithm adjusts ~ c in each iteration. Let ~ c i (i 1) be the ~ c after the i -th iteration. Also, let ~ c 0 = [0,..., 0] be theinitial ~ c before the rst iteration. Also, let k be the total number of adjustments.

    25 / 28 Yufei Tao Linear Classication: Maximizing the Margin

    http://find/
  • 8/9/2019 Lec5 Class Margin

    26/28

    Proof (cont.).

    We claim that, for any i 0, ~ c i +1 ~ u ~ c i ~ u + opt . Due to symmetry,we prove this only for the case where ~ c i +1 was adjusted from ~ c i becauseof the violation of a red point ~ p .

    In this case, ~ c i +1 = ~ c i + ~ p ; and hence, ~ c i +1 ~ u = ~ c i ~ u + ~ p ~ u . From the

    denition of opt , we know that ~ p ~ u opt . Therefore,~ c i +1 ~ u ~ c i ~ u + opt .It follows that

    |~ c k |

    ~ c k ~ u

    k opt . (1)

    26 / 28 Yufei Tao Linear Classication: Maximizing the Margin

    P f ( )

    http://find/
  • 8/9/2019 Lec5 Class Margin

    27/28

    Proof (cont.).

    We also claim that, for any i 0, |~ c i +1 | |~ c i | + R 2

    2|~ c i | + opt

    2 . Due tosymmetry, we will prove this only for the case where ~ c i +1 was adjusted

    from ~ c i due to the violation of a red point ~ p .~

    ci

    O(origin)

    ~

    p

    ~

    p1

    ~

    p2

    As shown above, ~ p = ~ p 1 + ~ p 2, where ~ p 1 is perpendicular to ~ c i , and ~ p 2 isparallel to ~ c i (and hence, perpendicular to the plane determined by ~ c i ).Therefore, ~ c i +1 = ~ c i + ~ p = ~ c i + ~ p 1 + ~ p 2. The claim is true due to:

    By denition of violation, ~ p 2 is either is to the opposite of ~ c i or hasa norm of |~ p 2 | / 2 opt / 2.Notice that |~ p 1 | |~ p | R . Hence, |~ c i + ~ p 1 |2 = |~ c i |2 + |~ p 1 |2 |~ c i |2 + R 2 (|~ c i | +

    R 2

    2|~ c i | )2. It thus follows that |~ c i + ~ p 1 | |~ c i | +

    R 2

    2|~ c i | .

    27 / 28 Yufei Tao Linear Classication: Maximizing the Margin

    http://find/
  • 8/9/2019 Lec5 Class Margin

    28/28

    Proof (cont.).

    The claim on the previous slide implies that when |~ c i |

    2R 2 opt ,

    |~ c i +1 | |~ c i | + 3 opt 4 . Therefore:

    |~ c k | 2R 2

    opt +

    3k opt 4

    Combining the above and (1) gives:

    k opt 2R 2

    opt +

    3k opt 4

    k

    8R 2

    2opt .

    28 / 28 Yufei Tao Linear Classication: Maximizing the Margin

    http://find/