Lecture 10: SVM and MIRA

Preview:

DESCRIPTION

Outline: margin, maximizing margin, the norm, support vectors machines, SVM, Margin Infused Relaxed Algorithm, MIRA

Citation preview

Machine  Learning  for  Language  Technology    Lecture  10:  SVM  and  MIRA  

Marina  San5ni  Department  of  Linguis5cs  and  Philology  Uppsala  University,  Uppsala,  Sweden  

 Autumn  2014  

 Acknowledgement:  Thanks  to  Prof.  Joakim  Nivre  for  course  design  and  materials  

1  

Margin  

Maximizing  Margin  (i)  

Maximizing  Margin  (ii)  

Maximizing  Margin  (iii)  

Max  Margin  =  Min  Norm  

Maximizing  the  margin  

Linear  Classifiers:  Repe55on  &  Extension   7  

•  The  no5on  of  margin:  a  way  of  predic5ng  what  it  will  be  a  good  separa5on  on  the  test  set.    

•  Intui5vely,  if  we  make  the  margin  between  opposite  groups  as  wide  as  possible,  our  chances  to  guess  correct  in  the  test  set  should  increase.    

•  the  generaliza5on  error  on  unseen  test  data  is  propor5onal  to  the  inverse  of  the  margin:  the  larger  the  margin,  the  smaller  the  generaliza5on  error  

Support  Vector  Machines  (SVM)  (i)  

Support  Vector  Machines  (SVM)  (ii)  

Margin  Infused  Relaxed  Algorithm  (MIRA)  

MIRA  

Perceptron  vs.  SVMs/MIRA  

Linear  Classifiers:  Repe55on  &  Extension   12  

Perceptron   SVMs/MIRA  If the training set is separable by some margin, the Perceptron will find a weight vector that separates the data, but it will not necessarily pick up the vector that maximizes the margin. If we are lucky, it will be a vector with the largest margin, but there will be no guarantee.

SVMs/MIRA want a weight vector that maximizes the margin to 1. Here the margin is normalized to 1. So we put a constraint on the weight vector saying that the weight should be such that when you computes the norm we should get 1. We keep the margin fixed and minimize the norm. That is, we want the smallest weight vector that gives us margin 1.

We  do  not  minimize  the  norm,  we  minimize  the  norm  squared  divided  by  2  to  make  the  math  easier  (trust  the  people  who  suggested  this  J  )    

Summary  

The  end  

Recommended