37
Machine Learning for Language Technology Lecture 9: Perceptron Marina San2ni Department of Linguis2cs and Philology Uppsala University, Uppsala, Sweden Autumn 2014 Acknowledgement: Thanks to Prof. Joakim Nivre for course design and materials 1

Lecture 9 Perceptron

Embed Size (px)

DESCRIPTION

Feature representation, Perceptron, Margin and Separability, Main Theorem.

Citation preview

Page 1: Lecture 9 Perceptron

Machine  Learning  for  Language  Technology    Lecture  9:  Perceptron  

Marina  San2ni  Department  of  Linguis2cs  and  Philology  Uppsala  University,  Uppsala,  Sweden  

 Autumn  2014  

 Acknowledgement:  Thanks  to  Prof.  Joakim  Nivre  for  course  design  and  materials  

1  

Page 2: Lecture 9 Perceptron

Inputs  and  Outputs  

Page 3: Lecture 9 Perceptron

Feature  Representa2on  

Page 4: Lecture 9 Perceptron

Features  and  Classes  

Page 5: Lecture 9 Perceptron

Examples  (i)  

Page 6: Lecture 9 Perceptron

Examples  (ii)  

Page 7: Lecture 9 Perceptron

Block  Feature  Vectors  

Page 8: Lecture 9 Perceptron

Representa2on  

Linear  Classifiers:  Repe22on  &  Extension   8  

Page 9: Lecture 9 Perceptron
Page 10: Lecture 9 Perceptron
Page 11: Lecture 9 Perceptron
Page 12: Lecture 9 Perceptron
Page 13: Lecture 9 Perceptron
Page 14: Lecture 9 Perceptron
Page 15: Lecture 9 Perceptron

Linear  classifiers  (atomic  classes)      

Linear  Classifiers:  Repe22on  &  Extension   15  

•  Assump2on:  data  must  be  linearily  separable  

Page 16: Lecture 9 Perceptron

Perceptron  

Page 17: Lecture 9 Perceptron

Perceptron  (i)  

Page 18: Lecture 9 Perceptron

Perceptron  Learning  Algorithm  

Page 19: Lecture 9 Perceptron

Separability  and  Margin  (i)  

Page 20: Lecture 9 Perceptron

Separability  and  Margin  (ii)  

Linear  Classifiers:  Repe22on  &  Extension   20  

•  Given  a  training  instance,  let  Y  bar  t  be  the  set  of  all  labels  that  are  incorrect,  let’s  define  the  set  of  incorrect  labels  minus  the  correct  labels  for  that  instance.  

•   Then  we  say  that  a  training  set  is  separable  with  a  margin  gamma,  if  there  exists  a  weight  vector  w  that  has  a  certain  norm  (ie  1),  

The score that we get when we use this vector w minus the score of every incorrect label is at least gamma

Page 21: Lecture 9 Perceptron

Separability  and  Margin  (iii)  •  IMPORTANT:  for  every  training  instance  the  score  that  we  

get  when  we  use  the  training  vector  w  minus  the  score  of  every  incorrect  label  is  at  least  a  certain  margin  gamma  (ɣ).  That  is,  the  margin  ɣ  is  the  smallest  difference  between  the  score  of  the  right  class  and  the  best  score  of  the  incorrect  class.    

 

The higher the weights, the greater the norms. And we want this to be 1 (normalization).

There  are  different  ways  of  measuring  the  length/magnitude  of  a  vector  and  they  are  known  as  norms.   The  Eucledian  norm  (or  L2  norm)  says:  take  all  the  values  of  the  weight  vector,  square  them  and  sum  them    up,  then  take  the  square  root  .

Page 22: Lecture 9 Perceptron

Perceptron  

Linear  Classifiers:  Repe22on  &  Extension   22  

Page 23: Lecture 9 Perceptron

Perceptron  Learning  Algorithm  

Linear  Classifiers:  Repe22on  &  Extension   23  

Page 24: Lecture 9 Perceptron

Main  Theorem  

Page 25: Lecture 9 Perceptron

Linear  Classifiers:  Repe22on  &  Extension  25  

Perceptron  Theorem  

• For  any  training  set  that  is  separable  with  some  margin,  we  can  prove  that  the  number  of  mistakes  during  training  -­‐-­‐  if  we  keep  itera2ng  over  the  training  set  -­‐-­‐  is  bounded  by  a  quan2ty  that  depends  on  the  size  of  the  margin  (see  proofs  in  the  Appendix,  slides  Lecture  3).    

• R  depends  on  the  norm  of  the  largest  difference  you  can  have  between  feature  vectors.  The  larger    R,  the  more  spread  out  the  data,  the  more  errors  we  can  poten2ally  make.    Similarly  if  gamma  is  larger  we  will  make  fewer  mistakes.    

Page 26: Lecture 9 Perceptron

Summary  

Page 27: Lecture 9 Perceptron

Basically…  

Linear  Classifiers:  Repe22on  &  Extension  27  

 ....  if  it  is  possible  to  find  such  a  weight  vector  for  some  posiAve  margin  gamma,  then  the  training  set  is  separable.    

So...  if  the  training  set  is  separable,  Perceptron  will  eventually  find  the  weight  vector  that  separates  the  data.    The  2me  it  takes  depends  on  the  property  of  the  data.  But  aeer  a  finite  number  of  itera2on,  the  training  set  will  converge  to  0.      However...  although  we  find  the  perfect  weight  vector  for  separa2ng  the  training  data,  it  might  be  the  case  that  the  classifier  has  not  good  generaliza2on  (do  you  remember  the  difference  between  empirical  error  and  generaliza2on  error?)      So,  with  Perceptron,  we  have  a  fixed  norm  (=1)  and  variable  margin  (>0).    

Page 28: Lecture 9 Perceptron

Appendix:  Proofs  and  Deriva2ons  

Page 29: Lecture 9 Perceptron
Page 30: Lecture 9 Perceptron
Page 31: Lecture 9 Perceptron
Page 32: Lecture 9 Perceptron
Page 33: Lecture 9 Perceptron
Page 34: Lecture 9 Perceptron
Page 35: Lecture 9 Perceptron
Page 36: Lecture 9 Perceptron
Page 37: Lecture 9 Perceptron