21
10/3/14 1 UVA CS 4501 001 / 6501 – 007 Introduc8on to Machine Learning and Data Mining Lecture 12: Probability and Sta3s3cs Review Yanjun Qi / Jane University of Virginia Department of Computer Science 10/02/14 1 Yanjun Qi / UVA CS 450101650107 Where are we ? Five major sec3ons of this course Regression (supervised) Classifica3on (supervised) Unsupervised models Learning theory Graphical models 10/02/14 2 Yanjun Qi / UVA CS 450101650107

UVACS$4501$+$001$/$6501$–007 $ Introduc8on$to$Machine$Learning$and$Data ...€¦ · 10/3/14 2 Where&are&we&?&!&& Three&major&sec3ons&for&classificaon • We can divide the large

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

Page 1: UVACS$4501$+$001$/$6501$–007 $ Introduc8on$to$Machine$Learning$and$Data ...€¦ · 10/3/14 2 Where&are&we&?&!&& Three&major&sec3ons&for&classificaon • We can divide the large

10/3/14  

1  

UVA  CS  4501  -­‐  001  /  6501  –  007  Introduc8on  to  Machine  Learning  and  Data  

Mining      

Lecture  12:  Probability  and  Sta3s3cs  Review  

Yanjun  Qi  /  Jane    

University  of  Virginia      

Department  of  Computer  Science      

10/02/14   1  

Yanjun  Qi  /  UVA  CS  4501-­‐01-­‐6501-­‐07  

Where  are  we  ?  è    Five  major  sec3ons  of  this  course  

q   Regression  (supervised)  q   Classifica3on  (supervised)  q   Unsupervised  models    q   Learning  theory    q   Graphical  models    

10/02/14   2  

Yanjun  Qi  /  UVA  CS  4501-­‐01-­‐6501-­‐07  

Page 2: UVACS$4501$+$001$/$6501$–007 $ Introduc8on$to$Machine$Learning$and$Data ...€¦ · 10/3/14 2 Where&are&we&?&!&& Three&major&sec3ons&for&classificaon • We can divide the large

10/3/14  

2  

Where  are  we  ?  è    Three  major  sec3ons  for  classifica3on

•  We can divide the large variety of classification approaches into roughly three major types

1. Discriminative - directly estimate a decision rule/boundary - e.g., support vector machine, decision tree 2. Generative: - build a generative statistical model - e.g., naïve bayes classifier, Bayesian networks 3. Instance based classifiers - Use observation directly (no models) - e.g. K nearest neighbors

10/02/14   3  

Yanjun  Qi  /  UVA  CS  4501-­‐01-­‐6501-­‐07  

Today  :  Probability  Review  

•  The  big  picture  •  Events  and  Event  spaces  •  Random  variables  •  Joint  probability,  Marginaliza3on,  condi3oning,  chain  rule,  Bayes  Rule,  law  of  total  probability,  etc.  

10/02/14   4  

Yanjun  Qi  /  UVA  CS  4501-­‐01-­‐6501-­‐07  

Page 3: UVACS$4501$+$001$/$6501$–007 $ Introduc8on$to$Machine$Learning$and$Data ...€¦ · 10/3/14 2 Where&are&we&?&!&& Three&major&sec3ons&for&classificaon • We can divide the large

10/3/14  

3  

The  Big  Picture  

Model    i.e.  Data  genera3ng  

process    

Observed  Data  

Probability  

Es3ma3on  /  learning  /  Inference  /  Data  mining  

But  how  to  specify  a  model?  

10/02/14   5  

Yanjun  Qi  /  UVA  CS  4501-­‐01-­‐6501-­‐07  

Probability  as  frequency    

•  Consider  the  following  ques3ons:  – 1.  What  is  the  probability  that  when  I  flip  a  coin  it  is  “heads”?    

– 2.  why  ?    

– 3.  What  is  the  probability  of  Blue  Ridge  Mountains  to  have  an  erup3ng  volcano  in  the  near  future  ?    

10/02/14   6  

Message:  The  frequen*st  view  is  very  useful,  but  it  seems  that  we  also  use  domain  knowledge  to  come  up  with  probabili*es.  

Yanjun  Qi  /  UVA  CS  4501-­‐01-­‐6501-­‐07  

We  can  count  è  ~1/2  

è  could  not  count    

Adapt  from  Prof.  Nando  de  Freitas’s  review  slides  

Page 4: UVACS$4501$+$001$/$6501$–007 $ Introduc8on$to$Machine$Learning$and$Data ...€¦ · 10/3/14 2 Where&are&we&?&!&& Three&major&sec3ons&for&classificaon • We can divide the large

10/3/14  

4  

Probability  as  a  measure  of  uncertainty  

10/02/14   7  

•  Imagine  we  are  throwing  darts  at  a  wall  of  size  1x1  and  that  all  darts  are  guaranteed  to  fall  within  this  1x1  wall.    

•  What  is  the  probability  that  a  dart  will  hit  the  shaded  area?    

Yanjun  Qi  /  UVA  CS  4501-­‐01-­‐6501-­‐07  

Adapt  from  Prof.  Nando  de  Freitas’s  review  slides  

Probability  as  a  measure  of  uncertainty  

•  Probability  is  a  measure  of  certainty  of  an  event  taking  place.  

•  i.e.  in  the  example,  we  were  measuring  the  chances  of  hi?ng  the  shaded  area.    

10/02/14   8  

prob = #RedBoxes#Boxes

Yanjun  Qi  /  UVA  CS  4501-­‐01-­‐6501-­‐07  

Adapt  from  Prof.  Nando  de  Freitas’s  review  slides  

Page 5: UVACS$4501$+$001$/$6501$–007 $ Introduc8on$to$Machine$Learning$and$Data ...€¦ · 10/3/14 2 Where&are&we&?&!&& Three&major&sec3ons&for&classificaon • We can divide the large

10/3/14  

5  

Today  :  Probability  Review  

•  The  big  picture  •  Sample  space,  Event  and  Event  spaces  •  Random  variables  •  Joint  probability,  Marginaliza3on,  condi3oning,  chain  rule,  Bayes  Rule,  law  of  total  probability,  etc.  

10/02/14   9  

Yanjun  Qi  /  UVA  CS  4501-­‐01-­‐6501-­‐07  

Probability    

10/02/14   10  

Ω:  Elementary  Event  “Throw  a  2”  

Ωdie  =  {1,2,3,4,5,6}  

Ωcoin  =  {H,T}  

The  elements  of  Ω  are  called  elementary  events.  

Yanjun  Qi  /  UVA  CS  4501-­‐01-­‐6501-­‐07  

Page 6: UVACS$4501$+$001$/$6501$–007 $ Introduc8on$to$Machine$Learning$and$Data ...€¦ · 10/3/14 2 Where&are&we&?&!&& Three&major&sec3ons&for&classificaon • We can divide the large

10/3/14  

6  

Probability    

•  Probability  allows  us  to  measure  many  events.  •  The  events  are  subsets  of  the  sample  space  Ω  .  For  example,  for  a  die  we  may  consider  the  following  events:  e.g.,    

   •  Assign  probabili6es  to  these  events:  e.g.,    

10/02/14   11  

EVEN  =  {2,  4,  6}  GREATER  =  {5,  6}  

P(EVEN)    =  1/2  

Yanjun  Qi  /  UVA  CS  4501-­‐01-­‐6501-­‐07  

Adapt  from  Prof.  Nando  de  Freitas’s  review  slides  

Sample  space  and  Events  

•  Ω : Sample  Space,  result  of  an  experiment  •  If  you  toss  a  coin  twice  Ω =

{ΗΗ,ΗΤ,ΤΗ,ΤΤ}

•  Event:  a  subset  of  Ω

•  First  toss  is  head  =  {HH,HT}  •  S:  event  space,  a  set  of  events:

•  Contains  the  empty  event  and  Ω  

10/02/14   12  

Yanjun  Qi  /  UVA  CS  4501-­‐01-­‐6501-­‐07  

Page 7: UVACS$4501$+$001$/$6501$–007 $ Introduc8on$to$Machine$Learning$and$Data ...€¦ · 10/3/14 2 Where&are&we&?&!&& Three&major&sec3ons&for&classificaon • We can divide the large

10/3/14  

7  

Axioms  for  Probability  

•  Defined  over  (Ω,S) s.t.  •  1  >=  P(α)  >=  0  for  all  α  in  S  •  P(Ω)  =  1  •  If  Α, Β  are  disjoint,  then    

•  P(Α  U  Β)  =  p(Α)  +  p(Β)  

•  P(Ω)  =  

   

B1  

B2  B3  B4  

B5  

B6  B7  

P Bi( )∑

10/02/14   13  

Yanjun  Qi  /  UVA  CS  4501-­‐01-­‐6501-­‐07  

OR  opera3on  for  Probability  

•  We  can  deduce  other  axioms  from  the  above  ones  

• Ex:  P(Α  U  Β)  for  non-­‐disjoint  events  

10/02/14   14  

Yanjun  Qi  /  UVA  CS  4501-­‐01-­‐6501-­‐07  

Page 8: UVACS$4501$+$001$/$6501$–007 $ Introduc8on$to$Machine$Learning$and$Data ...€¦ · 10/3/14 2 Where&are&we&?&!&& Three&major&sec3ons&for&classificaon • We can divide the large

10/3/14  

8  

A

10/02/14   15  

Yanjun  Qi  /  UVA  CS  4501-­‐01-­‐6501-­‐07  

A

B

10/02/14   16  

Yanjun  Qi  /  UVA  CS  4501-­‐01-­‐6501-­‐07  

Page 9: UVACS$4501$+$001$/$6501$–007 $ Introduc8on$to$Machine$Learning$and$Data ...€¦ · 10/3/14 2 Where&are&we&?&!&& Three&major&sec3ons&for&classificaon • We can divide the large

10/3/14  

9  

Condi3onal  Probability  

A  

B  

10/02/14   17  

Yanjun  Qi  /  UVA  CS  4501-­‐01-­‐6501-­‐07  

from  Prof.  Nando  de  Freitas’s  review  

Condi3onal  Probability  /  Chain  Rule  

•  More  ways  to  write  out  chain  rule  …    

10/02/14   18  

P(Α,B)  =  p(B|A)p(Α)  P(Α,B)  =  p(A|B)p(Β)    

Yanjun  Qi  /  UVA  CS  4501-­‐01-­‐6501-­‐07  

Page 10: UVACS$4501$+$001$/$6501$–007 $ Introduc8on$to$Machine$Learning$and$Data ...€¦ · 10/3/14 2 Where&are&we&?&!&& Three&major&sec3ons&for&classificaon • We can divide the large

10/3/14  

10  

Rule  of  total  probability    =>  Marginaliza3on  

A  

B1  

B2  B3  B4  

B5  

B6  B7  

p A( ) = P Bi( )P A | Bi( )∑ WHY  ???    

10/02/14   19  

Yanjun  Qi  /  UVA  CS  4501-­‐01-­‐6501-­‐07  

Today  :  Probability  Review  

•  The  big  picture  •  Events  and  Event  spaces  •  Random  variables  •  Joint  probability,  Marginaliza3on,  condi3oning,  chain  rule,  Bayes  Rule,  law  of  total  probability,  etc.  

10/02/14   20  

Yanjun  Qi  /  UVA  CS  4501-­‐01-­‐6501-­‐07  

Page 11: UVACS$4501$+$001$/$6501$–007 $ Introduc8on$to$Machine$Learning$and$Data ...€¦ · 10/3/14 2 Where&are&we&?&!&& Three&major&sec3ons&for&classificaon • We can divide the large

10/3/14  

11  

From  Events  to  Random  Variable  

•  Concise  way  of  specifying  apributes  of  outcomes  •  Modeling  students  (Grade  and  Intelligence):  

•  Ω = all  possible  students  (sample  space)  •  What  are  events  (subset  of  sample  space)  

•  Grade_A  =  all  students  with  grade  A  •  Grade_B  =  all  students  with  grade  B  •  HardWorking_Yes  =  …  who  works  hard  

•  Very  cumbersome  

•  Need  “func3ons”  that  maps  from  Ω to  an  apribute  space  T.  •  P(H  =  YES)  =  P({student  ϵ  Ω : H(student)  =  YES})      10/02/14   21  

Yanjun  Qi  /  UVA  CS  4501-­‐01-­‐6501-­‐07  

Random  Variables  (RV)  Ω  

Yes  

No  

A  

B   A+  

H:  hardworking  

G:Grade  

P(H  =  Yes)  =  P(  {all  students  who  is  working  hard  on  the  course})  

10/02/14   22  

Yanjun  Qi  /  UVA  CS  4501-­‐01-­‐6501-­‐07  

Page 12: UVACS$4501$+$001$/$6501$–007 $ Introduc8on$to$Machine$Learning$and$Data ...€¦ · 10/3/14 2 Where&are&we&?&!&& Three&major&sec3ons&for&classificaon • We can divide the large

10/3/14  

12  

Nota3on  Digression  

•  P(A)  is  shorthand  for  P(A=true)  •  P(~A)  is  shorthand  for  P(A=false)  •  Same  nota3on  applies  to  other  binary  RVs:  P(Gender=M),  P(Gender=F)  

•  Same  nota3on  applies  to  mul*valued  RVs:  P(Major=history),  P(Age=19),  P(Q=c)  

•  Note:  upper  case  lepers/names  for  variables,  lower  case  lepers/names  for  values  

 10/02/14   23  

Yanjun  Qi  /  UVA  CS  4501-­‐01-­‐6501-­‐07  

Discrete  Random  Variables  

•  Random  variables  (RVs)  which  may  take  on  only  a  countable  number  of  dis8nct  values  

•  X  is  a  RV  with  arity  k  if  it  can  take  on  exactly  one  value  out  of  {x1,  …,  xk}  

10/02/14   24  

Yanjun  Qi  /  UVA  CS  4501-­‐01-­‐6501-­‐07  

Page 13: UVACS$4501$+$001$/$6501$–007 $ Introduc8on$to$Machine$Learning$and$Data ...€¦ · 10/3/14 2 Where&are&we&?&!&& Three&major&sec3ons&for&classificaon • We can divide the large

10/3/14  

13  

Probability  of  Discrete  RV  

•  Probability  mass  func3on  (pmf):  P(X  =  xi)  •  Easy  facts  about  pmf  

§  Σi  P(X  =  xi)  =  1  §  P(X  =  xi∩X  =  xj)  =  0  if  i  ≠  j  §  P(X  =  xi  U  X  =  xj)  =  P(X  =  xi)  +  P(X  =  xj)  if  i  ≠  j  §  P(X  =  x1  U  X  =  x2  U  …  U  X  =  xk)  =  1    

10/02/14   25  

Yanjun  Qi  /  UVA  CS  4501-­‐01-­‐6501-­‐07  

Today  :  Probability  Review  

•  The  big  picture  •  Events  and  Event  spaces  •  Random  variables  •  Joint  probability,  Marginaliza3on,  condi3oning,  chain  rule,  Bayes  Rule,  law  of  total  probability,  etc.  

10/02/14   26  

Yanjun  Qi  /  UVA  CS  4501-­‐01-­‐6501-­‐07  

Page 14: UVACS$4501$+$001$/$6501$–007 $ Introduc8on$to$Machine$Learning$and$Data ...€¦ · 10/3/14 2 Where&are&we&?&!&& Three&major&sec3ons&for&classificaon • We can divide the large

10/3/14  

14  

Condi3onal  Probability        

( ) ( )( )

P X YP X Y

P Yx y

x yy

= ∩ == = =

=

( ))(),(|

ypyxpyxP =

But  we  will  always  write  it  this  way:  

events  

X=x  

Y=y  

10/02/14   27  

Yanjun  Qi  /  UVA  CS  4501-­‐01-­‐6501-­‐07  

Marginaliza3on        •  We  know  p(X,  Y),  what  is  P(X=x)?  

•  We  can  use  the  law  of  total  probability,  why?       ( ) ( )

( ) ( )∑

=

=

y

y

yxPyP

yxPxp

|

,

A  

B1  

B2  B3  B4  

B5  

B6  B7  

10/02/14   28  

Yanjun  Qi  /  UVA  CS  4501-­‐01-­‐6501-­‐07  

Page 15: UVACS$4501$+$001$/$6501$–007 $ Introduc8on$to$Machine$Learning$and$Data ...€¦ · 10/3/14 2 Where&are&we&?&!&& Three&major&sec3ons&for&classificaon • We can divide the large

10/3/14  

15  

Marginaliza3on  Cont.        •  Another  example      

( ) ( )

( ) ( )∑

=

=

yz

zy

zyxPzyP

zyxPxp

,

,

,|,

,,

10/02/14   29  

Yanjun  Qi  /  UVA  CS  4501-­‐01-­‐6501-­‐07  

Bayes  Rule  

•  We  know  that  P(rain)  =  0.5  •  If  we  also  know  that  the  grass  is  wet,  then  how  this  affects  our  belief  about  whether  it  rains  or  not?  

 €

P rain |wet( ) =P(rain)P(wet | rain)

P(wet)

P x | y( ) =P(x)P(y | x)

P(y)10/02/14   30  

Yanjun  Qi  /  UVA  CS  4501-­‐01-­‐6501-­‐07  

Page 16: UVACS$4501$+$001$/$6501$–007 $ Introduc8on$to$Machine$Learning$and$Data ...€¦ · 10/3/14 2 Where&are&we&?&!&& Three&major&sec3ons&for&classificaon • We can divide the large

10/3/14  

16  

10/02/14   31  

Yanjun  Qi  /  UVA  CS  4501-­‐01-­‐6501-­‐07  

P(A = a1 | B) =P(B | A = a1)P(A = a1)P(B | A = ai )P(A = ai )

i∑

10/02/14   32  

Yanjun  Qi  /  UVA  CS  4501-­‐01-­‐6501-­‐07  

Page 17: UVACS$4501$+$001$/$6501$–007 $ Introduc8on$to$Machine$Learning$and$Data ...€¦ · 10/3/14 2 Where&are&we&?&!&& Three&major&sec3ons&for&classificaon • We can divide the large

10/3/14  

17  

Bayes  Rule  cont.  

•  You  can  condi3on  on  more  variables  

( ))|(

),|()|(,|zyP

zxyPzxPzyxP =

10/02/14   33  

Yanjun  Qi  /  UVA  CS  4501-­‐01-­‐6501-­‐07  

Condi3onal  Probability  Example  

10/02/14   34  

Yanjun  Qi  /  UVA  CS  4501-­‐01-­‐6501-­‐07  

Adapt  from  Prof.  Nando  de  Freitas’s  review  slides  

Page 18: UVACS$4501$+$001$/$6501$–007 $ Introduc8on$to$Machine$Learning$and$Data ...€¦ · 10/3/14 2 Where&are&we&?&!&& Three&major&sec3ons&for&classificaon • We can divide the large

10/3/14  

18  

Condi3onal  Probability  Example  

10/02/14   35  

Yanjun  Qi  /  UVA  CS  4501-­‐01-­‐6501-­‐07  

Condi3onal  Probability  Example  è  Matrix  Nota3on    

•  X_1:  random  variable  represen3ng  first  draw  •  X_2:  random  variable  represen3ng  second  draw  •  X  ==1  means  “red  ball”  ,  0  mean  “blue  ball”  

10/02/14   36  

Yanjun  Qi  /  UVA  CS  4501-­‐01-­‐6501-­‐07  

Page 19: UVACS$4501$+$001$/$6501$–007 $ Introduc8on$to$Machine$Learning$and$Data ...€¦ · 10/3/14 2 Where&are&we&?&!&& Three&major&sec3ons&for&classificaon • We can divide the large

10/3/14  

19  

Condi3onal  Probability  Example  è  Matrix  Nota3on    

•  P(X1=0)  =  •  P(X1=1)  =  •  P(X2=0|X1=0)  =  •  P(X2=1|X1=0)  =  •  P(X2=0|X1=1)  =  •  P(X2=1|X1=1)  =  

•  è  P(X2=0)  •  è  P(X2=1)  10/02/14   37  

Yanjun  Qi  /  UVA  CS  4501-­‐01-­‐6501-­‐07  

Condi3onal  Probability  Example  è  Matrix  Nota3on    

10/02/14   38  

Yanjun  Qi  /  UVA  CS  4501-­‐01-­‐6501-­‐07  

Page 20: UVACS$4501$+$001$/$6501$–007 $ Introduc8on$to$Machine$Learning$and$Data ...€¦ · 10/3/14 2 Where&are&we&?&!&& Three&major&sec3ons&for&classificaon • We can divide the large

10/3/14  

20  

Condi3onal  Probability  Example  è  Matrix  Nota3on    

10/02/14   39  

Yanjun  Qi  /  UVA  CS  4501-­‐01-­‐6501-­‐07  

Adapt  from  Prof.  Nando  de  Freitas’s  review  slides  

Today  :  Probability  Review  

•  The  big  picture  •  Sample  space,  Event  and  Event  spaces  •  Random  variables  •  Joint  probability,  Marginaliza3on,  condi3oning,  chain  rule,  Bayes  Rule,  law  of  total  probability,  etc.  

10/02/14   40  

Yanjun  Qi  /  UVA  CS  4501-­‐01-­‐6501-­‐07  

Page 21: UVACS$4501$+$001$/$6501$–007 $ Introduc8on$to$Machine$Learning$and$Data ...€¦ · 10/3/14 2 Where&are&we&?&!&& Three&major&sec3ons&for&classificaon • We can divide the large

10/3/14  

21  

References    

q   Prof.  Andrew  Moore’s  review  tutorial  q   Prof.  Nando  de  Freitas’s  review  slides  q   Prof.  Carlos  Guestrin  recita3on  slides    

10/02/14   41  

Yanjun  Qi  /  UVA  CS  4501-­‐01-­‐6501-­‐07