20
2/2/08 CS 461, Winter 2008 1 CS 461: Machine Learning Lecture 5 Dr. Kiri Wagstaff [email protected]

2/2/08CS 461, Winter 20081 CS 461: Machine Learning Lecture 5 Dr. Kiri Wagstaff [email protected] Dr. Kiri Wagstaff [email protected]

Embed Size (px)

Citation preview

Page 1: 2/2/08CS 461, Winter 20081 CS 461: Machine Learning Lecture 5 Dr. Kiri Wagstaff kiri.wagstaff@calstatela.edu Dr. Kiri Wagstaff kiri.wagstaff@calstatela.edu

2/2/08 CS 461, Winter 2008 1

CS 461: Machine LearningLecture 5

Dr. Kiri [email protected]

Dr. Kiri [email protected]

Page 2: 2/2/08CS 461, Winter 20081 CS 461: Machine Learning Lecture 5 Dr. Kiri Wagstaff kiri.wagstaff@calstatela.edu Dr. Kiri Wagstaff kiri.wagstaff@calstatela.edu

2/2/08 CS 461, Winter 2008 2

Plan for Today

Midterm Exam Notes

Room change for 2/16: E&T A129 Sign up for post-midterm conferences Questions on Homework 3?

Probability Axioms

Bayesian Learning Classification Bayes’s Rule Bayesian Networks Naïve Bayes Classifier Association Rules

Page 3: 2/2/08CS 461, Winter 20081 CS 461: Machine Learning Lecture 5 Dr. Kiri Wagstaff kiri.wagstaff@calstatela.edu Dr. Kiri Wagstaff kiri.wagstaff@calstatela.edu

2/2/08 CS 461, Winter 2008 3

Review from Lecture 4

Support Vector Machines Non-separable data: the Kernel Trick Regression

Neural Networks Perceptrons Multilayer Perceptrons Backpropagation

Page 4: 2/2/08CS 461, Winter 20081 CS 461: Machine Learning Lecture 5 Dr. Kiri Wagstaff kiri.wagstaff@calstatela.edu Dr. Kiri Wagstaff kiri.wagstaff@calstatela.edu

2/2/08 CS 461, Winter 2008 4

Probability

Appendix AAppendix A

Page 5: 2/2/08CS 461, Winter 20081 CS 461: Machine Learning Lecture 5 Dr. Kiri Wagstaff kiri.wagstaff@calstatela.edu Dr. Kiri Wagstaff kiri.wagstaff@calstatela.edu

2/2/08 CS 461, Winter 2008 5

Background and Axioms of Probability

Random variable: X Probability: fraction of possible worlds

where X is true

Axioms Positivity Excluded Middle Conjunction (“and”) Disjunction (“or”)

Conditional probabilities

Page 6: 2/2/08CS 461, Winter 20081 CS 461: Machine Learning Lecture 5 Dr. Kiri Wagstaff kiri.wagstaff@calstatela.edu Dr. Kiri Wagstaff kiri.wagstaff@calstatela.edu

2/2/08 CS 461, Winter 2008 6

Bayesian Learning

Chapter 3Chapter 3

Page 7: 2/2/08CS 461, Winter 20081 CS 461: Machine Learning Lecture 5 Dr. Kiri Wagstaff kiri.wagstaff@calstatela.edu Dr. Kiri Wagstaff kiri.wagstaff@calstatela.edu

2/2/08 CS 461, Winter 2008 7

Making Inferences with Probability

Result of tossing a coin is {Heads,Tails} Random var Coin {Heads,Tails}

Bernoulli: P{Coin=Heads} = po, P{Coin=Tails} = 1-po

Sample: X = {xt }Nt =1

Estimation: po = #{Heads}/#{Tosses}

Prediction of next toss: Heads if po > ½, Tails otherwise

[Alpaydin 2004 The MIT Press]

Page 8: 2/2/08CS 461, Winter 20081 CS 461: Machine Learning Lecture 5 Dr. Kiri Wagstaff kiri.wagstaff@calstatela.edu Dr. Kiri Wagstaff kiri.wagstaff@calstatela.edu

2/2/08 CS 461, Winter 2008 8

Classification

Credit scoring: Inputs are income and savings Output is low-risk vs high-risk

Input: x = [x1,x2]T ,Output: C {0,1} Prediction:

[Alpaydin 2004 The MIT Press]

⎩⎨⎧

==>==

⎩⎨⎧

=>==

otherwise 0

)|0()|1(if 1 choose

lyequivalent or

otherwise 0

50)|1(if 1 choose

2121

21

C

C

C

C

,xxCP ,xxCP

. ,xxCP

Page 9: 2/2/08CS 461, Winter 20081 CS 461: Machine Learning Lecture 5 Dr. Kiri Wagstaff kiri.wagstaff@calstatela.edu Dr. Kiri Wagstaff kiri.wagstaff@calstatela.edu

2/2/08 CS 461, Winter 2008 9

Bayes’s Rule

P C | x( ) =P C( ) p x |C( )

p x( )

posterior

likelihoodprior

evidence

P C = 0( ) + P C =1( ) =1

p x( ) = p x |C =1( )P C =1( ) + p x |C = 0( )P C = 0( )

p C = 0 | x( ) + P C =1 | x( ) =1

[Alpaydin 2004 The MIT Press]

Page 10: 2/2/08CS 461, Winter 20081 CS 461: Machine Learning Lecture 5 Dr. Kiri Wagstaff kiri.wagstaff@calstatela.edu Dr. Kiri Wagstaff kiri.wagstaff@calstatela.edu

2/2/08 CS 461, Winter 2008 10

Causes and Bayes’s Rule

Diagnostic inference:Knowing that the grass is wet, what is the probability that rain is the cause?causal

diagnostic

P R |W( ) =P W |R( )P R( )

P W( )

=P W |R( )P R( )

P W |R( )P R( ) + P W |~ R( )P ~ R( )

=0.9 × 0.4

0.9 × 0.4 + 0.2 × 0.6= 0.75

[Alpaydin 2004 The MIT Press]

Page 11: 2/2/08CS 461, Winter 20081 CS 461: Machine Learning Lecture 5 Dr. Kiri Wagstaff kiri.wagstaff@calstatela.edu Dr. Kiri Wagstaff kiri.wagstaff@calstatela.edu

2/2/08 CS 461, Winter 2008 11

Causal vs. Diagnostic Inference

Causal inference: If the sprinkler is on, what is the probability that the grass is wet?

Diagnostic inference: If the grass is wet, what is the probabilitythat the sprinkler is on? P(S|W) = 0.35 > 0.2 P(S|R,W) = 0.21Explaining away: Knowing that it has rained decreases the probability that the sprinkler is on.

[Alpaydin 2004 The MIT Press]

P(W|S) = P(W|R,S) P(R|S) + P(W|~R,S) P(~R|S)

= P(W|R,S) P(R) + P(W|~R,S) P(~R)

= 0.95 0.4 + 0.9 0.6 = 0.92

Page 12: 2/2/08CS 461, Winter 20081 CS 461: Machine Learning Lecture 5 Dr. Kiri Wagstaff kiri.wagstaff@calstatela.edu Dr. Kiri Wagstaff kiri.wagstaff@calstatela.edu

2/2/08 CS 461, Winter 2008 12

Bayesian Networks: Causes

Causal inference:P(W|C) = P(W|R,S) P(R,S|C) +

P(W|~R,S) P(~R,S|C) +

P(W|R,~S) P(R,~S|C) +

P(W|~R,~S) P(~R,~S|C)

and use the fact that P(R,S|C) = P(R|C) P(S|C)

[Alpaydin 2004 The MIT Press]

Page 13: 2/2/08CS 461, Winter 20081 CS 461: Machine Learning Lecture 5 Dr. Kiri Wagstaff kiri.wagstaff@calstatela.edu Dr. Kiri Wagstaff kiri.wagstaff@calstatela.edu

2/2/08 CS 461, Winter 2008 13

Bayesian Networks: Classification

diagnostic

P (C | x )

Bayes rule inverts the arc:

( ) ( ) ( )( )x

xx

pCPCp

CP|

| =

[Alpaydin 2004 The MIT Press]

Page 14: 2/2/08CS 461, Winter 20081 CS 461: Machine Learning Lecture 5 Dr. Kiri Wagstaff kiri.wagstaff@calstatela.edu Dr. Kiri Wagstaff kiri.wagstaff@calstatela.edu

2/2/08 CS 461, Winter 2008 14

Naïve Bayes… why “naïve”?

Given C, xj are independent:

p(x|C) = p(x1|C) p(x2|C) ... p(xd|C)

[Alpaydin 2004 The MIT Press]

Page 15: 2/2/08CS 461, Winter 20081 CS 461: Machine Learning Lecture 5 Dr. Kiri Wagstaff kiri.wagstaff@calstatela.edu Dr. Kiri Wagstaff kiri.wagstaff@calstatela.edu

2/2/08 CS 461, Winter 2008 15

Naïve Bayes Classifier

The book’s formulation = maximum likelihood estimator (MLE)

More useful: maximum a-posteriori (MAP) classifier

Cpredict = argmaxc

P(X1 = u1L Xm = um |C = c)

Cpredict = argmaxc

P(C = c | X1 = u1L Xm = um )

[Copyright Andrew Moore]

Page 16: 2/2/08CS 461, Winter 20081 CS 461: Machine Learning Lecture 5 Dr. Kiri Wagstaff kiri.wagstaff@calstatela.edu Dr. Kiri Wagstaff kiri.wagstaff@calstatela.edu

2/2/08 CS 461, Winter 2008 16

MAP broken down

P(C = c | X1 = u1L Xm = um )

=P(X1 = u1L Xm = um |C = c)P(C = c)

P(X1 = u1L Xm = um )

=P(X1 = u1L Xm = um |C = c)P(C = c)

P(X1 = u1L Xm = um |C = c j )P(C = c j )j=1

k

[Copyright Andrew Moore]

Page 17: 2/2/08CS 461, Winter 20081 CS 461: Machine Learning Lecture 5 Dr. Kiri Wagstaff kiri.wagstaff@calstatela.edu Dr. Kiri Wagstaff kiri.wagstaff@calstatela.edu

2/2/08 CS 461, Winter 2008 17

Naïve Bayes in action

Train Test

= P(X1 = u1L Xm = um |C = c)P(C = c)

P(X1 = u1L Xm = um |C = c j )P(C = c j )j=1

k

Page 18: 2/2/08CS 461, Winter 20081 CS 461: Machine Learning Lecture 5 Dr. Kiri Wagstaff kiri.wagstaff@calstatela.edu Dr. Kiri Wagstaff kiri.wagstaff@calstatela.edu

2/2/08 CS 461, Winter 2008 18

Association Rules

Association rule: X Y Support (X Y):

Confidence (X Y):€

P X,Y( ) =# customers who bought X and Y{ }

# customers{ }

P Y | X( ) =P X,Y( )P(X)

=# customers who bought X and Y{ }

# customers who bought X{ }

[Alpaydin 2004 The MIT Press]

Page 19: 2/2/08CS 461, Winter 20081 CS 461: Machine Learning Lecture 5 Dr. Kiri Wagstaff kiri.wagstaff@calstatela.edu Dr. Kiri Wagstaff kiri.wagstaff@calstatela.edu

2/2/08 CS 461, Winter 2008 19

Summary: Key Points for Today

Probability Axioms

Bayesian Learning Classification Bayes’s Rule Bayesian Networks Naïve Bayes Classifier Association Rules

Page 20: 2/2/08CS 461, Winter 20081 CS 461: Machine Learning Lecture 5 Dr. Kiri Wagstaff kiri.wagstaff@calstatela.edu Dr. Kiri Wagstaff kiri.wagstaff@calstatela.edu

2/2/08 CS 461, Winter 2008 20

Next Time

Parametric Methods(read Ch. 4.1-4.5, Mitchell p. 177-179)

Questions to answer from the reading Posted on the website (calendar)

… if you find them useful?