21
COSC 522 – Machine Learning Baysian Decision Theory Hairong Qi, Gonzalez Family Professor Electrical Engineering and Computer Science University of Tennessee, Knoxville https://www.eecs.utk.edu/people/hairong-qi/ Email: [email protected] 1 Course Website : http://web.eecs.utk.edu/~hqi/cosc522/

COSC522–MachineLearning BaysianDecisionTheory

  • Upload
    others

  • View
    8

  • Download
    0

Embed Size (px)

Citation preview

Page 1: COSC522–MachineLearning BaysianDecisionTheory

COSC 522 – Machine Learning

Baysian Decision Theory

Hairong Qi, Gonzalez Family ProfessorElectrical Engineering and Computer ScienceUniversity of Tennessee, Knoxvillehttps://www.eecs.utk.edu/people/hairong-qi/Email: [email protected]

1

Course Website: http://web.eecs.utk.edu/~hqi/cosc522/

Page 2: COSC522–MachineLearning BaysianDecisionTheory

Recap from Previous Lecture

• The differences among DL, ML, and AI• The evolution of AI and ML – the three

waves/eras• The different ways of categorizing ML algorithms

– Supervised vs. Unsupervised– Linear vs. Nonlinear– Parametric (model-based) vs. Nonparametric

(modeless)

2

Page 3: COSC522–MachineLearning BaysianDecisionTheory

Questions• What is supervised learning (vs. unsupervised learning)?• What is the difference between the training set and the test set?• What is the difference between classification and regression?• What are features and samples?• What is dimension?• What is histogram?• What is pdf?• What is Bayes’ Formula?• What is conditional pdf?• What is the difference between prior probability and posterior

probability?• What is Baysian decision rule? or MPP?• What are decision regions?• How to calculate conditional probability of error and overall

probability of error?• What are cost function (or objective function) and optimization

method?3

terminologies

the Formula

decision rule

Page 4: COSC522–MachineLearning BaysianDecisionTheory

The Toy Example

• Student (taking COSC522 in F21) gender classification

• Feature: height (1-D), if adding weight, then 2-D• Data:

– Training set: For half of the class, we take measurement of their height and record the student’s gender

– Testing set: For the other half of the class, given height information, determine the student’s gender

4

Page 5: COSC522–MachineLearning BaysianDecisionTheory

Terminologies

• Supervised learning: – Training data vs. testing data vs. validation data– Training: given input-output pairs

• Features (e.g., height)• Samples• Dimensions• Classification vs. Regression

5

Page 6: COSC522–MachineLearning BaysianDecisionTheory

Example 1 – 2-D Feature

6

height weight label5’5” 140 lb 15’6” 150 lb 25’4” 130 lb 15’7” 160 lb 2

Page 7: COSC522–MachineLearning BaysianDecisionTheory

Questions• What is supervised learning (vs. unsupervised learning)?• What is the difference between the training set and the test set?• What is the difference between classification and regression?• What are features and samples?• What is dimension?• What is histogram?• What is pdf?• What is Bayes’ Formula?• What is conditional pdf?• What is the difference between prior probability and posterior

probability?• What is Baysian decision rule? or MPP?• What are decision regions?• How to calculate conditional probability of error and overall

probability of error?• What are cost function (or objective function) and optimization

method?7

terminologies

the Formula

decision rule

Page 8: COSC522–MachineLearning BaysianDecisionTheory

Example 2 – 1-D feature

8

height label5’5” 15’6” 25’4” 15’7” 2

5’5” 15’8” 25’6” 15’7” 25’7” 2

Page 9: COSC522–MachineLearning BaysianDecisionTheory

From Histogram to Probability Density Distribution (pdf)

9

x

Page 10: COSC522–MachineLearning BaysianDecisionTheory

Q&A Session - Looking into Gaussian• Two classes with one intersection?

• Two classes with no intersection?

• Two classes with two intersections?

10

Page 11: COSC522–MachineLearning BaysianDecisionTheory

Bayes’ Formula (Bayes’ Rule)

11

( ) ( ) ( )( )xpPxp

xP jjj

ωωω

|| =

posterior probability(a-posteriori probability)

prior probability(a-priori probability)Conditional probability density function

(likelihood)

normalization constant(evidence)

( ) ( ) ( )∑=

=c

jjj Pxpxp

1| ωω

From domain knowledge

Page 12: COSC522–MachineLearning BaysianDecisionTheory

Q&A Session

• How do you interpret prior probability in the toy example?

12

Page 13: COSC522–MachineLearning BaysianDecisionTheory

Questions• What is supervised learning (vs. unsupervised learning)?• What is the difference between the training set and the test set?• What is the difference between classification and regression?• What are features and samples?• What is dimension?• What is histogram?• What is pdf?• What is Bayes’ Formula?• What is conditional pdf?• What is the difference between prior probability and posterior

probability?• What is Baysian decision rule? or MPP?• What are decision regions?• How to calculate conditional probability of error and overall

probability of error?• What are cost function (or objective function) and optimization

method?13

the Formulaterminologies

decision rule

Page 14: COSC522–MachineLearning BaysianDecisionTheory

Bayes Decision Rule

14

( ) ( )22| ωω Pxp

( ) ( )11| ωω Pxp

( ) ( )2. otherwise, 1, class tobelongs then ,|| if ,given aFor 21

xxPxPx ωω >

( ) ( ) ( )( )xpPxp

xP jjj

ωωω

|| =

xMaximum Posterior Probability (MPP):

Page 15: COSC522–MachineLearning BaysianDecisionTheory

15

Decision Regions

The effect of any decision rule is to partition the feature space into c decision regions

cℜℜℜ ,,, 21

1ℜ 2ℜ

( ) ( )22| ωω Pxp

( ) ( )11| ωω Pxp

x1ℜ

Page 16: COSC522–MachineLearning BaysianDecisionTheory

Conditional Probability of Error

16€

P error | x( ) =P ω1 | x( ) if we decide ω2

P ω2 | x( ) if we decide ω1

# $ %

= min P(ω1 | x),P(ω2 | x)[ ]

( ) ( )22| ωω Pxp

( ) ( )11| ωω Pxp

( ) ( ) ( )( )xpPxp

xP jjj

ωωω

|| =

x

Page 17: COSC522–MachineLearning BaysianDecisionTheory

Overall Probability of Error

17

P error( ) = P ω2 | x( )p x( )dxℜ1

∫ + P ω1 | x( )p x( )dxℜ2

Or unconditional risk, unconditional probability of error

P error( ) = P(error,x)dx−∞

∫ = P(error | x)p(x)dx−∞

1ℜ 2ℜ

( ) ( )22| ωω Pxp

( ) ( )11| ωω Pxp

x1ℜ

= P (error|!2) + P (error|!1)

Page 18: COSC522–MachineLearning BaysianDecisionTheory

How Does It Work Altogether?

18

height label5’5” 15’6” 25’4” 15’7” 2

5’5” 15’8” 25’6” 15’7” 25’7” 2

Training Set:

Testing Sample:

Page 19: COSC522–MachineLearning BaysianDecisionTheory

Questions• What is supervised learning (vs. unsupervised learning)?• What is the difference between the training set and the test set?• What is the difference between classification and regression?• What are features and samples?• What is dimension?• What is histogram?• What is pdf?• What is Bayes’ Formula?• What is conditional pdf?• What is the difference between prior probability and posterior probability?• What is Baysian decision rule? or MPP?• What are decision regions?• How to calculate conditional probability of error and overall probability of

error?• What are cost function (or objective function) and optimization method?

19

Page 20: COSC522–MachineLearning BaysianDecisionTheory

Q&A Session

• What is the cost function?• What is the optimization approach we use to find

the optimal solution to the cost function?

20

Page 21: COSC522–MachineLearning BaysianDecisionTheory

Recap

21

( ) ( )2. otherwise, 1, class tobelongs then ,|| if ,given aFor 21

xxPxPx ωω >

( ) ( ) ( )( )xpPxp

xP jjj

ωωω

|| =

MaximumPosteriorProbability

P error( ) = P ω2 | x( )p x( )dxℜ1

∫ + P ω1 | x( )p x( )dxℜ2

∫Overall probability of error