Upload
others
View
8
Download
0
Embed Size (px)
Citation preview
COSC 522 – Machine Learning
Baysian Decision Theory
Hairong Qi, Gonzalez Family ProfessorElectrical Engineering and Computer ScienceUniversity of Tennessee, Knoxvillehttps://www.eecs.utk.edu/people/hairong-qi/Email: [email protected]
1
Course Website: http://web.eecs.utk.edu/~hqi/cosc522/
Recap from Previous Lecture
• The differences among DL, ML, and AI• The evolution of AI and ML – the three
waves/eras• The different ways of categorizing ML algorithms
– Supervised vs. Unsupervised– Linear vs. Nonlinear– Parametric (model-based) vs. Nonparametric
(modeless)
2
Questions• What is supervised learning (vs. unsupervised learning)?• What is the difference between the training set and the test set?• What is the difference between classification and regression?• What are features and samples?• What is dimension?• What is histogram?• What is pdf?• What is Bayes’ Formula?• What is conditional pdf?• What is the difference between prior probability and posterior
probability?• What is Baysian decision rule? or MPP?• What are decision regions?• How to calculate conditional probability of error and overall
probability of error?• What are cost function (or objective function) and optimization
method?3
terminologies
the Formula
decision rule
The Toy Example
• Student (taking COSC522 in F21) gender classification
• Feature: height (1-D), if adding weight, then 2-D• Data:
– Training set: For half of the class, we take measurement of their height and record the student’s gender
– Testing set: For the other half of the class, given height information, determine the student’s gender
4
Terminologies
• Supervised learning: – Training data vs. testing data vs. validation data– Training: given input-output pairs
• Features (e.g., height)• Samples• Dimensions• Classification vs. Regression
5
Example 1 – 2-D Feature
6
height weight label5’5” 140 lb 15’6” 150 lb 25’4” 130 lb 15’7” 160 lb 2
Questions• What is supervised learning (vs. unsupervised learning)?• What is the difference between the training set and the test set?• What is the difference between classification and regression?• What are features and samples?• What is dimension?• What is histogram?• What is pdf?• What is Bayes’ Formula?• What is conditional pdf?• What is the difference between prior probability and posterior
probability?• What is Baysian decision rule? or MPP?• What are decision regions?• How to calculate conditional probability of error and overall
probability of error?• What are cost function (or objective function) and optimization
method?7
terminologies
the Formula
decision rule
Example 2 – 1-D feature
8
height label5’5” 15’6” 25’4” 15’7” 2
5’5” 15’8” 25’6” 15’7” 25’7” 2
From Histogram to Probability Density Distribution (pdf)
9
x
Q&A Session - Looking into Gaussian• Two classes with one intersection?
• Two classes with no intersection?
• Two classes with two intersections?
10
Bayes’ Formula (Bayes’ Rule)
11
( ) ( ) ( )( )xpPxp
xP jjj
ωωω
|| =
posterior probability(a-posteriori probability)
prior probability(a-priori probability)Conditional probability density function
(likelihood)
normalization constant(evidence)
( ) ( ) ( )∑=
=c
jjj Pxpxp
1| ωω
From domain knowledge
Q&A Session
• How do you interpret prior probability in the toy example?
12
Questions• What is supervised learning (vs. unsupervised learning)?• What is the difference between the training set and the test set?• What is the difference between classification and regression?• What are features and samples?• What is dimension?• What is histogram?• What is pdf?• What is Bayes’ Formula?• What is conditional pdf?• What is the difference between prior probability and posterior
probability?• What is Baysian decision rule? or MPP?• What are decision regions?• How to calculate conditional probability of error and overall
probability of error?• What are cost function (or objective function) and optimization
method?13
the Formulaterminologies
decision rule
Bayes Decision Rule
14
( ) ( )22| ωω Pxp
( ) ( )11| ωω Pxp
( ) ( )2. otherwise, 1, class tobelongs then ,|| if ,given aFor 21
xxPxPx ωω >
( ) ( ) ( )( )xpPxp
xP jjj
ωωω
|| =
xMaximum Posterior Probability (MPP):
15
Decision Regions
The effect of any decision rule is to partition the feature space into c decision regions
cℜℜℜ ,,, 21
1ℜ 2ℜ
( ) ( )22| ωω Pxp
( ) ( )11| ωω Pxp
x1ℜ
Conditional Probability of Error
16€
P error | x( ) =P ω1 | x( ) if we decide ω2
P ω2 | x( ) if we decide ω1
# $ %
= min P(ω1 | x),P(ω2 | x)[ ]
( ) ( )22| ωω Pxp
( ) ( )11| ωω Pxp
( ) ( ) ( )( )xpPxp
xP jjj
ωωω
|| =
x
Overall Probability of Error
17
€
P error( ) = P ω2 | x( )p x( )dxℜ1
∫ + P ω1 | x( )p x( )dxℜ2
∫
Or unconditional risk, unconditional probability of error
€
P error( ) = P(error,x)dx−∞
∞
∫ = P(error | x)p(x)dx−∞
∞
∫
1ℜ 2ℜ
( ) ( )22| ωω Pxp
( ) ( )11| ωω Pxp
x1ℜ
= P (error|!2) + P (error|!1)
How Does It Work Altogether?
18
height label5’5” 15’6” 25’4” 15’7” 2
5’5” 15’8” 25’6” 15’7” 25’7” 2
Training Set:
Testing Sample:
Questions• What is supervised learning (vs. unsupervised learning)?• What is the difference between the training set and the test set?• What is the difference between classification and regression?• What are features and samples?• What is dimension?• What is histogram?• What is pdf?• What is Bayes’ Formula?• What is conditional pdf?• What is the difference between prior probability and posterior probability?• What is Baysian decision rule? or MPP?• What are decision regions?• How to calculate conditional probability of error and overall probability of
error?• What are cost function (or objective function) and optimization method?
19
Q&A Session
• What is the cost function?• What is the optimization approach we use to find
the optimal solution to the cost function?
20
Recap
21
( ) ( )2. otherwise, 1, class tobelongs then ,|| if ,given aFor 21
xxPxPx ωω >
( ) ( ) ( )( )xpPxp
xP jjj
ωωω
|| =
MaximumPosteriorProbability
€
P error( ) = P ω2 | x( )p x( )dxℜ1
∫ + P ω1 | x( )p x( )dxℜ2
∫Overall probability of error