View
41
Download
0
Category
Preview:
DESCRIPTION
Variations of Minimax Probability Machine. Huang, Kaizhu 2003-09-16. Overview. Classification types, problems Minimax Probability Machine Main work Biased Minimax Probability Machine Minimum Error Minimax probability Machine Experiments Future work. Classification. - PowerPoint PPT Presentation
Citation preview
Variations of Minimax Probability Machine
Huang, Kaizhu
2003-09-16
Overview
• Classification– types, problems
• Minimax Probability Machine• Main work
– Biased Minimax Probability Machine– Minimum Error Minimax probability Machine
• Experiments• Future work
Classification
x
y
bT za
bT xa
bT ya
Types of Classifiers
• Generative Classifiers
• Discriminative Classifiers
Classification—Generative Classifier
x
y
bT zabT xa
bT ya
p1
p2
Generative model assumes specific distributions on two class of data and uses these distributions to construct classification boundary.
Problems of Generative Model
• All models are wrong, but some are useful –by Box
• The distributional assumptions lack the generality and are invalidate in real cases
It seems that Generative model should not assume specific model on the data
Classification—Discriminative Classifier:SVM
x
y
bT zabT xa
bT ya
support vectors
Problems of SVM
x
y
bT za
bT xa
bT ya
support vectors
It seems that SVM should consider the distribution of the data
SVM GMIt seems that Generative model should not assume specific models on the data
It seems that SVM should consider the distribution of the data
Minimax Probability Machine (MPM)
• Features:– With distribution considerations
– With no specific distribution assumption
Minimax Probability Machine
• With distribution considerations– Assume the mean and covariance directly
estimated from data reliably represent the real mean of covariance
• Without specific distribution assumption– Directly construct classifiers from data
Minimax Probability Machine (Formulation)
}Pr{inf
}Pr{infs.t.max
),(~
),(~,,
b
b
T
T
b
ya
xa
y
x
yy
xx0a
Objective
Minimax Probability Machine (Cont’d)
• MPM problem leads to Second Order Cone Programming
• Dual Problem
• Geometric interpretation
1)(s.t.min22
21
21
yxaaa yxa
T
22,,
,:min 21
21
vuvyux yxvu
opt2
u
opt2
v
Minimax Probability Machine (Cont’d)
• Summary – Distribution-free– In general case, the accuracy of classification of
the future data is bounded by α– Demonstrated to achieve comparative
performance with the SVM.
Problems of MPM
}Pr{inf
}Pr{infs.t.max
),(~
),(~,,
b
b
T
T
b
ya
xa
y
x
yy
xx0a
1. In real cases, the importance for two classes is not always the same, which implies the lower bound α for two classes is not necessarily the same. – Motivate Biased Minimax Probability Machine
2. On the other hand, it seems that no reason exists that these equal bounds are required to be equal. The derived model is thus non-optimal in this sense.– Motivate Minimum Error Minimax Probability Machine
Biased Minimax Probability Machine
• Observation: In diagnosing a severe epidemic disease, misclassification of the positive class causes more serious consequence than misclassification of the negative class.
• A typical setting: as long as the accuracy of classification of the less important maintains at an acceptable level ( specified by the real practitioners), the accuracy of classification of the important class should be as high as possible.
• Objective
• the same meaning as previous
• an acceptable accuracy level
• Equivalently
Biased Minimax Probability Machine (BMPM)
}Pr{ inf
}Pr{infs.t.max
),(~
),(~,,,
b
b
T
T
b
ya
xa
y
x
yy
xx0a
),,(~ xxx ),(~ yyy
)()( 1)(
)()(1s.t.)(max,,,
yxa
aaaa yx0a
T
TT
b
11 )(,)(
• Objective
• Equivalently,
• Equivalently,
BMPM (Cont’d)
)()( 1)(
)()(1s.t.)(max,,,
yxa
aaaa yx0a
T
TT
b
)()(,1)(s.t.)(1
max),(
yxaaa
aa
x
y
0a
T
T
T
1)(s.t.)(1
max
yxaaa
aa
x
y
0a
T
T
T
BMPM (Cont’d)
• Parametric Method1. Find by solving
2. Update
• Equivalently
• Least-squares approach
1)(s.t.)(1max
yxaaaaa xy0a
TTT
1)(s.t.)(min
yxaaaaa xy0a
TTT
a
aa
aa
x
y
T
T)(1
Biased Minimax Probability Machine
x
y
bmpmTbmpm bxa
bmpmT bbmpm ya
bT za
MPM
bmpmTbmpm bza
BMPM
at an acceptable accuracy level
Minimum Error Minimax Probability Machine
-4 -2 0 2 4 6 80
0.1
0.2
0.3
0.4
0.5
x
1-1-
p1
p2
decision plane when =
}Pr{inf
}Pr{infs.t.max
),(~
),(~,,
b
b
T
T
b
ya
xa
y
x
yy
xx0a
.}Pr{inf
,}Pr{inf
s.t.,)1(max
),(~
),(~
,,,
b
b
T
T
b
ya
xa
y
x
yy
xx
0a
-4 -2 0 2 4 6 80
0.1
0.2
0.3
0.4
0.5
x
1- 1-
p1
p2
optimal decision plane
MPM MEMPM
The MEMPM achieves the distribution-free Bayes optimal hyperplane in the worst-case setting.
Minimum Error Minimax Probability Machine
• MEMPM achieves the Bayes optimal hyerplane when we assume some specific distribution, e.g. Gaussian distribution on data.
Lemma : If the distribution of the normalized random variable
is independent of a , the classifier derived by MEMPM will exactly represent the real Bayes optimal hyerplane.
• Objective
• Equivalently
MEMPM (Cont’d)
1)(
)()(1
s.t.,)1(max,,
yxa
aaaa yx
0a
T
TT
1)(
)()(1
s.t.,1)(
1
1)(min
22),(),(
yxa
aaaa yx
0a
T
TT
• Objective
• Line search + sequential BMPM method
MEMPM (Cont’d)
aa
aa
yxa
x
y
0a
T
T
T
where)(1
)(
,125.0
,1)(
s.t.,)1(1)(
)(max
2
2
,,
•
• Kernelized BMPM
• where
Kernelized Version
function mappinga ,RR:),,)((~)(
),,)((~)(
)(
______)(
______
fnwhere
y
x
yyy
xxx
.1))()((s.t.)(1
max____________
)(
)(
yxaaa
aa
x
y
0a
T
T
T
yx
yxaN
jjj
N
iii
11
)()(
.))()()()()((1
,))()()()()((1
,)(1
)(
,)(1
)(
____________
1)(
____________
1)(
1
______1
______
Tj
N
jj
Ti
N
ii
N
jj
N
ii
N
N
N
N
yyyy
xxxx
yy
xx
y
x
y
x
yy
xx
y
x
• Kernelized BMPM
• where
• and
Kernelized Version (Cont’d)
.1)~~
(s.t.~~1
~~1)(1
max
yx
xxx
yyy
0wkkw
wKKw
wKKwT
TT
TT
N
N
TNN ],,,,,[ 11 yx
w
,)(1~
,)(1~
1
1
y
x
zyKk
zxKk
yy
xx
N
jiji
N
jiji
N
N
.,,1,
,,1,
yxx
x
xy
xz
NNNi
Ni
Ni
i
i
T
N
TN
yy
xx
y
x
k1K
k1K
K
KK
y
x ~
~
~
~~
)()( jT
iij zzK
y
x
x
yzwxzwzN
iiiN
N
iii bKKf
1
**
1
* ),(),()(
Illustration of kernel methods
Linear
Kernel
Experimental results (BMPM)
• Five benchmark datasets– Twonorm, Breast, Ionosphere, Pima, Sonar
• Procedure – 5-fold cross validation– Linear– Gaussian Kernel
• Parameter setting– pima – others
%0.20
%0.60
Experimental results
Experiments for MEMPM
• Six benchmark datasets– Twonorm, Breast, Ionosphere, Pima, Heart, Vote
• Procedure – 10-fold cross validation– Linear
– Gaussian Kernel
Results for MEMPM
Experiments for MEMPM
• Six benchmark datasets– Twonorm, Breast, Ionosphere, Pima, Heart, Vote
• Procedure – 10-fold cross validation– Linear
– Gaussian Kernel
Results for MEMPM
Conclusions and Future works• Conclusions
– First quantitative method to analyze the biased classification task
– Minimize the classification error rate in the worst case
• Future works– Improve the efficiency of algorithm, especially in the
kernelized version• Any decomposed method?
– Robust estimation – Relation between VC bound in Support Vector Machine
and bound in MEMPM– Regression model?
Reference
• Popescu, I. and Bertsimas, D. (2001). Optimal inequalities in probability theory: A convex optimization approach. Technical Report TM62, INSEAD.
• Lanckriet, G. R. G., El Ghaoui, L., and Jordan, M. I. (200a). Minimax probability machine. In Advances in Neural Information Processing Systems (NIPS) 14, Cambridge, MA. MIT Press.
• Kaizhu Huang, Haiqin Yang, Irwin King, R. Michael Lyu, and Laiwan Chan. Biased minimax probability machine. 2003.
• Kaizhu Huang, Haiqin Yang, Irwin King, R. Michael Lyu, and Laiwan Chan. Minimum error minimax probability machine. 2003.
Recommended