24
Probabilistic Classification using Fuzzy Support Vector Machines (PFSVM) Marzieh Parandehgheibi ORC - MIT INFORMS DM-HI 11/12/2011 1

Probabilistic Classification using Fuzzy Support Vector Machines (PFSVM) Marzieh Parandehgheibi ORC - MIT INFORMS DM-HI11/12/20111

Embed Size (px)

Citation preview

Page 1: Probabilistic Classification using Fuzzy Support Vector Machines (PFSVM) Marzieh Parandehgheibi ORC - MIT INFORMS DM-HI11/12/20111

INFORMS DM-HI 1

Probabilistic Classification using Fuzzy Support Vector Machines

(PFSVM)

Marzieh ParandehgheibiORC - MIT

11/12/2011

Page 2: Probabilistic Classification using Fuzzy Support Vector Machines (PFSVM) Marzieh Parandehgheibi ORC - MIT INFORMS DM-HI11/12/20111

2

Content

• Motivation• Problem• Methodology• Simulation Results• Conclusion

11/12/2011 INFORMS DM-HI

Page 3: Probabilistic Classification using Fuzzy Support Vector Machines (PFSVM) Marzieh Parandehgheibi ORC - MIT INFORMS DM-HI11/12/20111

INFORMS DM-HI 3

Motivation• Is Cancer Misdiagnosis More Common Than You Thought? It is estimated

that nearly 12 percent of all cancer diagnoses may be in error.

• When a positive cancer diagnosis is missed, the consequences can be deadly. For example, a woman who is diagnosed with breast cancer in its early stages will survive at least 5 years longer.

• Being misdiagnosed with cancer can be a devastating. Patients who are misdiagnosed are often subjected to unnecessary, harmful, painful and expensive treatments.

• Confirm a diagnosis via methods such as seeking second opinions, consulting specialists, getting further medical tests, and researching information about the medical condition.

11/12/2011

Page 4: Probabilistic Classification using Fuzzy Support Vector Machines (PFSVM) Marzieh Parandehgheibi ORC - MIT INFORMS DM-HI11/12/20111

INFORMS DM-HI 4

Motivation• Is Cancer Misdiagnosis More Common Than You Thought? It is estimated

that nearly 12 percent of all cancer diagnoses may be in error.

• When a positive cancer diagnosis is missed, the consequences can be deadly. For example, a woman who is diagnosed with breast cancer in its early stages will survive at least 5 years longer.

• Being misdiagnosed with cancer can be a devastating. Patients who are misdiagnosed are often subjected to unnecessary, harmful, painful and expensive treatments.

• Confirm a diagnosis via methods such as seeking second opinions, consulting specialists, getting further medical tests, and researching information about the medical condition.

11/12/2011

When can we trust a diagnosis? When do we need to have additional

tests?

Page 5: Probabilistic Classification using Fuzzy Support Vector Machines (PFSVM) Marzieh Parandehgheibi ORC - MIT INFORMS DM-HI11/12/20111

INFORMS DM-HI 5

Problem

• What we do: Given data, is it a benign cancer or malignant?• What we need to do: Is the given data enough to decide on

the type of cancer?– YES : What’s the type of cancer?– NO : Do more Tests

11/12/2011

Page 6: Probabilistic Classification using Fuzzy Support Vector Machines (PFSVM) Marzieh Parandehgheibi ORC - MIT INFORMS DM-HI11/12/20111

INFORMS DM-HI 6

Problem/Solution

• What we do: Given data, is it a benign cancer or malignant?• What we need to do: Is the given data enough to decide on

the type of cancer?– YES : What’s the type of cancer?– NO : Do more Tests

• Find the Criteria that most of errors occur• Find the probability of error (Pe)

• If Pe > α, wait for more tests

11/12/2011

Page 7: Probabilistic Classification using Fuzzy Support Vector Machines (PFSVM) Marzieh Parandehgheibi ORC - MIT INFORMS DM-HI11/12/20111

INFORMS DM-HI 7

PFSVM Methodology Probabilistic Fuzzy Support Vector Machine (PFSVM) is a two-phase classification method which probabilistically assigns the points to each of the classes.

1- Apply FSVM to the whole training data such that most of the uncertain points will be placed in the margin. Moreover, the certain points are assigned to appropriate classes.

2- Define a fuzzy membership function and an appropriate rule to classify the points that were located in the margin.

This will result in assigning uncertain points to each of the classes with a specific probability.

11/12/2011

Page 8: Probabilistic Classification using Fuzzy Support Vector Machines (PFSVM) Marzieh Parandehgheibi ORC - MIT INFORMS DM-HI11/12/20111

INFORMS DM-HI 8

SVM

11/12/2011

XTβ+ β0= 0

XTβ+ β0< 0

XTβ+ β0> 0

Suppose Training Data – N pairs (X1,Y1),…,(Xn,Yn) where Yi {-1,1}∈Separable Data:Separating Hyperplane {X: f(X)= XTβ+ β0=0} separates dataClassification Rule: g(x) = sign(XTβ+ β0)

NiM

M

...1 )(xy s.t.

0Tii

1,,max

0

1

M

Page 9: Probabilistic Classification using Fuzzy Support Vector Machines (PFSVM) Marzieh Parandehgheibi ORC - MIT INFORMS DM-HI11/12/20111

INFORMS DM-HI 9

SVM

11/12/2011

ξi

ξi

ξi

1

M

1

M

Non-Separable Data:SVM maximizes the margin M between the training points for class 1 and -1, but allows for some points to be on the wrong side of the margin

Ni

...Nits

Min

i

N

ii

...1 0

1 -1)(xy ..

i0Tii

1

Page 10: Probabilistic Classification using Fuzzy Support Vector Machines (PFSVM) Marzieh Parandehgheibi ORC - MIT INFORMS DM-HI11/12/20111

INFORMS DM-HI 10

FSVM• In many real-world applications, the effects of the training points are

different, i.e. some training points are more important than others.

• Each training point does not exactly belong to one of the two classes. It may 90% belong to one class and 10% of the other class.

• There is a fuzzy membership 0 < si ≤ 1 associated with each training point Xi.

11/12/2011

" " with " " replace11

N

iii

N

ii sMinMin

Page 11: Probabilistic Classification using Fuzzy Support Vector Machines (PFSVM) Marzieh Parandehgheibi ORC - MIT INFORMS DM-HI11/12/20111

INFORMS DM-HI 11

FSVM• Suppose out of N training points, N1 points are in class 1 and N2 remaining

points are in class 2. Define the weight for each point as following:

where μjk and σjk refer to the mean and standard deviation of jth feature of all points in the class k, respectively. Moreover, xij indicates the jth feature value of ith point.

• Normalize the weights such that the total sum of the weights is equal to N, which is the sum of error costs for the classic SVM.

• the weights show up in the objective function

11/12/2011

KClassxxW i

P

j

x

ijk

jkij

exp)(1

2

)(2

2

)W(x)W(x

)(xW iN

1i i

in

N

i

N

i

)(xW min 1

in

Page 12: Probabilistic Classification using Fuzzy Support Vector Machines (PFSVM) Marzieh Parandehgheibi ORC - MIT INFORMS DM-HI11/12/20111

INFORMS DM-HI 12

FSVM• Suppose out of N training points, N1 points are in class 1 and N2 remaining

points are in class 2. Define the weight for each point as following:

where μjk and σjk refer to the mean and standard deviation of jth feature of all points in the class k, respectively. Moreover, xij indicates the jth feature value of ith point.

• Normalize the weights such that the total sum of the weights is equal to N, which is the sum of error costs for the classic SVM.

• the weights show up in the objective function

11/12/2011

KClassxxW i

P

j

x

ijk

jkij

exp)(1

2

)(2

2

)W(x)W(x

)(xW iN

1i i

in

N

i

N

i

)(xW min 1

in

Points near to the center of each class have a higher weight than those farther. Therefore, near points will be classified certainly, and the points which are in the middle of the two classes, called uncertain points, will be located in the margin.

Page 13: Probabilistic Classification using Fuzzy Support Vector Machines (PFSVM) Marzieh Parandehgheibi ORC - MIT INFORMS DM-HI11/12/20111

INFORMS DM-HI 13

PFSVM Methodology Probabilistic Fuzzy Support Vector Machine (PFSVM) is a Two-phase classification method which probabilistically assigns the uncertain points to each of the classes.

1- Apply FSVM to the whole training data such that most of the uncertain points will be placed in the margin. Moreover, the certain points are assigned to appropriate classes.

2- Define a fuzzy membership function and an appropriate rule to classify the points that were located in the margin.

This will result in assigning uncertain points to each of the classes with a specific probability.

11/12/2011

Page 14: Probabilistic Classification using Fuzzy Support Vector Machines (PFSVM) Marzieh Parandehgheibi ORC - MIT INFORMS DM-HI11/12/20111

INFORMS DM-HI 14

Fuzzy Classification• Apply a fuzzy classification on the marginal points• Define Gaussian fuzzy membership function Aik for every test point Yi

located in the margin as

where μjk and σjk are the mean and standard deviation of training points of class k located in the margin, respectively.

• This membership shows the closeness of element Yi to the center of Kth class. To measure the related closeness of a point to both centers, a “membership probability” is defined for each marginal point as follows:

11/12/2011

2,1 exp1

2

)'(2

2

KAP

j

x

ikjk

jkij

C1i,C2i,C2i,C1i,

C1i,C1i, P- 1 P and ,

A A

A P

Page 15: Probabilistic Classification using Fuzzy Support Vector Machines (PFSVM) Marzieh Parandehgheibi ORC - MIT INFORMS DM-HI11/12/20111

INFORMS DM-HI 15

Fuzzy Classification• Apply a fuzzy classification on the marginal points• Define Gaussian fuzzy membership function Aik for every test point Yi

located in the margin as

where μjk and σjk are the mean and standard deviation of training points of class k located in the margin, respectively.

• This membership shows the closeness of element Yi to the center of Kth class. To measure the related closeness of a point to both centers, a “membership probability” is defined for each marginal point as follows:

11/12/2011

2,1 exp1

2

)'(2

2

KAP

j

x

ikjk

jkij

C1i,C2i,C2i,C1i,

C1i,C1i, P- 1 P and ,

A A

A P

Points with probability more than 90% in class, will be assigned to that class. Otherwise, the given information is not sufficient to make a decision.

Page 16: Probabilistic Classification using Fuzzy Support Vector Machines (PFSVM) Marzieh Parandehgheibi ORC - MIT INFORMS DM-HI11/12/20111

INFORMS DM-HI 16

DATA SET

• Wisconsin breast cancer diagnostic dataset

• 569 instances in two classes of Malignant (M) and Benign (B) with 32 features per instance.

• Reduce the number of features from 32 to 23 by saving just one feature out of every set of features with correlation more than 0.95.

• Determine the set of training and test data by 10-fold cross validation method.

11/12/2011

Page 17: Probabilistic Classification using Fuzzy Support Vector Machines (PFSVM) Marzieh Parandehgheibi ORC - MIT INFORMS DM-HI11/12/20111

INFORMS DM-HI 17

Widen the Margin by FSVM

11/12/2011

SVM - Width of Margin: 0.895 FSVM - Width of Margin: 1.931

0

100

-100

-200

-300

-400

-500

-600

-700

-800-35 -30 -25 -20 -15 -10 -5 0

Page 18: Probabilistic Classification using Fuzzy Support Vector Machines (PFSVM) Marzieh Parandehgheibi ORC - MIT INFORMS DM-HI11/12/20111

INFORMS DM-HI 18

Error Location in FSVM Methods

11/12/2011

On average, more than 80% of errors are inside the margin

Page 19: Probabilistic Classification using Fuzzy Support Vector Machines (PFSVM) Marzieh Parandehgheibi ORC - MIT INFORMS DM-HI11/12/20111

INFORMS DM-HI 19

Comparison of different classification methods

Method\Run 1 2 3 4 5 6 7 8 9 10 Percentave

SVMerr 1 1 5 3 4 1 2 4 0 1 3.86

FSVMerr 4 4 5 7 7 3 2 1 3 4 7.02

Fuzzyerr 3 3 5 8 4 6 5 3 3 1 7.19

PFSVMerr 1 1 0 0 3 0 0 1 2 0

PFSVMundet 1 1 1 2 0 1 2 1 0 0 1.58

11/12/2011

Page 20: Probabilistic Classification using Fuzzy Support Vector Machines (PFSVM) Marzieh Parandehgheibi ORC - MIT INFORMS DM-HI11/12/20111

INFORMS DM-HI 20

Comparison of different classification methods

Method\Run 1 2 3 4 5 6 7 8 9 10 Percentave

SVMerr 1 1 5 3 4 1 2 4 0 1 3.86

FSVMerr 4 4 5 7 7 3 2 1 3 4 7.02

Fuzzyerr 3 3 5 8 4 6 5 3 3 1 7.19

PFSVMerr 1+1 1 0 0+2 3+1 0+2 0 1 2+1 0+2 1.63

PFSVMundet 1 1 1 2 0 1 2 1 0 0 1.58

11/12/2011

Page 21: Probabilistic Classification using Fuzzy Support Vector Machines (PFSVM) Marzieh Parandehgheibi ORC - MIT INFORMS DM-HI11/12/20111

INFORMS DM-HI 21

Double Cost PFSVM

1) Misdiagnosis of positive cancer is deadly2) Most of errors happen in positive cancer

diagnosis

11/12/2011

Double the cost of error for Positive Cancer Diagnosis

On average, more than 98% of errors are inside the margin

Page 22: Probabilistic Classification using Fuzzy Support Vector Machines (PFSVM) Marzieh Parandehgheibi ORC - MIT INFORMS DM-HI11/12/20111

INFORMS DM-HI 22

Comparison of different classification methods

Method\Run 1 2 3 4 5 6 7 8 9 10 Percentave

SVMerr 4 2 4 2 1 2 3 1 2 3 4.29

FSVMerr 6 2 3 2 1 2 4 1 4 5 5.36

Fuzzyerr 7 2 5 4 1 3 6 3 4 4 6.96

PFSVMerr 0 2 1 0 0 2 0 0 0 1

PFSVMundet 1 0 3 1 0 1 1 1 3 1 2.14

11/12/2011

Page 23: Probabilistic Classification using Fuzzy Support Vector Machines (PFSVM) Marzieh Parandehgheibi ORC - MIT INFORMS DM-HI11/12/20111

INFORMS DM-HI 23

Comparison of different classification methods

Method\Run 1 2 3 4 5 6 7 8 9 10 Percentave

SVMerr 4 2 4 2 1 2 3 1 2 3 4.29

FSVMerr 6 2 3 2 1 2 4 1 4 5 5.36

Fuzzyerr 7 2 5 4 1 3 6 3 4 4 6.96

PFSVMerr 0 2 1 0 0 2 0 0 0+1 1 1.23

PFSVMundet 1 0 3 1 0 1 1 1 3 1 2.14

11/12/2011

Page 24: Probabilistic Classification using Fuzzy Support Vector Machines (PFSVM) Marzieh Parandehgheibi ORC - MIT INFORMS DM-HI11/12/20111

INFORMS DM-HI 24

Comparison of different classification methods

Method\Run 1 2 3 4 5 6 7 8 9 10 Percentave

SVMerr 4 2 4 2 1 2 3 1 2 3 4.29

FSVMerr 6 2 3 2 1 2 4 1 4 5 5.36

Fuzzyerr 7 2 5 4 1 3 6 3 4 4 6.96

PFSVMerr 0 2 1 0 0 2 0 0 0+1 1 1.23

PFSVMundet 1 0 3 1 0 1 1 1 3 1 2.14

11/12/2011

QUESTIONS?