22
Introduction Rough Set Regions and Gini Coefficients Setting Objective Functions Example Conclusion Determining Three-way Decision Regions with Gini Coefficients Yan Zhang and JingTao Yao Department of Computer Science University of Regina [zhang83y, jtyao]@cs.uregina.ca June 16, 2014

Determining Three-way Decision Regions with Gini Coe cients

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Determining Three-way Decision Regions with Gini Coe cients

Introduction Rough Set Regions and Gini Coefficients Setting Objective Functions Example Conclusion

Determining Three-way Decision Regionswith Gini Coefficients

Yan Zhang and JingTao Yao

Department of Computer ScienceUniversity of Regina

[zhang83y, jtyao]@cs.uregina.ca

June 16, 2014

Page 2: Determining Three-way Decision Regions with Gini Coe cients

Introduction Rough Set Regions and Gini Coefficients Setting Objective Functions Example Conclusion

Three-way decision regions

• Three-way decision rules can be constructed from rough setregions.• The rules of acceptance and rejection decisions can be induced

from the positive and negative regions, respectively.• The non-commitment decisions can be made from the

boundary region.

• Rough set regions can be viewed as the acceptance, rejection,and non-commitment decision regions in three-wayclassification.

footlineY. Zhang & J. T. Yao Determining Three-way Decision Regions with Gini Coefficients 2/22

Page 3: Determining Three-way Decision Regions with Gini Coe cients

Introduction Rough Set Regions and Gini Coefficients Setting Objective Functions Example Conclusion

Determining decision regions

• Interpretation and determination of decision regions are one ofthe key issues of three-way decision and rough set theories.• Decision-theoretic rough sets (Yao 2007).• Information-theoretic rough sets (Deng & Yao 2012).• Game-theoretic rough sets (Herbert & Yao 2011)

Yao, Y.Y., (2007).Decision-theoretic rough set models. In: (RSKT’07).Deng, X. F., Yao, Y. Y., (2012). An information-theoretic interpretation of thresholds in PRS. In: (RSCTC’12).Herbert, J.P., Yao, J.T. (2011). Game-theoretic rough sets. Fundamenta Informaticae 108(3-4).

footlineY. Zhang & J. T. Yao Determining Three-way Decision Regions with Gini Coefficients 3/22

Page 4: Determining Three-way Decision Regions with Gini Coe cients

Introduction Rough Set Regions and Gini Coefficients Setting Objective Functions Example Conclusion

Approach

• The relationship between changes in rough set regions andtheir impacts on the Gini coefficients of decision regions.

• Effective decision regions can be obtained by satisfyingobjective functions of Gini coefficients of decision regions.

footlineY. Zhang & J. T. Yao Determining Three-way Decision Regions with Gini Coefficients 4/22

Page 5: Determining Three-way Decision Regions with Gini Coe cients

Introduction Rough Set Regions and Gini Coefficients Setting Objective Functions Example Conclusion

Decision regions

• Acceptance, rejection, and non-commitment decision regions:

POS(α,β)(C) =⋃{[x ] | [x ] ∈ U/E ,Pr(C |[x ]) ≥ α},

NEG(α,β)(C) =⋃{[x ] | [x ] ∈ U/E ,Pr(C |[x ]) ≤ β},

BND(α,β)(C) =⋃{[x ] | [x ] ∈ U/E , β < Pr(C |[x ]) < α}.

footlineY. Zhang & J. T. Yao Determining Three-way Decision Regions with Gini Coefficients 5/22

Page 6: Determining Three-way Decision Regions with Gini Coe cients

Introduction Rough Set Regions and Gini Coefficients Setting Objective Functions Example Conclusion

Gini coefficient

• A kind of entropy calculation.

• An inequality measure of income distribution (Ceriani &Verme 2012).

• A measure of relative mean difference, i.e., the mean of thedifference between every possible pair of individuals divided bythe mean size (Ceriani & Verme 2012).

• An impurity-based criterion that measures the divergencebetween the probability distributions of the target attributevalues (Breiman et al. 1984).

Ceriani, L., Verme, P., (2012). The Origins of the Gini Index: Extracts from Variabilita e Mutabilita (1912) byCorrado Gini. The Journal of Economic Inequality 10(3).Breiman, L., Friedman, J., Stone, C. J., Olshen, R. A. (1984). Classification and Regression Trees. Chapman andHall.

footlineY. Zhang & J. T. Yao Determining Three-way Decision Regions with Gini Coefficients 6/22

Page 7: Determining Three-way Decision Regions with Gini Coe cients

Introduction Rough Set Regions and Gini Coefficients Setting Objective Functions Example Conclusion

Gini coefficients

• Attribute A with k possible values, a1, a2, ..., ak .Partition πA = {σA=a1(S), σA=a2(S), ..., σA=ak

(S)}• The probabilistic distribution of a partition πA can be defined

as:

PπA =

(|σA=a1(S)||S | ,

|σA=a2(S)||S | , ...,

|σA=ak (S)||S |

).

• The absolute Gini coefficient

Gini(S , πA) =k∑

i=1

|σA=ai (S)||S | ×

(1− |σA=ai (S)|

|S |

)

=1−k∑

i=1

(|σA=ai (S)||S |

)2

.

footlineY. Zhang & J. T. Yao Determining Three-way Decision Regions with Gini Coefficients 7/22

Page 8: Determining Three-way Decision Regions with Gini Coe cients

Introduction Rough Set Regions and Gini Coefficients Setting Objective Functions Example Conclusion

Gini coefficients

• Attribute A with k possible values, a1, a2, ..., ak .Partition πA = {σA=a1(S), σA=a2(S), ..., σA=ak

(S)}.• Attribute B with m possible values, b1, b2, ..., bm.

Partitions πB = {σB=b1(S), σB=b2(S), ..., σB=bm (S)}.• The relative Gini coefficient

Gini(σB=bi (S)) =|σB=bi (S)||S | × Gini (σB=bi (S), πA)

=|σB=bi (S)||S | ×

(1−

k∑j=1

( |σA=aj (S) ∩ σB=bi (S)||σB=bi (S)|

)2).

footlineY. Zhang & J. T. Yao Determining Three-way Decision Regions with Gini Coefficients 8/22

Page 9: Determining Three-way Decision Regions with Gini Coe cients

Introduction Rough Set Regions and Gini Coefficients Setting Objective Functions Example Conclusion

Gini coefficients of decision regions

Two partitions:

π(α,β) = {POS(α,β)(C),NEG(α,β)(C),BND(α,β)(C)}.πC = {C ,C c}.

The absolute Gini coefficients of three decision regions:

Gini(POS(α,β)(C), πC ) =1− (Pr(C |POS(α,β)(C)))2 − (Pr(C c |POS(α,β)(C)))2,

Gini(NEG(α,β)(C), πC ) =1− (Pr(C |NEG(α,β)(C)))2 − (Pr(C c |NEG(α,β)(C)))2,

Gini(BND(α,β)(C), πC ) =1− (Pr(C |BND(α,β)(C)))2 − (Pr(C c |BND(α,β)(C)))2.

footlineY. Zhang & J. T. Yao Determining Three-way Decision Regions with Gini Coefficients 9/22

Page 10: Determining Three-way Decision Regions with Gini Coe cients

Introduction Rough Set Regions and Gini Coefficients Setting Objective Functions Example Conclusion

Gini coefficients of decision regions (cont.)

The conditional probabilities can be computed by:

Pr(C |POS(α,β)(C)) =|C ∩ POS(α,β)(C)||POS(α,β)(C)| ,

Pr(C |NEG(α,β)(C)) =|C ∩ NEG(α,β)(C)||NEG(α,β)(C)| ,

Pr(C |BND(α,β)(C)) =|C ∩ BND(α,β)(C)||BND(α,β)(C)| .

Conditional probabilities of C c can be computed in the similar way.

footlineY. Zhang & J. T. Yao Determining Three-way Decision Regions with Gini Coefficients 10/22

Page 11: Determining Three-way Decision Regions with Gini Coe cients

Introduction Rough Set Regions and Gini Coefficients Setting Objective Functions Example Conclusion

Gini coefficients of decision regions (cont.)

The relative Gini coefficients of acceptance, rejection, andnon-commitment decision regions are:

GP(α, β) = Pr(POS(α,β)(C))× Gini(POS(α,β)(C), πC ),

GN(α, β) = Pr(NEG(α,β)(C))× Gini(NEG(α,β)(C), πC ),

GB(α, β) = Pr(BND(α,β)(C))× Gini(BND(α,β)(C), πC ).

The probabilities of three decision regions are:

Pr(POS(α,β)(C)) =|POS(α,β)(C)|

|U| ,

Pr(NEG(α,β)(C)) =|NEG(α,β)(C)|

|U| ,

Pr(BND(α,β)(C)) =|BND(α,β)(C)|

|U| .

footlineY. Zhang & J. T. Yao Determining Three-way Decision Regions with Gini Coefficients 11/22

Page 12: Determining Three-way Decision Regions with Gini Coe cients

Introduction Rough Set Regions and Gini Coefficients Setting Objective Functions Example Conclusion

Minimizing the overall Gini coefficient

• The overall Gini coefficient of three regions:

Gsum(α, β) = GP(α, β) + GN(α, β) + GB(α, β).

• The aim is to minimize Gsum(α, β) to obtain the decisionregions.

• The problem of finding optimal threshold pairs can beformulated as the optimization problem.

• Objective function:

(α, β) = {(α, β)|MIN(Gsum(α, β))}.

footlineY. Zhang & J. T. Yao Determining Three-way Decision Regions with Gini Coefficients 12/22

Page 13: Determining Three-way Decision Regions with Gini Coe cients

Introduction Rough Set Regions and Gini Coefficients Setting Objective Functions Example Conclusion

Minimizing the overall Gini coefficient (cont.)

• The relationship between the changes of decision regions andGini coefficients of regions.

GP GN GB Gsum

(α, β) = (1, 0) 0 0 max GB

(α ↓, β) ↗ 0 ↘ GP + GB

(α, β ↑) 0 ↗ ↘ GN + GB

(α ↓, β ↑) ↗ ↗ ↘ GP + GN + GB

(α, β) = (γ, γ) ↗ ↗ 0 GP + GN

• More than one threshold pair corresponding to the minimaloverall Gini coefficient.

• Search strategies.

footlineY. Zhang & J. T. Yao Determining Three-way Decision Regions with Gini Coefficients 13/22

Page 14: Determining Three-way Decision Regions with Gini Coe cients

Introduction Rough Set Regions and Gini Coefficients Setting Objective Functions Example Conclusion

Minimizing the difference between Gini coefficients

• Immediate decision regions: acceptance and rejection decisionregion.

• Immediate decision regions VS. non-commitment decisionregions.

• The difference between Gini coefficients of immediate andnon-commitment decision regions:

Gdiff (α, β) = (GP(α, β) + GN(α, β))− GB(α, β).

• Objective function:

(α, β) = {(α, β)|MIN (|Gdiff (α, β)|)}.

footlineY. Zhang & J. T. Yao Determining Three-way Decision Regions with Gini Coefficients 14/22

Page 15: Determining Three-way Decision Regions with Gini Coe cients

Introduction Rough Set Regions and Gini Coefficients Setting Objective Functions Example Conclusion

Minimizing the difference between Gini coefficients (cont.)

• The relationship between the changes of decision regions andGini coefficients of regions.

GP + GN GB Gdiff

(α, β) = (1, 0) 0 max −GB

(α ↓, β) ↗ ↘ GP − GB

(α, β ↑) ↗ ↘ GN − GB

(α ↓, β ↑) ↗ ↘ (GP + GN )− GB

(α, β) = (γ, γ) ↗ 0 GP + GN

• More than one pair can be obtained.

• Heuristic search strategies.

footlineY. Zhang & J. T. Yao Determining Three-way Decision Regions with Gini Coefficients 15/22

Page 16: Determining Three-way Decision Regions with Gini Coe cients

Introduction Rough Set Regions and Gini Coefficients Setting Objective Functions Example Conclusion

Setting limits for three Gini coefficients

• Treat three decision regions individually.

• Try to keep the Gini coefficients of them less simultaneously.

• Objective function:

(α, β) = {(α, β)|GP(α, β) ≤ cP ∧ GN(α, β) ≤ cN ∧ GB(α, β) ≤ cB}.

• The specific limits cP , cB and cN can be designated by usersor experts, or evaluated by statistical results. The suitablelimits are important for obtaining effective decision regions.

footlineY. Zhang & J. T. Yao Determining Three-way Decision Regions with Gini Coefficients 16/22

Page 17: Determining Three-way Decision Regions with Gini Coe cients

Introduction Rough Set Regions and Gini Coefficients Setting Objective Functions Example Conclusion

Setting limits for three Gini coefficients (cont.)

• The relationship between the changes of decision regions andGini coefficients of regions.

GP GN GB

(α, β) = (1, 0) 0 0 max(α ↓, β) ↗ 0 ↘(α, β ↑) 0 ↗ ↘

(α ↓, β ↑) ↗ ↗ ↘(α, β) = (γ, γ) ↗ ↗ 0

• The decrease of α causes the increase in Gini coefficient ofacceptance region while the decrease in Gini coefficient ofnon-commitment region.

• The increase of β causes the increase in Gini coefficient ofrejection region while the decrease in Gini coefficient ofnon-commitment region.

footlineY. Zhang & J. T. Yao Determining Three-way Decision Regions with Gini Coefficients 17/22

Page 18: Determining Three-way Decision Regions with Gini Coe cients

Introduction Rough Set Regions and Gini Coefficients Setting Objective Functions Example Conclusion

Summary of the experimental data

X1 X2 X3 X4 X5 X6 X7 X8

Pr(Xi ) 0.093 0.088 0.093 0.089 0.069 0.046 0.019 0.015Pr(C |Xi ) 1 0.978 0.95 0.91 0.89 0.81 0.72 0.61

X9 X10 X11 X12 X13 X14 X15 X16

Pr(Xi ) 0.016 0.02 0.059 0.04 0.087 0.075 0.098 0.093Pr(C |Xi ) 0.42 0.38 0.32 0.29 0.2 0.176 0.1 0

footlineY. Zhang & J. T. Yao Determining Three-way Decision Regions with Gini Coefficients 18/22

Page 19: Determining Three-way Decision Regions with Gini Coe cients

Introduction Rough Set Regions and Gini Coefficients Setting Objective Functions Example Conclusion

Gini coefficients of regions for different thresholds pairs

PPPPPαβ

0.0 0.1 0.2

GP , GB , GN GP , GB , GN GP , GB , GN1.0 0.0000, 0.3995, 0.0000 0.0000, 0.3332, 0.0186 0.0000, 0.2014, 0.07160.9 0.0280, 0.2563, 0.0000 0.0280, 0.2199, 0.0186 0.0280, 0.1378, 0.07160.8 0.0579, 0.1617, 0.0000 0.0579, 0.1382, 0.0186 0.0579, 0.0811, 0.07160.7 0.0672, 0.1453, 0.0000 0.0672, 0.1233, 0.0186 0.0672, 0.0691, 0.07160.6 0.0773, 0.1336, 0.0000 0.0773, 0.1125, 0.0186 0.0773, 0.0599, 0.0716

PPPPPαβ

0.3 0.4 0.5

1.0 0.0000, 0.1658, 0.0902 0.0000, 0.0906, 0.1309 0.0000, 0.0757, 0.14070.9 0.0280, 0.1132, 0.0902 0.0280, 0.0572, 0.1309 0.0280, 0.0448, 0.14070.8 0.0579, 0.0634, 0.0902 0.0579, 0.0242, 0.1309 0.0579, 0.0150, 0.14070.7 0.0672, 0.0521, 0.0902 0.0672, 0.0155, 0.1309 0.0672, 0.0071 0.14070.6 0.0773, 0.0432, 0.0902 0.0773, 0.0078, 0.1309 0.0773, 0.0000 0.1407

footlineY. Zhang & J. T. Yao Determining Three-way Decision Regions with Gini Coefficients 19/22

Page 20: Determining Three-way Decision Regions with Gini Coe cients

Introduction Rough Set Regions and Gini Coefficients Setting Objective Functions Example Conclusion

Result decision regions

• Determining decision regions by minimizing the overall Ginicoefficient:Gsum(0.7, 0.2) = 0.2079 is minimal, so (α, β) = (0.7, 0.2).

• Determining decision regions by minimizing the differencebetween Gini coefficients of immediate and non-commitmentdecision regions:Gdiff (0.7, 0.3) = 0.0291 is minimal, so (α, β) = (0.7, 0.3).

• Determining decision regions by setting specific limits forGP(α, β) ≤ 0.06, GB(α, β) ≤ 0.1, and GN(α, β) ≤ 0.08:(α, β) = {(0.8, 0.2), (0.8, 0.3)}.

footlineY. Zhang & J. T. Yao Determining Three-way Decision Regions with Gini Coefficients 20/22

Page 21: Determining Three-way Decision Regions with Gini Coe cients

Introduction Rough Set Regions and Gini Coefficients Setting Objective Functions Example Conclusion

Concluding remarks

• Use Gini coefficient to measure the distribution of threedecision regions defined by rough set model.

• Effective decision regions can be obtained by adjusting Ginicoefficients of decision regions to satisfy defined objectivefunctions.

• Three objective functions are discussed.

• Future Work• Search strategies and learning mechanisms for obtaining

effective decision regions.• Determination of suitable specific limits for each decision

region.

footlineY. Zhang & J. T. Yao Determining Three-way Decision Regions with Gini Coefficients 21/22

Page 22: Determining Three-way Decision Regions with Gini Coe cients

Introduction Rough Set Regions and Gini Coefficients Setting Objective Functions Example Conclusion

Questions?

footlineY. Zhang & J. T. Yao Determining Three-way Decision Regions with Gini Coefficients 22/22