Transcript
Page 1: PhD Defense - Example-Dependent Cost-Sensitive Classification

EXAMPLE-DEPENDENT COST-SENSITIVE CLASSIFICATION

applications in financial risk modelingand marketing analytics

September 15, 2015

Alejandro Correa Bahnsenwith

Djamila Aouada, SnTBjΓΆrn Ottersten, SnT

Page 2: PhD Defense - Example-Dependent Cost-Sensitive Classification

Motivation

2

β€’ Classification: predicting the class ofa set of examples given theirfeatures.

β€’ Standard classification methods aimat minimizing the errors

β€’ Such a traditional frameworkassumes that all misclassificationerrors carry the same cost

β€’ This is not the case in many real-world applications: Credit cardfraud detection, churn modeling, credit scoring, direct marketing.

x1

x2

Page 3: PhD Defense - Example-Dependent Cost-Sensitive Classification

Motivation

3

β€’ Credit card fraud detection, failing to detect a fraudulenttransaction may have an economical impact from a few tothousands of Euros, depending on the particular transaction andcard holder.

β€’ Credit scoring, accepting loans from bad customers does not havethe same economical loss, since customers have different creditlines, therefore, different profit.

β€’ Churn modeling, misidentifying a profitable or unprofitable churnerhas a significant different economic result.

β€’ Direct marketing, wrongly predicting that a customer will notaccept an offer when in fact he will, may have different financialimpact, as not all customers generate the same profit.

Page 4: PhD Defense - Example-Dependent Cost-Sensitive Classification

β€’ Motivation

β€’ Cost-sensitive classificationBackground

β€’ Real-world cost-sensitive applicationsCredit card fraud detection, churn modeling, credit scoring, direct marketing

β€’ Proposed cost-sensitive algorithmsBayes minimum risk, cost-sensitive logistic regression, cost-sensitive decision trees,ensembles of cost-sensitive decision trees

β€’ ExperimentsExperimental setup, results

β€’ ConclusionsContributions, future work

Agenda

4

Page 5: PhD Defense - Example-Dependent Cost-Sensitive Classification

predict the class of set of examples given their features

𝑓: 𝑆 β†’ 0,1

Where each element of 𝑆 is composed by 𝑋𝑖 = π‘₯𝑖1, π‘₯𝑖

2, … , π‘₯π‘–π‘˜

It is usually evaluated using a traditional misclassification measures such as

Accuracy, F1Score, AUC, among others.

However, these measures assume that different misclassification errorscarry the same cost

Background - Binary classification

5

Page 6: PhD Defense - Example-Dependent Cost-Sensitive Classification

We define a cost measure based on the cost matrix [Elkan 2001]

From which we calculate the π‘ͺ𝒐𝒔𝒕 of applying a classifier to a given set

πΆπ‘œπ‘ π‘‘ 𝑓 𝑆 =

𝑖=1

𝑁

𝑦𝑖 𝑐𝑖𝐢𝑇𝑃𝑖+ 1 βˆ’ 𝑐𝑖 𝐢𝐹𝑁𝑖

+ 1 βˆ’ 𝑦𝑖 𝑐𝑖𝐢𝐹𝑃𝑖+ 1 βˆ’ 𝑐𝑖 𝐢𝑇𝑁𝑖

Background - Cost-sensitive evaluation

6

Actual Positiveπ’šπ’Š = 𝟏

Actual Negativeπ’šπ’Š = 𝟎

Predicted Positiveπ’„π’Š = 𝟏

𝐢𝑇𝑃𝑖𝐢𝐹𝑃𝑖

Predicted Negativeπ’„π’Š = 𝟎

𝐢𝐹𝑁𝑖𝐢𝑇𝑁𝑖

Page 7: PhD Defense - Example-Dependent Cost-Sensitive Classification

However, the total cost may not be easy to interpret. Therefore, we proposea π‘Ίπ’‚π’—π’Šπ’π’ˆπ’” measure as the cost vs. the cost of using no algorithm at all

π‘†π‘Žπ‘£π‘–π‘›π‘”π‘  𝑓 𝑆 =πΆπ‘œπ‘ π‘‘π‘™ 𝑓 𝑆 βˆ’ πΆπ‘œπ‘ π‘‘ 𝑓 𝑆

πΆπ‘œπ‘ π‘‘π‘™ 𝑓 𝑆

Where π‘ͺ𝒐𝒔𝒕𝒍 𝒇 𝑺 is the cost of predicting the costless class

πΆπ‘œπ‘ π‘‘π‘™ 𝑓 𝑆 = min πΆπ‘œπ‘ π‘‘ 𝑓0 𝑆 , πΆπ‘œπ‘ π‘‘ 𝑓1 𝑆

Background - Cost-sensitive evaluation

7

Page 8: PhD Defense - Example-Dependent Cost-Sensitive Classification

Research in example-dependent cost-sensitive classification has beennarrow, mostly because of the lack of publicly available datasets [Aodhaand Brostow 2013].

Standard approaches consist in re-weighting the training examples basedon their costs:

β€’ Cost-proportionate rejection sampling [Zadrozny et al. 2003]

β€’ Cost-proportionate oversampling [Elkan 2001]

Background - State-of-the-art methods

8

Page 9: PhD Defense - Example-Dependent Cost-Sensitive Classification

β€’ Motivation

β€’ Cost-sensitive classificationBackground

β€’ Real-world cost-sensitive applicationsCredit card fraud detection, churn modeling, credit scoring, direct marketing

β€’ Proposed cost-sensitive algorithmsBayes minimum risk, cost-sensitive logistic regression, cost-sensitive decision trees,ensembles of cost-sensitive decision trees

β€’ ExperimentsExperimental setup, results

β€’ ConclusionsContributions, future work

Agenda

9

Page 10: PhD Defense - Example-Dependent Cost-Sensitive Classification

Estimate the probability of a transaction being fraud based on analyzingcustomer patterns and recent fraudulent behavior

Issues when constructing a fraud detection system [Bolton et al., 2002]:β€’ Skewness of the dataβ€’ Cost-sensitivityβ€’ Short time response of the systemβ€’ Dimensionality of the search spaceβ€’ Feature preprocessing

Credit card fraud detection

10

Page 11: PhD Defense - Example-Dependent Cost-Sensitive Classification

Credit card fraud detection is a cost-sensitive problem. As the cost due to afalse positive is different than the cost of a false negative.

β€’ False positives: When predicting a transaction as fraudulent, when infact it is not a fraud, there is an administrative cost that is incurred bythe financial institution.

β€’ False negatives: Failing to detect a fraud, the amount of that transactionis lost.

Moreover, it is not enough to assume a constant cost difference betweenfalse positives and false negatives, as the amount of the transactions variesquite significantly.

Credit card fraud detection

11

Page 12: PhD Defense - Example-Dependent Cost-Sensitive Classification

Cost matrix

A. Correa Bahnsen, A. Stojanovic, D. Aouada, and B. Ottersten, β€œCost Sensitive Credit Card Fraud DetectionUsing Bayes Minimum Risk,” in 2013 12th International Conference on Machine Learning and Applications.Miami, USA: IEEE, Dec. 2013, pp. 333–338.

Credit card fraud detection

12

Actual Positiveπ’šπ’Š = 𝟏

Actual Negativeπ’šπ’Š = 𝟎

Predicted Positiveπ’„π’Š = 𝟏

𝐢𝑇𝑃𝑖= πΆπ‘Ž 𝐢𝐹𝑃𝑖

= πΆπ‘Ž

Predicted Negativeπ’„π’Š = 𝟎

𝐢𝐹𝑁𝑖= π΄π‘šπ‘‘π‘– 𝐢𝑇𝑁𝑖

= 0

Page 13: PhD Defense - Example-Dependent Cost-Sensitive Classification

Raw features

Credit card fraud detection

13

Attribute name Description

Transaction ID Transaction identification number

Time Date and time of the transaction

Account number Identification number of the customer

Card number Identification of the credit card

Transaction type ie. Internet, ATM, POS, ...

Entry mode ie. Chip and pin, magnetic stripe, ...

Amount Amount of the transaction in Euros

Merchant code Identification of the merchant type

Merchant group Merchant group identification

Country Country of trx

Country 2 Country of residence

Type of card ie. Visa debit, Mastercard, American Express...

Gender Gender of the card holder

Age Card holder age

Bank Issuer bank of the card

Page 14: PhD Defense - Example-Dependent Cost-Sensitive Classification

Transaction aggregation strategy [Whitrow, 2008]

Credit card fraud detection

14

Raw Features

TrxId Time Type Country Amt

1 1/1 18:20 POS Lux 250

2 1/1 20:35 POS Lux 400

3 1/1 22:30 ATM Lux 250

4 2/1 00:50 POS Ger 50

5 2/1 19:18 POS Ger 100

6 2/1 23:45 POS Ger 150

7 3/1 06:00 POS Lux 10

Aggregated Features

No Trxlast 24h

Amt last 24h

No Trxlast 24h

same type and country

Amt last 24h same type and country

0 0 0 0

1 250 1 250

2 650 0 0

3 900 0 0

3 700 1 50

2 150 2 150

3 400 0 0

Page 15: PhD Defense - Example-Dependent Cost-Sensitive Classification

Proposed periodic features

When is a customer expected to make a new transaction?

Considering a von Mises distribution with a period of 24 hours such that

𝑃(π‘‘π‘–π‘šπ‘’) ~ π‘£π‘œπ‘›π‘šπ‘–π‘ π‘’π‘  πœ‡, 𝜎 =𝑒 πœŽπ‘π‘œπ‘ (π‘‘π‘–π‘šπ‘’βˆ’πœ‡)

2πœ‹πΌ0 𝜎

where 𝝁 is the mean, 𝝈 is the standard deviation, and π‘°πŸŽ is the Bessel function

Credit card fraud detection

15

Page 16: PhD Defense - Example-Dependent Cost-Sensitive Classification

Proposed periodic features

A. Correa Bahnsen, A. Stojanovic, D. Aouada, and B. Ottersten, β€œFeature Engineering Strategies for

Credit Card Fraud Detection” submitted to Expert Systems with Applications.

Credit card fraud detection

16

Page 17: PhD Defense - Example-Dependent Cost-Sensitive Classification

Classify which potential customers are likely to default a contractedfinancial obligation based on the customer’s past financial experience.

It is a cost-sensitive problem as the cost associated with approving a badcustomer, i.e., false negative, is quite different from the cost associatedwith declining a good customer, i.e., false positive. Furthermore, the costsare not constant among customers. This is because loans have differentcredit line amounts, terms, and even interest rates.

Credit scoring

17

Page 18: PhD Defense - Example-Dependent Cost-Sensitive Classification

Cost matrix

A. Correa Bahnsen, D. Aouada, and B. Ottersten, β€œExample-Dependent Cost-Sensitive Logistic Regression forCredit Scoring,” in 2014 13th International Conference on Machine Learning and Applications. Detroit, USA:IEEE, 2014, pp. 263–269.

Credit scoring

18

Actual Positiveπ’šπ’Š = 𝟏

Actual Negativeπ’šπ’Š = 𝟎

Predicted Positiveπ’„π’Š = 𝟏

𝐢𝑇𝑃𝑖= 0 𝐢𝐹𝑃𝑖

= π‘Ÿπ‘– + πΆπΉπ‘ƒπ‘Ž

Predicted Negativeπ’„π’Š = 𝟎

𝐢𝐹𝑁𝑖= 𝐢𝑙𝑖 βˆ— 𝐿𝑔𝑑 𝐢𝑇𝑁𝑖

= 0

Page 19: PhD Defense - Example-Dependent Cost-Sensitive Classification

Predict the probability of a customer defecting using historical, behavioraland socioeconomical information.

This tool is of great benefit to subscription based companies allowing themto maximize the results of retention campaigns.

Churn modeling

19

Page 20: PhD Defense - Example-Dependent Cost-Sensitive Classification

Churn management campaign [Verbraken, 2013]

Churn modeling

20

Inflow

New Customers

Customer Base

Active Customers

Predicted Churners

Predicted Non-Churners

TP: Actual Churners

FP: Actual Non-Churners

FN: Actual Churners

TN: Actual Non-Churners

Outflow

Effective Churners

Churn Model Prediction

1

1

𝟏 βˆ’ 𝜸𝜸

1

Page 21: PhD Defense - Example-Dependent Cost-Sensitive Classification

Proposed financial evaluation of a churn campaign

A. Correa Bahnsen, A. Stojanovic, D. Aouada, and B. Ottersten, β€œA novel cost-sensitiveframework for customer churn predictive modeling,” Decision Analytics, vol. 2:5, 2015.

Churn modeling

21

Inflow

New Customers

Customer Base

Active Customers

Predicted Churners

Predicted Non-Churners

TP: Actual Churners

FP: Actual Non-Churners

FN: Actual Churners

TN: Actual Non-Churners

Outflow

Effective Churners

Churn Model Prediction

𝟎

π‘ͺ𝑳𝑽

π‘ͺ𝑳𝑽 + π‘ͺ𝒂π‘ͺ𝒐 + π‘ͺ𝒂

π‘ͺ𝒐 + π‘ͺ𝒂

Page 22: PhD Defense - Example-Dependent Cost-Sensitive Classification

Cost matrix

A. Correa Bahnsen, A. Stojanovic, D. Aouada, and B. Ottersten, β€œA novel cost-sensitiveframework for customer churn predictive modeling,” Decision Analytics, vol. 2:5, 2015.

Churn modeling

22

Actual Positiveπ’šπ’Š = 𝟏

Actual Negativeπ’šπ’Š = 𝟎

Predicted Positiveπ’„π’Š = 𝟏

𝐢𝑇𝑃𝑖= π›Ύπ‘–πΆπ‘œπ‘– +

1 βˆ’ 𝛾𝑖 𝐢𝐿𝑉𝑖 + πΆπ‘ŽπΆπΉπ‘ƒπ‘–

= πΆπ‘œπ‘– + πΆπ‘Ž

Predicted Negativeπ’„π’Š = 𝟎

𝐢𝐹𝑁𝑖= 𝐢𝐿𝑉𝑖 𝐢𝑇𝑁𝑖

= 0

Page 23: PhD Defense - Example-Dependent Cost-Sensitive Classification

Classify those customers who are more likely to have a certain response to amarketing campaign.

This problem is example-dependent cost sensitive, as the false positiveshave the cost of contacting the client, and false negatives have the cost dueto the loss of income by failing to making the an offer to the right customer.

Direct marketing

23

Page 24: PhD Defense - Example-Dependent Cost-Sensitive Classification

Cost matrix

A. Correa Bahnsen, A. Stojanovic, D. Aouada, and B. Ottersten, β€œImproving Credit Card Fraud Detection withCalibrated Probabilities,” in Proceedings of the fourteenth SIAM International Conference on Data Mining,

Philadelphia, USA, 2014, pp. 677 – 685.

Direct marketing

24

Actual Positiveπ’šπ’Š = 𝟏

Actual Negativeπ’šπ’Š = 𝟎

Predicted Positiveπ’„π’Š = 𝟏

𝐢𝑇𝑃𝑖= πΆπ‘Ž 𝐢𝐹𝑃𝑖

= πΆπ‘Ž

Predicted Negativeπ’„π’Š = 𝟎

𝐢𝐹𝑁𝑖= 𝐼𝑛𝑑𝑖 𝐢𝑇𝑁𝑖

= 0

Page 25: PhD Defense - Example-Dependent Cost-Sensitive Classification

β€’ Motivation

β€’ Cost-sensitive classificationBackground

β€’ Real-world cost-sensitive applicationsCredit card fraud detection, churn modeling, credit scoring, direct marketing

β€’ Proposed cost-sensitive algorithmsBayes minimum risk, cost-sensitive logistic regression, cost-sensitive decision trees,ensembles of cost-sensitive decision trees

β€’ ExperimentsExperimental setup, results

β€’ ConclusionsContributions, future work

Agenda

25

Page 26: PhD Defense - Example-Dependent Cost-Sensitive Classification

β€’ Bayes minimum risk (BMR)A. Correa Bahnsen, A. Stojanovic, D. Aouada, and B. Ottersten, β€œCost Sensitive Credit Card Fraud DetectionUsing Bayes Minimum Risk,” in 2013 12th International Conference on Machine Learning and Applications.Miami, USA: IEEE, Dec. 2013, pp. 333–338.

A. Correa Bahnsen, Aouada, and B. Ottersten, β€œImproving Credit Card Fraud Detection with CalibratedProbabilities,” in Proceedings of the fourteenth SIAM International Conference on Data Mining, Philadelphia,USA, 2014, pp. 677 – 685.

β€’ Cost-sensitive logistic regression (CSLR)A. Correa Bahnsen, D. Aouada, and B. Ottersten, β€œExample-Dependent Cost-Sensitive Logistic Regression forCredit Scoring,” in 2014 13th International Conference on Machine Learning and Applications. Detroit, USA:IEEE, 2014, pp. 263–269.

β€’ Cost-sensitive decision trees (CSDT)A. Correa Bahnsen, D. Aouada, and B. Ottersten, β€œExample-Dependent Cost-Sensitive Decision Trees,” ExpertSystems with Applications, vol. 42:19, 2015.

β€’ Ensembles of cost-sensitive decision trees (ECSDT)A. Correa Bahnsen, D. Aouada, and B. Ottersten, β€œEnsemble of Example-Dependent Cost-Sensitive DecisionTrees,” IEEE Transactions on Knowledge and Data Engineering, vol. under review, 2015.

Proposed cost-sensitive algorithms

26

Page 27: PhD Defense - Example-Dependent Cost-Sensitive Classification

Decision model based on quantifying tradeoffs between various decisionsusing probabilities and the costs that accompany such decisions

Risk of classification

𝑅 𝑐𝑖 = 0|π‘₯𝑖 = 𝐢𝑇𝑁𝑖 1 βˆ’ 𝑝𝑖 + 𝐢𝐹𝑁𝑖 βˆ™ 𝑝𝑖

𝑅 𝑐𝑖 = 1|π‘₯𝑖 = 𝐢𝐹𝑃𝑖 1 βˆ’ 𝑝𝑖 + 𝐢𝑇𝑃𝑖 βˆ™ 𝑝𝑖

Using the different risks the prediction is made based on the following condition:

𝑐𝑖 = 0 𝑅 𝑐𝑖 = 0|π‘₯𝑖 ≀ 𝑅 𝑐𝑖 = 1|π‘₯𝑖1 π‘œπ‘‘β„Žπ‘’π‘Ÿπ‘€π‘–π‘ π‘’

Bayes Minimum Risk

27

Page 28: PhD Defense - Example-Dependent Cost-Sensitive Classification

Cost-Sensitive Logistic Regression

28

β€’ Logistic Regression Model

𝑝𝑖 = 𝑃 𝑦𝑖 = 0|π‘₯𝑖 = β„Žπœƒ π‘₯𝑖 = 𝑔

𝑗=1

π‘˜

πœƒπ‘—π‘₯𝑖𝑗

β€’ Cost Function

𝐽𝑖 πœƒ = βˆ’π‘¦π‘– log β„Žπœƒ π‘₯𝑖 βˆ’ 1 βˆ’ 𝑦𝑖 log 1 βˆ’ β„Žπœƒ π‘₯𝑖

β€’ Cost Analysis

𝐽𝑖 πœƒ β‰ˆ 0 𝑖𝑓 𝑦𝑖 β‰ˆ β„Žπœƒ π‘₯𝑖𝑖𝑛𝑓 𝑖𝑓 𝑦𝑖 β‰ˆ 1 βˆ’ β„Žπœƒ π‘₯𝑖

𝐢𝑇𝑃𝑖 = 𝐢𝑇𝑁𝑖 β‰ˆ 0

𝐢𝐹𝑃𝑖 = 𝐢𝐹𝑁𝑖 β‰ˆ ∞

Page 29: PhD Defense - Example-Dependent Cost-Sensitive Classification

Cost-Sensitive Logistic Regression

29

β€’ Actual Costs

𝐽𝑐 πœƒ =

𝐢𝑇𝑃𝑖𝑖𝑓 𝑦𝑖 = 1 π‘Žπ‘›π‘‘ β„Žπœƒ π‘₯𝑖 β‰ˆ 1

𝐢𝑇𝑁𝑖𝑖𝑓 𝑦𝑖 = 0 π‘Žπ‘›π‘‘ β„Žπœƒ π‘₯𝑖 β‰ˆ 0

𝐢𝐹𝑃𝑖𝑖𝑓 𝑦𝑖 = 0 π‘Žπ‘›π‘‘ β„Žπœƒ π‘₯𝑖 β‰ˆ 1

𝐢𝐹𝑁𝑖𝑖𝑓 𝑦𝑖 = 1 π‘Žπ‘›π‘‘ β„Žπœƒ π‘₯𝑖 β‰ˆ 0

β€’ Proposed Cost-Sensitive Function

𝐽𝑐 πœƒ =1

𝑁

𝑖=1

𝑁

𝑦𝑖 β„Žπœƒ π‘₯𝑖 𝐢𝑇𝑃𝑖+ 1 βˆ’ β„Žπœƒ π‘₯𝑖 𝐢𝐹𝑁𝑖

+

1 βˆ’ 𝑦𝑖 β„Žπœƒ π‘₯𝑖 𝐢𝐹𝑃𝑖+ 1 βˆ’ β„Žπœƒ π‘₯𝑖 𝐢𝑇𝑁𝑖

Page 30: PhD Defense - Example-Dependent Cost-Sensitive Classification

β€’ A decision tree is a classification model that iteratively creates binary

decision rules π‘₯𝑗, π‘™π‘šπ‘— that maximize certain criteria (gain, entropy, …).

Where π‘₯𝑗, π‘™π‘šπ‘— refers to making a rule using feature j on value m

β€’ Maximize the accuracy is different than maximizing the cost.

β€’ To solve this, some studies had been proposed method that aim tointroduce the cost-sensitivity into the algorithms [Lomax 2013].However, research have been focused on class-dependent methods

β€’ We proposed:

β€’ Example-dependent cost based impurity measure

β€’ Example-dependent cost based pruning criteria

Cost-Sensitive Decision trees

30

Page 31: PhD Defense - Example-Dependent Cost-Sensitive Classification

Cost-Sensitive Decision trees

31

Proposed Cost based impurity measure

𝑆𝑙 = π‘₯|π‘₯𝑖 ∈ 𝑆 ∧ π‘₯𝑖𝑗≀ π‘™π‘š

π‘—π‘†π‘Ÿ = π‘₯|π‘₯𝑖 ∈ 𝑆 ∧ π‘₯𝑖

𝑗> π‘™π‘š

𝑗

β€’ The impurity of each leaf is calculated using:

𝐼𝑐 𝑆 = min πΆπ‘œπ‘ π‘‘ 𝑓0 𝑆 , πΆπ‘œπ‘ π‘‘ 𝑓1 𝑆

𝑓 𝑆 = 0 𝑖𝑓 πΆπ‘œπ‘ π‘‘ 𝑓0 𝑆 ≀ πΆπ‘œπ‘ π‘‘ 𝑓1 𝑆

1 π‘œπ‘‘β„Žπ‘’π‘Ÿπ‘€π‘–π‘ π‘’

β€’ Afterwards the gain of applying a given rule to the set 𝑆 is:

πΊπ‘Žπ‘–π‘›π‘ π‘₯𝑗 , π‘™π‘šπ‘—

= 𝐼𝑐 πœ‹1 βˆ’ 𝐼𝑐 πœ‹1𝑙 + 𝐼𝑐 πœ‹1

π‘Ÿ

S

S Sπ‘₯𝑗 , π‘™π‘š

𝑗

Page 32: PhD Defense - Example-Dependent Cost-Sensitive Classification

Cost-Sensitive Decision trees

32

Decision trees constructionβ€’ The rule that maximizes the gain is selected

𝑏𝑒𝑠𝑑π‘₯, 𝑏𝑒𝑠𝑑𝑙 = π‘Žπ‘Ÿπ‘”max𝑗,π‘š

πΊπ‘Žπ‘–π‘› π‘₯𝑗 , π‘™π‘šπ‘—

S

S S

S S S S

S S S S

β€’ The process is repeated until a stopping criteria is met:

Page 33: PhD Defense - Example-Dependent Cost-Sensitive Classification

Cost-Sensitive Decision trees

33

Proposed cost-sensitive pruning criteriaβ€’ Calculation of the Tree savings and pruned Tree savings

S

S S

S S S S

S S S S

𝑃𝐢𝑐 =πΆπ‘œπ‘ π‘‘ 𝑓 𝑆, π‘‡π‘Ÿπ‘’π‘’ βˆ’ πΆπ‘œπ‘ π‘‘ 𝑓 𝑆, 𝐸𝐡 π‘‡π‘Ÿπ‘’π‘’, π‘π‘Ÿπ‘Žπ‘›π‘β„Ž

π‘‡π‘Ÿπ‘’π‘’ βˆ’ 𝐸𝐡 π‘‡π‘Ÿπ‘’π‘’, π‘π‘Ÿπ‘Žπ‘›π‘β„Ž

β€’ After calculating the pruning criteria for all possible trees. The maximumimprovement is selected and the Tree is pruned.

β€’ Later the process is repeated until there is no further improvement.

S

S S

S S S S

S S

S

S S

S S

Page 34: PhD Defense - Example-Dependent Cost-Sensitive Classification

Typical ensemble is made by combining T different base classifiers. Eachbase classifiers is trained by applying algorithm M in a random subset

Ensembles of Cost-Sensitive Decision trees

34

𝑀𝑗 ← 𝑀 𝑆𝑗 βˆ€π‘— ∈ 1,… , 𝑇

Page 35: PhD Defense - Example-Dependent Cost-Sensitive Classification

The core principle in ensemble learning, is to induce random perturbationsinto the learning procedure in order to produce several different baseclassifiers from a single training set, then combining the base classifiers inorder to make the final prediction.

Ensembles of Cost-Sensitive Decision trees

35

Page 36: PhD Defense - Example-Dependent Cost-Sensitive Classification

Ensembles of Cost-Sensitive Decision trees

36

1 2 3 4 5 6 7 8

862 5213 6

7 1 2 3 8

1 5 814 4 21

9461

1 5 814 4 21

1 5 814 4 21

1 5 814 4 21

Bagging Pasting Random forest Random patches

Training set

Page 37: PhD Defense - Example-Dependent Cost-Sensitive Classification

After the base classifiers are constructed they are typically combined usingone of the following methods:

β€’ Majority voting

𝐻 𝑆 = π‘“π‘šπ‘£ 𝑆,𝑀 = π‘Žπ‘Ÿπ‘” maxπ‘βˆˆ 0,1

𝑗=1

𝑇

1𝑐 𝑀𝑗 𝑆

β€’ Proposed cost-sensitive weighted voting

𝐻 𝑆 = 𝑓𝑀𝑣 𝑆,𝑀, 𝛼 = π‘Žπ‘Ÿπ‘” maxπ‘βˆˆ 0,1

𝑗=1

𝑇

𝛼𝑗1𝑐 𝑀𝑗 𝑆

𝛼𝑗 =1βˆ’πœ€ 𝑀𝑗 𝑆𝑗

π‘œπ‘œπ‘

𝑗1=1𝑇 1βˆ’πœ€ 𝑀𝑗1 𝑆𝑗1

π‘œπ‘œπ‘π›Όπ‘— =

π‘†π‘Žπ‘£π‘–π‘›π‘”π‘  𝑀𝑗 π‘†π‘—π‘œπ‘œπ‘

𝑗1=1𝑇 π‘†π‘Žπ‘£π‘–π‘›π‘”π‘  𝑀𝑗1 𝑆𝑗1

π‘œπ‘œπ‘

Ensembles of Cost-Sensitive Decision trees

37

Page 38: PhD Defense - Example-Dependent Cost-Sensitive Classification

β€’ Proposed cost-sensitive stacking

𝐻 𝑆 = 𝑓𝑠 𝑆,𝑀, 𝛽 =1

1 + π‘’βˆ’ 𝑗=1

𝑇 𝛽𝑗𝑀𝑗 𝑆

Using the cost-sensitive logistic regression [Correa et. al, 2014] model:

𝐽 𝑆,𝑀, 𝛽 =

𝑖=1

𝑁

𝑦𝑖 𝑓𝑠 𝑆,𝑀, 𝛽 πΆπ‘‡π‘ƒπ‘–βˆ’ 𝐢𝐹𝑁𝑖

+ 𝐢𝐹𝑁𝑖+

1 βˆ’ 𝑦𝑖 𝑓𝑠 𝑆,𝑀, 𝛽 πΆπΉπ‘ƒπ‘–βˆ’ 𝐢𝑇𝑁𝑖

+ 𝐢𝑇𝑁𝑖

Then the weights are estimated using 𝛽 = π‘Žπ‘Ÿπ‘”min

𝛽𝐽 𝑆,𝑀, 𝛽

Ensembles of Cost-Sensitive Decision trees

38

Page 39: PhD Defense - Example-Dependent Cost-Sensitive Classification

β€’ Motivation

β€’ Cost-sensitive classificationBackground

β€’ Real-world cost-sensitive applicationsCredit card fraud detection, churn modeling, credit scoring, direct marketing

β€’ Proposed cost-sensitive algorithmsBayes minimum risk, cost-sensitive logistic regression, cost-sensitive decision trees,ensembles of cost-sensitive decision trees

β€’ ExperimentsExperimental setup, results

β€’ ConclusionsContributions, future work

Agenda

39

Page 40: PhD Defense - Example-Dependent Cost-Sensitive Classification

Experimental setup - Datasets

40

Database # Examples % Positives Cost (Euros)

Fraud 1,638,772 0.21% 860,448

Churn 9,410 4.83% 580,884

Kaggle Credit 112,915 6.74% 8,740,181

PAKDD09 Credit 38,969 19.88% 3,117,960

Direct Marketing 37,931 12.62% 59,507

Page 41: PhD Defense - Example-Dependent Cost-Sensitive Classification

β€’ Cost-insensitive (CI):

β€’ Decision trees (DT)

β€’ Logistic regression (LR)

β€’ Random forest (RF)

β€’ Under-sampling (u)

β€’ Cost-proportionate sampling (CPS):

β€’ Cost-proportionate rejection-sampling (r)

β€’ Cost-proportionate over-sampling (o)

β€’ Bayes minimum risk (BMR)

β€’ Cost-sensitive training (CST):

β€’ Cost-sensitive logistic regression (CSLR)

β€’ Cost-sensitive decision trees (CSDT)

Experimental setup - Methods

41

Page 42: PhD Defense - Example-Dependent Cost-Sensitive Classification

β€’ Ensemble cost-sensitive decision trees (ECSDT):

Random inducers:

β€’ Bagging (CSB)

β€’ Pasting (CSP)

β€’ Random forest (CSRF)

β€’ Random patches (CSRP)

Combination:

β€’ Majority voting (mv)

β€’ Cost-sensitive weighted voting (wv)

β€’ Cost-sensitive staking (s)

Experimental setup - Methods

42

Page 43: PhD Defense - Example-Dependent Cost-Sensitive Classification

β€’ Each experiment was carried out 50 times

β€’ For the parameters of the algorithms a grid search was made

β€’ Results are measured by savings and F1Score

β€’ Then the Friedman ranking is calculated for each method

Experimental setup

43

Page 44: PhD Defense - Example-Dependent Cost-Sensitive Classification

Results

44

Percentage of the highest savings

Database Algorithm SavingsSavings (Euros)

Fraud CSRP-wv-t 0.73 628,127

Churn CSRP-s-t 0.17 98,750

Credit1 CSRP-mv-t 0.52 4,544,894

Credit2 LR-t-BMR 0.31 966,568

Marketing LR-t-BMR 0.51 30,349

% Pos.

0.21

4.83

6.74

19.88

12.62

Page 45: PhD Defense - Example-Dependent Cost-Sensitive Classification

Results

45

Results of the Friedman rank of the savings (1=best, 28=worst)

Family Algorithm Rank

ECSDT CSRP-wv-t 2.6

ECSDT CSRP-s-t 3.4

ECSDT CSRP-mv-t 4

ECSDT CSB-wv-t 5.6

ECSDT CSP-wv-t 7.4

ECSDT CSB-mv-t 8.2

ECSDT CSRF-wv-t 9.4

BMR RF-t-BMR 9.4

ECSDT CSP-s-t 9.6

ECSDT CSP-mv-t 10.2

ECSDT CSB-s-t 10.2

BMR LR-t-BMR 11.2

CPS RF-r 11.6

CST CSDT-t 12.6

Family Algorithm Rank

CST CSLR-t 14.4

ECSDT CSRF-mv-t 15.2

ECSDT CSRF-s-t 16

CI RF-u 17.2

CPS LR-r 19

BMR DT-t-BMR 19

CPS LR-o 21

CPS DT-r 22.6

CI LR-u 22.8

CPS RF-o 22.8

CI DT-u 24.4

CPS DT-o 25

CI DT-t 26

CI RF-t 26.2

Page 46: PhD Defense - Example-Dependent Cost-Sensitive Classification

Results

46

Results of the Friedman rank of the savings organized by family

Page 47: PhD Defense - Example-Dependent Cost-Sensitive Classification

Results within the ECSDT family

47

By combination methodBy random inducer

Page 48: PhD Defense - Example-Dependent Cost-Sensitive Classification

Results

48

Comparison of the Friedman ranking of the savings and F1Score sorted byF1Score ranking

Page 49: PhD Defense - Example-Dependent Cost-Sensitive Classification

β€’ Motivation

β€’ Cost-sensitive classificationBackground

β€’ Real-world cost-sensitive applicationsCredit card fraud detection, churn modeling, credit scoring, direct marketing

β€’ Proposed cost-sensitive algorithmsBayes minimum risk, cost-sensitive logistic regression, cost-sensitive decision trees,ensembles of cost-sensitive decision trees

β€’ ExperimentsExperimental setup, results

β€’ ConclusionsContributions, future work

Agenda

49

Page 50: PhD Defense - Example-Dependent Cost-Sensitive Classification

β€’ New framework of example dependent cost-sensitive classification

β€’ Using five databases, from four real-world applications: credit card frauddetection, churn modeling, credit scoring and direct marketing, we showthat the proposed algorithms significantly outperforms the state-of-the-art cost-insensitive and example-dependent cost-sensitivealgorithms

β€’ Highlight the importance of using the real example-dependent financialcosts associated with the real-world applications

Conclusions

50

Page 51: PhD Defense - Example-Dependent Cost-Sensitive Classification

β€’ Multi-class example-dependent cost-sensitive classification

β€’ Cost-sensitive calibration

β€’ Staking cost-sensitive decision trees

β€’ Example-dependent cost-sensitive boosting

β€’ Online example-dependent cost-sensitive classification

Future research directions

51

Page 52: PhD Defense - Example-Dependent Cost-Sensitive Classification

Contributions - Papers

Date Name Conference / Journal Status

July 2013

Cost Sensitive Credit Card Fraud Detection using Bayes Minimum Risk

IEEE International Conference on Machine Learning and Applications

Published

October 2013

Improving Credit Card Fraud Detection with Calibrated Probabilities

SIAM International Conference on Data Mining

Published

June2014

Credit Scoring usingCost-Sensitive Logistic Regression

IEEE International Conference on Machine Learning and Applications

Published

October 2014

Example-Dependent Cost-Sensitive Decision Trees

Expert Systems with Applications Published

January 2015

A novel cost-sensitive framework for customer churn predictive modeling

Decision Analytics Published

March 2015

Ensemble of Example-Dependent Cost-Sensitive Decision Trees

IEEE Transactions on Knowledge and Data Engineering

Under review

March 2015

Feature Engineering Strategies for Credit Card Fraud Detection

Expert Systems with ApplicationsUnder review

June 2015

Detecting Credit Card Fraud using Periodic Features

IEEE International Conference on Machine Learning and Applications

In Press

52

Page 53: PhD Defense - Example-Dependent Cost-Sensitive Classification

Contributions - Costcla

costcla is a Python module for cost-sensitive classification built on top ofScikit-Learn, SciPy and distributed underthe 3-Clause BSD license.

In particular, it provides:

β€’ A set of example-dependent cost-sensitive algorithms

β€’ Different real-world example-dependent cost-sensitive datasets.

Installation

pip install costcla

Documentation:https://pythonhosted.org/costcla/

53

Page 54: PhD Defense - Example-Dependent Cost-Sensitive Classification

Thank You!!

Alejandro Correa Bahnsen

Page 55: PhD Defense - Example-Dependent Cost-Sensitive Classification

Contact information

Alejandro Correa Bahnsen

University of Luxembourg

albahnsen.com

[email protected]

http://www.linkedin.com/in/albahnsen

https://github.com/albahnsen/CostSensitiveClassification


Recommended