Copyright © 2014 SAS Institute Inc. All rights reserved. #analytics2014
Maximizing a Churn Campaign’s Profitability With Cost-Sensitive
Predictive Analytics
Alejandro Correa Bahnsen, Luxembourg University Andres Felipe Gonzalez Montoya, DIRECTV
Copyright © 2014, SAS Institute Inc. All rights reserved. #analytics2014
Agenda
• Churn modeling
• Evaluation Measures
• Offers
• Predictive modeling
• Cost-Sensitive Predictive Modeling
Cost Proportionate Sampling
Bayes Minimum Risk
CS – Decision Trees
• Conclusions
Copyright © 2014, SAS Institute Inc. All rights reserved. #analytics2014
Churn Modeling
• Detect which customers are likely to abandon
Voluntary churn
Involuntary churn
Copyright © 2014, SAS Institute Inc. All rights reserved. #analytics2014
Customer Churn Management Campaign
Inflow
New Customers
Customer Base
Active Customers
*Verbraken et. al (2013). A novel profit maximizing metric for measuring classification performance of customer churn prediction models.
Predicted Churners
Predicted Non-Churners
TP: Actual Churners
FP: Actual Non-Churners
FN: Actual Churners
TN: Actual Non-Churners
Outflow
Effective Churners
Churn Model Prediction
1
1
1 − 𝛾 𝛾
1
Copyright © 2014, SAS Institute Inc. All rights reserved. #analytics2014
Evaluation of a Campaign
• Confusion Matrix
• Accuracy =𝑇𝑃+𝑇𝑁
𝑇𝑃+𝑇𝑁+𝐹𝑃+𝐹𝑁
• Recall =𝑇𝑃
𝑇𝑃+𝐹𝑁
• Precision =𝑇𝑃
𝑇𝑃+𝐹𝑃
• F1-Score = 2𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 ∗ 𝑅𝑒𝑐𝑎𝑙𝑙
𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛+𝑅𝑒𝑐𝑎𝑙𝑙
True Class (𝑦𝑖)
Churner (𝑦𝑖=1) Non-Churner(𝑦𝑖=0)
Predicted class (𝑐𝑖)
Churner (𝑐𝑖=1) TP FP
Non-Churner (𝑐𝑖=0) FN TN
Copyright © 2014, SAS Institute Inc. All rights reserved. #analytics2014
Evaluation of a Campaign
• However these measures assign the same weight to different errors
• Not the case in a Churn model since Failing to predict a churner carries a different cost than wrongly
predicting a non-churner
Churners have different financial impact
Copyright © 2014, SAS Institute Inc. All rights reserved. #analytics2014
Financial Evaluation of a Campaign
Inflow
New Customers
Customer Base
Active Customers
*Verbraken et. al (2013). A novel profit maximizing metric for measuring classification performance of customer churn prediction models.
Predicted Churners
Predicted Non-Churners
TP: Actual Churners
FP: Actual Non-Churners
FN: Actual Churners
TN: Actual Non-Churners
Outflow
Effective Churners
Churn Model Prediction
0
𝐶𝐿𝑉
𝐶𝐿𝑉 + 𝐶𝑎 𝐶𝑜 + 𝐶𝑎
𝐶𝑜 + 𝐶𝑎
Copyright © 2014, SAS Institute Inc. All rights reserved. #analytics2014
Financial Evaluation of a Campaign
• Cost Matrix
where:
True Class (𝑦𝑖)
Churner (𝑦𝑖=1) Non-Churner(𝑦𝑖=0)
Predicted class (𝑐𝑖)
Churner (𝑐𝑖=1)
Non-Churner (𝑐𝑖=0)
𝐶𝑎 = Administrative cost 𝐶𝐿𝑉𝑖 = Client Lifetime Value of customer 𝑖
𝐶𝑜𝑖 = Cost of the offer made to
customer 𝑖
𝛾𝑖 = Probability that customer 𝑖 accepts the offer
𝐶𝑇𝑃𝑖= 𝛾𝑖𝐶𝑜𝑖 + 1 − 𝛾𝑖 𝐶𝐿𝑉𝑖 + 𝐶𝑎
𝐶𝐹𝑁𝑖= 𝐶𝐿𝑉𝑖 𝐶𝑇𝑁𝑖
= 0
𝐶𝐹𝑃𝑖= 𝐶𝑜𝑖 + 𝐶𝑎
Copyright © 2014, SAS Institute Inc. All rights reserved. #analytics2014
Financial Evaluation of a Campaign • Using the cost matrix the total cost is calculated as:
𝐶 = 𝑦𝑖 𝑐𝑖 ∙ 𝐶𝑇𝑃𝑖 + 1 − 𝑐𝑖 𝐶𝐹𝑁𝑖 + 1 − 𝑦𝑖 𝑐𝑖 ∙ 𝐶𝐹𝑃𝑖 + 1 − 𝑐𝑖 𝐶𝑇𝑁𝑖
• Additionally the savings are defined as:
𝐶𝑠 =𝐶0 − 𝐶
𝐶0
where 𝐶0 is the cost when all the customers are predicted as non-churners
Copyright © 2014, SAS Institute Inc. All rights reserved. #analytics2014
• Customer Lifetime Value
Financial Evaluation of a Campaign
*Glady et al. (2009). Modeling churn using customer lifetime value.
Copyright © 2014, SAS Institute Inc. All rights reserved. #analytics2014
Agenda
• Churn modeling
• Evaluation Measures
• Offers
• Predictive modeling
• Cost-Sensitive Predictive Modeling
Cost Proportionate Sampling
Bayes Minimum Risk
CS – Decision Trees
• Conclusions
Copyright © 2014, SAS Institute Inc. All rights reserved. #analytics2014
Offers
• Same offer may not apply to all customers (eg. Already have premium channels)
• An offer should be made such that it maximizes the probability of acceptance (𝛾) and CLV
Copyright © 2014, SAS Institute Inc. All rights reserved. #analytics2014
Offers clusters
Copyright © 2014, SAS Institute Inc. All rights reserved. #analytics2014
Offers Analysis
Improve to HD DVR
Monthly Discount
Premium Channels
Evaluate Offers
Performance
Copyright © 2014, SAS Institute Inc. All rights reserved. #analytics2014
Offers Analysis
88%
90%
92%
94%
96%
98%
100%
0.0%
1.0%
2.0%
3.0%
4.0%
5.0%
6.0%
Cluster 1 Cluster 2 Cluster 3 Cluster 4
Churn Rate Gamma (right axis)
𝛾 = Probability that a customer accepts the offer
Copyright © 2014, SAS Institute Inc. All rights reserved. #analytics2014
Predictive Modeling
• Using predictive analytics for detecting the behavioral patterns of those customer's who had defect in the past
Copyright © 2014, SAS Institute Inc. All rights reserved. #analytics2014
Predictive Modeling
• Then check which of the current customers share the same patterns
Copyright © 2014, SAS Institute Inc. All rights reserved. #analytics2014
Predictive Modeling
• Dataset
Dataset N Churn 𝑪𝟎 (Euros)
Total 9410 4.83% 580,884
Training 3758 5.05% 244,542
Validation 2824 4.77% 174,171
Testing 2825 4.42% 162,171
Under-Sampling 374 50.80% 244,542
Copyright © 2014, SAS Institute Inc. All rights reserved. #analytics2014
Predictive Modeling
• Algorithms
Decision Trees
Logistic Regression
Random Forest
Copyright © 2014, SAS Institute Inc. All rights reserved. #analytics2014
Predictive Modeling - Results
0
0.02
0.04
0.06
0.08
0.1
0.12
0.14
DecisionTrees
LogisticRegression
RandomForest
F1-Score
Training Under-Sampling
0%
1%
2%
3%
4%
5%
6%
7%
8%
Decision Trees LogisticRegression
RandomForest
Savings
Training Under-Sampling
Copyright © 2014, SAS Institute Inc. All rights reserved. #analytics2014
Predictive Modeling - SMOTE
• Synthetic Minority Over-sampling Technique D
im 2
Dim 1 Synthetic samples
Copyright © 2014, SAS Institute Inc. All rights reserved. #analytics2014
Predictive Modeling - SMOTE
• Dataset
Dataset N Churn 𝑪𝟎 (Euros)
Total 9410 4.83% 580,884
Training 3758 5.05% 244,542
Validation 2824 4.77% 174,171
Testing 2825 4.42% 162,171
Under-Sampling 374 50.80% 244,542
SMOTE 6988 48.94% 4,273,083
Copyright © 2014, SAS Institute Inc. All rights reserved. #analytics2014
Predictive Modeling - SMOTE
0
0.02
0.04
0.06
0.08
0.1
0.12
0.14
DecisionTrees
LogisticRegression
RandomForest
F1-Score
Training Under-Sampling SMOTE
0%
1%
2%
3%
4%
5%
6%
7%
8%
Decision Trees LogisticRegression
RandomForest
Savings
Training Under-Sampling SMOTE
Copyright © 2014, SAS Institute Inc. All rights reserved. #analytics2014
Predictive Modeling - SMOTE
• Sampling techniques helps to improve models’ predictive power however not necessarily the savings
• There is a need for methods that aim to increase savings
Copyright © 2014, SAS Institute Inc. All rights reserved. #analytics2014
Agenda
• Churn modeling
• Evaluation Measures
• Offers
• Predictive modeling
• Cost-Sensitive Predictive Modeling
Cost Proportionate Sampling
Bayes Minimum Risk
CS – Decision Trees
• Conclusions
Copyright © 2014, SAS Institute Inc. All rights reserved. #analytics2014
Cost-Sensitive Predictive Modeling
• Traditional methods assume the same cost for different errors
• Not the case in Churn modeling
• Some cost-sensitive methods assume a constant cost difference between errors
• Example-Dependent Cost-Sensitive Predictive Modeling
Copyright © 2014, SAS Institute Inc. All rights reserved. #analytics2014
Cost-Sensitive Predictive Modeling
• Changing class distribution Cost Proportionate Rejection Sampling
Cost Proportionate Over Sampling
• Direct Cost Bayes Minimum Risk
• Modifying a learning algorithm CS – Decision Tree
Copyright © 2014, SAS Institute Inc. All rights reserved. #analytics2014
Cost Proportionate Sampling
• Normalized Cost weight
𝑤𝑖 = 𝐶𝐹𝑃𝑖 𝑖𝑓 𝑦𝑖 = 0
𝐶𝐹𝑁𝑖 𝑖𝑓 𝑦𝑖 = 1
𝑤 𝑖 =𝑤𝑖
max𝑗
𝑤𝑗
Copyright © 2014, SAS Institute Inc. All rights reserved. #analytics2014
Cost Proportionate Sampling
• Cost Proportionate Over Sampling
Example 𝑦𝑖 𝑤𝑖
1 0 1
2 1 10
3 0 2
4 1 20
5 0 1
Initial Dataset
(1,0,1) (2,1,10) (3,0,2)
(4,1,20) (5,0,1)
Cost Proportionate Dataset
(1,0,1) (2,1,1), (2,1,1), …, (2,1,1)
(3,0,2), (3,0,2) (4,1,1), (4,1,1), (4,1,1), …, (4,1,1), (4,1,1)
(5,0,1)
*Elkan, C. (2001). The Foundations of Cost-Sensitive Learning.
Copyright © 2014, SAS Institute Inc. All rights reserved. #analytics2014
Cost Proportionate Sampling
• Cost Proportionate Rejection Sampling
Example 𝑦𝑖 𝑤𝑖
1 0 1
2 1 10
3 0 2
4 1 20
5 0 1
Initial Dataset
(1,0,1) (2,1,10) (3,0,2)
(4,1,20) (5,0,1)
Cost Proportionate
Dataset
(2,1,1) (4,1,1) (4,1,1) (5,0,1)
*Zadrozny et al. (2003). Cost-sensitive learning by cost-proportionate example weighting.
𝑤 𝑖
0.05
0.5
0.1
1
0.05
Copyright © 2014, SAS Institute Inc. All rights reserved. #analytics2014
Cost Proportionate Sampling
• Dataset
Dataset N Churn 𝑪𝟎 (Euros)
Total 9410 4.83% 580,884
Training 3758 5.05% 244,542
Validation 2824 4.77% 174,171
Testing 2825 4.42% 162,171
Under-Sampling 374 50.80% 244,542
SMOTE 6988 48.94% 4,273,083
CS – Rejection-Sampling 428 41.35% 231,428
CS – Over-Sampling 5767 31.24% 2,350,285
Copyright © 2014, SAS Institute Inc. All rights reserved. #analytics2014
Cost Proportionate Sampling
0%
5%
10%
15%
20%
25%
Decision Trees LogisticRegression
RandomForest
Savings
Training Under SMOTE
CS-Rejection CS-Over
0
0.02
0.04
0.06
0.08
0.1
0.12
0.14
DecisionTrees
LogisticRegression
RandomForest
F1-Score
Training Under SMOTE
CS-Rejection CS-Over
Copyright © 2014, SAS Institute Inc. All rights reserved. #analytics2014
• Decision model based on quantifying tradeoffs between various decisions using probabilities and the costs that accompany such decisions
• Risk of classification 𝑅 𝑐𝑖 = 0|𝑥𝑖 = 𝐶𝑇𝑁𝑖 1 − 𝑝 𝑖 + 𝐶𝐹𝑁𝑖 ∙ 𝑝 𝑖
𝑅 𝑐𝑖 = 1|𝑥𝑖 = 𝐶𝐹𝑃𝑖 1 − 𝑝 𝑖 + 𝐶𝑇𝑃𝑖 ∙ 𝑝 𝑖
Bayes Minimum Risk
Copyright © 2014, SAS Institute Inc. All rights reserved. #analytics2014
• Using the different risks the prediction is made based on the following condition:
𝑐𝑖 = 0 𝑅 𝑐𝑖 = 0|𝑥𝑖 ≤ 𝑅 𝑐𝑖 = 1|𝑥𝑖 1 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
• Example-dependent threshold
𝑡𝐵𝑀𝑅𝑖 =𝐶𝐹𝑃𝑖 − 𝐶𝑇𝑁𝑖
𝐶𝐹𝑁𝑖 − 𝐶𝑇𝑁𝑖 − 𝐶𝑇𝑃𝑖 + 𝐶𝐹𝑃𝑖
Bayes Minimum Risk
Copyright © 2014, SAS Institute Inc. All rights reserved. #analytics2014
Bayes Minimum Risk
0%
5%
10%
15%
20%
25%
30%
35%
- BMR - BMR - BMR
Decision Trees Logistic Regression Random Forest
Savings
Training Under-Sampling SMOTE CS-Rejection CS-Over
Copyright © 2014, SAS Institute Inc. All rights reserved. #analytics2014
Bayes Minimum Risk
0
0.02
0.04
0.06
0.08
0.1
0.12
0.14
- BMR - BMR - BMR
Decision Trees Logistic Regression Random Forest
F1-Score
Training Under-Sampling SMOTE CS-Rejection CS-Over
Copyright © 2014, SAS Institute Inc. All rights reserved. #analytics2014
Bayes Minimum Risk
• Bayes Minimum Risk increases the savings by using a cost-insensitive method and then introducing the costs
• Why not introduce the costs during the estimation of the methods?
Copyright © 2014, SAS Institute Inc. All rights reserved. #analytics2014
CS – Decision Trees
• Decision trees
Classification model that iteratively creates binary decision rules
𝑥𝑗 , 𝑙𝑗𝑚 that maximize certain criteria
Where 𝑥𝑗 , 𝑙𝑗𝑚 refers to making a rule using feature 𝑗 on value 𝑚
Copyright © 2014, SAS Institute Inc. All rights reserved. #analytics2014
• Decision trees – Construction
• Then the impurity of each leaf is calculated using:
Misclassification : 𝐼𝑚 𝜋1 = 1 −𝑚𝑎𝑥 𝜋1, (1 − 𝜋1)
Entropy : 𝐼𝑒 𝜋1 = −𝜋1 log 𝜋1 − 1 − 𝜋1 log (1 − 𝜋1)
Gini : 𝐼𝑔 𝜋1 = 2𝜋1 1 − 𝜋1
𝜋1is the percentage of positives.
CS – Decision Trees
𝑆
𝑆𝑙 𝑆𝑟
𝑆𝑙 = 𝑆|𝑋𝑖 ∈ 𝑆 ⋀ 𝑥𝑗𝑖≤ 𝑙𝑗𝑚 𝑆𝑟 = 𝑆|𝑋𝑖 ∈ 𝑆 ⋀ 𝑥𝑗
𝑖> 𝑙𝑗𝑚
𝑥𝑗 , 𝑙𝑗𝑚
Copyright © 2014, SAS Institute Inc. All rights reserved. #analytics2014
• Decision trees – Construction
• Afterwards the gain of applying a given rule to the set 𝑆 is:
𝐺𝑎𝑖𝑛 𝑥𝑗 , 𝑙𝑗𝑚 = 𝐼 𝜋1 −𝑆𝑙
𝑆𝐼(𝜋𝑙
1) −𝑆𝑟
𝑆𝐼(𝜋𝑟
1)
CS – Decision Trees
𝑆
𝑆𝑙 𝑆𝑟
𝑆𝑙 = 𝑆|𝑋𝑖 ∈ 𝑆 ⋀ 𝑥𝑗𝑖≤ 𝑙𝑗𝑚 𝑆𝑟 = 𝑆|𝑋𝑖 ∈ 𝑆 ⋀ 𝑥𝑗
𝑖> 𝑙𝑗𝑚
𝑥𝑗 , 𝑙𝑗𝑚
Copyright © 2014, SAS Institute Inc. All rights reserved. #analytics2014
• Decision trees – Construction • The rule that maximizes the gain is selected
𝑏𝑒𝑠𝑡𝑥, 𝑏𝑒𝑠𝑡𝑙 = argmax(𝑗,𝑚)
𝐺𝑎𝑖𝑛 𝑥𝑗 , 𝑙𝑗𝑚
• The process is repeated until a stopping criteria is met:
CS – Decision Trees
S
S S
S S S S
S S S S
Copyright © 2014, SAS Institute Inc. All rights reserved. #analytics2014
CS – Decision Trees • Decision trees - Pruning • Calculation of the Tree error and pruned Tree error
• After calculating the pruning criteria for all possible trees. The maximum improvement is selected and the Tree is pruned.
• Later the process is repeated until there is no further improvement.
S
S S
S S S S
S S S S
S
S S
S S S S
S S
S
S S
S S
𝜖 𝑇𝑟𝑒𝑒 𝜖 𝐸𝐵(𝑇𝑟𝑒𝑒, 𝑏𝑟𝑎𝑐ℎ) − 𝜖 𝑇𝑟𝑒𝑒
𝑇𝑟𝑒𝑒 − |𝐸𝐵(𝑇𝑟𝑒𝑒, 𝑏𝑟𝑎𝑐ℎ)|
𝜖 𝐸𝐵(𝑇𝑟𝑒𝑒, 𝑏𝑟𝑎𝑐ℎ) − 𝜖 𝑇𝑟𝑒𝑒
𝑇𝑟𝑒𝑒 − |𝐸𝐵(𝑇𝑟𝑒𝑒, 𝑏𝑟𝑎𝑐ℎ)|
Copyright © 2014, SAS Institute Inc. All rights reserved. #analytics2014
CS – Decision Trees
• Maximize the accuracy is different than maximizing the cost
• To solve this, some studies had been proposed method that aim to introduce the cost-sensitivity into the algorithms
• However, research have been focused on class-dependent methods Instead we used a: Example-dependent cost based impurity measure
Example-dependent cost based pruning criteria
Copyright © 2014, SAS Institute Inc. All rights reserved. #analytics2014
CS – Decision Trees • Cost based impurity measure
• The impurity of each leaf is calculated using:
𝐼𝑐 𝑆 = 𝑚𝑖𝑛 𝐶0, 𝐶1
𝑓(𝑆) = 0 𝐶0 ≤ 𝐶1 1 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
𝑆
𝑆𝑙 𝑆𝑟
𝑆𝑙 = 𝑆|𝑋𝑖 ∈ 𝑆 ⋀ 𝑥𝑗𝑖≤ 𝑙𝑗𝑚 𝑆𝑟 = 𝑆|𝑋𝑖 ∈ 𝑆 ⋀ 𝑥𝑗
𝑖> 𝑙𝑗𝑚
𝑥𝑗 , 𝑙𝑗𝑚
Copyright © 2014, SAS Institute Inc. All rights reserved. #analytics2014
CS – Decision Trees
• Cost sensitive pruning
𝑃𝐶𝑐 =𝐶 𝐸𝐵(𝑇𝑟𝑒𝑒, 𝑏𝑟𝑎𝑐ℎ) − 𝐶 𝑇𝑟𝑒𝑒
𝑇𝑟𝑒𝑒 − |𝐸𝐵(𝑇𝑟𝑒𝑒, 𝑏𝑟𝑎𝑐ℎ)|
• New pruning criteria that evaluates the improvement in cost of eliminating a particular branch
Copyright © 2014, SAS Institute Inc. All rights reserved. #analytics2014
CS – Decision Trees
0%
10%
20%
30%
40%
50%
Error Pruning Cost Pruning
Decision Trees Cost-Sensitive Decision Trees
Savings
Training Under-Sampling SMOTE CS-Rejection CS-Over
Copyright © 2014, SAS Institute Inc. All rights reserved. #analytics2014
CS – Decision Trees
0
0.05
0.1
0.15
0.2
0.25
0.3
F1-Score
Training Under-Sampling SMOTE CS-Rejection CS-Over
Copyright © 2014, SAS Institute Inc. All rights reserved. #analytics2014
Comparison of Models
0%
10%
20%
30%
40%
50%
Random ForestTrain
Logistic RegressionCSRejection
Logistic RegressionBMR Train
Decision TreeCostPruningCSRejection
CS-Decision TreeTrain
Savings F1-Score
Copyright © 2014, SAS Institute Inc. All rights reserved. #analytics2014
Conclusions
• Selecting models based on traditional statistics does not gives the best results measured by savings
• Incorporating the costs into the modeling helps to achieve higher savings
Copyright © 2014, SAS Institute Inc. All rights reserved. #analytics2014
Other Applications • Fraud Detection
Correa Bahnsen et al. (2013). Cost Sensitive Credit Card Fraud Detection using Bayes Minimum Risk.
Correa Bahnsen, et al. (2014). Improving Credit Card Fraud Detection with Calibrated Probabilities.
• Credit Scoring Correa Bahnsen, et al. (2014). Example-Dependent Cost-Sensitive Credit
Scoring using Bayes Minimum Risk.
• Direct Marketing Correa Bahnsen, et al. (2014). Example-Dependent Cost-Sensitive Decision
Trees.
Copyright © 2014, SAS Institute Inc. All rights reserved. #analytics2014
Contact Information
Alejandro Correa Bahnsen
University of Luxembourg
Luxembourg
http://www.linkedin.com/in/albahnsen
http://www.slideshare.net/albahnsen
Andres Gonzalez Montoya
DIRECTV
Colombia
Copyright © 2014 SAS Institute Inc. All rights reserved. #analytics2014
Thank you!
Alejandro Correa Bahnsen, Luxembourg University Andres Felipe Gonzalez Montoya, DIRECTV