Upload
jack-jackson
View
214
Download
0
Embed Size (px)
DESCRIPTION
Portrait Software Copyright 2007 Decision Trees: Profiling and Segmentation –Presenter: Sachin Chincholi, Professional Services –Audience: Existing Quadstone Users
Citation preview
Make every interaction count™
Decision Trees:Profiling and SegmentationSachin Chincholi, Professional Services
Starting in 15 minutesStarting in 10 minutesStarting in 5 minutesStarting in 2 minutesStarting now
USA: 1 866 793 4279Austria 0800 28 1673Belgium: 0800 505 60Canada: 1 866 270 8076India 000800 100 6558Republic of Ireland: 1800 944 607Netherlands 0800 0233 593Norway: 800 164 90Spain 900 801 508Sweden: 0200 125 679UK: 0808 109 1441International: +44 20 8609 1476Access code 131716 #
Portrait Software Copyright 2007 CUSTOMER CONFIDENTIAL
How to ask a Question
Portrait Software Copyright 2007
Decision Trees: Profiling and Segmentation
– Presenter: Sachin Chincholi, Professional Services
– Audience: Existing Quadstone Users
Portrait Software Copyright 2007
Decision Trees for insight
+ Transparent – Easily understandable by non-statisticians– Sanity check your modelling framework
– Is your objective defined correctly?– Are the initial splits plausible?
+ Fast to build– Quick alert to possible contamination
Portrait Software Copyright 2007
Decision Trees for Modeling
+ Transparent– Easier to get buy-in from the business– Easy to code
+ Non-parametric– No assumptions about underlying distributions of Analysis Candidates
+ Non-linear– Allow easy discovery of non-linear patterns (age vs. income)
– ‘Unstable’– Different populations give very different trees
Portrait Software Copyright 2007
Interpreting a decision tree
≥ 40
The split at Age = 40 is the most predictive
< 40 Age
#2 #3
50.2% of 2030220.1% of 79698
Age Income
Color is used to show match rates
#1
Objective: Response match = 26.2% of 100000
Match rate for the objective over the entire population
Portrait Software Copyright 2007
Decision tree build process
– Given an objective, Decision Tree Builder will find the most predictive split among all possible splits, with all analysis candidates, given the current binnings
– The population is then split into two segments based on this– The same method splits each of the two segments into two further segments– This process continues until the tree is finished, as determined by the tree constraints
Portrait Software Copyright 2007
Choice of a decision tree split
– Each possible split is assigned a quality value– The splits are ranked:
– The quality value depends on the tree type:– Binary outcome tree and classification tree: Information gain– Regression tree: R2
Split Quality Value
Income < 40000? 0.205MaritalStatus = Single? 0.201Income < 30000? 0.199
… …
Portrait Software Copyright 2007
0.11
Choice of a decision tree split (2)
Objective: Response Level: 1
Age 18 20 30 40 50 60 65
Income 0 10000 20000 30000 40000 50000 100000
LoanAmount 0 2000 10000 20000 30000 50000 100000
MaritalStatusSingle Married Widow
0.1040.1050.1210.1450.132
0.186 0.2050.1930.199 0.156
0.0980.1630.1690.123
0.205
0.111
0.2010.1690.180Misc.
0.1750.1000.131
Portrait Software Copyright 2007
Splitting criterion
–Information = Σ p(c).log(p(c))
– Sum of (proportion C x log(proportion(C)) for all C’s
– Equivalent to likelihood-ratio test for comparing two populations
– Seeks to separate out classes, while minimising small nodes
c=1,n
Portrait Software Copyright 2007
Is the decision tree any good (binary case)?
Proportion of actual non- matches
1
Proportion of actual matches
0
0.5
10.5
Gini “curve”
0
Sort by predicted propensity
Portrait Software Copyright 2007
Calculating the Gini value
Gini = A/B x 100%
Gini “curve”
A
B
Portrait Software Copyright 2007
Gini “curves”
Perfect model
Totally unpredictive model
Portrait Software Copyright 2007
Overfitting
Predictivepower
Complexity (relative to dataset size)
apparent
actual
overfitting
*
Portrait Software Copyright 2007
Best Practice
– Derive a Training-Test field
– Group “too small” categories
– Reduce number of categories
– Watch number of responses per node
– (Watch confidence intervals of prediction)
– Auto-pruning
Portrait Software Copyright 2007
Best Practice
– Derive a Training-Test field
– Group “too small” categories
– Reduce number of categories
– Watch number of responses per node
– (Watch confidence intervals of prediction)
– Auto-pruning
Portrait Software Copyright 2007
Best Practise
– Derive a Training-Test field
– Group “too small” categories
– Reduce number of categories
– Watch number of responses per node
– (Watch confidence intervals of prediction)
– Auto-pruning
Portrait Software Copyright 2007
0%
20%
40%
60%
80%
100%
120%
140%
1 2 3
Population
perc
ent o
f mea
n
Confidence interval for 100 responses…
1000 10,000 100,000
Mean
Upper
Lower
Portrait Software Copyright 2007
Upper and Lower Confidence intervals, 95% confidenceEg. 50 responses out of N, suggests that the true (population) mean is 95% likely to
be between 75% and 130% of the observed (sample) mean.
1015
2550
100300 500 1000 3000 5000 10000
162
146131
121112 109 106 104 103 102
20
40
60
80
100
120
140
160
180
200
10 100 1000 10000
Number of Responses
Perc
ent o
f obs
erve
d m
ean
Confidence intervals
Portrait Software Copyright 2007
What makes a good segment?
0.00%
2.00%
4.00%
6.00%
8.00%
10.00%
12.00%
1 2 3
Node
Res
pons
e ra
teIf this is the average…
Is this worth knowing?
Is this?
Portrait Software Copyright 2007
– Derive a Training-Test field
– Group “too small” categories
– Reduce number of categories
– Watch number of responses per node
– (Watch confidence intervals of prediction)
– Auto-pruning
Best Practice
Portrait Software Copyright 2007
Possible splits scale exponentially
1
10
100
1000
10000
100000
1000000
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
Number of categories
Num
ber o
f spl
its
Portrait Software Copyright 2007
– Derive a Training-Test field
– Group “too small” categories
– Reduce number of categories
– Watch number of responses per node
– (Watch confidence intervals of prediction)
– Auto-pruning
Best Practice
Portrait Software Copyright 2007
– Derive a Training-Test field
– Group “too small” categories
– Reduce number of categories
– Watch number of responses per node
– (Watch confidence intervals of prediction)
– Auto-pruning
Best Practise
Portrait Software Copyright 2007
– Derive a Training-Test field
– Group “too small” categories
– Reduce number of categories
– Watch number of responses per node
– (Watch confidence intervals of prediction)
– Auto-pruning
Best Practise
Portrait Software Copyright 2007
Reporting on your model
– Audit the model you build
– Monitor future ‘through the door’ populations
Portrait Software Copyright 2007
Where to find out more
– Quadstone System Support website:
http://support.quadstone.com/info/releases/#qs5.3
– Documentation– What’s new in the Quadstone System 5.3 release notes– Updated Quadstone System help (F1)– Updated Quadstone System data-build command and TML reference– Updated Data Build Manager reference– Updated Quadstone System administration reference– Customer-specific release notes
– Quadstone System Support– Web Site: http://support.quadstone.com/– Email: [email protected]– Tel: US 1-800-335-3860; All +44 131 240 3140
Portrait Software Copyright 2007 Friday, May 5, 2023 Page 28Portrait Software Copyright 2008www.portraitsoftware.com
Asia PacificLevel 715-17 Young StreetSydney NSW 2000AustraliaF: +61 2 8004 9600
Questions?
EMEA (Headquarters)The Smith Centre, The FairmileHenley-on-Thames, Oxfordshire,RG9 6AB, United KingdomT: +44 (0)1491 416600F: +44 (0)1491 416601
The Americas125 Summer Street16th FloorBoston MA 02110, USAT: +1 617 457-5200F: +1 617 457-5299
Asia PacificLevel 715-17 Young StreetSydney NSW 2000AustraliaF: +61 2 8004 9600