Make every interaction count Decision Trees: Profiling and Segmentation Sachin Chincholi, Professional Services Starting in 15 minutesStarting in 10 minutesStarting

Make every interaction count™

Decision Trees:Profiling and SegmentationSachin Chincholi, Professional Services

Starting in 15 minutesStarting in 10 minutesStarting in 5 minutesStarting in 2 minutesStarting now

USA: 1 866 793 4279Austria 0800 28 1673Belgium: 0800 505 60Canada: 1 866 270 8076India 000800 100 6558Republic of Ireland: 1800 944 607Netherlands 0800 0233 593Norway: 800 164 90Spain 900 801 508Sweden: 0200 125 679UK: 0808 109 1441International: +44 20 8609 1476Access code 131716 #

Portrait Software Copyright 2007 CUSTOMER CONFIDENTIAL

How to ask a Question

Portrait Software Copyright 2007

Decision Trees: Profiling and Segmentation

– Presenter: Sachin Chincholi, Professional Services

– Audience: Existing Quadstone Users


Decision Trees for insight

+ Transparent – Easily understandable by non-statisticians– Sanity check your modelling framework

– Is your objective defined correctly?– Are the initial splits plausible?

+ Fast to build– Quick alert to possible contamination


Decision Trees for Modeling

+ Transparent– Easier to get buy-in from the business– Easy to code

+ Non-parametric– No assumptions about underlying distributions of Analysis Candidates

+ Non-linear– Allow easy discovery of non-linear patterns (age vs. income)

– ‘Unstable’– Different populations give very different trees


Interpreting a decision tree

≥ 40

The split at Age = 40 is the most predictive

< 40 Age

#2 #3

50.2% of 2030220.1% of 79698

Age Income

Color is used to show match rates

#1

Objective: Response match = 26.2% of 100000

Match rate for the objective over the entire population


Decision tree build process

– Given an objective, Decision Tree Builder will find the most predictive split among all possible splits, with all analysis candidates, given the current binnings

– The population is then split into two segments based on this– The same method splits each of the two segments into two further segments– This process continues until the tree is finished, as determined by the tree constraints


Choice of a decision tree split

– Each possible split is assigned a quality value– The splits are ranked:

– The quality value depends on the tree type:– Binary outcome tree and classification tree: Information gain– Regression tree: R2

Split Quality Value

Income < 40000? 0.205MaritalStatus = Single? 0.201Income < 30000? 0.199

… …


0.11

Choice of a decision tree split (2)

Objective: Response Level: 1

Age 18 20 30 40 50 60 65

Income 0 10000 20000 30000 40000 50000 100000

LoanAmount 0 2000 10000 20000 30000 50000 100000

MaritalStatusSingle Married Widow

0.1040.1050.1210.1450.132

0.186 0.2050.1930.199 0.156

0.0980.1630.1690.123

0.205

0.111

0.2010.1690.180Misc.

0.1750.1000.131


Splitting criterion

–Information = Σ p(c).log(p(c))

– Sum of (proportion C x log(proportion(C)) for all C’s

– Equivalent to likelihood-ratio test for comparing two populations

– Seeks to separate out classes, while minimising small nodes

c=1,n


Is the decision tree any good (binary case)?

Proportion of actual non- matches

1

Proportion of actual matches

0

0.5

10.5

Gini “curve”

0

Sort by predicted propensity


Calculating the Gini value

Gini = A/B x 100%

Gini “curve”

A

B


Gini “curves”

Perfect model

Totally unpredictive model


Overfitting

Predictivepower

Complexity (relative to dataset size)

apparent

actual

overfitting

*


Best Practice

– Derive a Training-Test field

– Group “too small” categories

– Reduce number of categories

– Watch number of responses per node

– (Watch confidence intervals of prediction)

– Auto-pruning


Best Practice






– Auto-pruning


Best Practise






– Auto-pruning


0%

20%

40%

60%

80%

100%

120%

140%

1 2 3

Population

perc

ent o

f mea

n

Confidence interval for 100 responses…

1000 10,000 100,000

Mean

Upper

Lower


Upper and Lower Confidence intervals, 95% confidenceEg. 50 responses out of N, suggests that the true (population) mean is 95% likely to

be between 75% and 130% of the observed (sample) mean.

1015

2550

100300 500 1000 3000 5000 10000

162

146131

121112 109 106 104 103 102

20

40

60

80

100

120

140

160

180

200

10 100 1000 10000

Number of Responses

Perc

ent o

f obs

erve

d m

ean

Confidence intervals


What makes a good segment?

0.00%

2.00%

4.00%

6.00%

8.00%

10.00%

12.00%

1 2 3

Node

Res

pons

e ra

teIf this is the average…

Is this worth knowing?

Is this?







– Auto-pruning

Best Practice


Possible splits scale exponentially

1

10

100

1000

10000

100000

1000000

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

Number of categories

Num

ber o

f spl

its







– Auto-pruning

Best Practice







– Auto-pruning

Best Practise







– Auto-pruning

Best Practise


Reporting on your model

– Audit the model you build

– Monitor future ‘through the door’ populations


Where to find out more

– Quadstone System Support website:

http://support.quadstone.com/info/releases/#qs5.3

– Documentation– What’s new in the Quadstone System 5.3 release notes– Updated Quadstone System help (F1)– Updated Quadstone System data-build command and TML reference– Updated Data Build Manager reference– Updated Quadstone System administration reference– Customer-specific release notes

– Quadstone System Support– Web Site: http://support.quadstone.com/– Email: [email protected]– Tel: US 1-800-335-3860; All +44 131 240 3140

http://support.quadstone.com/info/releases/

http://support.quadstone.com/



mailto:[email protected]

Portrait Software Copyright 2007 Friday, May 5, 2023 Page 28Portrait Software Copyright 2008www.portraitsoftware.com

Asia PacificLevel 715-17 Young StreetSydney NSW 2000AustraliaF: +61 2 8004 9600

Questions?

EMEA (Headquarters)The Smith Centre, The FairmileHenley-on-Thames, Oxfordshire,RG9 6AB, United KingdomT: +44 (0)1491 416600F: +44 (0)1491 416601

The Americas125 Summer Street16th FloorBoston MA 02110, USAT: +1 617 457-5200F: +1 617 457-5299

Asia PacificLevel 715-17 Young StreetSydney NSW 2000AustraliaF: +61 2 8004 9600

Documents

Make every interaction count Decision Trees: Profiling and Segmentation Sachin Chincholi, Professional Services Starting in 15 minutesStarting in 10 minutesStarting