1 Chapter 4: Predictive Modeling 4.1 Introduction to Predictive Modeling 4.2 Predictive Modeling...

Preview:

Citation preview

1

Chapter 4: Predictive Modeling

4.1 Introduction to Predictive Modeling

4.2 Predictive Modeling Using Decision Trees

4.3 Predictive Modeling Using Logistic Regression

4.4 Churn Case Study

4.5 A Note about Model Management

4.6 Recommended Reading

2

Chapter 4: Predictive Modeling

4.1 Introduction to Predictive Modeling4.1 Introduction to Predictive Modeling

4.2 Predictive Modeling Using Decision Trees

4.3 Predictive Modeling Using Logistic Regression

4.4 Churn Case Study

4.5 A Note about Model Management

4.6 Recommended Reading

3

Objectives Explain the concepts of predictive modeling. Illustrate the modeling essentials of a predictive

model. Explain the importance of data partitioning.

4

Catalog Case StudyAnalysis Goal:

A mail-order catalog retailer wants to save money on mailing and increase revenue by targeting mailed catalogs to customers most likely to purchase in the future.

Data set: CATALOG2010

Number of rows: 48,356

Number of columns: 98

Contents: sales figures summarized across departments and quarterly totals for 5.5 years of sales

Targets: RESPOND (binary)

ORDERSIZE (continuous)

5

Where You’ve Been, Where You’re Going… With basic descriptive modeling techniques (RFM), you

identified customers who might be profitable. Sophisticated predictive modeling techniques can

produce risk scores for current customers, profitable prospects from outside the customer database, cross-sell and up-sell lists, and much more.

Scoring techniques based on predictive models can be implemented in real-time data collection systems, automating the process of fact-based decision making.

6

Descriptive Modeling Tells You about NowDescriptive statistics inform you about your sample. This information is important for reacting to things that have happened in the past.

Past BehaviorFact-Based

Reports Current State of

the Customer

7

From Descriptive to Predictive ModelingPredictive modeling techniques, paired with scoring and good model management, enable you to use your data about the past and the present to make good decisions for the future.

Fact-Based PredictionsPast Behavior

8

Predictive Modeling Terminology

The observations in a training data set are known as training cases.

The variables are called inputs and targets.

inputs target

Training Data Set

9

Predictive Model

Predictive model: a concise representation of the input and target association

Training Data Setinputs target

10

Predictive Model

predictions

Predictions: output of the predictive model given a set of input measurements

inputs

11

Modeling Essentials

Determine type of prediction.

Select useful inputs.

Optimize complexity.

12

Select useful inputs.

Optimize complexity.

Modeling Essentials

Determine type of prediction.

13

Three Prediction Types

rankings

estimates

decisionspredictioninputs

14

Decision Predictions

A predictive model usesinput measurementsto make the best decision for each case.

primary

secondary

secondary

primary

tertiary

inputs prediction

15

Ranking Predictions

A predictive model usesinput measurementsto optimally rank each case.

prediction

720

520

630

470

580

inputs

16

Estimate Predictions

A predictive model usesinput measurementsto optimally estimate the target value.

prediction

0.65

0.33

0.75

0.28

0.54

inputs

17

Idea ExchangeThink of two or three business problems that would require each of the three types of prediction. What would require a decision? How would you obtain

information to help you in making a decision based on a model score?

What would require a ranking? How would you use this ranking information?

What would require an estimate? Would you estimate a continuous quantity, a count, a proportion, or some other quantity?

18

Select useful inputs.

Optimize complexity.

Modeling Essentials – Predict Review

Determine type of prediction. Decide, rank,and estimate.

19

Select useful inputs.

Determine type of prediction.

Optimize complexity.

Modeling Essentials

20

Input Reduction Strategies

Irrelevancy

0.70

0.60

0.50

0.40

x4

x3

Redundancy

x1

x2

21

Irrelevancy

0.70

0.60

0.50

0.40

x4

x3

Input Reduction – Redundancy

Redundancy

x1

x2

Input x2 has the same information as input x1.

Example: x1 is household income and x2 is home value.

22

Redundancy

x1

x2

Input Reduction – IrrelevancyIrrelevancy

0.70

0.60

0.50

0.40

x4

x3

Predictions change with input x4 but much

less with input x3.

Example: Target is response to direct mail solicitation, x3 is religious affiliation, and x4 is response to previous solicitations.

23

Modeling Essentials – Select Review

Eradicateredundancies

and irrelevancies.

Decide, rank,and estimate.

Select useful inputs.

Determine type of prediction.

Optimize complexity.

24

Select useful inputs.

Modeling Essentials

Determine type of prediction.

Optimize complexityOptimize complexity.

25

Data PartitioningTraining Data Validation Data

Partition available data into training and validation sets.

The model is fit on the training data set, and model performance is evaluated on the validation data set.

inputs target inputs target

26

5

4

3

2

1

Predictive Model SequenceTraining Data Validation Data

Create a sequence of models with increasing complexity.

ModelComplexity

inputs target inputs target

27

5

4

3

2

1

Model Performance AssessmentTraining Data Validation Data

ModelComplexity

ValidationAssessment

Rate model performance using validation data.

inputs target inputs target

28

3

Model SelectionTraining Data Validation Data

ModelComplexity

ValidationAssessment

Select the simplest model with the highest validation assessment.

inputs target inputs target

29

4.01 Multiple Choice PollThe best model is the

a. simplest model with the best performance on the training data.

b. simplest model with the best performance on the validation data.

c. most complex model with the best performance on the training data.

d. most complex model with the best performance on the validation data.

30

4.01 Multiple Choice Poll – Correct AnswerThe best model is the

a. simplest model with the best performance on the training data.

b. simplest model with the best performance on the validation data.

c. most complex model with the best performance on the training data.

d. most complex model with the best performance on the validation data.

31

Select useful inputs.

Modeling Essentials – Optimize Review

Determine type of prediction.

Optimize complexity.

Eradicateredundancies

and irrelevancies.

Decide, rank,and estimate.

Tune models withvalidation data.

32

Chapter 4: Predictive Modeling

4.1 Introduction to Predictive Modeling

4.2 Predictive Modeling Using Decision Trees 4.2 Predictive Modeling Using Decision Trees

4.3 Predictive Modeling Using Logistic Regression

4.4 Churn Case Study

4.5 A Note about Model Management

4.6 Recommended Reading

33

Objectives Explain the concept of decision trees. Illustrate the modeling essentials of decision trees. Construct a decision tree predictive model in

SAS Enterprise Miner.

34

Modeling Essentials – Decision Trees

Determine type of prediction.

Select useful inputs.

Optimize complexity.

35

Simple Prediction Illustration

0.0 0.50.1 0.2 0.3 0.4 0.6 0.7 0.8 0.9 1.0

0.0

0.5

0.1

0.2

0.3

0.4

0.6

0.7

0.8

0.9

1.0

x1

x2

Predict dot color for each x1 and x2.

Training Data

36

Decision Tree Prediction Rules

0.0 0.50.1 0.2 0.3 0.4 0.6 0.7 0.8 0.9 1.0

0.0

0.5

0.1

0.2

0.3

0.4

0.6

0.7

0.8

0.9

1.0

x1

x2

60%55%

70%

x1

<0.52 ≥0.52 <0.51 ≥0.51x1

x2

<0.63 ≥0.63

root node

interior node

leaf node

37

Decision Tree Prediction Rules

0.0 0.50.1 0.2 0.3 0.4 0.6 0.7 0.8 0.9 1.0

0.0

0.5

0.1

0.2

0.3

0.4

0.6

0.7

0.8

0.9

1.0

x1

x2

60%55%

x1

<0.52 ≥0.52

<0.63

70%

<0.51 ≥0.51x1

x2

≥0.63

root node

interior node

leaf node

Predict:

38

≥0.51

60%55%

x1

<0.52 ≥0.52

<0.63

40%

60%55%

x1

<0.52 ≥0.52 ≥0.51

<0.63

Decision Tree Prediction Rules

0.0 0.50.1 0.2 0.3 0.4 0.6 0.7 0.8 0.9 1.0

0.0

0.5

0.1

0.2

0.3

0.4

0.6

0.7

0.8

0.9

1.0

x1

x2

Decision = Estimate = 0.70

70%

<0.51x1

x2

≥0.63

Predict:

39

Prediction rulesDetermine type of prediction.

Modeling Essentials – Decision Trees

Pruning

Split searchSelect useful inputs

Optimize complexity.

Select useful inputs.

40

Decision Tree Split Search

0.0 0.50.1 0.2 0.3 0.4 0.6 0.7 0.8 0.9 1.0

x1

0.0

0.5

0.1

0.2

0.3

0.4

0.6

0.7

0.8

0.9

1.0

x2

Calculate the logworth of every partition on input x1.

left right

Classification Matrix

41

Decision Tree Split Search

0.0 0.50.1 0.2 0.3 0.4 0.6 0.7 0.8 0.9 1.0

0.0

0.5

0.1

0.2

0.3

0.4

0.6

0.7

0.8

0.9

1.0

x1

x2

maxlogworth(x1)

0.95

0.52left right

Select the partition with the maximum logworth.

42

Decision Tree Split Search

0.0 0.50.1 0.2 0.3 0.4 0.6 0.7 0.8 0.9 1.0

0.0

0.5

0.1

0.2

0.3

0.4

0.6

0.7

0.8

0.9

1.0

x1

x2

maxlogworth(x1)

0.95

left right

53%53% 42%42%

47%47% 58%58%

Repeat for input x2.

43

Decision Tree Split Search

maxlogworth(x1)

0.95

left right

53%53% 42%42%

47%47% 58%58%

0.0 0.50.1 0.2 0.3 0.4 0.6 0.7 0.8 0.9 1.0

0.0

0.5

0.1

0.2

0.3

0.4

0.6

0.7

0.8

0.9

1.0

x1

x2

0.63

maxlogworth(x2)

4.92

bottom top

44

Decision Tree Split Search

0.0 0.50.1 0.2 0.3 0.4 0.6 0.7 0.8 0.9 1.0

0.0

0.5

0.1

0.2

0.3

0.4

0.6

0.7

0.8

0.9

1.0

x1

x2

maxlogworth(x2)

4.92

bottom top

maxlogworth(x1)

0.95

left right

Compare partition logworth ratings.

45

Decision Tree Split Search

0.0 0.50.1 0.2 0.3 0.4 0.6 0.7 0.8 0.9 1.0

0.0

0.5

0.1

0.2

0.3

0.4

0.6

0.7

0.8

0.9

1.0

x1

x2

0.63

x2<0.63 ≥0.63

Create a partition rule from the best partition across all inputs.

46

Decision Tree Split Search

0.0 0.50.1 0.2 0.3 0.4 0.6 0.7 0.8 0.9 1.0

0.0

0.5

0.1

0.2

0.3

0.4

0.6

0.7

0.8

0.9

1.0

x1

x2

x2<0.63 ≥0.63

Repeat the process in each subset.

47

Decision Tree Split Search

0.0 0.50.1 0.2 0.3 0.4 0.6 0.7 0.8 0.9 1.0

0.0

0.5

0.1

0.2

0.3

0.4

0.6

0.7

0.8

0.9

1.0

x1

x2

0.52

maxlogworth(x1)

5.72

left right

48

Decision Tree Split Search

0.0 0.50.1 0.2 0.3 0.4 0.6 0.7 0.8 0.9 1.0

0.0

0.5

0.1

0.2

0.3

0.4

0.6

0.7

0.8

0.9

1.0

x1

x2

maxlogworth(x1)

5.72

left right

61%61% 55%55%

39%39% 45%45%

0.02

maxlogworth(x2)

-2.01

bottom top

49

Decision Tree Split Search

0.0 0.50.1 0.2 0.3 0.4 0.6 0.7 0.8 0.9 1.0

0.0

0.5

0.1

0.2

0.3

0.4

0.6

0.7

0.8

0.9

1.0

x1

x2

maxlogworth(x2)

-2.01

bottom top

maxlogworth(x1)

5.72

left right

50

Decision Tree Split Search

0.0 0.50.1 0.2 0.3 0.4 0.6 0.7 0.8 0.9 1.0

0.0

0.5

0.1

0.2

0.3

0.4

0.6

0.7

0.8

0.9

1.0

x1

x2

0.52

maxlogworth(x2)

-2.01

bottom top

38%38% 55%55%

62%62% 45%45%

maxlogworth(x1)

5.72

left right

51

Decision Tree Split Search

0.0 0.50.1 0.2 0.3 0.4 0.6 0.7 0.8 0.9 1.0

0.0

0.5

0.1

0.2

0.3

0.4

0.6

0.7

0.8

0.9

1.0

x1

x2

x2

x1

<0.63 ≥0.63

<0.52 ≥0.52

Create a second partition rule.

52

Repeat to form a maximal tree.

Decision Tree Split Search

0.0 0.50.1 0.2 0.3 0.4 0.6 0.7 0.8 0.9 1.0

0.0

0.5

0.1

0.2

0.3

0.4

0.6

0.7

0.8

0.9

1.0

x1

x2

53

4.02 PollThe maximal tree is usually the tree that you use to score new data.

Yes

No

54

4.02 Poll – Correct AnswerThe maximal tree is usually the tree that you use to score new data.

Yes

No

55

Modeling Essentials – Decision Trees

Optimize complexityOptimize complexity.

Prediction rulesDetermine type of prediction.

Split searchSelect useful inputs.

56

6

5

4

3

2

Predictive Model SequenceTraining Data Validation Data

ModelComplexity

1

Create a sequence of models with increasing complexity.

inputs target inputs target

57

The Maximal TreeTraining Data Validation Data

6

5

4

3

2

ModelComplexity

1Maximal

Tree

A maximal tree is the most complex model in the sequence.

Create a sequence of models with increasing complexity.

inputs target inputs target

58

The Maximal TreeTraining Data Validation Data

5

4

3

2

ModelComplexity

1

A maximal tree is the most complex model in the sequence.

inputs target inputs target

60

Pruning One SplitTraining Data Validation Data

4

3

2

1

ModelComplexity

Each subtree’s predictive performance is rated on validation data.

inputs target inputs target

61

Pruning One SplitTraining Data Validation Data

4

3

2

1

ModelComplexity

The subtree with the highest validation assessment is selected.

inputs target inputs target

62

Pruning Two SplitsTraining Data Validation Data

4

3

2

1

ModelComplexity

Similarly, this is done for subsequent models.

inputs target inputs target

63

Pruning Two SplitsTraining Data Validation Data

3

2

1

ModelComplexity

Prune two splits from the maximal tree,…

inputs target inputs target

continued...

64

Pruning Two SplitsTraining Data Validation Data

3

2

1

ModelComplexity

…rate each subtree using validation assessment, and…

inputs target inputs target

continued...

65

Pruning Two SplitsTraining Data Validation Data

3

2

1

ModelComplexity

…select the subtree with the best assessment rating.

inputs target inputs target

66

Subsequent PruningTraining Data Validation Data

ModelComplexity

Continue pruning until all subtrees are considered.

inputs target inputs target

67

Selecting the Best Tree Training Data Validation Data

ModelComplexity

ValidationAssessment

Compare validation assessment between tree complexities.

inputs target inputs target

68

Validation AssessmentTraining Data Validation Data

Choose the simplest model with highest validation assessment.

ModelComplexity

ValidationAssessment

inputs target inputs target

69

Validation AssessmentTraining Data Validation Data

What are appropriate validation assessmentratings?

inputs target inputs target

70

Assessment Statistics

inputs target

Validation Data

target measurement (binary, continuous, and so on)

prediction type (decisions, rankings, estimates)

Ratings depend on…

71

inputs

Binary Targets

primary outcomesecondary outcome

target

1

0

1

1

0

72

inputs

Binary Target Predictions

target

1

0

1

1

0

prediction

primary

secondary

0.249

720

520 rankings

estimates

decisions

73

inputs

Decision Optimization

target

1

0

1

1

0

prediction

0.249

720

520

primary

secondary

decisions

74

inputs

Decision Optimization – Accuracy

target

1

0

1

1

0

prediction

0.249

720

520

primary

secondary

true positive true positive

true negativetrue negative

Maximize accuracy: agreement between outcome and prediction

75

inputs

Decision Optimization – Misclassification

target

1

0

1

1

0

prediction

0.249

720

520

secondary

primarysecondary

primary

false negativefalse negative

false positivefalse positive

Minimize misclassification: disagreement between outcome and prediction

76

inputs

Ranking Optimization

target

1

0

1

1

0

prediction

0.249

720

520

secondary

primary

1

0

720

520 rankings

estimates

decisions

77

inputs

Ranking Optimization – Concordance

target

1

0

1

1

0

prediction

0.249

720

520

secondary

primary

1

0

720

520

Maximize concordance: proper ordering of primary and secondary outcomes

target=0→low score target=1→high scoretarget=0→low score target=1→high score

78

inputs

Ranking Optimization – Discordance

target

1

0

1

1

0

prediction

0.249

secondary

primary

0

1

720

520

target=0→high scoretarget=1→low scoretarget=0→high scoretarget=1→low score

Minimize discordance: improper ordering of primary and secondary outcomes

720

520

79

inputs

Estimate Optimization

target

1

0

1

1

0

prediction

0.249

secondary

primary

720

520

1 0.249

rankings

estimates

decisions

80

inputs

Estimate Optimization – Squared Error

target

1

0

1

1

0

prediction

0.249

secondary

primary

720

520

1 0.249 (target – estimate)2(target – estimate)2

Minimize squared error:squared difference between target and prediction

81

inputs

Complexity Optimization – Summary

target

1

0

1

1

0

prediction

0.249

secondary

primary

720

520concordance / discordance

squared error

accuracy / misclassification

rankings

estimates

decisions

82

4.03 QuizWhat are some target variables that you might encounter that would require optimizing on… accuracy/misclassification? concordance/discordance? average squared error?

83

Statistical Graphs

ROC Curves

Gains and Lift Charts

84

Decision Matrix

TrueNegative

FalsePositive

FalseNegative

TruePositive

ActualNegative

PredictedNegative

PredictedPositive

ActualPositive

Predicted ClassA

ctua

l Cla

ss 0

1

0 1

85

Sensitivity

TruePositive

PredictedPositive

ActualPositive

Predicted ClassA

ctua

l Cla

ss 0

1

0 1

86

Positive Predicted Value

TruePositive

PredictedPositive

ActualPositive

Predicted ClassA

ctua

l Cla

ss 0

1

0 1

87

Specificity

TrueNegative

ActualNegative

PredictedNegative

Predicted ClassA

ctua

l Cla

ss 0

1

0 1

88

Negative Predicted Values

TrueNegative

ActualNegative

PredictedNegative

Predicted ClassA

ctua

l Cla

ss 0

1

0 1

89

ROC Curve

90

Gains Chart

91

Catalog Case Study: Steps to Build a Decision Tree1. Add the CATALOG2010 data source to the diagram.

2. Use the Data Partition node to split the data into training and validation data sets.

3. Use the Decision Tree node to select useful inputs.

4. Use the Model Comparison node to generate model assessment statistics and plots.

92

Constructing a Decision Tree Predictive Model

Catalog Case Study

Task: Construct a decision tree model.

93

Chapter 4: Predictive Modeling

4.1 Introduction to Predictive Modeling

4.2 Predictive Modeling Using Decision Trees

4.3 Predictive Modeling Using Logistic 4.3 Predictive Modeling Using Logistic RegressionRegression

4.4 Churn Case Study

4.5 A Note about Model Management

4.6 Recommended Reading

94

Objectives Explain the concepts of logistic regression. Discuss modeling strategies for building a

predictive model. Fit a predictive logistic regression model in

SAS Enterprise Miner.

95

Modeling Essentials – Regressions

Determine type of prediction.

Select useful inputs.

Optimize complexity.

97

Simple Linear Regression Model

Regression Best Fit Line

98

Linear Regression Prediction Formula

parameterestimate

inputmeasurement

interceptestimate

= β0 + β1 x1 + β2 x2 ^ ^ ^y prediction

estimate^

Choose intercept and parameter estimates to minimize:

∑( yi – yi )2

trainingdata

^squared error function

99

Binary Target

Linear regression does not work, because whatever the form of the equation, the results are generally unbounded.

Instead, you work with the probability p that the event will occur rather than a direct classification.

100

Odds Instead of ProbabilityConsider the probability p of an event (such as a horse losing a race) occurring.

The probability of the event not occurring is 1-p.

The odds of the event happening are p:(1-p), although you more commonly express this as integers, such as a 19-to-1 long shot at the race track.

The ratio19:1 means that the horse has one chance of winning for 19 chances of losing, or the probability of winning is 1/(19+1) = 5%.

1loss

win

p podds

p p

101

Properties of Odds and Log Odds

Odds is not symmetric, varying from 0 to infinity.

Odds is 1 when the probability is 50%.

Log Odds is symmetric, going from minus infinity to positive infinity, like a line.

Log Odds is 0 when the probability is 50%.

It is highly negative for low probabilities and highly positive for high probabilities.

Properties of Odds versus Log Odds

-5

-4

-3

-2

-1

0

1

2

3

4

5

0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%

odds

log odds

102

Logistic Regression Prediction Formula

= β0 + β1 x1 + β2 x2 ^ ^ ^

logit scores

^log

p

1 – p( )^

103

Logit Link Function

logitlink function

0 1

5

-5

The logit link function transforms probabilities (between 0 and 1) to logit scores (between −∞ and +∞).

^log

p

1 – p( )^

logit scores= β0 + β1 x1 + β2 x2 ^ ^ ^

104

Logit Link Function

^log

p

1 – p( )^

1

1 + e-logit( p )p = ^^

^logit( p )

To obtain prediction estimates, the logit equation is solved for p. ^

== β0 + β1 x1 + β2 x2 ^ ^ ^

105

4.04 PollLinear regression on a binary target is a problem because predictions can range outside of 0 and 1.

Yes

No

106

4.04 Poll – Correct AnswerLinear regression on a binary target is a problem because predictions can range outside of 0 and 1.

Yes

No

107

Simple Prediction Illustration – Regressions

0.0 0.50.1 0.2 0.3 0.4 0.6 0.7 0.8 0.9 1.0

x1

0.0

0.5

0.1

0.2

0.3

0.4

0.6

0.7

0.8

0.9

1.0

x2

Predict dot color for each x1 and x2.

Need intercept and parameter estimates.

= β0 + β1 x1 + β2 x2 ^ ^ ^logit( p ) ^

108

Simple Prediction Illustration – Regressions

0.0 0.50.1 0.2 0.3 0.4 0.6 0.7 0.8 0.9 1.0

x1

0.0

0.5

0.1

0.2

0.3

0.4

0.6

0.7

0.8

0.9

1.0

x2

log-likelihood function

Find parameter estimates by maximizing.

= β0 + β1 x1 + β2 x2 ^ ^ ^logit( p ) ^

109

Simple Prediction Illustration – Regressions

0.0 0.50.1 0.2 0.3 0.4 0.6 0.7 0.8 0.9 1.0

x1

0.0

0.5

0.1

0.2

0.3

0.4

0.6

0.7

0.8

0.9

1.0

x2

0.40

0.50

0.60

0.70

Using the maximum likelihood estimates, the prediction formula assigns a logit score to each x1 and x2.

110

Regressions: Beyond the Prediction Formula

Manage missing values.

Interpret the model.

Account for nonlinearities.

Handle extreme or unusual values.

Use nonnumeric inputs.

111

Regressions: Beyond the Prediction Formula

Manage missing values.

Interpret the model.

Account for nonlinearities.

Handle extreme or unusual values.

Use nonnumeric inputs.

112

Missing Values and Regression Modeling

Training Datatargetinputs

Problem 1: Training data cases with missing values on inputs used by a regression model are ignored.

113

Missing Values and Regression Modeling

Consequence: Missing values can significantly reduce your amount of training data for regression modeling!

Training Datatargetinputs

114

Missing Values and the Prediction Formula

Predict: (x1, x2) = (0.3, ? )

Problem 2: Prediction formulas cannot score cases with missing values.

115

Missing Values and the Prediction Formula

Problem 2: Prediction formulas cannot score cases with missing values.

116

Missing Value Issues

Manage missing values.

Problem 2: Prediction formulas cannot score cases with missing values.

Problem 1: Training data cases with missing valueson inputs used by a regression model are ignored.

117

Missing Value Causes

Manage missing values.

Non-applicable measurement

No match on merge

Non-disclosed measurement

118

Missing Value Remedies

Manage missing values.

xi = f(x1, … ,xp)

Non-applicable measurement

No match on merge

Non-disclosed measurement

119

4.05 PollObservations with missing values should always be deleted from scoring because a predicted value cannot be determined.

Yes

No

120

4.05 Poll – Correct AnswerObservations with missing values should always be deleted from scoring because a predicted value cannot be determined.

Yes

No

121

Predictionformula

Modeling Essentials – Regressions

Best modelfrom sequence

Sequentialselection

Determine type of predictions.

Select useful inputs

Optimize complexity.

Select useful inputs.

122

Variable Redundancy

123

X1

X3

X4

X6

X8

X9

X10

X1

X2

X3

X4

X5

X6

X7

X8

X9

X10

Variable Clustering

Inputs are selected bycluster representationexpert opiniontarget correlation.

X1

X3

X4

X6

X8

X9

X10

124

Selection by 1 – R2 Ratio

Own Cluster

Next Closest

R2 = 0.90

R2 = 0.01

1-R 2

own cluster

1-R 2

next closest

1 – 0.90

1 – 0.01= = 0.101

X2

125

Predictionformula

Modeling Essentials – Regressions

Best modelfrom sequence

Determine type of prediction.

Select useful inputs

Optimize complexity.

Select useful inputs. Sequentialselection

126

Sequential Selection – Forward

Entry CutoffInput p-value

127

Sequential Selection – Forward

Entry CutoffInput p-value

128

Sequential Selection – Forward

Entry CutoffInput p-value

129

Sequential Selection – Forward

Entry CutoffInput p-value

130

Sequential Selection – Forward

Entry CutoffInput p-value

131

Sequential Selection – Forward

Entry CutoffInput p-value

132

Sequential Selection – Backward

Stay CutoffInput p-value

133

Sequential Selection – Backward

Stay CutoffInput p-value

134

Sequential Selection – Backward

Stay CutoffInput p-value

135

Sequential Selection – Backward

Stay CutoffInput p-value

136

Sequential Selection – Backward

Stay CutoffInput p-value

137

Sequential Selection – Backward

Stay CutoffInput p-value

138

Sequential Selection – Backward

Stay CutoffInput p-value

139

Sequential Selection – Backward

Stay CutoffInput p-value

140

Sequential Selection – Backward

Stay CutoffInput p-value

141

Sequential Selection – Stepwise

Input p-value Entry Cutoff

Stay Cutoff

142

Sequential Selection – Stepwise

Input p-value Entry Cutoff

Stay Cutoff

143

Sequential Selection – Stepwise

Input p-value Entry Cutoff

Stay Cutoff

144

Sequential Selection – Stepwise

Input p-value Entry Cutoff

Stay Cutoff

145

Sequential Selection – Stepwise

Input p-value Entry Cutoff

Stay Cutoff

146

Sequential Selection – Stepwise

Input p-value Entry Cutoff

Stay Cutoff

147

4.06 PollDifferent model selection methods often result in different candidate models. No one method is uniformly the best.

Yes

No

148

4.06 Poll – Correct AnswerDifferent model selection methods often result in different candidate models. No one method is uniformly the best.

Yes

No

149

Modeling Essentials – Regressions

Determine type of prediction.

Select useful inputs.

Optimize complexity.

Predictionformula

Variable clusteringand selection

150

Model Fit versus Complexity

1 2 3 4 5 6

Model fit statistic

training

validation

151

Select Model with Optimal Validation Fit

1 2 3 4 5 6

Model fit statistic

Evaluate eachsequence step.

152

Beyond the Prediction Formula

Manage missing values.

Interpret the model.

Account for nonlinearities.

Handle extreme or unusual values.

Use nonnumeric inputs.

153

Interpretation

x1

x2 x1

x2

Unit change in x2

2 change in logit

logit(p) p

100(exp(2)-1)%change in the odds

154

Odds Ratio from a Logistic Regression ModelEstimated logistic regression model:

logit(p) = .7567 + .4373*(gender)

Estimated odds ratio (Females to Males):

odds ratio = (e-.7567+.4373)/(e-.7567) = 1.55

An odds ratio of 1.55 means that females have 1.55 times the odds of having the outcome compared to males.

155

Properties of the Odds Ratio

Group in denominatorhas higher odds of the event.

Group in numeratorhas higher odds of the event.

No Association

0 1

156

Beyond the Prediction Formula

Manage missing values.

Interpret the model.

Account for nonlinearities.

Handle extreme or unusual values.

Use nonnumeric inputs.

157

Extreme Distributions and Regressions

high leverage pointsskewed inputdistribution

Original Input Scale

158

Extreme Distributions and Regressions

high leverage pointsskewed inputdistribution

true association

true association

Original Input Scale

159

Extreme Distributions and Regressions

high leverage pointsskewed inputdistribution

standard regression

true association

standard regression

true association

Original Input Scale

160

Extreme Distributions and Regressions

high leverage pointsskewed inputdistribution

standard regression

true association

standard regression

true association

Original Input Scale

more symmetricdistribution

Regularized Scale

161

Original Input Scale

Regularizing Input Transformations

more symmetricdistribution

Regularized Scale

standard regression

standard regression

Original Input Scale

high leverage pointsskewed inputdistribution

162

Regularizing Input TransformationsRegularized Scale

standard regression

standard regression

Original Input Scale

regularized estimate

regularized estimate

true association

true association

163

Idea ExchangeWhat are examples of variables with unusual distributions that could produce problems in a regression model? Would you transform these variables? If so, what types of transformations would you entertain?

164

Beyond the Prediction Formula

Manage missing values.

Interpret the model.

Account for nonlinearities.

Handle extreme or unusual values.

Use nonnumeric inputs.

165

Nonnumeric Input Coding

Level

1 0

DA DB

0 1AB

Two-level variable:

Level

1 00 1

AB

DA DB

Coding redundancy:

166

Nonnumeric Input Coding: Many Levels

Level DI

1 0 0 0 0 0 0 0

DA DB DC DD DE DF DG DH

0

0 0 0 1 0 0 0 0

0 1 0 0 0 0 0 00 0 1 0 0 0 0 0

0 0 0 0 1 0 0 00 0 0 0 0 1 0 00 0 0 0 0 0 1 00 0 0 0 0 0 0 10 0 0 0 0 0 0 0

00000001

ABCDEFGHI

167

DI

000000001

DI

000000001

Coding Redundancy: Many Levels

Level

1 0 0 0 0 0 0 0

DA DB DC DD DE DF DG DH

0 0 0 1 0 0 0 0

0 1 0 0 0 0 0 00 0 1 0 0 0 0 0

0 0 0 0 1 0 0 00 0 0 0 0 1 0 00 0 0 0 0 0 1 00 0 0 0 0 0 0 10 0 0 0 0 0 0 0

ABCDEFGHI

168

DI

000000001

Coding Consolidation

Level

1 0 0 0 0 0 0 0

DA DB DC DD DE DF DG DH

0 0 0 1 0 0 0 0

0 1 0 0 0 0 0 00 0 1 0 0 0 0 0

0 0 0 0 1 0 0 00 0 0 0 0 1 0 00 0 0 0 0 0 1 00 0 0 0 0 0 0 10 0 0 0 0 0 0 0

ABCDEFGHI

169

DI

000000001

Coding Consolidation

Level

1 0 0 0 0 0 0 0

DABCD DB DC DD DEF DF DGH DH

1 0 0 1 0 0 0 0

1 1 0 0 0 0 0 01 0 1 0 0 0 0 0

0 0 0 0 1 0 0 00 0 0 0 1 1 0 00 0 0 0 0 0 1 00 0 0 0 0 0 1 10 0 0 0 0 0 0 0

ABCDEFGHI

170

Beyond the Prediction Formula

Manage missing values.

Interpret the model.

Account for nonlinearities.

Handle extreme or unusual values.

Use nonnumeric inputs.

171

Standard Logistic Regression

= w0 + w1 x1 + w2 x2 ^

^ ^ ^log p

1 – p( )^ · ·

0.0 0.50.1 0.2 0.3 0.4 0.6 0.7 0.8 0.9 1.0

x1

0.0

0.5

0.1

0.2

0.3

0.4

0.6

0.7

0.8

0.9

1.0

x2

172

Polynomial Logistic Regression

= w0 + w1 x1 + w2 x2 ^

^ ^ ^log p

1 – p( )^ · ·

quadratic terms

+ w3 x1 + w4 x2 2 2^ ^

+ w5 x1 x2

0.0 0.50.1 0.2 0.3 0.4 0.6 0.7 0.8 0.9 1.0

x1

0.0

0.5

0.1

0.2

0.3

0.4

0.6

0.7

0.8

0.9

1.0

x2

0.40 0.50 0.60 0.700.30

0.60

0.70

0.80

173

Idea ExchangeWhat are some predictors that you can think of that would have a nonlinear relationship with a target? What do you think the functional form of the relationship is (for example, quadratic, exponential, …)?

174

Catalog Case StudyAnalysis Goal:

A mail-order catalog retailer wants to save money on mailing and increase revenue by targeting mailed catalogs to customers most likely to purchase in the future.

Data set: CATALOG2010

Number of rows: 48,356

Number of columns: 98

Contents: sales figures summarized across departments and quarterly totals for 5.5 years of sales

Targets: RESPOND (binary)

ORDERSIZE (continuous)

175

Fitting a Logistic Regression Model

Catalog Case Study

Task: Build a logistic regression model in SAS Enterprise Miner.

176

Catalog Case Study: Steps to Build a Logistic Regression Model1. Add the CATALOG2010 data source to the diagram.

2. Use the Data Partition node to split the data into training and validation data sets.

3. Use the Variable Clustering node to select relatively independent inputs.

4. Use the Regression node to select relevant inputs.

5. Use the Model Comparison node to generate model assessment statistics and plots.

In the previous example, you performed steps 1 and 2.

177

Chapter 4: Predictive Modeling

4.1 Introduction to Predictive Modeling

4.2 Predictive Modeling Using Decision Trees

4.3 Predictive Modeling Using Logistic Regression

4.4 Churn Case Study4.4 Churn Case Study

4.5 A Note about Model Management

4.6 Recommended Reading

178

Objectives Formulate an objective for predictive churn in a

telecommunications example. Generate predictive models in SAS Enterprise Miner

to predict churn. Score a customer database to target who is most likely

to churn.

179

Telecommunications Company Mobile (prepaid and postpaid)

and fixed service provider. In recent years, a high

percentage of high revenue subscribers have churned.

Company wants to target subscribers with a high churn probability for its customer retention program.

180

Churn Score A churn propensity score measures the propensity for

an active customer to churn. The score enables marketing managers to take

proactive steps to retain targeted customers before churn occurs.

Churn scores are derived from analysis of the historical behavior of churned customers and existing customers who have not churned.

181

Possible Predictor Variables Outstanding bill value Outstanding balance period Number of calls Call duration (international, local, national calls) Period as customer Total dropped calls Total failed calls

182

Model Implementation

inputs predictions

Predictions might be added to a data source inside or outside of SAS Enterprise Miner.

183

Churn Case Study1. Examine the CHURN_TELECOM data set and add it

to a diagram.

2. Partition the data in training and validation data sets.

3. Perform missing value imputation.

4. Recode nominal variables to combine class levels.

5. Reduce redundancy with variable clustering.

6. Reduce irrelevant inputs with a decision tree and a logistic regression. Compare results and select the final model based on validation error.

7. Score a data set to generate the list of churn risk customers.

184

Analyzing Churn Data

Churn Case Study

Task: Analyze churn data.

185

Chapter 4: Predictive Modeling

4.1 Introduction to Predictive Modeling

4.2 Predictive Modeling Using Decision Trees

4.3 Predictive Modeling Using Logistic Regression

4.4 Churn Case Study

4.5 A Note about Model Management 4.5 A Note about Model Management

4.6 Recommended Reading

186

Objectives Discuss the movement of analytics from the “back

office” to the executive level and the reasons for these changes.

Describe the three-way pull for model management. Explain why models must be maintained and

reassessed over time.

187

Model Management and Business AnalyticsModel management is the assessment, deployment, and continued modification of models. This is a critical business process. Demonstrate that the model is well developed. Verify that the model is working well. Perform outcomes analysis.

Model management requires a collaborative effort across the company: VP Decision Analysis and Support Group, Senior Modeling

Analyst, Enterprise Architect, Internal Validation Compliance Analyst, Database Administrator

188

Analytical Model Management Challenges

Proliferation of Data and Models

Largely Manual ProcessesMoving to Production

Increased RegulationSarbanes-Oxley, Basel II

ActionableInferences

Integrating withOperational Systems

189

Three-Way Pull for Model Management

Business Value

GovernanceProcess

ProductionProcess

190

Three-Way Pull for Model ManagementBusiness Value Deployment of the “best” models Consistent model development and validation Understanding of model strategy and lifetime value

Production Process Efficient deployment of models in a timely manner Effective deployment to minimize operational risk

Governance Process Audit trails for compliance purposes Justification for management and shareholders

191

Changes in the Analytical Landscape

Analytical Modelers

Management

IT Ops

Data Integrators

Business

Governance

STAKEHOLDERSNow…

CustomerService

Retail

Logistics

Promotions

OPERATIONS TARGET

Customers

Stockholders

Suppliers

Employees

192

Model ManagementAs models proliferate, you need:

To be more diligent, but… There is not an established process to handle model

deployment into production. Model deployment is inefficient. More individuals and groups in the organization must be

involved in the process.

To be more vigilant, but… It is difficult to effectively manage existing models and track

the model life cycle. It is difficult to consistently provide appropriate internal and

regulatory documentation.

193

Idea ExchangeHow can you implement model management in your organization? Do you already have systems in place for continuous improvement and monitoring of models? For audit trails and compliance checks? Describe briefly how they operate.

194

Lessons LearnedModel management is a key part of good business analytics. Models should be evaluated before, during, and after

deployment. New models replace old ones as dictated by the data

over time.

195

Chapter 4: Predictive Modeling

4.1 Introduction to Predictive Modeling

4.2 Predictive Modeling Using Decision Trees

4.3 Predictive Modeling Using Logistic Regression

4.4 Churn Case Study

4.5 A Note about Model Management

4.6 Recommended Reading4.6 Recommended Reading

196

Recommended Reading Davenport, Thomas H., Jeanne G. Harris, and Robert Morison. 2010. Analytics at Work: Smarter Decisions, Better Results. Boston: Harvard Business Press. Chapters 7 and 8

– Chapters 7 and 8 focus on making analytics an integral part of a business. Systems, processes, and organizational culture must work together to move toward analytical leadership. The remaining three chapters of the book (9-11) are optional, self-study material.

197

Recommended ReadingMay, Thornton. 2010. The New Know: Innovation Powered by Analytics. New York: Wiley. Chapter 1

– May’s book provides a counterpoint to the Davenport, et al. book, from the perspective of the role of analysts in the organization, and how organizations can make the best use of their analytical talent.

198

Recommended ReadingMorris, Michael. “Mining Student Data Could Save Lives.” The Chronicle of Higher Education. October 2, 2011. http://chronicle.com/article/Mining-Student-Data-Could-Save/129231/This article discusses the mining of student data at colleges and universities to prevent large-scale acts of violence on campus. Mining of students’ data (including Internet usage and social networking data), would enhance the capacity of threat-assessment teams to protect the health and safety of the students.

Recommended