1 Chapter 4: Predictive Modeling 4.1 Introduction to Predictive Modeling 4.2 Predictive Modeling...

Chapter 4: Predictive Modeling

4.1 Introduction to Predictive Modeling

4.2 Predictive Modeling Using Decision Trees

4.3 Predictive Modeling Using Logistic Regression

4.4 Churn Case Study

4.5 A Note about Model Management

4.6 Recommended Reading

4.1 Introduction to Predictive Modeling4.1 Introduction to Predictive Modeling

Objectives Explain the concepts of predictive modeling. Illustrate the modeling essentials of a predictive

model. Explain the importance of data partitioning.

Catalog Case StudyAnalysis Goal:

A mail-order catalog retailer wants to save money on mailing and increase revenue by targeting mailed catalogs to customers most likely to purchase in the future.

Data set: CATALOG2010

Number of rows: 48,356

Number of columns: 98

Contents: sales figures summarized across departments and quarterly totals for 5.5 years of sales

Targets: RESPOND (binary)

ORDERSIZE (continuous)

Where You’ve Been, Where You’re Going… With basic descriptive modeling techniques (RFM), you

identified customers who might be profitable. Sophisticated predictive modeling techniques can

produce risk scores for current customers, profitable prospects from outside the customer database, cross-sell and up-sell lists, and much more.

Scoring techniques based on predictive models can be implemented in real-time data collection systems, automating the process of fact-based decision making.

Descriptive Modeling Tells You about NowDescriptive statistics inform you about your sample. This information is important for reacting to things that have happened in the past.

Past BehaviorFact-Based

Reports Current State of

the Customer

From Descriptive to Predictive ModelingPredictive modeling techniques, paired with scoring and good model management, enable you to use your data about the past and the present to make good decisions for the future.

Fact-Based PredictionsPast Behavior

Predictive Modeling Terminology

The observations in a training data set are known as training cases.

The variables are called inputs and targets.

inputs target

Training Data Set

Predictive Model

Predictive model: a concise representation of the input and target association

Training Data Setinputs target

Predictive Model

predictions

Predictions: output of the predictive model given a set of input measurements

inputs

Modeling Essentials

Determine type of prediction.

Select useful inputs.

Optimize complexity.

Modeling Essentials

Three Prediction Types

rankings

estimates

decisionspredictioninputs

Decision Predictions

A predictive model usesinput measurementsto make the best decision for each case.

primary

secondary

primary

tertiary

inputs prediction

Ranking Predictions

A predictive model usesinput measurementsto optimally rank each case.

prediction

inputs

Estimate Predictions

A predictive model usesinput measurementsto optimally estimate the target value.

prediction

inputs

Idea ExchangeThink of two or three business problems that would require each of the three types of prediction. What would require a decision? How would you obtain

information to help you in making a decision based on a model score?

What would require a ranking? How would you use this ranking information?

What would require an estimate? Would you estimate a continuous quantity, a count, a proportion, or some other quantity?

Modeling Essentials – Predict Review

Determine type of prediction. Decide, rank,and estimate.

Modeling Essentials

Input Reduction Strategies

Irrelevancy

Redundancy

Irrelevancy

Input Reduction – Redundancy

Redundancy

Input x2 has the same information as input x1.

Example: x1 is household income and x2 is home value.

Redundancy

Input Reduction – IrrelevancyIrrelevancy

Predictions change with input x4 but much

less with input x3.

Example: Target is response to direct mail solicitation, x3 is religious affiliation, and x4 is response to previous solicitations.

Modeling Essentials – Select Review

Eradicateredundancies

and irrelevancies.

Decide, rank,and estimate.

Modeling Essentials

Optimize complexityOptimize complexity.

Data PartitioningTraining Data Validation Data

Partition available data into training and validation sets.

The model is fit on the training data set, and model performance is evaluated on the validation data set.

inputs target inputs target

Predictive Model SequenceTraining Data Validation Data

Create a sequence of models with increasing complexity.

ModelComplexity

Model Performance AssessmentTraining Data Validation Data

ModelComplexity

ValidationAssessment

Rate model performance using validation data.

Model SelectionTraining Data Validation Data

ModelComplexity

Select the simplest model with the highest validation assessment.

4.01 Multiple Choice PollThe best model is the

a. simplest model with the best performance on the training data.

b. simplest model with the best performance on the validation data.

c. most complex model with the best performance on the training data.

d. most complex model with the best performance on the validation data.

4.01 Multiple Choice Poll – Correct AnswerThe best model is the

a. simplest model with the best performance on the training data.

b. simplest model with the best performance on the validation data.

c. most complex model with the best performance on the training data.

d. most complex model with the best performance on the validation data.

Modeling Essentials – Optimize Review

Eradicateredundancies

and irrelevancies.

Decide, rank,and estimate.

Tune models withvalidation data.

4.2 Predictive Modeling Using Decision Trees 4.2 Predictive Modeling Using Decision Trees

Objectives Explain the concept of decision trees. Illustrate the modeling essentials of decision trees. Construct a decision tree predictive model in

SAS Enterprise Miner.

Modeling Essentials – Decision Trees

Simple Prediction Illustration

0.0 0.50.1 0.2 0.3 0.4 0.6 0.7 0.8 0.9 1.0

Predict dot color for each x1 and x2.

Training Data

Decision Tree Prediction Rules

0.0 0.50.1 0.2 0.3 0.4 0.6 0.7 0.8 0.9 1.0

60%55%

<0.52 ≥0.52 <0.51 ≥0.51x1

<0.63 ≥0.63

root node

interior node

leaf node

0.0 0.50.1 0.2 0.3 0.4 0.6 0.7 0.8 0.9 1.0

60%55%

<0.52 ≥0.52

<0.51 ≥0.51x1

≥0.63

root node

interior node

leaf node

Predict:

≥0.51

60%55%

<0.52 ≥0.52

60%55%

<0.52 ≥0.52 ≥0.51

0.0 0.50.1 0.2 0.3 0.4 0.6 0.7 0.8 0.9 1.0

Decision = Estimate = 0.70

<0.51x1

≥0.63

Predict:

Prediction rulesDetermine type of prediction.

Pruning

Split searchSelect useful inputs

Decision Tree Split Search

0.0 0.50.1 0.2 0.3 0.4 0.6 0.7 0.8 0.9 1.0

Calculate the logworth of every partition on input x1.

left right

Classification Matrix

0.0 0.50.1 0.2 0.3 0.4 0.6 0.7 0.8 0.9 1.0

maxlogworth(x1)

0.52left right

Select the partition with the maximum logworth.

0.0 0.50.1 0.2 0.3 0.4 0.6 0.7 0.8 0.9 1.0

maxlogworth(x1)

left right

53%53% 42%42%

47%47% 58%58%

Repeat for input x2.

maxlogworth(x1)

left right

53%53% 42%42%

47%47% 58%58%

0.0 0.50.1 0.2 0.3 0.4 0.6 0.7 0.8 0.9 1.0

maxlogworth(x2)

bottom top

0.0 0.50.1 0.2 0.3 0.4 0.6 0.7 0.8 0.9 1.0

maxlogworth(x2)

bottom top

maxlogworth(x1)

left right

Compare partition logworth ratings.

0.0 0.50.1 0.2 0.3 0.4 0.6 0.7 0.8 0.9 1.0

x2<0.63 ≥0.63

Create a partition rule from the best partition across all inputs.

0.0 0.50.1 0.2 0.3 0.4 0.6 0.7 0.8 0.9 1.0

x2<0.63 ≥0.63

Repeat the process in each subset.

0.0 0.50.1 0.2 0.3 0.4 0.6 0.7 0.8 0.9 1.0

maxlogworth(x1)

left right

0.0 0.50.1 0.2 0.3 0.4 0.6 0.7 0.8 0.9 1.0

maxlogworth(x1)

left right

61%61% 55%55%

39%39% 45%45%

maxlogworth(x2)

bottom top

0.0 0.50.1 0.2 0.3 0.4 0.6 0.7 0.8 0.9 1.0

maxlogworth(x2)

bottom top

maxlogworth(x1)

left right

0.0 0.50.1 0.2 0.3 0.4 0.6 0.7 0.8 0.9 1.0

maxlogworth(x2)

bottom top

38%38% 55%55%

62%62% 45%45%

maxlogworth(x1)

left right

0.0 0.50.1 0.2 0.3 0.4 0.6 0.7 0.8 0.9 1.0

<0.63 ≥0.63

<0.52 ≥0.52

Create a second partition rule.

Repeat to form a maximal tree.

0.0 0.50.1 0.2 0.3 0.4 0.6 0.7 0.8 0.9 1.0

4.02 PollThe maximal tree is usually the tree that you use to score new data.

4.02 Poll – Correct AnswerThe maximal tree is usually the tree that you use to score new data.

Optimize complexityOptimize complexity.

Prediction rulesDetermine type of prediction.

Split searchSelect useful inputs.

Predictive Model SequenceTraining Data Validation Data

ModelComplexity

The Maximal TreeTraining Data Validation Data

ModelComplexity

1Maximal

A maximal tree is the most complex model in the sequence.

The Maximal TreeTraining Data Validation Data

ModelComplexity

A maximal tree is the most complex model in the sequence.

Pruning One SplitTraining Data Validation Data

ModelComplexity

Each subtree’s predictive performance is rated on validation data.

Pruning One SplitTraining Data Validation Data

ModelComplexity

The subtree with the highest validation assessment is selected.

Pruning Two SplitsTraining Data Validation Data

ModelComplexity

Similarly, this is done for subsequent models.

ModelComplexity

Prune two splits from the maximal tree,…

continued...

ModelComplexity

…rate each subtree using validation assessment, and…

continued...

ModelComplexity

…select the subtree with the best assessment rating.

Subsequent PruningTraining Data Validation Data

ModelComplexity

Continue pruning until all subtrees are considered.

Selecting the Best Tree Training Data Validation Data

ModelComplexity

Compare validation assessment between tree complexities.

Validation AssessmentTraining Data Validation Data

Choose the simplest model with highest validation assessment.

ModelComplexity

Validation AssessmentTraining Data Validation Data

What are appropriate validation assessmentratings?

Assessment Statistics

inputs target

Validation Data

target measurement (binary, continuous, and so on)

prediction type (decisions, rankings, estimates)

Ratings depend on…

inputs

Binary Targets

primary outcomesecondary outcome

target

inputs

Binary Target Predictions

target

prediction

primary

secondary

520 rankings

estimates

decisions

inputs

Decision Optimization

target

prediction

primary

secondary

decisions

inputs

Decision Optimization – Accuracy

target

prediction

primary

secondary

true positive true positive

true negativetrue negative

Maximize accuracy: agreement between outcome and prediction

inputs

Decision Optimization – Misclassification

target

prediction

secondary

primarysecondary

primary

false negativefalse negative

false positivefalse positive

Minimize misclassification: disagreement between outcome and prediction

inputs

Ranking Optimization

target

prediction

secondary

primary

520 rankings

estimates

decisions

inputs

Ranking Optimization – Concordance

target

prediction

secondary

primary

Maximize concordance: proper ordering of primary and secondary outcomes

target=0→low score target=1→high scoretarget=0→low score target=1→high score

inputs

Ranking Optimization – Discordance

target

prediction

secondary

primary

target=0→high scoretarget=1→low scoretarget=0→high scoretarget=1→low score

Minimize discordance: improper ordering of primary and secondary outcomes

inputs

Estimate Optimization

target

prediction

secondary

primary

1 0.249

rankings

estimates

decisions

inputs

Estimate Optimization – Squared Error

target

prediction

secondary

primary

1 0.249 (target – estimate)2(target – estimate)2

Minimize squared error:squared difference between target and prediction

inputs

Complexity Optimization – Summary

target

prediction

secondary

primary

520concordance / discordance

squared error

accuracy / misclassification

rankings

estimates

decisions

4.03 QuizWhat are some target variables that you might encounter that would require optimizing on… accuracy/misclassification? concordance/discordance? average squared error?

Statistical Graphs

ROC Curves

Gains and Lift Charts

Decision Matrix

TrueNegative

FalsePositive

FalseNegative

TruePositive

ActualNegative

PredictedNegative

PredictedPositive

ActualPositive

Predicted ClassA

Sensitivity

TruePositive

PredictedPositive

ActualPositive

Predicted ClassA

Positive Predicted Value

TruePositive

PredictedPositive

ActualPositive

Predicted ClassA

Specificity

TrueNegative

ActualNegative

PredictedNegative

Predicted ClassA

Negative Predicted Values

TrueNegative

ActualNegative

PredictedNegative

Predicted ClassA

ROC Curve

Gains Chart

Catalog Case Study: Steps to Build a Decision Tree1. Add the CATALOG2010 data source to the diagram.

2. Use the Data Partition node to split the data into training and validation data sets.

3. Use the Decision Tree node to select useful inputs.

4. Use the Model Comparison node to generate model assessment statistics and plots.

Constructing a Decision Tree Predictive Model

Catalog Case Study

Task: Construct a decision tree model.

4.3 Predictive Modeling Using Logistic 4.3 Predictive Modeling Using Logistic RegressionRegression

Objectives Explain the concepts of logistic regression. Discuss modeling strategies for building a

predictive model. Fit a predictive logistic regression model in

SAS Enterprise Miner.

Modeling Essentials – Regressions

Simple Linear Regression Model

Regression Best Fit Line

Linear Regression Prediction Formula

parameterestimate

inputmeasurement

interceptestimate

= β0 + β1 x1 + β2 x2 ^ ^ ^y prediction

estimate^

Choose intercept and parameter estimates to minimize:

∑( yi – yi )2

trainingdata

^squared error function

Binary Target

Linear regression does not work, because whatever the form of the equation, the results are generally unbounded.

Instead, you work with the probability p that the event will occur rather than a direct classification.

Odds Instead of ProbabilityConsider the probability p of an event (such as a horse losing a race) occurring.

The probability of the event not occurring is 1-p.

The odds of the event happening are p:(1-p), although you more commonly express this as integers, such as a 19-to-1 long shot at the race track.

The ratio19:1 means that the horse has one chance of winning for 19 chances of losing, or the probability of winning is 1/(19+1) = 5%.

p podds

Properties of Odds and Log Odds

Odds is not symmetric, varying from 0 to infinity.

Odds is 1 when the probability is 50%.

Log Odds is symmetric, going from minus infinity to positive infinity, like a line.

Log Odds is 0 when the probability is 50%.

It is highly negative for low probabilities and highly positive for high probabilities.

Properties of Odds versus Log Odds

0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%

log odds

Logistic Regression Prediction Formula

= β0 + β1 x1 + β2 x2 ^ ^ ^

logit scores

1 – p( )^

Logit Link Function

logitlink function

The logit link function transforms probabilities (between 0 and 1) to logit scores (between −∞ and +∞).

1 – p( )^

logit scores= β0 + β1 x1 + β2 x2 ^ ^ ^

Logit Link Function

1 – p( )^

1 + e-logit( p )p = ^^

^logit( p )

To obtain prediction estimates, the logit equation is solved for p. ^

== β0 + β1 x1 + β2 x2 ^ ^ ^

4.04 PollLinear regression on a binary target is a problem because predictions can range outside of 0 and 1.

4.04 Poll – Correct AnswerLinear regression on a binary target is a problem because predictions can range outside of 0 and 1.

Simple Prediction Illustration – Regressions

0.0 0.50.1 0.2 0.3 0.4 0.6 0.7 0.8 0.9 1.0

Predict dot color for each x1 and x2.

Need intercept and parameter estimates.

= β0 + β1 x1 + β2 x2 ^ ^ ^logit( p ) ^

0.0 0.50.1 0.2 0.3 0.4 0.6 0.7 0.8 0.9 1.0

log-likelihood function

Find parameter estimates by maximizing.

= β0 + β1 x1 + β2 x2 ^ ^ ^logit( p ) ^

0.0 0.50.1 0.2 0.3 0.4 0.6 0.7 0.8 0.9 1.0

Using the maximum likelihood estimates, the prediction formula assigns a logit score to each x1 and x2.

Regressions: Beyond the Prediction Formula

Manage missing values.

Interpret the model.

Account for nonlinearities.

Handle extreme or unusual values.

Use nonnumeric inputs.

Regressions: Beyond the Prediction Formula

Missing Values and Regression Modeling

Training Datatargetinputs

Problem 1: Training data cases with missing values on inputs used by a regression model are ignored.

Missing Values and Regression Modeling

Consequence: Missing values can significantly reduce your amount of training data for regression modeling!

Training Datatargetinputs

Missing Values and the Prediction Formula

Predict: (x1, x2) = (0.3, ? )

Problem 2: Prediction formulas cannot score cases with missing values.

Missing Values and the Prediction Formula

Missing Value Issues

Problem 1: Training data cases with missing valueson inputs used by a regression model are ignored.

Missing Value Causes

Non-applicable measurement

No match on merge

Non-disclosed measurement

Missing Value Remedies

xi = f(x1, … ,xp)

Non-applicable measurement

No match on merge

Non-disclosed measurement

4.05 PollObservations with missing values should always be deleted from scoring because a predicted value cannot be determined.

4.05 Poll – Correct AnswerObservations with missing values should always be deleted from scoring because a predicted value cannot be determined.

Predictionformula

Best modelfrom sequence

Sequentialselection

Determine type of predictions.

Select useful inputs

Variable Redundancy

Variable Clustering

Inputs are selected bycluster representationexpert opiniontarget correlation.

Selection by 1 – R2 Ratio

Own Cluster

Next Closest

R2 = 0.90

R2 = 0.01

own cluster

next closest

1 – 0.90

1 – 0.01= = 0.101

Predictionformula

Best modelfrom sequence

Select useful inputs

Select useful inputs. Sequentialselection

Sequential Selection – Forward

Entry CutoffInput p-value

Sequential Selection – Backward

Stay CutoffInput p-value

Sequential Selection – Stepwise

Input p-value Entry Cutoff

Stay Cutoff

4.06 PollDifferent model selection methods often result in different candidate models. No one method is uniformly the best.

4.06 Poll – Correct AnswerDifferent model selection methods often result in different candidate models. No one method is uniformly the best.

Predictionformula

Variable clusteringand selection

Model Fit versus Complexity

1 2 3 4 5 6

Model fit statistic

training

validation

Select Model with Optimal Validation Fit

1 2 3 4 5 6

Model fit statistic

Evaluate eachsequence step.

Beyond the Prediction Formula

Interpretation

Unit change in x2

2 change in logit

logit(p) p

100(exp(2)-1)%change in the odds

Odds Ratio from a Logistic Regression ModelEstimated logistic regression model:

logit(p) = .7567 + .4373*(gender)

Estimated odds ratio (Females to Males):

odds ratio = (e-.7567+.4373)/(e-.7567) = 1.55

An odds ratio of 1.55 means that females have 1.55 times the odds of having the outcome compared to males.

Properties of the Odds Ratio

Group in denominatorhas higher odds of the event.

Group in numeratorhas higher odds of the event.

No Association

Extreme Distributions and Regressions

high leverage pointsskewed inputdistribution

Original Input Scale

true association

standard regression

true association

standard regression

true association

standard regression

true association

standard regression

true association

more symmetricdistribution

Regularized Scale

Regularizing Input Transformations

more symmetricdistribution

Regularized Scale

standard regression

Regularizing Input TransformationsRegularized Scale

standard regression

regularized estimate

true association

Idea ExchangeWhat are examples of variables with unusual distributions that could produce problems in a regression model? Would you transform these variables? If so, what types of transformations would you entertain?

Nonnumeric Input Coding

Two-level variable:

1 00 1

Coding redundancy:

Nonnumeric Input Coding: Many Levels

Level DI

1 0 0 0 0 0 0 0

DA DB DC DD DE DF DG DH

0 0 0 1 0 0 0 0

0 1 0 0 0 0 0 00 0 1 0 0 0 0 0

0 0 0 0 1 0 0 00 0 0 0 0 1 0 00 0 0 0 0 0 1 00 0 0 0 0 0 0 10 0 0 0 0 0 0 0

00000001

ABCDEFGHI

000000001

Coding Redundancy: Many Levels

1 0 0 0 0 0 0 0

0 0 0 1 0 0 0 0

0 1 0 0 0 0 0 00 0 1 0 0 0 0 0

0 0 0 0 1 0 0 00 0 0 0 0 1 0 00 0 0 0 0 0 1 00 0 0 0 0 0 0 10 0 0 0 0 0 0 0

ABCDEFGHI

000000001

Coding Consolidation

1 0 0 0 0 0 0 0

0 0 0 1 0 0 0 0

0 1 0 0 0 0 0 00 0 1 0 0 0 0 0

0 0 0 0 1 0 0 00 0 0 0 0 1 0 00 0 0 0 0 0 1 00 0 0 0 0 0 0 10 0 0 0 0 0 0 0

ABCDEFGHI

000000001

Coding Consolidation

1 0 0 0 0 0 0 0

DABCD DB DC DD DEF DF DGH DH

1 0 0 1 0 0 0 0

1 1 0 0 0 0 0 01 0 1 0 0 0 0 0

0 0 0 0 1 0 0 00 0 0 0 1 1 0 00 0 0 0 0 0 1 00 0 0 0 0 0 1 10 0 0 0 0 0 0 0

ABCDEFGHI

Standard Logistic Regression

= w0 + w1 x1 + w2 x2 ^

^ ^ ^log p

1 – p( )^ · ·

0.0 0.50.1 0.2 0.3 0.4 0.6 0.7 0.8 0.9 1.0

Polynomial Logistic Regression

= w0 + w1 x1 + w2 x2 ^

^ ^ ^log p

1 – p( )^ · ·

quadratic terms

+ w3 x1 + w4 x2 2 2^ ^

+ w5 x1 x2

0.0 0.50.1 0.2 0.3 0.4 0.6 0.7 0.8 0.9 1.0

0.40 0.50 0.60 0.700.30

Idea ExchangeWhat are some predictors that you can think of that would have a nonlinear relationship with a target? What do you think the functional form of the relationship is (for example, quadratic, exponential, …)?

Catalog Case StudyAnalysis Goal:

A mail-order catalog retailer wants to save money on mailing and increase revenue by targeting mailed catalogs to customers most likely to purchase in the future.

Data set: CATALOG2010

Number of rows: 48,356

Number of columns: 98

Contents: sales figures summarized across departments and quarterly totals for 5.5 years of sales

Targets: RESPOND (binary)

ORDERSIZE (continuous)

Fitting a Logistic Regression Model

Catalog Case Study

Task: Build a logistic regression model in SAS Enterprise Miner.

Catalog Case Study: Steps to Build a Logistic Regression Model1. Add the CATALOG2010 data source to the diagram.

2. Use the Data Partition node to split the data into training and validation data sets.

3. Use the Variable Clustering node to select relatively independent inputs.

4. Use the Regression node to select relevant inputs.

5. Use the Model Comparison node to generate model assessment statistics and plots.

In the previous example, you performed steps 1 and 2.

4.4 Churn Case Study4.4 Churn Case Study

Objectives Formulate an objective for predictive churn in a

telecommunications example. Generate predictive models in SAS Enterprise Miner

to predict churn. Score a customer database to target who is most likely

to churn.

Telecommunications Company Mobile (prepaid and postpaid)

and fixed service provider. In recent years, a high

percentage of high revenue subscribers have churned.

Company wants to target subscribers with a high churn probability for its customer retention program.

Churn Score A churn propensity score measures the propensity for

an active customer to churn. The score enables marketing managers to take

proactive steps to retain targeted customers before churn occurs.

Churn scores are derived from analysis of the historical behavior of churned customers and existing customers who have not churned.

Possible Predictor Variables Outstanding bill value Outstanding balance period Number of calls Call duration (international, local, national calls) Period as customer Total dropped calls Total failed calls

Model Implementation

inputs predictions

Predictions might be added to a data source inside or outside of SAS Enterprise Miner.

Churn Case Study1. Examine the CHURN_TELECOM data set and add it

to a diagram.

2. Partition the data in training and validation data sets.

3. Perform missing value imputation.

4. Recode nominal variables to combine class levels.

5. Reduce redundancy with variable clustering.

6. Reduce irrelevant inputs with a decision tree and a logistic regression. Compare results and select the final model based on validation error.

7. Score a data set to generate the list of churn risk customers.

Analyzing Churn Data

Churn Case Study

Task: Analyze churn data.

4.5 A Note about Model Management 4.5 A Note about Model Management

Objectives Discuss the movement of analytics from the “back

office” to the executive level and the reasons for these changes.

Describe the three-way pull for model management. Explain why models must be maintained and

reassessed over time.

Model Management and Business AnalyticsModel management is the assessment, deployment, and continued modification of models. This is a critical business process. Demonstrate that the model is well developed. Verify that the model is working well. Perform outcomes analysis.

Model management requires a collaborative effort across the company: VP Decision Analysis and Support Group, Senior Modeling

Analyst, Enterprise Architect, Internal Validation Compliance Analyst, Database Administrator

Analytical Model Management Challenges

Proliferation of Data and Models

Largely Manual ProcessesMoving to Production

Increased RegulationSarbanes-Oxley, Basel II

ActionableInferences

Integrating withOperational Systems

Three-Way Pull for Model Management

Business Value

GovernanceProcess

ProductionProcess

Three-Way Pull for Model ManagementBusiness Value Deployment of the “best” models Consistent model development and validation Understanding of model strategy and lifetime value

Production Process Efficient deployment of models in a timely manner Effective deployment to minimize operational risk

Governance Process Audit trails for compliance purposes Justification for management and shareholders

Changes in the Analytical Landscape

Analytical Modelers

Management

IT Ops

Data Integrators

Business

Governance

STAKEHOLDERSNow…

CustomerService

Retail

Logistics

Promotions

OPERATIONS TARGET

Customers

Stockholders

Suppliers

Employees

Model ManagementAs models proliferate, you need:

To be more diligent, but… There is not an established process to handle model

deployment into production. Model deployment is inefficient. More individuals and groups in the organization must be

involved in the process.

To be more vigilant, but… It is difficult to effectively manage existing models and track

the model life cycle. It is difficult to consistently provide appropriate internal and

regulatory documentation.

Idea ExchangeHow can you implement model management in your organization? Do you already have systems in place for continuous improvement and monitoring of models? For audit trails and compliance checks? Describe briefly how they operate.

Lessons LearnedModel management is a key part of good business analytics. Models should be evaluated before, during, and after

deployment. New models replace old ones as dictated by the data

over time.

4.6 Recommended Reading4.6 Recommended Reading

Recommended Reading Davenport, Thomas H., Jeanne G. Harris, and Robert Morison. 2010. Analytics at Work: Smarter Decisions, Better Results. Boston: Harvard Business Press. Chapters 7 and 8

– Chapters 7 and 8 focus on making analytics an integral part of a business. Systems, processes, and organizational culture must work together to move toward analytical leadership. The remaining three chapters of the book (9-11) are optional, self-study material.

Recommended ReadingMay, Thornton. 2010. The New Know: Innovation Powered by Analytics. New York: Wiley. Chapter 1

– May’s book provides a counterpoint to the Davenport, et al. book, from the perspective of the role of analysts in the organization, and how organizations can make the best use of their analytical talent.

Recommended ReadingMorris, Michael. “Mining Student Data Could Save Lives.” The Chronicle of Higher Education. October 2, 2011. http://chronicle.com/article/Mining-Student-Data-Could-Save/129231/This article discusses the mining of student data at colleges and universities to prevent large-scale acts of violence on campus. Mining of students’ data (including Internet usage and social networking data), would enhance the capacity of threat-assessment teams to protect the health and safety of the students.

1 Chapter 4: Predictive Modeling 4.1 Introduction to Predictive Modeling 4.2 Predictive Modeling...

Documents

Predictive Modeling Applications in Actuarial Science: Volume 1, Predictive Modeling Techniques

Predictive Modeling of Overland Spill Using ArcGIS M. · Predictive Modeling of Overland Petrochemical Spill Trajectories Using ArcGIS by Kristen M. Mathieu A MAJOR PAPER SUBMITTED

Infographic: How Are P&C Insurers Using Predictive Modeling for Competitive Advantage?

Predictive Modeling

Predictive Modeling: Basics and Beyond - Global Health · PDF filePredictive Modeling: Basics and Beyond June 2009. ... Definition of Predictive Modeling “Predictive modeling is

Curso Web y Data Mining 3 Predictive Modeling Using Logistic Regresion

Predictive Modeling With Consumer Data - Member | … · using predictive modeling and con- ... COST/UNIT china 560 80 $10 $70 ... inpatient/outpatient/pharmaceutical, inpatient surgery,

Using Predictive Modeling to Target Interventions · Using Predictive Modeling to Target Interventions Barry P. Chaiken, MD, MPH Chief Medical Officer ABQAURP - PSOS. 2 Overview Cost

Completing Learning Support: Using Predictive Modeling to Determine Best Practices

Predictive Modeling of Terrorist Attacks Using Machine ... · Predictive Modeling of Terrorist Attacks Using Machine Learning 1Chaman Verma, 2Sarika Malhotra, 3Sharmila and 4Vin eeta

In Silico Predictive Modeling of CRISPR/Cas9 guide … · In Silico Predictive Modeling of CRISPR/Cas9 guide ... logistic regression. ... and all using classification rather than

Regrasping and Unfolding of Garments Using Predictive Thin ...yli/papers/YLI_ICRA2015_Final.pdfRegrasping and Unfolding of Garments Using Predictive Thin Shell Modeling Yinxiao Li

Predictive modeling in trucking how critical decisions are made using data

Introduction to Predictive Modeling Using GLMs › cas › annual14 › webprogram › Handout...Introduction to Predictive Modeling Using GLMs Dan Tevet, FCAS, MAAA, Liberty Mutual

Predictive Modeling Using Decision Trees

Predictive Modeling Using Logistic Regression Step-by-Step ... Modeling Using Logistic Regression Step-by-Step ... IntegrityM Predictive Modeling Using Logistic Regression in Excel

Preventing Churn Using Predictive Modeling€¦ · Predictive Modeling Alex Herbert Sales Manager James Cousins Sr. Statistical Analyst March 26th, 2019. Agenda •Who We Are •Implications

Predictive Modeling Using Logistic Regressiondocshare01.docshare.tips/files/18851/188513901.pdf · For Your Information v Course Description This course covers predictive modeling

Predictive Modeling in Reserving - Gross Consulting Modeling with Claim Analytics.pdf · Predictive Modeling in Reserve Analysis • It’s all predictive modeling isn’t it? •

approach Supervised Machine Learning Predictive Modeling using …jet.delhitechnicalcampus.ac.in/wp-content/uploads/sites/... · 2019-07-11 · Predictive Modeling using Supervised