Upload
others
View
2
Download
0
Embed Size (px)
Citation preview
Trim Size: 6in x 9in Verbeke ffirs.tex V1 - 09/01/2017 7:42am Page viii�
� �
�
Trim Size: 6in x 9in Verbeke ffirs.tex V1 - 09/01/2017 7:42am Page i�
� �
�
Profit-DrivenBusiness Analytics
Trim Size: 6in x 9in Verbeke ffirs.tex V1 - 09/01/2017 7:42am Page ii�
� �
�
Wiley & SAS BusinessSeries
The Wiley & SAS Business Series presents books that help senior-level
managers with their critical management decisions.
Titles in the Wiley & SAS Business Series include:
Analytics: The Agile Way by Phil Simon
Analytics in a Big Data World: The Essential Guide to Data Science and its
Applications by Bart Baesens
A Practical Guide to Analytics for Governments: Using Big Data for Good
by Marie Lowman
Bank Fraud: Using Technology to Combat Losses by Revathi Subrama-
nian
Big Data Analytics: Turning Big Data into Big Money by Frank Ohlhorst
Big Data, Big Innovation: Enabling Competitive Differentiation through
Business Analytics by Evan Stubbs
Business Analytics for Customer Intelligence by Gert Laursen
Business Intelligence Applied: Implementing an Effective Information and
Communications Technology Infrastructure by Michael Gendron
Business Intelligence and the Cloud: Strategic Implementation Guide by
Michael S. Gendron
Business Transformation: A Roadmap for Maximizing Organizational
Insights by Aiman Zeid
Connecting Organizational Silos: Taking Knowledge Flow Management to
the Next Level with Social Media by Frank Leistner
Data-Driven Healthcare: How Analytics and BI are Transforming the
Industry by Laura Madsen
Delivering Business Analytics: Practical Guidelines for Best Practice by
Evan Stubbs
Demand-Driven Forecasting: A Structured Approach to Forecasting, Second
Edition by Charles Chase
Demand-Driven Inventory Optimization and Replenishment: Creating a
More Efficient Supply Chain by Robert A. Davis
Trim Size: 6in x 9in Verbeke ffirs.tex V1 - 09/01/2017 7:42am Page iii�
� �
�
Developing Human Capital: Using Analytics to Plan and Optimize Your
Learning and Development Investments by Gene Pease, Barbara Beres-
ford, and Lew Walker
The Executive’s Guide to Enterprise Social Media Strategy: How Social Net-
works Are Radically Transforming Your Business by David Thomas and
Mike Barlow
Economic and Business Forecasting: Analyzing and Interpreting Economet-
ric Results by John Silvia, Azhar Iqbal, Kaylyn Swankoski, Sarah
Watt, and Sam Bullard
Economic Modeling in the Post Great Recession Era: Incomplete Data,
Imperfect Markets by John Silvia, Azhar Iqbal, and Sarah Watt
House
Enhance Oil & Gas Exploration with Data-Driven Geophysical and Petro-
physical Models by Keith Holdaway and Duncan Irving
Foreign Currency Financial Reporting from Euros to Yen to Yuan: A Guide
to Fundamental Concepts and Practical Applications by Robert Rowan
Fraud Analytics Using Descriptive, Predictive, and Social Network Tech-
niques: A Guide to Data Science for Fraud Detection by Bart Baesens,
Veronique Van Vlasselaer, and Wouter Verbeke
Harness Oil and Gas Big Data with Analytics: Optimize Exploration and
Production with Data-Driven Models by Keith Holdaway
Health Analytics: Gaining the Insights to Transform Health Care by Jason
Burke
Heuristics in Analytics: A Practical Perspective of What Influences Our
Analytical World by Carlos Andre Reis Pinheiro and Fiona McNeill
Human Capital Analytics: How to Harness the Potential of Your Organi-
zation’s Greatest Asset by Gene Pease, Boyce Byerly, and Jac Fitz-enz
Implement, Improve and Expand Your Statewide Longitudinal Data Sys-
tem: Creating a Culture of Data in Education by Jamie McQuiggan and
Armistead Sapp
Intelligent Credit Scoring: Building and Implementing Better Credit Risk
Scorecards, Second Edition, by Naeem Siddiqi
Killer Analytics: Top 20 Metrics Missing from your Balance Sheet by Mark
Brown
Trim Size: 6in x 9in Verbeke ffirs.tex V1 - 09/01/2017 7:42am Page iv�
� �
�
Machine Learning for Marketers: Hold the Math by Jim Sterne
On-Camera Coach: Tools and Techniques for Business Professionals in a
Video-Driven World by Karin Reed
Predictive Analytics for Human Resources by Jac Fitz-enz and John
Mattox II
Predictive Business Analytics: Forward-Looking Capabilities to Improve
Business Performance by Lawrence Maisel and Gary Cokins
Profit Driven Business Analytics: A Practitioner’s Guide to Transforming
Big Data into Added Value by Wouter Verbeke, Cristian Bravo, and
Bart Baesens
Retail Analytics: The Secret Weapon by Emmett Cox
Social Network Analysis in Telecommunications by Carlos Andre Reis
Pinheiro
Statistical Thinking: Improving Business Performance, Second Edition by
Roger W. Hoerl and Ronald D. Snee
Strategies in Biomedical Data Science: Driving Force for Innovation by Jay
Etchings
Style & Statistic: The Art of Retail Analytics by Brittany Bullard
Taming the Big Data Tidal Wave: Finding Opportunities in Huge Data
Streams with Advanced Analytics by Bill Franks
Too Big to Ignore: The Business Case for Big Data by Phil Simon
The Analytic Hospitality Executive by Kelly A. McGuire
The Value of Business Analytics: Identifying the Path to Profitability
by Evan Stubbs
The Visual Organization: Data Visualization, Big Data, and the Quest
for Better Decisions by Phil Simon
Using Big Data Analytics: Turning Big Data into Big Money by Jared
Dean
Winwith Advanced Business Analytics: Creating Business Value from Your
Data by Jean Paul Isson and Jesse Harriott
For more information on any of the above titles, please visit
www.wiley.com.
Trim Size: 6in x 9in Verbeke ffirs.tex V1 - 09/01/2017 7:42am Page v�
� �
�
Profit-DrivenBusiness AnalyticsA Practitioner’s Guide to Transforming
Big Data into Added Value
Wouter VerbekeBart Baesens
Cristián Bravo
Trim Size: 6in x 9in Verbeke ffirs.tex V1 - 09/01/2017 7:42am Page vi�
� �
�
Copyright © 2018 by John Wiley & Sons, Inc. All rights reserved.
Published by John Wiley & Sons, Inc., Hoboken, New Jersey.Published simultaneously in Canada.
No part of this publication may be reproduced, stored in a retrieval system, ortransmitted in any form or by any means, electronic, mechanical, photocopying,recording, scanning, or otherwise, except as permitted under Section 107 or 108 of the1976 United States Copyright Act, without either the prior written permission of thePublisher, or authorization through payment of the appropriate per-copy fee to theCopyright Clearance Center, Inc., 222 Rosewood Drive, Danvers, MA 01923, (978)750-8400, fax (978) 646-8600, or on the Web at www.copyright.com. Requests to thePublisher for permission should be addressed to the Permissions Department, JohnWiley & Sons, Inc., 111 River Street, Hoboken, NJ 07030, (201) 748-6011, fax (201)748-6008, or online at www.wiley.com/go/permissions.
Limit of Liability/Disclaimer of Warranty: While the publisher and author have usedtheir best efforts in preparing this book, they make no representations or warrantieswith respect to the accuracy or completeness of the contents of this book andspecifically disclaim any implied warranties of merchantability or fitness for a particularpurpose. No warranty may be created or extended by sales representatives or writtensales materials. The advice and strategies contained herein may not be suitable for yoursituation. You should consult with a professional where appropriate. Neither thepublisher nor author shall be liable for any loss of profit or any other commercialdamages, including but not limited to special, incidental, consequential, or otherdamages.
For general information on our other products and services or for technical support,please contact our Customer Care Department within the United States at (800)762-2974, outside the United States at (317) 572-3993, or fax (317) 572-4002.
Wiley publishes in a variety of print and electronic formats and by print-on-demand.Some material included with standard print versions of this book may not be includedin e-books or in print-on-demand. If this book refers to media such as a CD or DVDthat is not included in the version you purchased, you may download this material athttp://booksupport.wiley.com. For more information about Wiley products, visitwww.wiley.com.
Library of Congress Cataloging-in-Publication Data is Available:
ISBN 9781119286554 (Hardcover)ISBN 9781119286998 (ePDF)ISBN 9781119286981 (ePub)
Cover Design: WileyCover Image: © Ricardo Reitmeyer/iStockphoto
Printed in the United States of America.
10 9 8 7 6 5 4 3 2 1
Trim Size: 6in x 9in Verbeke ffirs.tex V1 - 09/01/2017 7:42am Page vii�
� �
�
To Luit, Titus, and Fien.
To my wonderful wife, Katrien, and kids Ann-Sophie, Victor,and Hannelore.
To my parents and parents-in-law.
To Cindy, for her unwavering support.
Trim Size: 6in x 9in Verbeke ffirs.tex V1 - 09/01/2017 7:42am Page viii�
� �
�
Trim Size: 6in x 9in Verbeke ftoc.tex V1 - 09/01/2017 7:42am Page ix�
� �
�
Contents
Foreword xv
Acknowledgments xvii
Chapter 1 A Value-Centric Perspective Towards Analytics 1Introduction 1Business Analytics 3
Profit-Driven Business Analytics 9Analytics Process Model 14Analytical Model Evaluation 17Analytics Team 19Profiles 19Data Scientists 20
Conclusion 23Review Questions 24Multiple Choice Questions 24Open Questions 25
References 25
Chapter 2 Analytical Techniques 28Introduction 28Data Preprocessing 29Denormalizing Data for Analysis 29Sampling 30Exploratory Analysis 31Missing Values 31Outlier Detection and Handling 32Principal Component Analysis 33
Types of Analytics 37Predictive Analytics 37Introduction 37Linear Regression 38Logistic Regression 39Decision Trees 45Neural Networks 52
ix
Trim Size: 6in x 9in Verbeke ftoc.tex V1 - 09/01/2017 7:42am Page x�
� �
�
x C O N T E N T S
Ensemble Methods 56Bagging 57Boosting 57Random Forests 58Evaluating Ensemble Methods 59
Evaluating Predictive Models 59Splitting Up the Dataset 59Performance Measures for Classification Models 63Performance Measures for Regression Models 67Other Performance Measures for Predictive AnalyticalModels 68
Descriptive Analytics 69Introduction 69Association Rules 69Sequence Rules 72Clustering 74
Survival Analysis 81Introduction 81Survival Analysis Measurements 83Kaplan Meier Analysis 85Parametric Survival Analysis 87Proportional Hazards Regression 90Extensions of Survival Analysis Models 92Evaluating Survival Analysis Models 93
Social Network Analytics 93Introduction 93Social Network Definitions 94Social Network Metrics 95Social Network Learning 97Relational Neighbor Classifier 98Probabilistic Relational Neighbor Classifier 99Relational Logistic Regression 100Collective Inferencing 102
Conclusion 102Review Questions 103Multiple Choice Questions 103Open Questions 108
Notes 110References 110
Chapter 3 Business Applications 114Introduction 114Marketing Analytics 114Introduction 114
Trim Size: 6in x 9in Verbeke ftoc.tex V1 - 09/01/2017 7:42am Page xi�
� �
�
C O N T E N T S xi
RFM Analysis 115Response Modeling 116Churn Prediction 118X-selling 120Customer Segmentation 121Customer Lifetime Value 123Customer Journey 129Recommender Systems 131
Fraud Analytics 134Credit Risk Analytics 139HR Analytics 141Conclusion 146Review Questions 146Multiple Choice Questions 146Open Questions 150
Note 151References 151
Chapter 4 Uplift Modeling 154Introduction 154The Case for Uplift Modeling: Response Modeling 155Effects of a Treatment 158
Experimental Design, Data Collection, and DataPreprocessing 161
Experimental Design 161Campaign Measurement of Model Effectiveness 164
Uplift Modeling Methods 170Two-Model Approach 172Regression-Based Approaches 174Tree-Based Approaches 183Ensembles 193Continuous or Ordered Outcomes 198
Evaluation of Uplift Models 199Visual Evaluation Approaches 200Performance Metrics 207
Practical Guidelines 210Two-Step Approach for Developing Uplift Models 210Implementations and Software 212
Conclusion 213Review Questions 214Multiple Choice Questions 214Open Questions 216
Note 217References 217
Trim Size: 6in x 9in Verbeke ftoc.tex V1 - 09/01/2017 7:42am Page xii�
� �
�
xii C O N T E N T S
Chapter 5 Profit-Driven Analytical Techniques 220Introduction 220Profit-Driven Predictive Analytics 221The Case for Profit-Driven Predictive Analytics 221Cost Matrix 222Cost-Sensitive Decision Making with Cost-InsensitiveClassification Models 228
Cost-Sensitive Classification Framework 231Cost-Sensitive Classification 234Pre-Training Methods 235During-Training Methods 247Post-Training Methods 253Evaluation of Cost-Sensitive Classification Models 255Imbalanced Class Distribution 256Implementations 259
Cost-Sensitive Regression 259The Case for Profit-Driven Regression 259
Cost-Sensitive Learning for Regression 260During Training Methods 260Post-Training Methods 261
Profit-Driven Descriptive Analytics 267Profit-Driven Segmentation 267Profit-Driven Association Rules 280
Conclusion 283Review Questions 284Multiple Choice Questions 284Open Questions 289
Notes 290References 291
Chapter 6 Profit-Driven Model Evaluationand Implementation 296
Introduction 296Profit-Driven Evaluation of Classification Models 298Average Misclassification Cost 298Cutoff Point Tuning 303ROC Curve-Based Measures 310Profit-Driven Evaluation with Observation-DependentCosts 334
Profit-Driven Evaluation of Regression Models 338Loss Functions and Error-Based Evaluation Measures 339REC Curve and Surface 341
Conclusion 345
Trim Size: 6in x 9in Verbeke ftoc.tex V1 - 09/01/2017 7:42am Page xiii�
� �
�
C O N T E N T S xiii
Review Questions 347Multiple Choice Questions 347Open Questions 350
Notes 351References 352
Chapter 7 Economic Impact 355Introduction 355Economic Value of Big Data and Analytics 355Total Cost of Ownership (TCO) 355Return on Investment (ROI) 357Profit-Driven Business Analytics 359
Key Economic Considerations 359In-Sourcing versus Outsourcing 359On Premise versus the Cloud 361Open-Source versus Commercial Software 362
Improving the ROI of Big Data and Analytics 364New Sources of Data 364Data Quality 367Management Support 369Organizational Aspects 370Cross-Fertilization 371
Conclusion 372Review Questions 373Multiple Choice Questions 373Open Questions 376
Notes 377References 377
About the Authors 378
Index 381
Trim Size: 6in x 9in Verbeke ftoc.tex V1 - 09/01/2017 7:42am Page xiv�
� �
�
Trim Size: 6in x 9in Verbeke flast.tex V1 - 09/01/2017 7:42am Page xv�
� �
�
ForewordSandra WilikensSecretary General, responsible for CSR and member of the ExecutiveCommittee, BNP Paribas Fortis
In today’s corporate world, strategic priorities tend to center oncustomer and shareholder value. One of the consequences is that ana-lytics often focuses too much on complex technologies and statisticsrather than long-term value creation. With their book Profit-DrivenBusiness Analytics, Verbeke, Bravo, and Baesens pertinently bringforward a much-needed shift of focus that consists of turning analyticsinto a mature, value-adding technology. It further builds on theextensive research and industry experience of the author team,making it a must-read for anyone using analytics to create valueand gain sustainable strategic leverage. This is even more true as weenter a new era of sustainable value creation in which the pursuit oflong-term value has to be driven by sustainably strong organizations.The role of corporate employers is evolving as civic involvement andsocial contribution grow to be key strategic pillars.
xv
Trim Size: 6in x 9in Verbeke flast.tex V1 - 09/01/2017 7:42am Page xvi�
� �
�
Trim Size: 6in x 9in Verbeke flast.tex V1 - 09/01/2017 7:42am Page xvii�
� �
�
Acknowledgments
It is a great pleasure to acknowledge the contributions and assistance ofvarious colleagues, friends, and fellow analytics lovers to the writing ofthis book. This book is the result of many years of research and teachingin business analytics. We first would like to thank our publisher, Wiley,for accepting our book proposal.
We are grateful to the active and lively business analytics commu-nity for providing various user fora, blogs, online lectures, and tutori-als, which proved very helpful.
We would also like to acknowledge the direct and indirectcontributions of the many colleagues, fellow professors, students,researchers, and friends with whom we collaborated during thepast years. Specifically, we would like to thank Floris Devriendt andGeorge Petrides for contributing to the chapters on uplift modelingand profit-driven analytical techniques.
Last but not least, we are grateful to our partners, parents, andfamilies for their love, support, and encouragement.
We have tried to make this book as complete, accurate, and enjoy-able as possible. Of course, what really matters is what you, the reader,think of it. Please let us know your views by getting in touch. Theauthors welcome all feedback and comments—so do not hesitate to letus know your thoughts!
Wouter VerbekeBart Baesens
Cristián BravoMay 2017
xvii
Trim Size: 6in x 9in Verbeke flast.tex V1 - 09/01/2017 7:42am Page xviii�
� �
�
Trim Size: 6in x 9in Verbeke c01.tex V1 - 09/01/2017 7:42am Page 1�
� �
�
C H A P T E R 1A Value-CentricPerspectiveTowards Analytics
INTRODUCTION
In this first chapter, we set the scene for what is ahead by broadlyintroducing profit-driven business analytics. The value-centric per-spective toward analytics proposed in this book will be positioned andcontrasted with a traditional statistical perspective. The implicationsof adopting a value-centric perspective toward the use of analytics inbusiness are significant: a mind shift is needed both from managersand data scientists in developing, implementing, and operating analyt-ical models. This, however, calls for deep insight into the underlyingprinciples of advanced analytical approaches. Providing such insight isour general objective in writing this book and, more specifically:
◾ We aim to provide the reader with a structured overview ofstate-of-the art analytics for business applications.
◾ We want to assist the reader in gaining a deeper practicalunderstanding of the inner workings and underlying principlesof these approaches from a practitioner’s perspective.
1
Trim Size: 6in x 9in Verbeke c01.tex V1 - 09/01/2017 7:42am Page 2�
� �
�
2 P R O F I T - D R I V E N B U S I N E S S A N A L Y T I C S
◾ We wish to advance managerial thinking on the use of advancedanalytics by offering insight into how these approaches may eithergenerate significant added value or lower operational costs byincreasing the efficiency of business processes.
◾ We seek to prosper and facilitate the use of analytical approaches thatare customized to needs and requirements in a business context.
As such, we envision that our book will facilitate organizationsstepping up to a next level in the adoption of analytics for decisionmaking by embracing the advanced methods introduced in thesubsequent chapters of this book. Doing so requires an investmentin terms of acquiring and developing knowledge and skills but, as isdemonstrated throughout the book, also generates increased profits.An interesting feature of the approaches discussed in this book isthat they have often been developed at the intersection of academiaand business, by academics and practitioners joining forces for tun-ing a multitude of approaches to the particular needs and problemcharacteristics encountered and shared across diverse business settings.
Most of these approaches emerged only after the millennium,which should not be surprising. Since the millennium, we have wit-nessed a continuous and pace-gaining development and an expandingadoption of information, network, and database technologies. Keytechnological evolutions include the massive growth and successof the World Wide Web and Internet services, the introduction ofsmart phones, the standardization of enterprise resource planningsystems, and many other applications of information technology. Thisdramatic change of scene has prospered the development of analyticsfor business applications as a rapidly growing and thriving branch ofscience and industry.
To achieve the stated objectives, we have chosen to adopt apragmatic approach in explaining techniques and concepts. We donot focus on providing extensive mathematical proof or detailedalgorithms. Instead, we pinpoint the crucial insights and underlyingreasoning, as well as the advantages and disadvantages, related to thepractical use of the discussed approaches in a business setting. Forthis, we ground our discourse on solid academic research expertise aswell as on many years of practical experience in elaborating industrialanalytics projects in close collaboration with data science professionals.Throughout the book, a plethora of illustrative examples and casestudies are discussed. Example datasets, code, and implementations
Trim Size: 6in x 9in Verbeke c01.tex V1 - 09/01/2017 7:42am Page 3�
� �
�
A V A L U E- C E N T R I C P E R S P E C T I V E T O W A R D S A N A L Y T I C S 3
are provided on the book’s companion website, www.profit-analytics.com, to further support the adoption of the discussed approaches.
In this chapter, we first introduce business analytics. Next, theprofit-driven perspective toward business analytics that will be elab-orated in this book is presented. We then introduce the subsequentchapters of this book and how the approaches introduced in thesechapters allow us to adopt a value-centric approach for maximizingprofitability and, as such, to increase the return on investment ofbig data and analytics. Next, the analytics process model is dis-cussed, detailing the subsequent steps in elaborating an analyticsproject within an organization. Finally, the chapter concludes bycharacterizing the ideal profile of a business data scientist.
Business Analytics
Data is the new oil is a popular quote pinpointing the increasing value ofdata and—to our liking—accurately characterizes data as rawmaterial.Data are to be seen as an input or basic resource needing furtherprocessing before actually being of use. In a subsequent section inthis chapter, we introduce the analytics process model that describesthe iterative chain of processing steps involved in turning data intoinformation or decisions, which is quite similar actually to an oil refineryprocess. Note the subtle but significant difference between the wordsdata and information in the sentence above. Whereas data fundamen-tally can be defined to be a sequence of zeroes and ones, informationessentially is the same but implies in addition a certain utility or valueto the end user or recipient. So, whether data are information dependson whether the data have utility to the recipient. Typically, for rawdata to be information, the data first need to be processed, aggregated,summarized, and compared. In summary, data typically need to beanalyzed, and insight, understanding, or knowledge should be addedfor data to become useful.
Applying basic operations on a dataset may already provide usefulinsight and support the end user or recipient in decision making. Thesebasic operations mainly involve selection and aggregation. Both selec-tion and aggregation may be performed in many ways, leading to aplentitude of indicators or statistics that can be distilled from raw data.The following illustration elaborates a number of sales indicators in aretail setting.
Providing insight by customized reporting is exactly what the fieldof business intelligence (BI) is about. Typically, visualizations arealso adopted to represent indicators and their evolution in time, ineasy-to-interpret ways. Visualizations provide support by facilitating
Trim Size: 6in x 9in Verbeke c01.tex V1 - 09/01/2017 7:42am Page 4�
� �
�
4 P R O F I T - D R I V E N B U S I N E S S A N A L Y T I C S
EXAMPL EFor managerial purposes, a retailer requires the development of real-time salesreports. Such a report may include a wide variety of indicators that summarizeraw sales data. Raw sales data, in fact, concern transactional data that can beextracted from the online transaction processing (OLTP) system that isoperated by the retailer. Some example indicators and the required selectionand aggregation operations for calculating these statistics are:
◾ Total amount of revenues generated over the last 24 hours: Select alltransactions over the last 24 hours and sum the paid amounts, withpaid meaning the price net of promotional offers.
◾ Average paid amount in online store over the last seven days: Selectall online transactions over the last seven days and calculate theaverage paid amount;
◾ Fraction of returning customers within one month: Select alltransactions over the last month and select customer IDs thatappear more than once; count the number of IDs.
Remark that calculating these indicators involves basic selectionoperations on characteristics or dimensions of transactions stored in thedatabase, as well as basic aggregation operations such as sum, count, andaverage, among others.
the user’s ability to acquire understanding and insight in the blink ofan eye. Personalized dashboards, for instance, are widely adopted inthe industry and are very popular with managers to monitor and keeptrack of business performance. A formal definition of business intelli-gence is provided by Gartner (http://www.gartner.com/it-glossary):
Business intelligence is an umbrella term that includes theapplications, infrastructure and tools, and best practicesthat enable access to and analysis of information toimprove and optimize decisions and performance.
Note that this definition explicitly mentions the required infras-tructure and best practices as an essential component of BI, whichis typically also provided as part of the package or solution offeredby BI vendors and consultants. More advanced analysis of data mayfurther support users and optimize decision making. This is exactlywhere analytics comes into play. Analytics is a catch-all term coveringa wide variety of what are essentially data-processing techniques.
Trim Size: 6in x 9in Verbeke c01.tex V1 - 09/01/2017 7:42am Page 5�
� �
�
A V A L U E- C E N T R I C P E R S P E C T I V E T O W A R D S A N A L Y T I C S 5
In its broadest sense, analytics strongly overlaps with data science,statistics, and related fields such as artificial intelligence (AI) andmachine learning. Analytics, to us, is a toolbox containing a varietyof instruments and methodologies allowing users to analyze data for adiverse range of well-specified purposes. Table 1.1 identifies a numberof categories of analytical tools that cover diverse intended uses or, inother words, allow users to complete a diverse range of tasks.
A first main group of tasks identified in Table 1.1 concerns pre-diction. Based on observed variables, the aim is to accurately estimateor predict an unobserved value. The applicable subtype of predictiveanalytics depends on the type of target variable, which we intend tomodel as a function of a set of predictor variables. When the targetvariable is categorical in nature, meaning the variable can only takea limited number of possible values (e.g., churner or not, fraudster ornot, defaulter or not), then we have a classification problem. Whenthe task concerns the estimation of a continuous target variable (e.g.,sales amount, customer lifetime value, credit loss), which can takeany value over a certain range of possible values, we are dealingwith regression. Survival analysis and forecasting explicitly accountfor the time dimension by either predicting the timing of events(e.g., churn, fraud, default) or the evolution of a target variable intime (e.g., churn rates, fraud rates, default rates). Table 1.2 providessimplified example datasets and analytical models for each type ofpredictive analytics for illustrative purposes.
The second main group of analytics comprises descriptive analyticsthat, rather than predicting a target variable, aim at identifyingspecific types of patterns. Clustering or segmentation aims at groupingentities (e.g., customers, transactions, employees, etc.) that aresimilar in nature. The objective of association analysis is to findgroups of events that frequently co-occur and therefore appear tobe associated. The basic observations that are being analyzed inthis problem setting consist of variable groups of events; for instance,transactions involving various products that are being bought by acustomer at a certain moment in time. The aim of sequence analysis
Table 1.1 Categories of Analytics from a Task-OrientedPerspective
Predictive Analytics Descriptive Analytics
ClassificationRegressionSurvival analysisForecasting
ClusteringAssociation analysisSequence analysis
Trim Size: 6in x 9in Verbeke c01.tex V1 - 09/01/2017 7:42am Page 6�
� �
�
6 P R O F I T - D R I V E N B U S I N E S S A N A L Y T I C S
Table 1.2 Example Datasets and Predictive Analytical Models
Example dataset Predictive analytical model
Classification
ID Recency Frequency Monetary Churn
C1 26 4.2 126 Yes
C2 37 2.1 59 No
C3 2 8.5 256 No
C4 18 6.2 89 No
C5 46 1.1 37 Yes
. . . . . . . . . . . . . . .
Decision tree classificationmodel:
Frequency < 5
Recency > 25
Churn = Yes
Churn = No
Churn = No
Yes
Yes
No
No
Regression
ID Recency Frequency Monetary CLV
C1 26 4.2 126 3,817
C2 37 2.1 59 4,31
C3 2 8.5 256 2,187
C4 18 6.2 89 543
C5 46 1.1 37 1,548
. . . . . . . . . . . . . . .
Linear regression model:
CLV = 260 + 11 ⋅ Recency
+ 6.1 ⋅ Frequency
+ 3.4 ⋅ Monetary
Survival analysis
ID RecencyChurnor Censored
Time of churnor Censoring
C1 26 Churn 181
C2 37 Censored 253
C3 2 Censored 37
C4 18 Censored 172
C5 46 Churn 98
. . . . . . . . . . . .
General parametric survivalanalysis model:
log(T) = 13 + 5.3 ⋅ Recency
Forecasting
Timestamp Demand
January 513
February 652
March 435
April 578
May 601
. . . . . .
Weighted moving averageforecasting model:
Demandt = 0.4 ⋅ Demandt−1
+ 0.3 ⋅ Demandt−2
+ 0.2 ⋅ Demandt−3
+ 0.1 ⋅ Demandt−4
Trim Size: 6in x 9in Verbeke c01.tex V1 - 09/01/2017 7:42am Page 7�
� �
�
A V A L U E- C E N T R I C P E R S P E C T I V E T O W A R D S A N A L Y T I C S 7
Table 1.3 Example Datasets and Descriptive Analytical Models
Data Descriptive analytical model
Clustering
ID Recency Frequency
C1 26 4.2
C2 37 2.1
C3 2 8.5
C4 18 6.2
C5 46 1.1
. . . . . . . . .
K-means clustering with K = 3:
0
10
Fre
quen
cy
9876543210
10 20 30Recency
40 50 60
Association analysis
ID Items
T1 beer, pizza, diapers, baby food
T2 coke, beer, diapers
T3 crisps, diapers, baby food
T4 chocolates, diapers, pizza, apples
T5 tomatoes, water, oranges, beer
. . . . . .
Association rules:
If baby food And diapers Then beerIf coke And pizza Then crisps
. . .
Sequence analysis
ID Sequential items
C1 <{3},{9}>
C2 <{1 2},{3},{4 6 7}>
C3 <{3 5 7}>
C4 <{3},{4 7},{9}>
C5 <{9}>
. . . . . .
Sequence rules:
Item 3
Item 3
Item 9
Item 4 & 7 Item 31
. . .
is similar to association analysis but concerns the detection of eventsthat frequently occur sequentially, rather than simultaneously as inassociation analysis. As such, sequence analysis explicitly accounts forthe time dimension. Table 1.3 provides simplified examples of datasetsand analytical models for each type of descriptive analytics.
Trim Size: 6in x 9in Verbeke c01.tex V1 - 09/01/2017 7:42am Page 8�
� �
�
8 P R O F I T - D R I V E N B U S I N E S S A N A L Y T I C S
Note that Tables 1.1 through 1.3 identify and illustrate categoriesof approaches that are able to complete a specific task from a technicalrather than an applied perspective. These different types of analyticscan be applied in quite diverse business and nonbusiness settings andconsequently lead to many specialized applications. For instance, pre-dictive analytics and,more specifically, classification techniquesmay beapplied for detecting fraudulent credit-card transactions, for predictingcustomer churn, for assessing loan applications, and so forth. From anapplication perspective, this leads to various groups of analytics suchas, respectively, fraud analytics, customer or marketing analytics, andcredit risk analytics. A wide range of business applications of analyt-ics across industries and business departments is discussed in detailin Chapter 3.
With respect to Table 1.1, it needs to be noted that these differ-ent types of analytics apply to structured data. An example of astructured dataset is shown in Table 1.4. The rows in such a datasetare typically called observations, instances, records, or lines, andrepresent or collect information on basic entities such as customers,transactions, accounts, or citizens. The columns are typically referredto as (explanatory or predictor) variables, characteristics, attributes,predictors, inputs, dimensions, effects, or features. The columnscontain information on a particular entity as represented by a rowin the table. In Table 1.4, the second column represents the age of acustomer, the third column the postal code, and so on. In this book weconsistently use the terms observation and variable (and sometimesmore specifically, explanatory, predictor, or target variable).
Because of the structure that is present in the dataset in Table 1.4and the well-defined meaning of rows and columns, it is much easierto analyze such a structured dataset compared to analyzing unstruc-tured data such as text, video, or networks, to name a few. Specializedtechniques exist that facilitate analysis of unstructured data—forinstance, text analytics with applications such as sentiment analysis,video analytics that can be applied for face recognition and incidentdetection, and network analytics with applications such as community
Table 1.4 Structured Dataset
Customer Age Income Gender Duration Churn
John 30 1,800 Male 620 Yes
Sarah 25 1,400 Female 12 No
Sophie 52 2,600 Female 830 No
David 42 2,200 Male 90 Yes
Trim Size: 6in x 9in Verbeke c01.tex V1 - 09/01/2017 7:42am Page 9�
� �
�
A V A L U E- C E N T R I C P E R S P E C T I V E T O W A R D S A N A L Y T I C S 9
mining and relational learning (see Chapter 2). Given the roughestimate that over 90% of all data are unstructured, clearly there is alarge potential for these types of analytics to be applied in business.
However, due to the inherent complexity of analyzing unstruc-tured data, as well as because of the often-significant developmentcosts that only appear to pay off in settings where adopting thesetechniques significantly adds to the easier-to-apply structuredanalytics, currently we see relatively few applications in businessbeing developed and implemented. In this book, we therefore focuson analytics for analyzing structured data, and more specifically thesubset listed in Table 1.1. For unstructured analytics, one may referto the specialized literature (Elder IV and Thomas 2012; Chakraborty,Murali, and Satish 2013; Coussement 2014; Verbeke, Martens andBaesens 2014; Baesens, Van Vlasselaer, and Verbeke 2015).
PROFIT-DRIVEN BUSINESS ANALYTICS
The premise of this book is that analytics is to be adopted in busi-ness for better decision making—“better” meaning optimal in termsof maximizing the net profits, returns, payoff, or value resultingfrom the decisions that are made based on insights obtained fromdata by applying analytics. The incurred returns may stem from again in efficiency, lower costs or losses, and additional sales, amongothers. The decision level at which analytics is typically adopted is theoperational level, where many customized decisions are to be madethat are similar and granular in nature. High-level, ad hoc decisionmaking at strategic and tactical levels in organizations also may benefitfrom analytics, but expectedly to a much lesser extent.
The decisions involved in developing a business strategy are highlycomplex in nature and do not match the elementary tasks enlisted inTable 1.1. A higher-level AI would be required for such purpose, whichis not yet at our disposal. At the operational level, however, there aremany simple decisions to be made, which exactly match with the taskslisted in Table 1.1. This is not surprising, since these approaches haveoften been developed with a specific application in mind. In Table 1.5,we provide a selection of example applications, most of which will beelaborated on in detail in Chapter 3.
Analytics facilitates optimization of the fine granular decision-making activities listed in Table 1.5, leading to lower costs or losses andhigher revenues and profits. The level of optimization depends on theaccuracy and validity of the predictions, estimates, or patterns derivedfrom the data. Additionally, as we stress in this book, the quality
Trim Size: 6in x 9in Verbeke c01.tex V1 - 09/01/2017 7:42am Page 10�
� �
�
10 P R O F I T - D R I V E N B U S I N E S S A N A L Y T I C S
Table 1.5 Examples of Business Decisions Matching Analytics
Decision Making with Predictive Analytics
Classification Credit officers have to screen loan applications and decide on whetherto accept or reject an application based on the involved risk. Based onhistorical data on the performance of past loan applications, aclassification model may learn to distinguish good from bad loanapplications using a number of well-chosen characteristics of theapplication as well as of the applicant. Analytics and, more specifically,classification techniques allow us to optimize the loan-granting processby more accurately assessing risk and reducing bad loan losses (VanGestel and Baesens 2009; Verbraken et al. 2014). Similar applicationsof decision making based on classification techniques, which arediscussed in more detail in Chapter 3 of this book, include customerchurn prediction, response modeling, and fraud detection.
Regression Regression models allow us to estimate a continuous target value andin practice are being adopted, for instance, to estimate customerlifetime value. Having an indication on the future worth in terms ofrevenues or profits a customer will generate is important to allowcustomization of marketing efforts, for pricing, etc. As is discussed indetail in Chapter 3, analyzing historical customer data allowsestimating the future net value of current customers using aregression model.Similar applications involve loss given default modeling as is discussedin Chapter 3, as well as the estimation of software development costs(Dejaeger et al. 2012).
Survival analysis Survival analysis is being adopted in predictive maintenanceapplications for estimating when a machine component will fail. Suchknowledge allows us to optimize decisions related to machinemaintenance—for instance, to optimally plan when to replace a vitalcomponent. This decision requires striking a balance between the costof machine failure during operations and the cost of the component,which is preferred to be operated as long as possible before replacing it(Widodo and Yang 2011).Alternative business applications of survival analysis involve theprediction of time to churn and time to default where, compared toclassification, the focus is on predicting when the event will occurrather than whether the event will occur.
Forecasting A typical application of forecasting involves demand forecasting, whichallows us to optimize production planning and supply chainmanagement decisions. For instance, a power supplier needs to be ableto balance electricity production and demand by the consumers andfor this purpose adopts forecasting or time-series modeling techniques.These approaches allow an accurate prediction of the short-termevolution of demand based on historical demand patterns (Hyndmanet al. 2008).