96
Stochastic Models of Noncontractual Consumer Relationships | | |||||||||||||||||||||||||||||| | || || || || | | || || |||| || | | | || |||||||||||||||| |||||||||||| | | | | || | | | | | || | || | | | | | | | || || | | ||||||||| | | | | | | ||| | ||||||||||| |||||||||||| ||||||||||||||||||||||| ||||||||||||||||||||||||||||||||| Calibration Period Validation Period Michael Platzer [email protected] Master Thesis at the Vienna University of Economics and Business Adminstration Under the Supervision of Dr. Thomas Reutterer November 2008

Stochastic Models of Noncontractual Consumer Relationships

Embed Size (px)

DESCRIPTION

Master thesis, which introduces a newly derived stochastic prediction model for customer lifetime values, that is able to incorporate regularities within the transaction timings of the customer base.

Citation preview

Page 1: Stochastic Models of Noncontractual Consumer Relationships

Stochastic Models of

Noncontractual Consumer Relationships

| | || | | | | | | | | | | | | | | | | | | | | | | | | | | | |

| | | || | | | | | | || || | |||| || | || | | ||

|| | | | | | | | | | | | | | | | | | | | | | | | | | |

| | | | | | | | | | | || | | | | | | |

| | | || | | | | | | | | | || | | |

| | | | | | | | | | || | | | | | | | | |

| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |

| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |

Calibration Period Validation Period

Michael Platzer

[email protected]

Master Thesis at theVienna University of Economics

and Business Adminstration

Under the Supervision ofDr. Thomas Reutterer

November 2008

Page 2: Stochastic Models of Noncontractual Consumer Relationships

F Dedicated to my Mom & Dad F

Page 3: Stochastic Models of Noncontractual Consumer Relationships

Abstract

The primary goal of this master thesis is to evaluate several well-establishedprobabilistic models for forecasting customer behavior in noncontractual set-tings on an individual level. This research has been carried out with theparticular purpose of participating in a lifetime value competition that hasbeen organized by the Direct Marketing Educational Foundation throughoutfall 2008.

First, an in-depth exploratory analysis of the provided contest data setis undertaken, with its key characteristics being displayed in several in-formative visualizations. Subsequently, the NBD (Ehrenberg, 1959), thePareto/NBD (Schmittlein et al., 1987), the BG/NBD (Fader et al., 2005a)and the CBG/NBD (Hoppe and Wagner, 2007) model are applied on thedata. Since the data seems to violate the Poisson assumption, which is aprevalent assumption regarding the random nature of the transaction timingprocess, the presented models produce rather mediocre results. This becomesapparent as we will show that a simple linear regression model outperformsthese probabilistic models for the contest data.

As a consequence a new variant based on the CBG/NBD model, namely theCBG/CNBD-k model, is being developed. This model is able to take a certaindegree of regularity in the timing process into account by modeling Erlang-kintertransaction times, and thereby delivers considerably better predictionsfor the data set at hand. Out of 25 participating teams at the contest themodel finished at second place, only marginally behind the winning model. Aresult that demonstrates that under certain conditions this newly developedvariant is able to outperform numerous other existent, in particular stochasticmodels.

Keywords : marketing, consumer behavior, lifetime value, stochastic predic-tion models, customer base analysis, Pareto/NBD, regularity

i

Page 4: Stochastic Models of Noncontractual Consumer Relationships

Contents

Abstract i

1 Introduction 1

1.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

1.2 Problem Scope . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

1.3 Discussed Models . . . . . . . . . . . . . . . . . . . . . . . . . . 3

1.4 Usage Scenarios . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

2 DMEF Competition 6

2.1 Contest Details . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

2.2 Data Set . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

2.3 Game Plan . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

3 Exploratory Data Analysis 11

3.1 Key Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

3.2 Distribution of Individual Donation Behavior . . . . . . . . . . 13

3.3 Trends on Aggregated Level . . . . . . . . . . . . . . . . . . . . 15

3.4 Distribution of Intertransaction Times . . . . . . . . . . . . . . 19

4 Forecast Models 21

4.1 NBD Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

4.2 Pareto/NBD Model . . . . . . . . . . . . . . . . . . . . . . . . . 28

4.3 BG/NBD Model . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

4.4 CBG/NBD Model . . . . . . . . . . . . . . . . . . . . . . . . . . 37

ii

Page 5: Stochastic Models of Noncontractual Consumer Relationships

CONTENTS iii

5 Model Comparison 41

5.1 Parameter Interpretation . . . . . . . . . . . . . . . . . . . . . . 41

5.2 Data Fit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

5.3 Forecast Accuracy . . . . . . . . . . . . . . . . . . . . . . . . . . 44

5.4 Simple Forecast Benchmarks . . . . . . . . . . . . . . . . . . . 51

5.5 Error Composition . . . . . . . . . . . . . . . . . . . . . . . . . 52

6 CBG/CNBD-k Model 56

6.1 Regularity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56

6.2 Assumptions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61

6.3 Comparison of Models . . . . . . . . . . . . . . . . . . . . . . . 64

6.4 Final Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69

7 Conclusion 72

A Derivation of CBG/CNBD-k 74

A.1 Assumptions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74

A.2 Erlang-k . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75

A.3 Individual Likelihood . . . . . . . . . . . . . . . . . . . . . . . . 76

A.4 Aggregate Likelihood . . . . . . . . . . . . . . . . . . . . . . . . 77

A.5 Parameter Estimation . . . . . . . . . . . . . . . . . . . . . . . 79

A.6 Probability Distribution of Purchase Frequencies . . . . . . . . 79

A.7 Probability of Being Active . . . . . . . . . . . . . . . . . . . . 81

A.8 Expected Number of Transactions . . . . . . . . . . . . . . . . 83

A.9 Concluding Remarks . . . . . . . . . . . . . . . . . . . . . . . . 88

Bibliography 89

Page 6: Stochastic Models of Noncontractual Consumer Relationships

Chapter 1

Introduction

1.1 Background

Over 80% of those companies that participated in a German study on theusage of information instruments in retail controlling regarded the concept ofcustomer lifetime value as useful (Schroder et al., 1999, p. 9). But only lessthan 10% actually had a working implementation at that time. No other con-sumer related information, for example customer satisfaction, penetration orsociodemographic variables, showed such a big discrepancy between assessedusefulness and actual usage. Therefore, accurate lifetime value models canbe expected to become, despite but also because of their inherent challengingcomplexity, a crucial information advance in highly competitive markets.

Typical fundamental managerial questions that arise, are (Schmittlein et al.,1987; Morrison and Schmittlein, 1988):

• How much is my current customer base worth?

• How many purchases, and which sales volume can I expect from myclientele in the future?

• How many customers are still active customers? Who has already, andwho will likely defect?

• Who will be my most, respectively my least profitable customers?

• Who should we target with a specific marketing activity?

• How much of the sales volume has been attributed to such a marketingactivity?

1

Page 7: Stochastic Models of Noncontractual Consumer Relationships

CHAPTER 1. INTRODUCTION 2

And a key part for finding answers to those questions is the accurate assess-ment of lifetime value on an aggregated as well as on an individual level.

Hardly any organization can afford to make budget plans for the upcomingperiod without making careful estimations regarding the future sales. Suchestimates on the aggregate level are therefore widely common and numerousmethods exist which range from simple managerial heuristics to advancedtime series analyses. Fairly more challenging is the prediction of future salesbroken down between trial and repetitive customers. And, considering howlittle information we have on an individual level, an even more demandingtask is the accurate forecasting for each single client.

Nevertheless, the increasing prevalence of computerized transaction systemsand the drop in data storage costs, which we have seen over the past decade,provide more and more companies with customer databases coupled withlarge records of transaction history (‘Who bought which product at whatprice at what time?’). But the sheer data itself is no good unless models andtools are implemented that condense the desired characteristics, trends andforecasts out of the data. Such tools are nowadays commonly provided aspart of customer relationship management software, which enables the orga-nizations to act and react individually to each customer. The heterogeneityin one’s customer base is thereby taken into account and this allows a furtheroptimization of marketing activities and their efficiency.1 And one essentialinformation bit for CRM implementations is the (monetary) valuation of anindividual customer (Rosset et al., 2003, p. 321).

1.2 Problem Scope

The primary focus of this thesis is the evaluation and implementation of sev-eral probabilistic models for forecasting customer behavior in noncontractualsettings on an individual level. This research has been carried out with themain focus on participating in a lifetime value competition which has beenorganized by the Direct Marketing Educational Foundation in fall 2008.

The limitations of the research scope in this thesis are fairly well defined bythe main task of the competition, which is the estimation of the future pur-chase amount for an existent customer base on a disaggregated level based

1Clustering a customer base into segments can be seen as a first step in dealing withheterogeneity. But one-to-one marketing, as it is described here, is the consequent contin-uation of this approach.

Page 8: Stochastic Models of Noncontractual Consumer Relationships

CHAPTER 1. INTRODUCTION 3

upon transaction history. Therefore, we will not provide a complete overviewof existing lifetime value models (see Gupta et al. (2006) for such an overview)but will rather focus on models that can make such accurate future predic-tions on an individual level.

Due to the large amount of one-time purchases and the long time span ofthe data, we have to use models that can also incorporate the defection ofcustomers in addition to modeling the purchase frequency. Furthermore, weare faced with noncontractual consumer relationships, a characteristic that iswidely common but which unfortunately adds considerably some complexityto the forecasting task (Reinartz and Kumar, 2000). The difficulty arisesbecause no definite information regarding the status of a customer-firm rela-tionship is available. Neither now nor later. This means that it is impossibleto tell whether a specific customer is still active or whether he/she has alreadydefected. On the contrary to that, in a contractual setting2, such as the clientbase of a telecommunication service provider, it is known when a customercancels his/her contract and is therefore lost for good.3 In a noncontractualsetting, such as retail shoppers, air carrier passengers or donors for a NPO,we cannot observe the current status of a customer-firm relationship (i.e. itis a latent variable), but rather rely on other data, such as the transactionhistory to make proper judgments. Therefore we will limit our research tomodels that can handle this kind of uncertainty.

Further, because the data set only provides transaction records,4 the empha-sis is put on models that extract the most out of the transaction history anddo not rely on incorporating other covariates, such as demographic variables,competition activity or other exogenous variables.

1.3 Discussed Models

Table 1.1 displays an overview of the probabilistic models that are beingevaluated and applied upon the competition data within this thesis.

Firstly, the seminal work by Ehrenberg who proposed the negative binomial

2Also known as subscription-based setting.3Models that explicitly model churn rates are, among others, logistic regression models

and survival models. See Rosset et al. (2003) and Mani et al. (1999) for examples of thelatter kind of models.

4Actually it also includes detailed records of direct marketing activities, but we neglectthis data, as such data is not available for the target period. See section 2.3 for a furtherreasoning.

Page 9: Stochastic Models of Noncontractual Consumer Relationships

CHAPTER 1. INTRODUCTION 4

Model Author(s) Year

NBD Ehrenberg 1959Pareto/NBD Schmittlein, Morrison, and Colombo 1987BG/NBD Fader, Hardie, and Lee 2005CBG/NBD Hoppe and Wagner 2007CBG/CNBD-k Platzer 2008

Table 1.1: Overview of Presented Models

distribution (NBD) in 1959 as a model for repeated buying is investigated indetail in section 4.1. Further, we will evaluate the well-known Pareto/NBDmodel (section 4.2) and two of its variants, the BG/NBD (section 4.3) andthe CBG/NBD (section 4.4) model, which are all extensions of the NBDmodel but make additional assumptions regarding the defection process andits heterogeneity among customer. In order to get a feeling for the forecastaccuracy of these probabilistic models, we will subsequently also benchmarkthem against a simple linear regression model.

Finally, the CBG/CNBD-k model, which is a new variant of the CBG/NBDmodel, will be introduced in chapter 6. This model makes differing assump-tions regarding the timing of purchases, in particular it considers a certainextent of regularity and thereby will improve forecast quality considerablyfor the competition data set. Detailed derivations for this model are providedin appendix A.

1.4 Usage Scenarios

But before diving into the details of the present models, we try to furtherincrease the reader’s motivation by providing some common usage scenariosof noncontractual relations with repeated transactions. The following listcontains usage scenarios which have already been studied in various articlesand which should give an idea of the broad field of applications for suchmodels.

• Customers of the online music store CDNOW (Fader et al., 2005a).This data set is also publicly available at http://brucehardie.com/notes/008/, and has been used in numerous other articles (Abe, 2008;Hoppe and Wagner, 2007; Batislam et al., 2007; Fader et al., 2005c;

Page 10: Stochastic Models of Noncontractual Consumer Relationships

CHAPTER 1. INTRODUCTION 5

Fader and Hardie, 2001; Wubben and von Wangenheim, 2008) to bench-mark the quality of various models.

• Clients of a financial service broker (Schmittlein et al., 1987).

• Members of a frequent shopper program at a department store in Japan(Abe, 2008).

• Consumers buying at a grocery store (Batislam et al., 2007). Individualdata can be collected by providing client-cards that are being combinedwith some sort of loyalty program.

• Business customers of an office supply company (Schmittlein and Pe-terson, 1994).

• Clients of a catalog retailer (Hoppe and Wagner, 2007).

But, citing Wubben and von Wangenheim (2008, p. 82), whenever ‘a cus-tomer purchases from a catalog retailer, walks off an aircraft, checks out of ahotel, or leaves a retail outlet, the firm has no way of knowing whether andhow often the customer will conduct business in the future’. And as such theusage scenarios are practically unlimited.

One other example from the author’s own business experience is the challengeto assess the number of active users of a free webservice, such as a bloggingplatform. Users can be uniquely identified by a permanent cookie stored inthe browser client, when they access the site. Each posting of a new blogentry could be seen as a transaction, and therefore these models could alsoprovide answers to questions like ‘How many of the registered users are stillactive?’ and ‘How many blog entries will be posted within the next monthby each one of them?’.

This thesis should shed some light on how to find accurate answers to ques-tions of this kind.

Page 11: Stochastic Models of Noncontractual Consumer Relationships

Chapter 2

DMEF Competition

2.1 Contest Details

The Direct Marketing Educational Foundation1 (DMEF) is a US based non-profit organization with the mission ‘to attract, educate, and place top collegestudents by continuously improving and supporting the teaching of world-class direct / interactive marketing’2. The DMEF is an affiliate of the DirectMarketing Association Inc.3 and it is also founder and publisher of the Jour-nal of Interactive Marketing4.

The DMEF organized a contest in 2008, with ‘the purpose [..] to compareand improve the estimation methods and applications for [lifetime value andcustomer equity modeling]’ which ‘have attracted widespread attention frommarketing researchers [..] over the past 15 years’ (May, Austin, Bartlett,Malthouse, and Fader, 2008). The participating teams were provided witha data set from a leading US nonprofit organization, whose name remainedundisclosed, containing detailed transaction and contact history of a cohortof 21.166 donors over a period of 4 years and 8 months. The transactionrecords included a unique donor ID, the timing, and the amount of eachsingle donation together with a (rather cryptic) code for the type of contact.The contact data included records of each single contact together with thecontacted donor, the timing, the type of contact, and the implied costs ofthat contact.

1cf. http://www.directworks.org/2http://www.directworks.org/About/Default.aspx?id=386, retrieved on Oct. 9, 20083cf. http://www.the-dma.org/4cf. https://www.directworks.org/Educators/Default.aspx?id=220

6

Page 12: Stochastic Models of Noncontractual Consumer Relationships

CHAPTER 2. DMEF COMPETITION 7

The first phase of the competition consisted of three separate estimationtasks for a target period of two years:

1. Estimate the donation sum on an aggregated level.

2. Estimate the donation sum on an individual level.

3. Estimate which donors, who have made their last donation beforeSep. 1, 2004, will be donating at all during the target period.

An error measure for all 3 tasks was defined by the contest organizing com-mittee in order to evaluate and compare the submitted calculations by theparticipating teams. Closeness on an aggregated level (task 1) was simplydefined as the absolute deviation from the actual donation amount, and fortask 3 it was the percentage of correctly classified cases. The error measurefor task 2 was defined as the mean squared logarithmic error:

MSLE =∑

i

(log(yi + 1)− log(yi + 1))2/21.166,

with the 1 added to avoid taking the logarithm of 0, and with 21.166 beingthe size of the cohort.

The deadline for submitting calculations for phase 1 (task 1 to 3) was Sep. 15,2008. The results for the participating teams were announced couple ofweeks afterwards and were discussed at the DMEF’s Research Summit inLas Vegas.5

2.2 Data Set

The data set contains records of 53,998 donations for 21,166 distinct donors,starting from Jan. 2, 2002, until Aug. 31, 2006. Each of these donors madetheir initial donation during the first half of 2002, as this is the criteriafor donors for being included into the cohort. The record of each donationcontains a unique identifier of the donor, and the date and dollar amount ofthat donation. Additionally, the type of contact that can be linked with thistransaction is given. See table 2.1 for a sample of the transaction records.

Furthermore, detailed contact records with their related costs were provided.These 611,188 records range from Sep. 10, 1999, until Aug. 28, 2006. Each

5cf. http://www.researchsummit.org/

Page 13: Stochastic Models of Noncontractual Consumer Relationships

CHAPTER 2. DMEF COMPETITION 8

id date amt source

8128357 2002-02-22 5 02WMFAWUUU

9430679 2002-01-10 50 01ZKEKAPAU

9455908 2002-04-19 25 02WMHAWUUU

9652546 2002-04-02 100 01RYAAAPBA

9652546 2003-01-06 100 02DEKAAGBA

9652546 2004-01-05 100 04CHB1AGCB

.. .. .. ..

13192422 2005-02-11 50 05HCPAAICD

13192422 2005-02-16 50 05WMFAWUUU

Table 2.1: Transaction Records

contact record contains an identifier of the contacted donor, the date ofcontact, the type of contact and the associated costs for the contact. Seetable 2.2 for a sample of these contact records.

id date source cost

9652546 2000-07-20 00AKMIHA28 0.2800000

9430679 2000-07-07 00AXKKAPAU 0.3243999

9455908 2000-07-07 00AXKKAPAU 0.3243999

11303542 2000-07-07 00AXKKAPAU 0.3243999

11305422 2000-01-14 00CS31A489 0.2107999

11261005 2000-01-14 00CS31A489 0.2107999

.. .. .. ..

11335783 2005-09-01 06ZONAAMGE 0.4068198

11303930 2005-09-01 06ZONAAMGE 0.4068198

Table 2.2: Contact Records

According to May et al. (2008), ‘the full data set, including 1 million cus-tomers, 17 years of transaction and contact history, and contact costs, willbe released for general research purposes’, and should become available athttps://www.directworks.org/Educators/Default.aspx?id=632. The compe-tition data set represents therefore only a small subset of the complete avail-able data that has been provided by the NPO after the competition.

2.3 Game Plan

Before starting out with the model building, an in-depth exploratory analysisof the data set is performed, in order to gain a deeper understanding of its

Page 14: Stochastic Models of Noncontractual Consumer Relationships

CHAPTER 2. DMEF COMPETITION 9

key characteristics. Various visualizations provide a comprehensive overviewof these characteristics and help comprehend the outcomes of the modelingprocess.

As mentioned above, our main emphasis is on winning task 2, i.e. on findingthe ‘best’ forecast model that will subsequently provide the lowest MSLE forthe target period. But of course no data for the target period is availablebefore the deadline of the competition, and therefore we have to split theprovided data into a training period and a validation period. The trainingdata is used for calibrating the model and its parameters, whereas the valida-tion data enables us to compare the forecast accuracy among the models. Bychoosing several different lengths of training periods, as has also been doneby Schmittlein and Peterson (1994), Batislam et al. (2007) and Hoppe andWagner (2007), we can further improve the robustness of our choice. Afterpicking a certain model for the competition, the complete provided data setis used for the final calibration of the model.

Despite the fact that a strong causal relation between contacts and actualdonations can be assumed, we will not include the contact data into ourmodel building. The main reason is that such data is not available for thetarget period and also cannot be reliably estimated. Therefore, we implic-itly assume that direct marketing activities will have a similar pattern asin the past and simply disregard this information. The same assumption isbeing made regarding all other possible exogenous influences, such as com-petition, advertisement, public opinion, and so forth, due to the absence ofsuch information.

All the probabilistic models under investigation try to model the purchaseopportunity as opposed to the actual purchase amount.6 The amount perdonor is estimated in a separate step and is simply multiplied with the es-timated number of future purchases (see section 6.4.1). This approach isfeasible, if we assume independence between purchase amount and purchaserate, respectively between purchase amount and defection rate (Schmittleinand Peterson, 1994, p. 49).

Providing an estimate for task 3 is directly derived from task 2. This is doneby assuming that any customer with an estimated number of purchases of0.5 or higher will actually make a purchase within the target period. Task 1could be deduced from task 2 as well by simply building the sum over allindividual estimates.

6Donations and purchases as well as donors and consumers or clients will be referredto as synonymously within this thesis.

Page 15: Stochastic Models of Noncontractual Consumer Relationships

CHAPTER 2. DMEF COMPETITION 10

All of our following calculations and visualizations are carried out with thestatistical programming environment R (R Development Core Team, 2008),which is freely available, well documented, widely used in academic research,and which further provides a large repository of additional libraries. Unfor-tunately, the presented probabilistic models are not yet part of an existentlibrary. Hence, the programming of these models needs to be done by our-selves. But thanks to the published estimates regarding the CDNOW dataset7 within the originating articles we are able to verify the correctness ofour implementations.

7http://brucehardie.com/notes/008/

Page 16: Stochastic Models of Noncontractual Consumer Relationships

Chapter 3

Exploratory Data Analysis

In this chapter an in-depth descriptive analysis of the contest data set isundertaken. Several key characteristics are being outlined and concisely vi-sualized. These findings will provide valuable insight into the succeedingmodel fitting process in chapter 4.

3.1 Key Summary

No. of donors 21,166Cohort time length 6 monthsAvailable time frame 4 years 8 monthsAvailable time units daysNo. of zero repeaters: absolute; relative 10,626; 50.2%No. of rep. donations: mean; sd; max 1.55; 2.93; 55Donation amount: mean; sd; max $39.31; $119.32; $10,000Time between donations: mean; sd; max 296 days; 260 days; 1626 daysTime until last donation: mean; sd 460 days; 568 days

Table 3.1: Descriptive Statistics

The data set consists of a rather large, heterogeneous cohort of donors.Heterogeneity can be observed in the donation frequency, in the donationamount, in the time laps between succeeding donations, and in the overallrecorded lifetime.

11

Page 17: Stochastic Models of Noncontractual Consumer Relationships

CHAPTER 3. EXPLORATORY DATA ANALYSIS 12

On the one hand, the majority (50.2%) did not donate at all after their initialdonation. On the other hand, some individuals donated very frequently, upto 55 times. The amount per transaction ranges from as little as a quarter ofa dollar up to $10,000. And the observed standard deviation of the amountis 3 times larger than its mean. These simple statistics already make it clearthat any model that is being considered to fit the data should be able toaccount for such a kind of heterogeneity.

It can also be noted that the covered time span of the records is considerablylong (like is the target period of 2 years). This implies that people who arestill active at the end of the 4 year and 8 month period are rather loyal, long-term customers. But it also means that assuming stationarity regarding theunderlying mechanism and thereby regarding the model parameters mightnot prove true.

Various Timing Patterns

| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |

|

|

| | | | | | |

| |

|

|

|

| |

| | |

| | | | | | | | | | | | | | | |

| | | | |

Don

or ID

Time Scale

10870988

11259736

11270451

11281342

11292547

11303989

11317401

11329984

11343894

11359536

11371770

11382546

2002 2003 2004 2005 2006

Figure 3.1: Timing Patterns for 12 Randomly Selected Donors

An important feature of the data set is that donation (as well as contact)records are given with their exact timing, and they are neither aggregatedto longer time spans nor condensed to simple frequency numbers. Thereforethe information of the exact timing of the donations can and also should beused for our further analysis. A first ad-hoc visualization (see figure 3.1) of12 randomly selected donors already displays some of the differing charac-teristic timing patterns. These patterns range from single-time donors (e.g.

Page 18: Stochastic Models of Noncontractual Consumer Relationships

CHAPTER 3. EXPLORATORY DATA ANALYSIS 13

ID 11259736), over sporadic donors (e.g. ID 11359536) to regular donors whohave already defected (see ID 10870988 at the bottom of the chart). Thus,the high number of single-time donors and also the observed defection of reg-ular donors suggests that models should be considered in particular whichcan also account for such a defection process.

3.2 Distribution of Individual Donation Be-

havior

1 2 3 4 5 6 7 8+

Distribution of Numbers of Donations

# Donations

# D

onor

s

040

0080

0012

000

50.2%

16.9%10.8%

7.6% 6.3%2.6% 1.6% 3.9%

Figure 3.2: Histogram of Number of Donations per Donor

Figure 3.2 displays once more the aforementioned 50.2% of single-time donors,i.e. donors who have never made any additional transaction after their initialdonation in the first half of 2002. Aside from these single-time donors, a fur-ther large share of donors must be considered as ‘light’ users. In particular42% donate less than 6 times which corresponds to an average frequency ofabout or even less than once a year. And only as little as 8% of the cus-tomer base (in total 1733 people) can be considered frequent donors, with 6or more donations. However, these 8% actually account for over half of thetransactions (51,5%) in the last year of the observation period, and thereforeare of great importance for our estimates into the future.

It it is important to point out that a low number of recorded donations canresult from two different causes. Either this low number really stems from a(very) low donation frequency, i.e. people just rarely donate. Or this stemsfrom the fact that people defected, i.e. turned away from the NPO and will

Page 19: Stochastic Models of Noncontractual Consumer Relationships

CHAPTER 3. EXPLORATORY DATA ANALYSIS 14

not donate at all anymore. An upcoming challenge will be to distinguishthese two mechanism within the data.

0.00

0.05

0.10

0.15

0.20

0.25

0.30

Distribution of Donation Amounts

Donation Amount − logarithmic scale

Rel

ativ

e F

requ

ency

0.25 1 2 3.5 6 10 18 32 57 110 235 500 1200 3000 10000

5 100

1520

50

10

25

Figure 3.3: Histogram of Dollar Amount per Donation

Figure 3.3 plots the observed donation amounts. These amounts vary tremen-dously, and range from as low as a quarter of a dollar up to a single generousdonation of $10,000. A visual inspection of the figure indicates that the over-all distribution follows, at least to some extent, a log-normal distribution,1

but with its values being restricted to certain integers. Particularly 89% ofthe 53,998 donations are accounted by some very specific dollar amounts,namely $5, $10, $15, $20, $25, $50 and $100. The other donation amountsseem to play a minor role. Though, special attention should be directed tothose few large donations, because the 3% of donations that exceed $100actually sum up to 30% of the overall donation sum.

In figure 3.4 a possible relation between the average amount of a single do-nation and the number of donations per individual is inspected.2 As we cansee, single time donors as well as very active donors (7+) tend to spend a

1The dashed gray line in the chart represents a kernel density estimation with a broadbandwidth.

2Note: The widths of the drawn boxes in the chart are proportional to the square rootsof the number of observations in the corresponding groups.

Page 20: Stochastic Models of Noncontractual Consumer Relationships

CHAPTER 3. EXPLORATORY DATA ANALYSIS 15

1 2 3 4 5 6 7 8+

020

4060

8010

0

Conditional Distribution of Donation Amounts

# Donations

Ave

rage

Don

atio

n A

mou

nt

Figure 3.4: Distribution of Average Donation Amounts grouped by Number ofDonations per Donor

little less money per donation. A result that seems plausible, as single timedonors rather ‘cautiously try out the product’ and heavy donors spread theiroverall donation over several transactions. Nevertheless, the observed corre-lation between these two variables is minimal and will be neglected in thefollowing.

3.3 Trends on Aggregated Level

This section analyzes possible existing trends within the data on an aggre-gated level by examining time series. Most of the charts that are presentedin the following share the same layout. The connected line represents theevolution of the particular figures for the quarters of a year, and the horizon-tal lines are the averages over 4 of these quarters at a time. The time seriesare aggregated to quarters instead of tracking the daily movements in orderto reduce the noise within these figures and to help identify the long-termtrends. The displayed percentage changes indicate the change from one yearto the next, whereas these averages cover the second half of one year and thefirst half of the next year. This shifted year average has been chosen, since

Page 21: Stochastic Models of Noncontractual Consumer Relationships

CHAPTER 3. EXPLORATORY DATA ANALYSIS 16

the covered time range of the competition data ends slightly after the secondquarter in 2006.

2002 2003 2004 2005 2006 2007

0e+

002e

+05

4e+

05

Donation Sum

Time

+8% −24% −3%

Figure 3.5: Trend in Overall Donation Sum

Inspecting the evolution of overall donation sums (figure 3.5) directly revealsvarious interesting properties. First of all, it is apparent that donations showa sharp decline immediately after the second quarter in 2002. This observeddrop is plausible, if we recall that our cohort has actually been built bydefinition of new donors from the first half of 2002 and that on average onlya few following donations are being made. Further, it can be stated that thedata shows a strong seasonal fluctuation with the third quarter being theweakest, and the fourth and first quarter being the strongest periods. Abouttwice as many donations occur during each of these strong quarters thanduring the third quarter. It also seems that there is a downward trend indonation sums. But the speed of this trend remains ambiguous, if a look atthe corresponding percentage changes is taken. At the beginning an increaseof 8% is recorded, then a sharp drop of 24%, which is followed by a moderatedecrease of 3% over the last year. Task 1 of the competition is the estimationof the future trend of these aggregated donation sums for the next two years.Considering the erratic movements this is quite a challenge.

The overall donation sum is the result of the multiplication of the numberof donations with the average donation amount. Figure 3.6, which separatesthese two variables, provides some further insight into the decomposition ofthe overall trend. The time series for the number of donations also displaysa strong seasonality, which has a peak around the Christmas holidays. Thecontinuous downward trend (-13%, -15%, -14%) in the transaction numbersis considerably stable and hence predictable. A simple heuristic could, forexample, assume a constant decreasing rate of 14% for the next two years.As has been noted in the preceding section, this downward trend can eitherbe the result from a decreasing donation frequency for each donor or might

Page 22: Stochastic Models of Noncontractual Consumer Relationships

CHAPTER 3. EXPLORATORY DATA ANALYSIS 17

2002 2004 2006

040

0080

00

# Donations

Time

−13% −15% −14%

2002 2004 2006

010

2030

4050

Avg Donation Amount

Time

+24% −10% +12%

Figure 3.6: Trend in Number of Donations and Average Donation Amount

stem from an ongoing defection process. Figure 3.7 indicates that ratherthe latter of these two effects is dominant. The number of active donorsis steadily decreasing,3 whereas the average number of donations per activedonor is slightly increasing.

2002 2003 2004 2005

Percentage of Donorswho Have Donated Within that Year

Time

0.0

0.1

0.2

0.3

0.4

0.5

27.8% 29.5%

23.5%18.8%

2002 2003 2004 2005

Average # Donationsper Active Donor

Time

0.0

0.5

1.0

1.5

2.0

1.42 1.46 1.51 1.55

Figure 3.7: Trend in Activity

Due to the stable decline of donation numbers it can be concluded that theerratic movement of the overall sum stems from the up and downs in theaverage donation amounts. The chart on the right hand side of figure 3.6surprisingly also shows seasonal fluctuation, and has no clear overall trendat all, which makes it hard to make predictions into the future.

3Note that we disregard the initial donation for this chart as otherwise the share for2002 would simply be 100%.

Page 23: Stochastic Models of Noncontractual Consumer Relationships

CHAPTER 3. EXPLORATORY DATA ANALYSIS 18

2002 2003 2004 2005 2006 2007

0e+

002e

+05

4e+

05

Donation Sum

Time

+8% −24% −3%

2002 2003 2004 2005 2006 2007

010

000

2500

0

Contact Costs

Time

+25% −16% −33%

2002 2003 2004 2005 2006 2007

020

000

5000

0

# Contacts

Time

−3% −30% −7%

2002 2003 2004 2005 2006 2007

0.0

0.2

0.4

0.6

Avg Contact Cost

Time

+22% +19% −24%

Figure 3.8: Trend in Contacts

A possible explanation for the observed trends and movements might becontained in the contact records which have been provided by the organizingcommittee. Each donation is linked to a particular contact, but certainlynot each contact resulted in a donation. Therefore, it seems logical thatthe amount of contacts and the associated expenses have a strong influenceon the donation sums. The displayed time series from figure 3.8 stronglysupport this assumption. And again, the same seasonal variations in thenumber of contacts as well as in their average costs can be detected as before.Furthermore, the increase in donation sums in 2003/2004 can now be linkedto the tremendous increase of 25% in contact spending during that period.On the other hand, the NPO has been able to cut costs in 2005/2006 by33% (mostly due to a 24% drop in average contact costs) without hurtingthe generated contributions.

Unfortunately, it is not possible to take any advantage out of this detectedrelation between donations and contacts for the contest, because no informa-tion regarding the contact activities throughout the target period is available(see section 2.3 for the previous discussion).

Page 24: Stochastic Models of Noncontractual Consumer Relationships

CHAPTER 3. EXPLORATORY DATA ANALYSIS 19

3.4 Distribution of Intertransaction Times0

1000

2000

3000

4000

Overall Distribution of Intertransaction Times

# Months in between Donations

Cou

nt

0 3 6 9 12 15 18 21 24 27 30 33 36 39 42 45 48 51

1

12

24

Figure 3.9: Histogram of Intertransaction Times in Months

The disaggregated availability of transaction data on a day-to-day base allowsan inspection of the observed intertransaction times, i.e. the lapsed timebetween two succeeding donations for an individual.4 Figure 3.9 depicts theoverall distribution of this variable. The distribution contains two peaks, thefirst and also highest peak represents waiting times of one month and thesecond peak represents one year intervals. Further, we see that only very fewtimes (1.4%) donations occur within a single month. It seems that there isa dead period of one month, which marks the time until a donor is willingto make another transaction. It is also interesting to note that in 5% of thecases we have a waiting period of more than 24 months and that there areeven values higher than 4 years. This is an indicator that some customerscan remain inactive for a very long period and nevertheless can still possiblybe persuaded to make another donation. This particular characteristic ofthe data set will make it hard to model the defection process correctly in thefollowing, as some long-living customers just never actually defect but arerather ‘hibernating’ and can be reactivated at anytime5.

Figure 3.10 shows that light and frequent donors have a differing distributionof intertransaction times, with the former one donating approximately every

4Also commonly termed as interpurchase times or interevent times.5Compare further the lost-for-good versus always-a-share discussion in Rust, Lemon,

and Zeithaml (2004, p. 112).

Page 25: Stochastic Models of Noncontractual Consumer Relationships

CHAPTER 3. EXPLORATORY DATA ANALYSIS 20

year, and the latter one donating regularly each month. As we will see,this particular observed regularity will play a major role in the upcomingmodeling phase.

015

030

0

Intertransaction Times for Light Donors (2, 3 or 4 Donations)

# Days in between Donations

Cou

nt

0 76 178 292 406 520 634 748 862 976 1103 1243 1383 1524

Yearly Donations (~8%)

8814 Donors , 18352 Donations

040

0

Interpurchase Times for Frequent Donors (5 or more Donations)

# Days in between Donations

Cou

nt

0 76 178 292 406 520 634 749 870 994 1126 1385

Monthly Donations (~10%)

1733 Donors , 14480 Donations

Figure 3.10: Intertransaction Times Split by Frequency

Page 26: Stochastic Models of Noncontractual Consumer Relationships

Chapter 4

Forecast Models

4.1 NBD Model

4.1.1 Assumptions

As early as 1959, Andrew Ehrenberg1 published his seminal article ‘ThePattern of Consumer Purchase’ (Ehrenberg, 1959), in which he suggested thenegative binomial distribution (abbr. NBD) as a fit to aggregated count dataof sales of non-durable consumer goods.2 Since then Ehrenberg’s paper hasbeen cited numerous times in the marketing literature and various modelshave been derived based upon his work, proving that his assumptions arereasonable and widely applicable.

Besides the sheer benefit that a well fitting probability distribution is found,Ehrenberg further provides a logical justification for choosing that particulardistribution. He argues that each consumer purchases according to a Poissonprocess and that the associated purchase rates vary across consumers accord-ing to a Gamma distribution.3 Now, the negative binomial distribution isexactly the theoretical distribution that arises from such a Gamma-Poissonmixture. Table 4.1 summarizes the postulated assumptions of Ehrenberg’smodel.

1See http://www.marketingscience.info/people/Andrew.html for a brief summary ofhis major achievements in the field of marketing science.

2In other words, a discrete distribution is proposed that is supposed to fit the datadisplayed in figure 3.2 on page 13.

3Actually, he assumed a χ2-distribution in Ehrenberg (1959) but this is simply a specialcase of the more general Gamma distribution.

21

Page 27: Stochastic Models of Noncontractual Consumer Relationships

CHAPTER 4. FORECAST MODELS 22

A1 The number of transactions follows a Poisson processwith rate λ.

A2 Heterogeneity in λ follows a Gamma distribution withshape parameter r and rate parameter α across cus-tomers.

Table 4.1: NBD Assumptions

In order to support the reader’s understanding of the postulated assump-tions, visualizations of the aforementioned distributions are provided in fig-ure 4.1, 4.2 and 4.3 for various parameter constellations.

The Poisson distribution is characterized by the relation that its associatedmean and also its variance are equal to the rate parameter λ. Further, itcan be shown that assuming a Poisson distributed number of transactions isequivalent to assuming that the lapsed time between two succeeding transac-tions follows an exponential distribution. In other words, the Poisson processwith rate λ is the respective count process for a timing process with indepen-dently exponential distributed waiting times with mean 1/λ (Chatfield andGoodhardt, 1973).

The exponential distribution itself is a special case of the Gamma distributionwith its shape parameter being set to 1 (see the middle chart in figure 4.3).An important property of exponentially distributed random variables is thatit is memoryless. This means that any provided information about the timesince the last event does not change the probability of an event occurringwithin the immediate future.

P (T > s + t | T > s) = P (T > t) for all s, t ≥ 0.

For the mathematical calculations such a property might be appealing, be-cause it simplifies some derivations. But applied on sales data, this impliesthat the timing of a purchase does not depend on how far in the past thelast purchase took place. A conclusion that is quite contrary to commonintuition which would rather suggest that nondurable consumer goods arepurchased with certain regularity. If a consumer buys for example a certaingood, such as a package of detergent, he/she will wait with the next purchaseuntil that package is nearly consumed. But the memoryless property evenfurther implies that the most likely time for another purchase is immediatelyafter a purchase has just occurred (Morrison and Schmittlein, 1988, p. 148).4

4This can also be depicted from the middle chart of figure 4.3, as the density function

Page 28: Stochastic Models of Noncontractual Consumer Relationships

CHAPTER 4. FORECAST MODELS 23

0 1 2 3 4 5 6 7 8 9

0.0

0.1

0.2

0.3

0.4

r = 1p = 0.4

0 1 2 3 4 5 6 7 8 9

0.0

0.1

0.2

0.3

0.4

r = 1p = 0.2

0 1 2 3 4 5 6 7 8 9

0.0

0.1

0.2

0.3

0.4

r = 3p = 0.5

Negative Binomial Distribution

Figure 4.1: Probability Mass Function of the Negative Binomial Distribution forDifferent Parameter Values

0 1 2 3 4 5 6 7 8 9

0.0

0.1

0.2

0.3

0.4

lambda = 0.9

0 1 2 3 4 5 6 7 8 9

0.0

0.1

0.2

0.3

0.4

lambda = 2.5

0 1 2 3 4 5 6 7 8 9

0.0

0.1

0.2

0.3

0.4

lambda = 5

Poisson Distribution

Figure 4.2: Probability Mass Function of the Poisson Distribution for DifferentParameter Values

0 2 4 6 8 10

0.0

0.1

0.2

0.3

0.4

0.5

shape = 0.5rate = 0.5

0 2 4 6 8 10

0.0

0.1

0.2

0.3

0.4

0.5

shape = 1rate = 0.5

0 2 4 6 8 10

0.0

0.1

0.2

0.3

0.4

0.5

shape = 2rate = 0.5

Gamma Distribution

Figure 4.3: Probability Density Function of the Gamma distribution for DifferentParameter Values

Page 29: Stochastic Models of Noncontractual Consumer Relationships

CHAPTER 4. FORECAST MODELS 24

Nevertheless, the Poisson distribution has proven to be an accurate modelfor a wide range of applications, like the decay of radioactive particles, theoccurrence of accidents or the arrival of customers in a queue. But in allthese cases the memoryless property withstand basic face validity checks. Itseems plausible for example that the particular arrival time of one customerin a queue is absolutely independent of the arrival of the next customer, asthey both do not interact with each other. The fact that a customer has justarrived does not influence the arrival time of the next one. Therefore, it canbe argued that queuing arrivals are indeed a memoryless process.

But, as has been argued above, this is not the case for purchases of non-durable consumer goods for an individual customer. The regularity of con-sumption of a good does lead to a certain extent of regularity regarding itspurchases. Ehrenberg has been aware of this defect (Ehrenberg, 1959, p. 30)but simply required that the observed periods should not be ‘too short, sothat the purchases made in one period do not directly affect those made inthe next’ (ibid., p. 34).

Assumption A2 postulates a Gamma distribution for the distribution of pur-chase rates across customers, in order to account for heterogeneity. If thedifferent possible shapes of this two-parameter continuous probability are be-ing considered, then it is safe to state that such an assumption adds somesubstantial flexibility to the model. But besides the added flexibility and itspositive skewness no behavioral story is being provided in Ehrenberg (1959)in order to justify the choice of the Gamma distribution.

Nevertheless, Ehrenberg applies a powerful trick by explicitly modeling het-erogeneity. He utilizes information of the complete customer base for model-ing on an individual level. He thereby takes advantage of the well-establishedregression to the mean phenomenon. ‘[We] can better predict what the per-son will do next if we know not only what that person did before, but whatother people did’ (Greene, 1982, p. 130 reprinted from Hoppe and Wagner,2007, p. 80). Schmittlein et al. (1987, p. 5) similarly stated that ‘whilethere is not enough information to reliably estimate [the purchase rate] foreach person, there will generally be enough to estimate the distribution of[it] over customers. [..] This approach, estimating a prior distribution fromthe available data, is usually called an empirical Bayes method’.

So, despite a possibly violated assumption A15 and a somewhat arbitraryassumption A2, the negative binomial distribution proves to fit empirical

reaches its maximum for value zero.5See section 6.1 and also Herniter (1971) for some further empirical evidence.

Page 30: Stochastic Models of Noncontractual Consumer Relationships

CHAPTER 4. FORECAST MODELS 25

market data very well (Dunn et al., 1983; Wagner and Taudes, 1987; Chatfieldand Goodhardt, 1973).

4.1.2 Empirical Results

In the following the NBD model is applied on the data set from the DMEFcompetition. First, we will estimate the parameters, then analyze how wellthe model fits the data on an aggregated level, and finally we will calculateindividual estimates.6

Ehrenberg suggests an estimation method for the parameters α and r thatonly requires the mean number of purchases m and the proportion shareof non-buyers p0 (Ehrenberg, 1959). However, with modern computationalpower the calculation of a maximum likelihood estimation (abbr. MLE) doesnot pose a problem anymore. The MLE method tries to find those parametervalues, for which the likelihood of the observed data is maximized. It can beshown that this method has the favorable property of being an asymptoticallyunbiased, asymptotically efficient and asymptotically normal estimator.

The calculation of the likelihood for the NBD model requires two pieces ofinformation per donor: The length of observed time T , and the number oftransactions x within time interval (0, T ]. This time span differs from donorto donor, because the particular date of the first transaction varies across thecohort. It needs to be noted that x does not include the initial transaction,because that transaction occurred for each person of our cohort by definition.As we will see later on, the upcoming models will also require another piece ofinformation for each donor, namely the recency, i.e. the timing tx of the lastrecorded transaction.7 The set of information consisting of recency, frequencyand a monetary value is often referred to as RFM variables and is commonly(not only for probabilistic models) the condensed data base of many customerbase analyses. The layout of the transformed data can be depicted fromtable 4.2. The displayed information is read as followed: The donor withthe ID 10458867 made no additional transactions throughout the observedperiod of 1605 days after his initial donation of 25.42 dollars. Further, donor9791641 made five donations (one initial and four repetitive ones) which sumup to 275 dollars during an observed time span of 1687 days, whereas thelast donation occurred 1488 days after the initial one. That is, the donor did

6Again note that we only model the number of donations for now, and make an assess-ment for the amount per donation in a separate step in section 6.4.1.

7With this notation we closely follow the variable conventions used in Schmittlein et al.(1987) and Fader et al. (2005a).

Page 31: Stochastic Models of Noncontractual Consumer Relationships

CHAPTER 4. FORECAST MODELS 26

not donate during the last 199 days (= T −tx = 1687−1488) of the observationanymore.

id x tx T amt

10458867 0 0 1605 25.42

10544021 1 728 1602 175.00

10581619 7 1339 1592 80.00

.. .. .. .. ..

9455908 0 0 1595 25

9652546 4 1365 1612 450

9791641 4 1488 1687 275

Table 4.2: DMEF Data Converted to RFM

Applying the MLE method on the transformed data results in the followingparameter estimates

r = 0.475 = shape parameter, and

α = 498.5 = rate parameter,

for the DMEF data set, with both parameters being highly significant. Thegeneral shape of the resulting Gamma distribution can be depicted from theleft chart of figure 4.3, i.e. it is reversed J-shaped. This implies that themajority of donors have a very low donation frequency, with the mode beingat zero, the median being 0.00042 and the mean being 0.00095 (= r/α). Interms of average intertransaction times, which are simply the reciprocal val-ues of the frequencies, this result implies an average time period of 1,048 days(=2.9 years) between two succeeding donations, and that half of the donorsare donating less often than every 2,406 days (=6.6 years).8 If we considerthat the majority of donors has not redonated at all during the observationperiod, these long intertransaction times are obviously a consequence of theoverall low observed donation frequencies.

The next step is an analysis of the model’s capability to represent the data.For this purpose the actual observed number of donations are being comparedwith their theoretical counterparts that are calculated by the NBD model.Table 4.3 contains the result.

As can be seen, a nearly perfect fit for the large share of non-repeaters isachieved. However, the deviations of the estimated group sizes increase for

8The median of the Gamma distribution is approximated by generating a large ran-dom sample from the theoretical distribution and subsequently calculating the empiricalmedian.

Page 32: Stochastic Models of Noncontractual Consumer Relationships

CHAPTER 4. FORECAST MODELS 27

0 1 2 3 4 5 6 7+Actual 10,626 3,579 2,285 1,612 1,336 548 348 832

NBD 10,617 3,865 2,183 1,379 918 629 439 1,135

Table 4.3: Comparison of Actual vs. Theoretical Count Data

the more frequent donors, which indicates that the model is not fully able toexplain the observed data.

Attention is now turned to the predictive accuracy of the NBD model onan individual level. For this purpose the overall observation period of 4years and 8 months needs to be split into a calibration period of 3.5 yearsand a validation period of 1 year. Due to the shorter time range for thecalibration, the estimate parameters (r = 0.53, α = 501) are now slightlydifferent compared to our results from above. Subsequently, a conditionalestimate is being calculated for each individual for a one year period. Theseestimates take their respective observed frequencies x and time spans T intoaccount. Table 4.4 displays a small subset of such estimates with x365 beingthe actual number and x365Nbd being the estimated number of transactions.For example, the donor with ID 10581619 donated 6 times within the first3.5 years but only made a single donation in the following year, whereas theNBD model predicted approximately 2.5 donations during that period.9

id x tx T x365 x365Nbd

10458867 0 0 1179.5 0 0.0011

10544021 1 728 1176.5 0 0.4226

10581619 6 1079 1166.5 1 2.5303

.. .. .. .. .. ..

9455908 0 0 1169.5 0 0.0011

9652546 3 1001 1186.5 1 1.2657

9791641 3 777 1261.5 1 1.2657

Table 4.4: Individual NBD Forecasts for a Data Split of 3.5 Years to 1 Year

Table 4.5 contains these numbers in an aggregated form. It compares theactual with the average expected number of donations during the validationperiod split by the associated number of donations during the calibrationperiod. For example, those people that did not donate at all within the first3.5 years donated in average 0.038 times in the following year, whereas theNBD model only predicted an average of 0.001 donations. On the other hand,as can also be depicted from the table, the future donations of the frequent

9Note that the model estimates are not restricted to integer numbers.

Page 33: Stochastic Models of Noncontractual Consumer Relationships

CHAPTER 4. FORECAST MODELS 28

donors are being vastly overestimated. Overall, the NBD model estimates11,088 donations for the 21,166 donors, which is nearly twice as much as theobserved 6,047 donations during the validation period.

0 1 2 3 4 5 6 7+Actual 0.038 0.20 0.43 0.69 0.75 1.06 1.54 2.44

NBD 0.001 0.42 0.84 1.27 1.69 2.11 2.53 4.68

Table 4.5: Comparison of Actual vs. Theoretical Average Number of Donationsper Donor during the Validation Period

A possible explanation for the poor performance of the NBD model is thelong overall time period, in combination with the assumption that all donorsremain active. The upcoming section will present a model that explicitlytakes a possible defection process into account.

4.2 Pareto/NBD Model

4.2.1 Assumptions

In 1987, Schmittlein, Morrison, and Colombo introduced the Pareto/NBDmodel to the marketing science community (Schmittlein et al., 1987). It isnowadays a well known, and well studied stochastic purchase model for non-contractual settings and has even further ‘received growing attention amongresearchers and managers within recent years’ (Fader et al., 2005a, p. 275).

Schmittlein et al. explicitly try to tackle the problem of a nonobservabledefection process. For various reasons existing customers may decide to quita business relation, e.g. stop purchasing a product or buying at a shop. Thereasons can range from a change in personal taste or attitudes, over changesin personal circumstances, such as marriages, newborns, illnesses, or movingto other places, to the very definitive form of defection, namely death. Butregardless of the actual cause, the fundamental problem in a noncontractualcustomer relationship is that the organization will generally not be notifiedof that defection. Hence the organization relies on other indicators to assessthe current activity status.

Building a stochastic model for a nonobservable dropout process on an in-dividual level is a challenging task. Especially if we consider that a dropout can only occur a single time per customer. And even then, it is still

Page 34: Stochastic Models of Noncontractual Consumer Relationships

CHAPTER 4. FORECAST MODELS 29

not possible to verify whether this event has really occurred. Looking at thevarious timing patterns (see figure 3.1 on page 12) gives an impression on theinherent difficulty of estimating which of these donors are still active afterAugust 2006, let alone of building a stochastic parametric model.

But the Pareto/NBD succeeds in solving this dilemma. It uses the samesmart technique like the NBD model already does for modeling individualpurchase frequencies (see end of section 4.1.1), and applies this trick to thedefection process. In particular it assumes some sort of individual stochasticdropout process, and makes assumptions regarding the form of heterogene-ity across all customers at the same time. Thereby, the information of thecomplete customer base can be used for modeling the individual customer.

The assumptions of the Pareto/NBD regarding consumer behavior are sum-marized in table 4.6.10

A1 While active, the number of transactions follows a Pois-son process with rate λ.

A2 Heterogeneity in λ follows a Gamma distribution withshape parameter r and rate parameter α across cus-tomers.

A3 Customer lifetime is exponentially distributed with deathrate µ.

A4 Heterogeneity in µ follows a Gamma distribution withshape parameters s and rate parameter β across cus-tomers.

A5 The purchasing rate λ and the death rate µ are dis-tributed independently of each other.

Table 4.6: Pareto/NBD Assumptions

A1 and A2 are identical with the already presented NBD model and hence thesame concerns regarding these assumptions apply again (see section 4.1.1).

Assumption A3 now postulates an exponentially distributed lifetime with a

10For consistency reasons the ordering and wording of the assumptions is changed com-pared to the originating paper in order to ease comparison with the other models presentedwithin this chapter.

Page 35: Stochastic Models of Noncontractual Consumer Relationships

CHAPTER 4. FORECAST MODELS 30

certain ‘death’ rate µ for each customer. This assumption is justified bySchmittlein et al. because ‘the events that could trigger death (a move, afinancial setback, a lifestyle change, etc.) may arrive in a Poisson manner’(Schmittlein et al., 1987, p. 3). On the one hand, this seems entirely rea-sonable. On the other hand, it is also hard to verify because the event ofdefection is not observable. And even if the event was observable, defectionjust occurs a single time for a customer and therefore reveals hardly anyinformation on the underlying death rate µ. But by making specific assump-tions regarding the distribution of µ across customers (A4) an estimationof the model for the complete customer base becomes feasible. Heterogene-ity is again assumed to follow the flexible Gamma distribution, but withtwo different parameters than for the purchase frequency. And because aGamma-Exponential mixture results in the Pareto distribution, the overallmodel is termed Pareto/NBD model.

Finally, assumption A5 requires independence between frequency and life-time. It is for example assumed that a heavy purchaser has neither a longernor a shorter lifetime expectancy than less frequent buyers. This assumptionis necessary in order to simplify the fairly complex mathematical derivationsof the model. Schmittlein et al. provide some reasoning for this assumptionand Abe (2008, p. 19) present some statistical evidence that λ and µ areindeed uncorrelated.

4.2.2 Empirical Results

Again, we will apply the presented model to the DMEF data set and subse-quently evaluate its forecasting accuracy.

Several different methods for estimating the four parameters r, α, s and β ofour model are available. A two-step estimation method which tries to fit theobserved moments is suggested in Schmittlein et al. (1987) and described indetail in Schmittlein and Peterson (1994, appendix A2). Nevertheless, theMLE method seems to be more reliable for a wide range of data constellations.But despite the ongoing increase in computational power, the computationalburden for calculating the maximum likelihood estimates are still challenging(Fader et al., 2005a, p. 275). The bottleneck is the evaluation of the GaussianHypergeometric function, which is part of the likelihood function, and as suchneeds to be evaluated numerous times for each customer and for each step ofthe numerical optimization procedure. An efficient and fast implementationof that function is essential to make the estimation procedure complete in

Page 36: Stochastic Models of Noncontractual Consumer Relationships

CHAPTER 4. FORECAST MODELS 31

reasonable time11.

Estimating the model parameters requires another piece of information com-pared to the NBD model, which is the actual timing of the last transactiontx.12 Schmittlein et al. (1987) prove that tx is a sufficient information for themodel and that the actual timing of the preceding transactions (t1,..,tx−1) isnot required for calculating the likelihood. This is due to the memorylessproperty of the assumed Poisson process.

The MLE method applied on the DMEF data set results in the followingparameter estimates

r = 0.659, α = 514.651, and

s = 0.471, β = 766.603,

with all four parameters being highly significant. The shape parametersfor both Gamma distributions (r and s) are well below 1 and therefore theresulting distributions of the purchase rate λ and the death rate µ can againbe depicted from the outer left chart of figure 4.3. The resulting average timebetween two transactions (α/r) is 781 days with a standard deviation (α/

√r)

of 634 days and a median of 1,395 days. The corresponding theoreticalaverage lifetime (β/s) across the cohort is 1,629 days (=4.5 years) with astandard deviation (β/

√s) of 1,117 days and a median of 3,785 days (=over

10 years).

Comparing these numbers with the NBD results shows that due to the addeddefection possibility the intertransaction time has dropped from 1,024 daysto 787 days. In other words, most of the active donor wait over two yearsuntil they make another donation. Further, the average donor has a life ex-pectancy of over 4 years, which is nearly as long as the provided time span.These estimates still seem too high in comparison with our findings from theexploratory data analysis. Assessing the theoretical standard deviations, itcan further be concluded that the overall extent of heterogeneity is consid-erably high within the data set. In short, the estimated parameters suggestthat we are dealing with a heterogeneous, long living, rarely donating cohortof donors.

11Many thanks go to Dr. Hoppe, who provided us with a R wrapper package for theimpressively fast Fortran-77 implementation of the Gaussian Hypergeometric functiondeveloped by Zhang et al. (1996). See http://jin.ece.uiuc.edu/routines/routines.html fortheir source code. It was this contribution that made the herewith presented calculationsfeasible for us.

12By convention tx is set to 0, if no (re-)purchase has occurred within time span (0, T ].

Page 37: Stochastic Models of Noncontractual Consumer Relationships

CHAPTER 4. FORECAST MODELS 32

These conclusions indicate that the fitted model does not fully take advantageof the dropout possibility. According to the estimated model, 38.2% of thedonors are still active in the mid of 2006, which is a high number comparedto the 18.8% that actually made a donation in 2005 (see figure 3.7). Onthe other hand, figure 3.9 indicates that there are indeed some donors withintertransaction times of four years and more. In separate calculations, thatare not being presented here, it could be verified that this rather small groupof long-living, ‘hibernating’, ‘always-a-share’ donors has a significant effecton the estimated parameter values. This occurs because the overall modeltries to fit the complete cohort including these outliers altogether.13

But, at what point does a customer finally defect? Maybe the postulatedconcept of activity, which is that a customer can be either active or is lost forgood, is too shortsighted, too simple for the data set? Alternative approachesthat allow customers to switch between several states of activity back andforth, such as Markov Chain models (cf. Jain and Singh, 2002, p. 39 for anoverview), might be more appropriate, especially when we consider the longtime span of the observation period.

Figure 4.4 depicts the estimated distributions for the donation frequency λ aswell as for the estimated death rate µ. The axes on top of the charts displaythe related average intertransaction times respectively the average lifetime,both being measured in number of days. The short vertical line segment atthat top axis represents the corresponding mean value.

0.000 0.005 0.010 0.015 0.020

020

4060

8010

0

Inf 250 125 83.3 62.5 50

shape = 0.66rate = 515

Distribution of Purchase Frequency

0.000 0.005 0.010 0.015 0.020

020

4060

8010

0

Inf 250 125 83.3 62.5 50

shape = 0.47rate = 767

Distribution of Death Rate

Figure 4.4: Estimated Distribution of λ and µ across Donors

13Nevertheless, for our final chosen model, the CBG/CNBD-k, these outliers did notpose a relevant problem anymore and therefore we did not split up the data set in thefollowing.

Page 38: Stochastic Models of Noncontractual Consumer Relationships

CHAPTER 4. FORECAST MODELS 33

Despite the lack of plausibility of the estimated parameters, the questionthat matters most for our purpose is: How well does the Pareto/NBD pre-dict future transactions for the DMEF data set? Did the forecast improvecompared to the NBD model or did we possibly overfit the training data?

For now, we will only reproduce the comparison on an aggregated level intable 4.7. These numbers reveal that for the large share of no-repeatersthe Pareto/NBD surprisingly provides inferior results by making overly op-timistic forecasts. But for all other groups the model succeeds in providinga much closer fit to the actual transaction counts.

0 1 2 3 4 5 6 7+Actual 0.038 0.20 0.43 0.69 0.75 1.06 1.54 2.44

NBD 0.001 0.42 0.84 1.27 1.69 2.11 2.53 4.68Pareto/NBD 0.102 0.23 0.50 0.71 0.91 1.11 1.32 2.24

Table 4.7: Comparison of Actual vs. Theoretical Average Number of Donationsper Donor during the Validation Period

All further assessments of this model’s accuracy are deferred to chapter 5,which provides a detailed, extensive comparative analyses of all presentedmodels.

4.3 BG/NBD Model

4.3.1 Assumptions

18 years after the introduction of the Pareto/NBD model, Fader, Hardie,and Lee (2005a) call attention to the discrepancy between the raised scientificinterest in that model, measured in terms of citations, and the small numbersof actual implementations. They argue that it is the inherent mathematicalcomplexity and the computational burden of the Pareto/NBD that keepspractitioners from applying it to real world data.

As a solution Fader et al. introduce an alternative model which makes aslightly different assumption regarding the dropout and termed it the Beta-geometric/NBD (abbr. BG/NBD) model. They succeed in simplifying themathematical key expressions of the model and further demonstrate that animplementation is nowadays even possible with standard spreadsheet appli-

Page 39: Stochastic Models of Noncontractual Consumer Relationships

CHAPTER 4. FORECAST MODELS 34

cations, such as MS Excel.14 Further, they show that despite this changein the assumptions, the accuracy of the resulting fit and the individual pre-dictive strength are for most of the possible scenarios very similar to thePareto/NBD results.

A1 While active, the number of transactions follows a Pois-son process with rate λ.

A2 Heterogeneity in λ follows a Gamma distribution withshape parameter r and rate parameter α across cus-tomers.

A3 Directly after each purchase there is a constant probabil-ity p that the customer becomes inactive.

A4 Heterogeneity in p follows a Beta distribution with pa-rameters a and b across customers.

A5 The transaction rate λ and the dropout probability p aredistributed independently of each other.

Table 4.8: BG/NBD Assumptions

The assumed behavioral ‘story’ regarding the dropout process is modified byFader et al. in that respect that an existent customer cannot defect at anarbitrary point in time but only right after a purchase is being made. Thismodification seems to be plausible to some extent, because the customer ismost likely to have either a positive or a negative experience regarding theproduct or service right after the purchase. And this extent of satisfactionwill have a strong influence on the future purchase decisions.

Assumption A3 claims that the probability p of such a dropout remains con-stant throughout an individual customer lifetime. As such, lifetime measuredin number of ‘survived’ transactions results in a geometric distribution. Thisdistribution can be seen as the discrete analogue to the continuous expo-nential distribution since it is also characterized by being memoryless. Thismeans that the number of already ‘survived’ transactions does not effect thedrop out probability p for the upcoming transaction. This assumption alsoseems reasonable since it is possible to find arguments in favor of high early

14The Microsoft Excel implementation of the BG/NBD model can be downloaded fromhttp://www.brucehardie.com/notes/004/.

Page 40: Stochastic Models of Noncontractual Consumer Relationships

CHAPTER 4. FORECAST MODELS 35

drop out probabilities (e.g. customer is still trying out the product) as well ashigh drop out probabilities later on (e.g. customer becomes tired of a certainproduct and is more likely to switch for something new).15

A4 is an assumption regarding the heterogeneous distribution of the dropoutrate. But as opposed to the death rate µ, the constant drop out probabilityp is bound between 0 and 1, and therefore the Beta distribution which sharesthe same property is considered. As can be depicted from figure 4.5, this dis-tribution is, like the Gamma distribution, also fairly flexible and is definedby two shape parameters. Aside from its provided flexibility no particularjustification for the Beta distribution is being provided. The resulting mix-ture distribution is generally referred to as the Betageometric distribution(BG).

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.5

1.0

1.5

2.0

2.5

a = 0.5b = 0.7

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.5

1.0

1.5

2.0

2.5

a = 1b = 3

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.5

1.0

1.5

2.0

2.5

a = 2b = 5

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.5

1.0

1.5

2.0

2.5

a = 1b = 1

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.5

1.0

1.5

2.0

2.5

a = 1b = 1.5

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.5

1.0

1.5

2.0

2.5

a = 1.5b = 2

Beta Distribution

Figure 4.5: Probability Density Function of the Beta distribution for DifferentParameter Values

Assumption A5 requires independence between the dropout probability andthe purchase frequency. But attention should be paid to the result that theactual lifetime measured in days and not in number of survived purchasesis, compared to the Pareto/NBD, not independent of the purchase frequencyanymore. The more frequent a customer purchases, the more opportunitiesto defect he/she will have, and because of the independence of p are λ the

15Note that the previously made critical remarks regarding the memoryless property re-ferred to the exponentially distributed intertransaction times and not to the exponentiallydistributed lifetimes of the Pareto/NBD model.

Page 41: Stochastic Models of Noncontractual Consumer Relationships

CHAPTER 4. FORECAST MODELS 36

sooner that customer will defect (Fader et al., 2005a, p. 278). Interestingly,this fundamentally different consequence of A5 does not seem to play animportant role in the overall model accuracy.

4.3.2 Empirical Results

The implementation of the BG/NBD model on top of R has been indeed fairlystraightforward, in particular because of the provided MATLAB source codein Fader et al. (2005b) which simply had to be ‘translated’ from one statisticalprogramming environment to another. Also the computation of the maxi-mum likelihood estimation itself finishes far faster than for the Pareto/NBDbecause the Gaussian Hypergeometric function is not part of the optimizedlikelihood function anymore.16

The MLE method produced the following parameter estimates:

r = 0.397, α = 331.8, and

a = 0.777, b = 6.262.

In accordance with the statements of Fader et al. (2005a), the overall char-acteristic of the distribution of transaction frequency λ across donors is notmuch different from the Pareto/NBD model. The corresponding mean isslightly higher (858 days) and the standard deviation slightly lower (546days) for our estimated BG/NBD model.

The dropout probability p varies around its mean a/(a+ b) of 11%. The 11%correspond to an average life time of 9.1 ‘survived’ donations. Consideringthat the average number of donations has been 1.55 times, the underlyingdata seems again to be represented rather poorly. Further, figure 4.6 depictsthe estimated distributions of λ and p and reveals that hardly any of thedonors has a lifetime of less than 5 donations. Again this result is quitecontrary to our findings from the exploratory analysis in chapter 3. It is likelythat the same concerns regarding those problematic long living customers,that have already been raised in section 4.2, apply here too.

Additionally, the simulation results of Fader et al. (2005a, p. 279) show thatthe BG/NBD model has problems mimicking the Pareto/NBD model if thetransaction rate is very low, like it is the case for the DMEF data set. The

16It took about 15 seconds on the author’s personal laptop, which is powered by aIntel Centrino 1.6GHz chip, to complete the calculations for the DMEF data set of 21,166donors.

Page 42: Stochastic Models of Noncontractual Consumer Relationships

CHAPTER 4. FORECAST MODELS 37

0.000 0.005 0.010 0.015 0.020

020

4060

8010

0

Inf 250 125 83.3 62.5 50

shape = 0.4rate = 332

Distribution of Purchase Frequency

0.0 0.2 0.4 0.6 0.8 1.0

02

46

810

Inf 5 2.5 1.7 1.2 1

a = 0.77b = 6.26

Distribution of Drop Out Probability

Figure 4.6: Estimated Distribution of λ and p across Donors

upcoming model will present a variant of the BG/NBD which fortunatelycan solve this issue.

4.4 CBG/NBD Model

4.4.1 Assumptions

The CBG/NBD is a modified variant of the BG/NBD model and has beendeveloped by Daniel Hoppe and Udo Wagner (Hoppe and Wagner, 2007).This variant makes similar assumptions as before but inserts an additionaldropout opportunity at time zero. By doing so it resolves the rather unre-alistic implication of the BG/NBD model that all customers that have not(re-)purchased at all after time zero are still active. Hoppe and Wagner alsoshow that their modification results in a slightly better fit to the publiclyfree available CDNOW data set that has been already used by Fader et al.(2005a) as a benchmark.

Aside from providing this new variant of the BG/NBD Hoppe and Wagneradditionally contribute valuable insight by deriving their mathematic keyexpressions by focusing on counting processes instead of timing processes andthereby can reduce the inherent complexity in the derivations significantly.For this reason the article Hoppe and Wagner (2007) is a highly recommendedreading also in terms of gaining a deeper understanding of the BG/NBDmodel.

Around the same time as Hoppe and Wagner worked on their model, Batis-

Page 43: Stochastic Models of Noncontractual Consumer Relationships

CHAPTER 4. FORECAST MODELS 38

lam, Denizel, and Filiztekin developed the same modification of the BG/NBDand termed it MBG/CBG (Batislam et al., 2007), whereas the letter M standsfor modified. Within this thesis we choose to use the abbreviation CBG/NBDinstead of MBD/NBD when we refer to this kind of variant, because the termCBG adheres a deeper meaning as it abbreviates central variant of the Be-tageometric distribution.

A1 While active, the number of transactions follows a Pois-son process with rate λ.

A2 Heterogeneity in λ follows a Gamma distribution withshape parameter r and rate parameter α across cus-tomers.

A3 At time zero and directly after each purchase there is aconstant probability p that the customer becomes inac-tive.

A4 Heterogeneity in p follows a Beta distribution with pa-rameters a and b across customers.

A5 The transaction rate λ and the dropout probability p aredistributed independently of each other.

Table 4.9: CBG/NBD Assumptions

As can be seen in table 4.9, assumptions A1, A2, A4, and A5 are identical tothe corresponding assumptions of the BG/NBD model. Only assumption A3is slightly modified. It now allows for the aforementioned immediate defectof a customer at time zero. The same constant probability p is used for thisadditional dropout opportunity.

4.4.2 Empirical Results

The BG/NBD assumptions imply that all single-time donors, which repre-sent the majority of the data set, are still ‘active’ despite an inactivity periodof over 4.5 years. Taking this implausible implication into account, it canbe expected that the added dropout opportunity of the CBG/NBD model isnecessary to fit our data structure appropriately.

Page 44: Stochastic Models of Noncontractual Consumer Relationships

CHAPTER 4. FORECAST MODELS 39

Our implementation on top of R results in the following parameter estimates:

r = 1.113, α = 552.5, and

a = 0.385, b = 0.668.

The related estimated distributions of λ and p can be depicted from figure 4.7.

0.000 0.005 0.010 0.015 0.020

020

4060

8010

0

Inf 250 125 83.3 62.5 50

shape = 1.11rate = 552

Distribution of Purchase Frequency

0.0 0.2 0.4 0.6 0.8 1.0

02

46

810

Inf 5 2.5 1.7 1.2 1

a = 0.38b = 0.67

Distribution of Drop Out Probability

Figure 4.7: Estimated Distribution of λ and p across Donors

Comparing this with figure 4.6 from the previous section, we notice the fun-damentally different shape for the distribution of the dropout probability.It has one peak at 1, representing the single-time donors, and one peak at0, representing those loyal, long-living donors which hardly defect at all.The mean number of repetitive donations is now 2.7 times, and seems muchmore realistic in comparison with the estimate of 9.1 donations made by theBG/NBD model. On the other hand, the detected level of heterogeneitywithin life time, measured in terms of the standard deviation of p, increasedfrom 0.11 to 0.34 for the CBG/NBD model at the same time.

Further, the average intertransaction time has dropped from 836 to 496 dayswith the standard deviation remaining at the high level of 524 days. This isa logical effect, since the single-timer donors are now allowed to defect im-mediately and do not bias the donation frequency anymore. The same con-sequence, a higher mean purchase rate together with a higher dropout prob-ability, has been diagnosed by Hoppe and Wagner (2007) for the CDNOWdata set.

If we observe the estimates for the number of active donors at the end ofthe observation period, then the difference between these models become

Page 45: Stochastic Models of Noncontractual Consumer Relationships

CHAPTER 4. FORECAST MODELS 40

even more apparent. The Pareto/NBD states that 38.2% of the donors areactive,17, the CBG/NBD produces a similar estimate of 34.4%, whereas theBG/NBD18 assumes that 94.7% (!) have still not defected in the mid of 2006.

After having analyzed the estimated parameters and their implications, wefind that the CBG/NBD model is better capable of explaining the character-istics of the DMEF data set than the BG/NBD model. As such the relativelynew CBG/NBD model seems to be a valuable contribution to the domain ofstochastic purchase models.

17A donor is assumed to be active by us if her conditional probability of being active ishigher than 0.5.

18A mathematical expression for the probability of a customer being active for theBG/NBD model is given in Hoppe and Wagner (2008, section 4).

Page 46: Stochastic Models of Noncontractual Consumer Relationships

Chapter 5

Model Comparison

This chapter provides an in-depth analysis of the performance of the previ-ously presented models regarding the DMEF data set. First, we will assessthe fit of these statistical models, and secondly determine their forecast ac-curacies.1 Our ultimate aim is to identify the model which will most likelyprovide us with a minimal mean squared logarithmic error for the targetperiod of the contest.

5.1 Parameter Interpretation

Table 5.1 provides a condensed overview of the calculated parameter esti-mates, together with their standard error.2 All of the estimated parametersare highly significant different from zero.

Since these values are just specific parameters of the assumed heterogeneitydistributions, namely of the Gamma and the Beta distribution, a display ofthe key statistical moments of these distributions is essential for interpretingthe results. Table 5.2 displays the distribution of average lifetimes,3 and ta-ble 5.3 the distribution of the mean intertransaction times across the cohortfor each model. As has already been stated in section 4.4, the CBG/NBDmodel seems to be the only model that results in plausible parameter es-

1Generally speaking, a good fit to the data does not automatically guarantee an abilityto extrapolate for new data, i.e. to make forecasts into the future.

2The standard error is returned by the MLE implementation mle2 which is part of theR package bbmle (Bolker, 2008).

3sd abbreviates standard deviation, d stands for days and t for number of transactions.

41

Page 47: Stochastic Models of Noncontractual Consumer Relationships

CHAPTER 5. MODEL COMPARISON 42

NBD Pareto/NBD BG/NBD CBG/NBDr (se) 0.48 (0.01) 0.66 (0.01) 0.40 (0.01) 1.11 (0.05)

α (se) 499 (10) 515 (11) 332 (8) 552 (19)

s (se) 0.471 (0.03)

β (se) 767 (69)

a (se) 0.78 (0.10) 0.38 (0.02)

b (se) 6.26 (1.00) 0.67 (0.04)

Table 5.1: Estimated Model Parameters

timates which do not conflict with our findings from the exploratory dataanalysis phase.

mean median sdNBD ∞ ∞ -

Pareto/NBD 1,629 d 3,785 d 1,117 dBG/NBD 9.1 t 13.3 t 9.0 t

CBG/NBD 2.7 t 3.8 t 3.0 t

Table 5.2: Statistical Summary of Fitted Life Times

mean median sdNBD 1,048 d 2,413 d 723 d

Pareto/NBD 781 d 1,395 d 634 dBG/NBD 836 d 2,324 d 527 d

CBG/NBD 496 d 688 d 523 d

Table 5.3: Statistical Summary of Fitted Intertransaction Times

5.2 Data Fit

The models’ abilities to explain the observed transaction patterns are subjectof this section. This task has already been done partially in the precedingchapter 4, but we will now provide a complete side-by-side comparison of allfour models to gain an accurate overview.

Table 5.4 groups the cohort of 21,166 donors according to their number oftransactions within the complete observed training period of over 4.5 years.The actual size of each of these groups is being compared to the expectedsizes that have been calculated by the distinct models. The closer these

Page 48: Stochastic Models of Noncontractual Consumer Relationships

CHAPTER 5. MODEL COMPARISON 43

0 1 2 3 4 5 6 7+

Actual 10,626 3,579 2,285 1,612 1,336 548 348 832NBD 10,617 3,865 2,183 1,379 918 629 439 1,135

Pareto/NBD 10,642 3,933 2,173 1,358 899 615 430 1,114BG/NBD 10,461 4,248 2,231 1,338 858 574 395 1,060

CBG/NBD 10,647 3,939 2,186 1,368 905 617 429 1,075

Table 5.4: Comparison of Actual vs. Expected Count Data for the CompleteTime Span

numbers are to the actual count data, the better is the model fit, at least onan aggregated level.

A first look at these numbers reveals that all models with the exception of theBG/NBD model nearly perfectly fit the share of single-time donors. Otherthan that, a fairly big mismatch regarding the other groups can be detectedfor all of the models. Interestingly, all models display a bias into the samedirection. The number of donors that re-donate once more (1), and also thenumber of frequent donors (5+) are all overestimated, whereas the remaininggroups (2, 3, 4) are all underestimated.

0 1 2 3 4 5 6 7+

ObservedNBDPareto/NBD BG/NBDCBG/NBD

Actual vs Fitted Frequency of Repeat Transactions

# Transactions

Fre

quen

cy

020

0040

0060

0080

0010

000

χ2NBD = 366.1

χ2Pareto/NBD = 391.5χ2

BG/NBD = 487.2χ2

CBG/NBD = 363.7

Figure 5.1: Fitted Distributions

There are several possible causes for this phenomena. Probably the most

Page 49: Stochastic Models of Noncontractual Consumer Relationships

CHAPTER 5. MODEL COMPARISON 44

apparent one is that the actual group sizes do not decrease gradually. Thedrop in group sizes from 3 to 4 is only 17%, but from 4 to 5 it is a decreaseof 59%. The overly large amount of people who donated 4 times can beexplained by the existence of regular, yearly donors (see section 3.4) in com-bination with an overall time period of 4.66 years. And because none of themodels accounts for any kind of regularity, they are all not capable of fittingthis deviation.

Figure 5.1, which resembles figure 2 of Fader et al. (2005a, p. 281), visualizesthe bias of our four models. Additionally, the chart includes the calculatedχ2 statistics, which can act as a measure for the fit to the actual distribution.According to the ranking of these values, the CBG/NBD model provides thebest fit. Though to our surprise, the much simpler NBD model performsnearly as good as CBG/NBD and clearly outperforms the Pareto/NBD andthe BG/NBD models.

Another assessment of the overall data fit can be made by comparing thecalculated loglikelihood (abbr. LL) values. The higher this value is the betterdoes the model approximate the data. This method has the advantage that itoperates on an individual and not on an aggregated level. Table 5.5 displaysthe results of this comparison.

Rank Model LLI. Pareto/NBD -245,674.2

II. CBG/NBD -245,702.2III. BG/NBD -245,833.0IV. NBD -246,552.5

Table 5.5: Comparison of Calculated Loglikelihood Values

According to this measure, the ranking of the models is different than before.The Pareto/NBD and the CBG/NBD show the best performance, whereasthe BG/NBD is slightly behind and the NBD model finishes last in explainingthe DMEF data.

5.3 Forecast Accuracy

In order to compare the forecast accuracy of several models we need to splitour data set into a calibration period and a validation period. The formeris used to estimate the model parameters and the latter is necessary forassessing the difference between the predicted and the actual values. If not

Page 50: Stochastic Models of Noncontractual Consumer Relationships

CHAPTER 5. MODEL COMPARISON 45

stated otherwise, then we will choose a calibration period of 3.5 years anda validation period of 1 year in the following. In section 5.3.4 we will selectdifferent time splits in order to test the stability of our findings.

Time Split

2002 2003 2004 2005 2006

CalibrationPeriod

ValidationPeriod

Figure 5.2: Default Time Split

As there is no single ‘best’ method to assess the forecast accuracy, severaldifferent techniques and measures are being presented. Ultimately, the er-ror measure defined by the DMEF contest committee will certainly be ourdecision criteria for the final submitted model.

5.3.1 Cumulative Repeat Transactions

One of the basic managerial questions that have been stated in the introduc-tory section of this thesis is ‘How many transactions can I expect from myclientele in the future?’. Although the strength of the investigated models isthe modeling on an individual level, it is also expected that the cumulativenumbers provide a good estimate for the overall transaction volume.

Table 5.6 provides a comparison of these numbers for each model, and showsthat on an aggregated level the CBG/NBD performs best but is still consid-erably off from the actual value (18.7% deviance). The next section will givesome insight on the cause of this misfit of the cumulative estimate.

Actual NBD Pareto/NBD BG/NBD CBG/NBD6,047 11,088 7,351 7,219 7,179

+83.4% +21.6% +19.4% +18.7%

Table 5.6: Comparison of Number of Overall Transactions Within Validation Pe-riod

Page 51: Stochastic Models of Noncontractual Consumer Relationships

CHAPTER 5. MODEL COMPARISON 46

5.3.2 Grouped by Transaction Count

Along the lines of Fader et al. (2005a, p. 281) a visualization of the con-ditional expectations is provided in figure 5.3, together with the associateddata table 5.7. The cohort is grouped by their number of transactions duringthe 3.5 year calibration period. Thereafter, the average predicted numberof transactions during the validation period is compared to the actual aver-age number for each group. The closer the estimates are, the better is theforecast ability of the model.

0 1 2 3 4 5 6 7+Actual 0.04 0.20 0.43 0.69 0.75 1.06 1.54 2.44

NBD 0.001 0.42 0.84 1.27 1.69 2.11 2.53 4.68Pareto/NBD 0.10 0.29 0.50 0.71 0.91 1.11 1.32 2.24

BG/NBD 0.11 0.29 0.49 0.69 0.87 1.05 1.25 2.04CBG/NBD 0.11 0.29 0.48 0.68 0.87 1.05 1.26 2.13Group Size 10,988 3,910 2,683 1,730 731 392 239 493

Table 5.7: Comparison of Actual vs. Predicted avg. Number of Donations Duringthe Validation Period

Despite the surprisingly good data fit of the NBD model, that we observedin section 5.2, the model is not able to extrapolate into the future. Dueto the lack of a defection process, the NBD model simply assumes that thepast transaction frequencies can be applied to the future, and therefore thenumber of transactions are tremendously overestimated.

All the other models provide considerably better and quite similar4 results.

Surprisingly, the deviations of all models display the same direction again.This is a strong indicator of an underlying systematic mechanism that has notbeen taking into consideration by any of these models. First of all, the largegroup of donors with no repetitive donations are more than 3 times overesti-mated, i.e. 0.11 expected transactions versus 0.04 actual transactions. Thispresumably indicates that the defection process has not been modeled cor-rectly, and that too many donors are still being considered active althoughthey have defected long time ago. On the other hand, the number of trans-actions of the frequent donors (6+) are predicted by 10% to 20% too low,indicating that the underestimated defection process goes hand in hand with

4For this reason we do not reproduce the BG/NBD model within figure 5.3 as it wouldclutter the chart. As can be seen from the data table, its numbers are within the closerange of the Pareto/NBD and the CBG/NBD model.

Page 52: Stochastic Models of Noncontractual Consumer Relationships

CHAPTER 5. MODEL COMPARISON 47

Conditional Expectation of Future Transactions

# Transactions in Training Period

Avg

# T

rans

actio

ns in

Val

idat

ion

Per

iod

01

23

45

0 1 2 3 4 5 6 7+

ActualNBDPareto/NBD CBG/NBD

Figure 5.3: Conditional Expectations

an underestimated transaction frequency. This is the very same bias that wehave already concluded in the previous section.

Furthermore, the solid line representing the observed data in figure 5.3 re-veals an unexpected slight bend at group 3 and 4. The average number offuture transactions for the cohort group that donated 3 times seems slightlyhigher than expected and group 4 seems to be slightly too low. A possibleexplanation might again lie in the detected regularity within the donationbehavior. A person who consequently donates once per year will most likelyfall into group 3. This is due to the chosen length of 3.5 years in combinationwith the observed seasonality (see section 3.3) as the strong fourth quarterstarts shortly after the calibration period ends. And such a regular donorwill, unless he/she defects, make exactly one donation within the followingyear. On the other hand, someone who donated 4 times is probably notsuch a regular yearly donor but rather had a higher transaction frequency.If we additionally make the plausible assumption that an irregular donor ismore likely to defect sooner than later, i.e. there is a negative correlationbetween regularity and dropout probability, then this slight bend in the curveis a logical consequence of the chosen time frame and the observed regularity.

Page 53: Stochastic Models of Noncontractual Consumer Relationships

CHAPTER 5. MODEL COMPARISON 48

5.3.3 Individual Level Forecasts

All presented models are capable of making conditional estimates for each ofthe 21,166 donors based upon their individual past transaction records. Butfor each one of them the estimates will likely deviate from the actual valueto some extent. The question is, how do we aggregate these individual errorsinto a single overall figure?

Several measures are common in the referred papers, each one of them havingtheir particular advantages. Probably the most basic form is the mean ab-solute deviation (abbr. MAE). The root mean squared error (abbr. RMSE),which builds the average over the squared individual errors, is also fairlysimple and thus similarly popular. The main obstacle of the RMSE is thatit puts a strong emphasis on the proper fit of all data points including anypotential single outliers. Minimizing the RMSE therefore commonly resultsin a mediocre fit, because it is sensitive to these outliers and does not focuson the dominant patterns of the data. The median of squared errors (see forexample Wubben and von Wangenheim, 2008, p. 88) resolves this issue andis robust regarding these outliers. Fader et al. (2005a, p. 282) interestinglysuggested the correlation between estimated and actual data as a perfor-mance quantity. The correlation is a measure for the linear relation betweentwo variables, and as such only provides information whether two variableschange in unison but not whether these two values are actually close together.Hoppe and Wagner (2007, p. 85) used the geometric mean relative absoluteerror (GMRAE) to evaluate different models. The GMRAE is a relative mea-sure which compares a model with some other particular benchmark model.In their article the NBD model acted as such a benchmark.

Nevertheless, the contest committee decided to use the mean squared loga-rithmic error, which has been defined as followed5

MSLE =∑

i

(log(yi + 1)− log(yi + 1))2 /21.166 (5.1)

=∑

i

(log(

yi + 1yi + 1

))2

/21.166.

The MSLE takes the square of the logarithmic of the relative error, as op-posed to the absolute error. As such it puts much more emphasis on the

5Note, that y depicts the actual donation amount in dollars. For now we assume thateach transaction has the same amount of $1, and use this error measure also to assess theforecasting accuracy regarding the number of transactions.

Page 54: Stochastic Models of Noncontractual Consumer Relationships

CHAPTER 5. MODEL COMPARISON 49

accurate estimate of the dominant group of donors with low transaction vol-umes, and is less sensitive regarding large values.

In a separate simulation study, which generated artificial transaction recordsaccording to the assumptions of the BG/NBD model, we could show thatthe MSLE measure favors forecasts that systematically underestimate. Inparticular, the MSLE could be lowered by another 5% simply by subtracting25% of the individual estimates. This is a quite surprising result, especially aswe know the exact data creating mechanism in this simulation and thereforecan exclude any systematic error. The same effect can also be identified forthe estimates of the DMEF data set. Therefore, we certainly take advantageof this finding, and try to determine an optimal multiplication factor for ourestimates in order to further minimize the MSLE.

One possible explanation for this effect might lie in the following numericalexample: If there is a 50% chance of y = 0 transactions and a 50% chance ofy = 1 transaction occurring, then the naive guess x for the outcome wouldnaturally be y = 0.5·0+0.5·1 = 0.5. This estimate also minimizes the expectedRMSE. But, as can be shown by simple analytical derivatives, the expectedMSLE is minimal for y =

√2 − 1 = 0.414, i.e. for a 17% lower estimate! For

the competition we tried to take advantage of this particular characteristic ofthe MSLE, and applied an ‘optimal’ factor to our estimates (see section 6.4.2for the final model).

Table 5.8 provides a condensed overview of various error measures for thefour presented models. The result of the best model regarding a specificmeasure is printed in bold figures. MSLEopt denotes the ‘optimal’ MSLE thatcan be achieved by applying a multiplication factor (ratio) to the calculatedestimates. The optimal ratio is found a posteriori by simply calculating theMSLE for all ratios with a precision of two digits behind the comma withinthe range (0, 2).

MSLE RMSE MAE Corr MSLEopt (ratio)

NBD 0.1587 0.849 0.415 0.597 0.0901 (0.37)

Pareto/NBD 0.0977 0.653 0.359 0.628 0.0879 (0.66)

BG/NBD 0.0963 0.651 0.362 0.640 0.0880 (0.68)

CBG/NBD 0.0959 0.650 0.360 0.639 0.0878 (0.68)

Table 5.8: Error Measures on Individual Level

The table provides several insights. First of all, we have different rankings fordifferent error measures. There is no single overall best model for the dataset at hand. Regarding the MSLE and the RMSE, the CBG/NBD model

Page 55: Stochastic Models of Noncontractual Consumer Relationships

CHAPTER 5. MODEL COMPARISON 50

performs best. But surprisingly, despite its irritating parameter values, theBG/NBD performs only marginally worse with respect to the MSLE, andeven outperforms all other models in terms of correlation. By multiplyingour results with 0.68 we can further reduce the MSLE by 8%. All in all, theBG/NBD and the CBG/NBD produce very similar estimates.6 But sincethe CBG/NBD results in far more plausible parameter estimates our topchoice for the DMEF competition would currently be the CBG/NBD model(combined with a multiplication factor of 0.68).

5.3.4 Robustness

For the DMEF contest we will ultimately need to calibrate our model basedupon the full length of 4.66 years and forecast the following target period of 2years. In order to gain some confidence regarding our findings from the pre-ceding section we will now try out several different time splits. The followingtable 5.9 contains the results on an individual level as well as on an aggregatelevel (SUM) for a time split of ‘3 years to 1 year’, ‘2.5 years to 1 year’ and‘2.5 years to 2 years’. We will only consider validation periods whose lengthsare multiples of one year in order to diminish problems occurring from thestrong seasonal influence that have already been noticed in section 3.3.

MSLE RMSE MAE Corr MSLEopt SUM

Pareto/NBD 0.1132 0.648 0.409 0.606 0.0969 (0.59) +35%BG/NBD 0.1095 0.636 0.398 0.626 0.0949 (0.61) +31%

CBG/NBD 0.1096 0.636 0.398 0.625 0.0949 (0.61) +31%3 Years Calibration, 1 Year Validation

Pareto/NBD 0.1157 0.672 0.425 0.610 0.1053 (0.67) +19%BG/NBD 0.1157 0.671 0.425 0.613 0.1053 (0.67) +19%

CBG/NBD 0.1160 0.672 0.426 0.610 0.1055 (0.67) +20%2.5 Years Calibration, 1 Year Validation

Pareto/NBD 0.2319 1.189 0.740 0.622 0.1879 (0.56) +28%BG/NBD 0.2323 1.187 0.741 0.625 0.1880 (0.56) +28%

CBG/NBD 0.2331 1.190 0.742 0.622 0.1882 (0.56) +29%2.5 Years Calibration, 2 Year Validation

Table 5.9: Error Measures for several Time Splits

Again, the results show neither a clear winner nor a loser. For all scenarios,

6The correlation between these two estimates is actually 0.998.

Page 56: Stochastic Models of Noncontractual Consumer Relationships

CHAPTER 5. MODEL COMPARISON 51

the optimal adjustment factor is somewhere between 0.56 and 0.67, and inall cases it improves the MSLE significantly.

5.4 Simple Forecast Benchmarks

So far, we have obtained an impression of the comparative performance ofthe presented stochastic models. But how good are these models really?

This section will benchmark the models against a very simple heuristic es-timate and also against a basic linear regression model. As we will see, theresults give a rather disillusioning answer to the raised question.

A basic heuristic estimate is to assign each donor the same number of transac-tions for the following year as in the preceding year, and adjust this estimateby a factor that corresponds to the decrease in contact costs. Figure 3.8from the exploratory data analysis depicts that the contact costs decreasedby 33% within the validation period.7

Additionally, we calibrate a linear regression model, which models the num-ber of future transactions to its past number of transactions, as well as itsmixed effect with recency (= T−tx; see section 4.1.2). For this purpose wehave to further split our calibration period of 3.5 years into a 2.5 year periodfor the input data and a 1 year period for the response variable. This yieldsthe following model.

y = 0.112 + 0.364 · x− 0.0005 · x · (T − tx),

Variable x denotes the number of transactions within the previous 2.5 years,and y is the estimated number of transactions for the following year. Forthose donors, who did not donate at all within the past 2.5 years we furtherassumed that they have defected and as such will not donate again.

Table 5.10 now contains the surprising results for these very simple models.The linear model performs better than all other models regarding the MSLE,the RMSE, the correlation and regarding the optimized MSLE. And also ourheuristic is able to beat the Pareto/NBD model, at least regarding the MSLEand the MAE measure. However, the good performance of the heuristic is

7We assume that managers can assess their contact costs a year ahead, and thereforecan use this information for their managerial heuristic. But even if we did not knowthe exact decrease, we could have guessed that the downward trend in contact costs willcontinue.

Page 57: Stochastic Models of Noncontractual Consumer Relationships

CHAPTER 5. MODEL COMPARISON 52

likely a result of the vastly underestimated transaction numbers and notfrom a good explanation of the overall data structure as can be seen fromthe corresponding correlation and MSLEopt values.

MSLE RMSE MAE Corr MSLEopt SUMHeuristic 0.0962 0.661 0.258 0.615 0.0909 (0.70) -22%

LM 0.0863 0.642 0.262 0.644 0.0861 (0.93) -31%Pareto/NBD 0.0977 0.653 0.359 0.628 0.0879 (0.66) +22%

BG/NBD 0.0963 0.651 0.362 0.640 0.0880 (0.68) +19%CBG/NBD 0.0959 0.650 0.360 0.639 0.0878 (0.68) +19%

Table 5.10: Error Measures for Benchmark Models

Wubben and von Wangenheim (2008) recently published an interesting ar-ticle ‘Instant Customer Base Analysis: Managerial Heuristics Often “Get ItRight” ’ with results along the same line. They demonstrate for several datasets that heuristic assessments by marketing experts can perform as goodas the far more complex probabilistic models, especially when it comes toclassifying the customers according to their activity status.

5.5 Error Composition

The conclusion that a simple linear regression model is able to outperform thefar more complex probabilistic models pushes our motivation to search forfurther improvements regarding our models. Some of the further approachesthat we investigate are:

1. Separating the long-living but rarely donating customers from the co-hort, in order to improve the validity of the estimated parameters. Seethe related remarks in section 4.2.

2. Removing the first and subsequently also the second year from thetransaction records, in order to put a stronger emphasis on more recentdata. All the models implicitly assume stationarity in the parameters,and this assumption might be violated for long histories. Schmittleinet al. (1987, p. 18) suggest to use only two years of data, even if moredata is available to cope with this issue.

3. Scaling the time units from days to months, in order to remove someof the inherent noise in the data (compare figures 3.9 and 3.10).

Page 58: Stochastic Models of Noncontractual Consumer Relationships

CHAPTER 5. MODEL COMPARISON 53

All of these attempts succeed in improving the results of the CBG/NBDmodel with respect to MSLE. However, none of them is able to outperformthe far simpler linear regression model.8

A possible room for improvement might lie in an analysis of the error struc-ture. We need to find out why our models perform so poorly, and especiallydetect those donors who cause the most problems for these models. It can beassumed that there is some underlying systematic mechanism within the er-ror structure, which subsequently would help us in improving our estimates,if we were able to take such a systematic into account.

We first start out by charting the overall distribution of the errors acrossthe cohort and plot a Lorenz curve for this purpose. Figure 5.4 displays thecumulated share of errors with respect to the MSLE against the cumulatedshare of donors. This provides an impression of the (in)equal distribution ofthe MSLE across the 21,166 donors.

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

Lorenz Curve for Individual Errors

Cumulative Share of Donors

Cum

ulat

ive

Sha

re o

f Err

or

Figure 5.4: Lorenz Curve for Individual Errors of 1 Year BG/NBD Forecast

The chart reveals that 50% of the donors account for over 90% of the cumu-lated errors and that 20% account for over 75% of the errors.9 Therefore,only a fraction of the cohort is responsible for the main part of the errors.The natural follow-up question is certainly, which donors are the ones forwhich the models perform so poorly. In order to find an answer to thatquestion, we display the timing patterns of the 10 worst under- as well as 10worst overestimated donors in figure 5.5.

8Regarding the RMSE measure and the correlation, only the rescaling of time helps tosurpass the regression model.

9Which is interestingly quite close to the well-known, far more general 80-20 Paretoprinciple.

Page 59: Stochastic Models of Noncontractual Consumer Relationships

CHAPTER 5. MODEL COMPARISON 54

Timing Patterns for the10 Worst Underestimated Donors

| | | | | | ||||||||||| | ||||| |||||||||

| | | | | | | | | | || | | | | |

| | | | | | | | | || | | | | |

| | | | | | | | | | | | | | || | | | | |

| | | | | | | | | | | | | | || | | | | |

| | | | | | | | | | | | | | || | | | | |

| | | | | | | | | | | | | | || | | | | |

| | | |||||| |

| | | | | || |||| | | |

| | || | | || |

Calibration Period Validation Period

Timing Patterns for the10 Worst Overestimated Donors

| | || | | || | | | | | | | | | | || | | | | | | | | | | ||

| |||| || |||| |||| ||||| |||||| | ||

|| | | | | | | | | | | || | | | | | | | | | | | | | |

| | | | || | | | | ||| | | | | | | |

| | | || || | | | | | | | ||| ||

| | | | | | ||| | || | | | | | | | | |

| || | | | | | | | | | | | | | | | | | | | | || | | | | | | | | | |

| | | | | | | | | || | | | | | | | | | | || | | | | | | | | | |

| | | | | | | | | | | || | | | | |

| | | | | | | ||| | | | |

Calibration Period Validation Period

Figure 5.5: Timing Patterns of Worst Estimates Regarding BG/NBD

The left chart displays those donors for whom the BG/NBD model made toopessimistic estimates regarding their number of transactions. These donorshad a rather low transaction frequency throughout the calibration period,but then started to donate frequently (and also regularly). Unfortunately,the patterns themselves do not provide any hint for this change in behavior,and therefore there is not much that can be done in order to improve theestimates for this kind of pattern change.10

On the contrary, the right chart, which contains those donors who have beenoverestimated, does reveal a highly interesting pattern in the transactiontimings. Basically, all of the displayed donors stopped donating during thecalibration period. But it seems that the stochastic model is not able todetect this defection, otherwise it would not have vastly overestimated thefuture number of transactions. This detected inability is even more astonish-ing since anybody who looks at this chart will conclude by simple intuitionthat these donors have very likely already defected at the end of the calibra-tion period. The reason for this is the apparent regularity within the timings.One might assume that these overestimated donors had some kind of stand-ing order with which the money is transferred each month and that at somepoint the donors decided to cancel that order. Hence, for these donors, aninactive period of 32 days (i.e. one day more than the maximum length of amonth) would already be a strong signal for a change in behavior.

But why are the models not able to detect this change if it is that obvious?

10Possibly more insight can be gained by comparing these patterns with the correspond-ing contact records for these donors.

Page 60: Stochastic Models of Noncontractual Consumer Relationships

CHAPTER 5. MODEL COMPARISON 55

The answer to this question lies within the critical assumption A1, which isbeing shared by all of the presented probabilistic models. A1 postulates thatthe number of transactions follows a Poisson process, which is equivalent tothe statement that the intertransaction times are exponentially distributed.Thus, a timing pattern is modeled which contains absolutely no regularityat all, and is characterized by being completely random and memoryless.Therefore, these models interpret the gap of inactivity at the end of thecalibration period for these regular donors as a ‘longer than average’ but stillnormal intertransaction period.

It is this particular inability of the presented models to incorporate for anyobserved regularity which causes the poor estimates. Recency as well as fre-quency are two important pieces of information in order to assess the criticalstatus of activity, but by additionally taking into account the regularity, re-sults could be vastly improved. This statement does not only hold true forstochastic models but can be generalized to all kinds of RF-based modelsthat try to estimate the state activity for a given cohort.

The following chapter will present an effort to incorporate regularity into theCBG/NBD model.

Page 61: Stochastic Models of Noncontractual Consumer Relationships

Chapter 6

CBG/CNBD-k Model

6.1 Regularity

The following list summarizes the key findings regarding regularity that havebeen identified so far:

• The time between two succeeding donations cannot be considered to-tally random for the DMEF data set. It rather seems that the tim-ing process follows, at least for some of the donors, a certain pattern.This result has been observed during our exploratory data analysis inchapter 3. Firstly, the plot of the observed intertransaction times (seefigure 3.9) shows that there is a dead period of at least one month rightafter a donation, during which hardly anybody makes a following do-nation. Secondly, the figure indicates that there are some donors whoadhere to a monthly rhythm and some who follow an annual rhythm.

• In section 4.1.1, which investigated the NBD assumptions and theirimplications, we pointed out that modeling the negative binomial dis-tribution is equivalent to assuming totally random transaction timings.Such an assumption seems to be violated for certain usage scenarios,in particular for purchase data for goods that are being consumed witha certain regularity.

• And finally, it is demonstrated that the presented models, which areall based upon the NBD model, are indeed unable to fit certain char-acteristics of the data set (see section 5.2). Additionally, they providerather mediocre results when extrapolating into the future as has been

56

Page 62: Stochastic Models of Noncontractual Consumer Relationships

CHAPTER 6. CBG/CNBD-K MODEL 57

shown by comparison to some benchmark models (see section 5.3 and5.4). Section 5.5 identifies that it is in particular the regular donorswho contribute the most to the forecast error.

These results justify that special attention should be directed towards theregularity, and that an attempt to incorporate some kind of regularity intostochastic models should be undertaken.

But, what is regularity and moreover, how can it be measured?

The observed timings can fall anywhere between totally random patterns(i.e. Poisson processes) and ‘clockwork-like’, deterministic patterns (Wheatand Morrison, 1990, p. 87). A regularity measure should therefore provide asingle figure that indicates its location between these two extremes.

A common method to assess the regularity is to fit a Gamma distributionto the observed intertransaction times and subsequently inspect the esti-mated shape parameters. Dunn, Reader, and Wrigley (1983, p. 252) reprintH.C.S. Thom’s approximation of the MLE of the shape parameter r as fol-lowed:

r =14Y −1(1 +

√1 +

43Y ), with (6.1)

Y := log(

arithmetic meangeometric mean

).

Additionally, Wagner and Taudes (1986, p. 243) provide a test statistic andan associated theoretical distribution which enables marketers to adequatelytest whether an observed process is Poisson. If the estimated shape parame-ters for intertransaction timings are close to 1, then the Poisson assumptiondoes not need to be rejected for these customers. This results directly fromthe fact that the corresponding exponential distribution equals the Gammadistribution with shape parameter 1.

But a problem arises, when it comes to applying this measure to real worlddata, because a rather long history of at least 5 or more transactions is re-quired for each individual, otherwise the estimates would be biased. Unfor-tunately, such long transaction records are commonly not available for eachcustomer. Hoppe and Wagner (2007, p. 83) for example applied this testfor the Poisson assumption to purchase data from a catalog retailer and hadto restrict the test to those 10% of the customers with at least 5 purchases.Their calculations showed that for only 5% of those frequent buyers (in ab-solute numbers: for 8 customers) the Poisson assumption had to be rejected.Therefore, the test affirmed them to hold on to the NBD assumption.

Page 63: Stochastic Models of Noncontractual Consumer Relationships

CHAPTER 6. CBG/CNBD-K MODEL 58

The same test for the Poisson assumption is now being applied to the DMEFdata set. Only 8% of all donors had 5 or more donations. For these 1,728donors the shape parameter r has been estimated according to equation 6.1.Its distribution across these donors is displayed in figure 6.1. As we can see,the median of r for these frequent donors is significantly higher than 2, whichis once more a strong indicator for the already detected regularity within thedata.

0 2 4 6 8 10

Distribution of Estimated Gamma Shape Parameters

Shape Parameter r

r = 1 ⇒ Exponential IPTs r = 2 ⇒ Erlang−2 IPTs

Figure 6.1: Distribution of the Estimated Gamma Shape Parameters for the In-tertransaction Times of Donors with 5 or more Donations

Wheat and Morrison (1990) introduce another regularity measure to thefield of consumer behavior. This new measure relaxes the problematic con-straint of long transaction records and thereby allows statements regardinga larger share of the cohort. Wheat and Morrison also assume that theintertransaction times are distributed according to a Gamma distribution,but additionally assume that all customers share the same shape parame-ter r. They define the following simple measure M , which requires only twointertransaction times for each individual (T1, T2).

M =T1

T1 + T2

(6.2)

They show that under the posed assumptions M follows a Beta(r, r) distri-bution. Hence, M is uniformly distributed within interval (0, 1) in the case ofexponentially distributed interevent times (r = 1). The actual estimate for r

is then given by:

r =1− 4 · var(M)

8 · var(M)(6.3)

Page 64: Stochastic Models of Noncontractual Consumer Relationships

CHAPTER 6. CBG/CNBD-K MODEL 59

with var(M) being the estimated variance of M . This estimate of r servesagain as a measure for the observed regularity, but not on an individual levelbut for the regularity of the complete cohort.

Figure 6.2 depicts the respective smoothed histogram of the observed dis-tribution of M for the DMEF data set. Additionally, two theoretical distri-butions for r = 1 and r = 2 are being displayed to ease interpretation of thecurve. This chart is now able to take 33% of the cases into account, as only3 donations are required per individual anymore.

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.5

1.0

1.5

2.0

2.5

Regularity Measure M

M

Den

sity

Actual Distribution of M Distribution of M for r=2Distribution of M for r=1

Figure 6.2: Distribution of Regularity Measure M for the Intertransaction Timesof Donors with 3 or more Donations

Because the (smoothed) histogram is not uniformly distributed but displays ahigh peak at 0.5, it is once more shown that the observed data does not followthe Poisson assumption. Furthermore, equation 6.3 results in an estimate forr of 2.1.

Concluding our findings regarding regularity within the DMEF data set,we have to reject the assumption that the intertransaction times follow anexponential distribution. But both figures, 6.1 as well as 6.2, already suggesta possible alternative distribution. Namely the Gamma distribution with ashape parameter of 2 instead of 1, a distribution that is commonly known asthe Erlang-2 distribution.

Page 65: Stochastic Models of Noncontractual Consumer Relationships

CHAPTER 6. CBG/CNBD-K MODEL 60

The family of Erlang-k distributions is a special case of the Gamma distri-bution with the shape parameter being restricted to positive integers. Theshape parameter r is then set to the value k. An Erlang-k distributed variablecan be seen as the sum of k i.i.d.1 variables that follow an exponential distri-bution. Another interpretation is that the interevent times are exponentiallydistributed, but only every k-th event is being observed or counted, thereforethe term censored counting process is used for such models (Chatfield andGoodhardt, 1973, p. 829).

Figure 6.3 displays the distribution of Erlang-k for several different values ofk. The rate parameter has also been set to k for each row, as this results inan equal mean across all four drawn examples and thereby helps comparison.

Erlang−1

0 1 2 3 4 5

0.0

0.4

0.8

| | | || | | | || || || | | | | || | | | || | | | | | | | | | | | ||| | | | | | || | | |

Erlang−2

0 1 2 3 4 5

0.0

0.4

0.8

| | | | | | | | | || | | | | | | | | | | || | | | | | | || | | | | | | || | | | | | | | |

Erlang−3

0 1 2 3 4 5

0.0

0.4

0.8

| | | | | | | | | || | | | | | || | | | | | | | || | | | | | | || | | | | | | | | |

Erlang−100

0 1 2 3 4 5

0.0

0.4

0.8

| | | | | | | | || | | | | | | | || | | | | | | | || | | | | | | | || | | | | | | | |

Figure 6.3: Erlang-k Distributions

The chart gives an idea of the respective shapes for different values of k butalso of the resulting timing patterns, which are drawn on the right hand side.As we can see, it is nearly impossible to distinguish the sample patterns ofa Poisson process (first row) with the patterns resulting from an Erlang-2distribution only by means of visual inspection. Such a task would becomeeven more difficult when we are faced with different rate parameters foreach individual. Even the Erlang-3 samples look totally random despite the

1i.i.d. = independent and identically distributed

Page 66: Stochastic Models of Noncontractual Consumer Relationships

CHAPTER 6. CBG/CNBD-K MODEL 61

clear peak and the dead period for the theoretical distribution. This impliesthat the calculation of the aforementioned regularity measures is necessaryto detect such light ‘hidden’ levels of regularity. The Erlang-100, on theother hand, resembles with its pattern the observed monthly donors that weencountered in figure 5.5 for the DMEF data set.

For several reasons the family of Erlang-k distributions seems to be a goodchoice to incorporate regularity. Firstly, it is possible to model a specificdegree of regularity by setting k according to the observed estimates (seeequation 6.3). Secondly, the Erlang-k distribution is, due to its relation tothe Poisson process, mathematically relative easy to handle, as opposed tothe Weibull or lognormal distribution (Chatfield and Goodhardt, 1973, p.828). And finally, it is possible to describe the following plausible behavioralstory that results in Erlang-k distributed interpurchase times. Even if a userconsumes a certain good in a Poisson manner, i.e. at totally random times,but only every k-th consumption results in a purchase of a new package ofthat good, then the observed waiting time between two purchases will bedistributed according to an Erlang-k distribution.

The following section will postulate a new model variant which assumesErlang-k intertransaction times.

6.2 Assumptions

Table 6.1 displays the respective assumptions of the herewith newly presentedCBG/CNBD-k model. These stated assumptions differ from those of theCBG/NBD model only with respect to A1 and A6.

A1 now postulates the more general Erlang-k distribution for modeling in-tertransaction times. This is opposed to the exponential waiting times thathave been assumed for all other presented stochastic models so far. It isimportant to point out that the integer parameter k is not being estimatedby the model itself, but has to be determined a priori. For the special caseof k=1 the CBG/CNBD-k model is equivalent to the CBG/NBD model.

Assumption A6 needs to be added because the modeled timing process is notmemoryless anymore and depends on the lapsed time since the last transac-tion. Therefore, the timing of the first event can be modeled more accuratelyif the timing of the previous one is known. The cohort of the DMEF data setconsists in particular of those donors who made their first donation withinthe first half of 2002. Because this event is defined as time 0 for all individu-

Page 67: Stochastic Models of Noncontractual Consumer Relationships

CHAPTER 6. CBG/CNBD-K MODEL 62

A1 While active, transactions of customers occur withErlang-k (rate parameter λ) distributed waiting times.

A2 Heterogeneity in λ follows a Gamma distribution withshape parameter r and rate parameter α across cus-tomers.

A3 At time zero and directly after each transaction thereis a constant probability p that the customer becomesinactive.

A4 Heterogeneity in p follows a Beta distribution with pa-rameters a and b across customers.

A5 The transaction rate λ and the dropout probability p aredistributed independently of each other.

A6 The observation period of each individual starts out witha transaction at time zero.

Table 6.1: CBG/CNBD-k Assumptions

als, we postulate A6 accordingly. If the cohort is built by some other criteria,e.g. by the date of first contact, and this date constitutes time 0, then wewould need to adapt A6 consequently. But it needs to be considered, thatsuch a change in the assumption A6 would result in different mathematicalderivations than those that are presented here.

The idea of modeling Erlang-k interpurchase times is not new at all to thefield of consumer behavior. In 1971, Herniter also observed dead periodswithin his histograms of observed interpurchase times. At that time he wasthe first to suggest the family of Erlang distributions for fitting such his-tograms appropriately. By analyzing the ratio of variance to mean, alsoknown as coefficient of variation (CV), Herniter further concluded that anErlang-2 provides the best fit for his data sets.

Two years later Chatfield and Goodhardt (1973) investigated this approachin detail. Firstly, they derived some basic results regarding the probabilitydistribution of the counting process that corresponds to Erlang-2 interpur-chase times. They coined the term condensed Poisson distribution for thisresulting distribution. This naming reflects its close relationship to the Pois-son distribution. But as opposed to the Poisson, its variance is now smaller

Page 68: Stochastic Models of Noncontractual Consumer Relationships

CHAPTER 6. CBG/CNBD-K MODEL 63

than its mean, hence the term condensed has been preceded. Secondly, theyfollowed Ehrenberg (1959) and also assumed a Gamma mixture of purchasefrequencies across the customers. The derived distribution has been termedconsistently condensed negative binomial distribution (CNBD).

It needs to be taken into consideration that Chatfield and Goodhardt (1973)assumed an arbitrary starting time for the counting process, thereby thecondensed Poisson distribution assumes a so called asynchronous countingdistribution (abbreviated to a.c.d.). By contrast, A6 postulates a so calledsynchronous counting distribution (s.c.d.) which arises when the start of thecounting coincides with an event (cf. Haight, 1965). Nevertheless, the nam-ing of the present model intentionally contains the term condensed for tworeasons. One the one hand, the resulting counting distribution is ‘condensed’just as well, if we examine its coefficient of variation. On the other hand, anasynchronous counting had to be assumed for the target period in order tokeep mathematical complexity within limits.

But after Chatfield and Goodhardt applied the CNBD on several data setswith ‘more regular than random’ purchase patterns, they concluded thatthe gained improvement is hardly noticeable and further stated that theadded complexity is not worth the effort for practical uses. This conclusionseems rather surprising, but is justified by the dominance of the Gamma-heterogeneity in comparison to the variance of the individual Poisson distri-butions (Chatfield and Goodhardt, 1973, p. 834). The latter one generallyplays a minor role in explaining the variance. This dominance can be in-spected numerically by decomposing the overall variance (α2r + αr) into thevariance of the Gamma (α2r) and the average variance of the Poisson distri-butions (αr). For example, applied to the DMEF data set (see section 4.1)this calculation reveals that 99.8% of the variance is indeed contributed bythe fitted Gamma distribution.

It can be assumed that it has been Chatfield and Goodhardt’s pessimisticconclusion regarding the poor practicability that kept marketers rather awayfrom further applying the CNBD model. But, Schmittlein and Morrison(1983) demonstrate that the CNBD is indeed able to outperform the NBDmodel significantly, in particular when the number of nonbuyers is large.Furthermore, Chatfield and Goodhardt’ conclusions have been based uponthe observed fit on an aggregated level, whereas our focus within this workis on the disaggregated level. And finally, the importance of an accuratetiming model is considerably higher for forecasting noncontractual relationsthan for simply finding a good fit to the data. In section 5.5 we reasonedthat any information regarding the regularity improves the judgments of the

Page 69: Stochastic Models of Noncontractual Consumer Relationships

CHAPTER 6. CBG/CNBD-K MODEL 64

activity status, which are otherwise solely based upon recency and frequency.And because a defected customer will not make any further transactions atall, no matter how many times he used to purchase before, a misjudgmentregarding the status results in a tremendous error of the predictions on anindividual level.

To the best of our knowledge, this work is the first published attempt to jointhe CNBD with some sort of defection process. This deficiency is surprisingbecause even Schmittlein et al. (1987, p. 18) themselves have already pointedout a possible extension towards the CNBD. Theoretically, any of the NBDbased models can be extended to Erlang-k interevent times. We choosethe CBG/NBD because it provides the best results regarding the contestdata. But also because the model’s derivation is very well documented andtraceable in Hoppe and Wagner (2008), and thereby made the deductions ofthe CBG/CNBD-k actually feasible for us at all in the first place.

The key mathematical expressions of the CBG/CNBD-k model are providedin full detail together with their derivations in appendix A. These derivationsfollow closely the notation and argumentation used in Hoppe and Wagner(2008). Unfortunately, we do not succeed in deriving exact closed formulas ofthe decisive expressions of the unconditional and the conditional expectationsof future transactions. Nevertheless, an approximation is suggested, whichis known to be biased but which still is able to provide superior estimates asthe next section will show.

6.3 Comparison of Models

Following the same proceeding than in chapter 5 ‘Model Comparison’, theperformance of the CBG/CNBD-k is being compared with other models byapplying it to the DMEF data set. The estimate 6.3 of r from the pre-vious section 6.1 suggests that a CBG/CNBD-2 should provide the bestfit, i.e. Erlang-2 intertransaction times are being modeled. Additionally, aCBG/CNBD-3 is also being fitted to the data to see how the results changewhen a stronger degree of regularity is being assumed.

6.3.1 Parameter Interpretation

Table 6.2 contains the estimated parameters when applying the model vari-ants on the full time range of the provided DMEF data set. The new model

Page 70: Stochastic Models of Noncontractual Consumer Relationships

CHAPTER 6. CBG/CNBD-K MODEL 65

CBG/NBD CBG/CNBD-2 CBG/CNBD-3r (se) 1.11 (0.05) 1.83 (0.06) 1.93 (0.05)

α (se) 552 (19) 323 (9) 210 (5)

a (se) 0.38 (0.02) 0.62 (0.02) 0.71 (0.02)

b (se) 0.67 (0.04) 0.76 (0.03) 0.84 (0.03)

Table 6.2: Estimated Model Parameters

uses the same four parameters as the CBG/NBD and the BG/NBD model.The parameters r and α still describe the heterogeneity of the transactionfrequency λ across the cohort, whereas λ is now the rate parameter of theErlang-k distribution, with its expected mean being λ/k. Hence, it is neces-sary to multiply the rate parameter α with the associated integer k, if we wantto make a direct comparison of the distribution of transaction frequencies.

mean median sdBG/NBD 9.1 t 13.3 t 9.0 t

CBG/NBD 2.7 t 3.8 t 3.0 tCBG/CNBD-2 2.2 t 2.4 t 3.1 tCBG/CNBD-3 2.2 t 2.3 t 3.2 t

Table 6.3: Statistical Summary of Fitted Life Times

mean median sdBG/NBD 836 d 2,324 d 527 d

CBG/NBD 496 d 688 d 523 dCBG/CNBD-2 354 d 428 d 478 dCBG/CNBD-3 327 d 392 d 454 d

Table 6.4: Statistical Summary of Fitted Intertransaction Times

Table 6.3 and 6.4 provide the related properties for the modeled life timesand intertransaction times. As can be seen, the expected life time drops evenfurther to an average of 2.2 ‘survived’ transactions, resembling the observedaverage number of donations of 1.55 even closer. Interestingly, the Erlang-2and Erlang-3 assumptions do result in very similar model parameters regard-ing a and b. Along with the drop in life time, a shorter expected intertrans-action period is now being modeled compared to the previous models. Themedian waiting time is now close to a one year period.

All in all, the estimated parameters seem to better represent the underlyingdata and its key characteristics.

Page 71: Stochastic Models of Noncontractual Consumer Relationships

CHAPTER 6. CBG/CNBD-K MODEL 66

6.3.2 Data Fit

Table 6.5 and figure 6.4 give an impression of the CBG/CNBD-k models’ability to fit the DMEF data on an aggregated level.

0 1 2 3 4 5 6 7+

Actual 10,626 3,579 2,285 1,612 1,336 548 348 832CBG/NBD 10,647 3,939 2,186 1,368 905 617 429 1,075

CBG/CNBD-2 10,592 3,952 2,217 1,402 931 633 436 1,023CBG/CNBD-3 10,570 3,998 2,228 1,401 927 628 431 983

Table 6.5: Comparison of Actual vs. Theoretical Count Data

0 1 2 3 4 5 6 7+

ObservedCBG/NBDCBG/CNBD−2 CBG/CNBD−3

Actual vs Fitted Frequency of Repeat Transactions

# Transactions

Fre

quen

cy

020

0040

0060

0080

0010

000

χ2CBG/NBD = 363.7

χ2CBG/CNBD−2 = 302.9

χ2CBG/CNBD−3 = 307.8

Figure 6.4: Fitted Distributions

The calculated χ2 test statistics indicate an improvement in comparison tothe CBG/CNBD model. The drop of χ2 from around 360 to nearly 300 ismainly due to the closer fit of those classes that previously showed the biggestrelative offsets from the actual values. These are group 3, group 4 and thegroup of the heavy donors (7+). Nevertheless, the models are still not quiteable to explain the large size of group 4 which likely stems from the regularyearly donors (see the discussion in section 5.2).

Page 72: Stochastic Models of Noncontractual Consumer Relationships

CHAPTER 6. CBG/CNBD-K MODEL 67

A comparison of the loglikelihood values is slightly more complex because thecalculation requires the exact intertransaction times (t1−t0, t2−t1, . . . , tx−tx−1)as opposed to the timing tx of the last transaction (see section A.4 in theappendix for the exact formulas). Fortunately, this information is availablefor the DMEF data set and thus the following ranking table can be provided.

Rank Model LLI. CBG/CNBD-2 -242,738.5

II. CBG/CNBD-3 -243,924.0III. Pareto/NBD -245,674.2IV. CBG/NBD -245,702.2V. BG/NBD -245,833.0

VI. NBD -246,552.5

Table 6.6: Comparison of Calculated Loglikelihood Values

Hence, the maximized loglikelihood values of the estimated CBG/CNBD-kmodels clearly surpass the related values of the classic models. And amongthe CBG/CNBD-k models the CBG/CNBD-2 provides the best fit with re-spect to this measure, which is also the expected result corresponding to ourassessment of r in the preceding section. In summary, these results provethat the consideration of regularity does indeed provide a better fit to thepresent data set, and that the extra effort is thereby justified.

6.3.3 Forecast Accuracy

As has been demonstrated for the NBD model, a decent fit to the observeddata does not necessarily imply that the model is capable of providing soundestimates for the future. Therefore, the crucial evaluation criterion in ourcontext is again the capability of making such predictions.

Table 6.7 and figure 6.5 compare the model’s estimates with the actual datathroughout the 1 year calibration period.

The data table displays that the CBG/CNBD-2 makes nearly a perfect as-sessment for the large group of donors, who have not made any repetitivedonations within the training period. The new model also outperforms allother so far presented models for group 1 and 2. Unfortunately, the model isneither capable to repair the notable underestimation of the frequent donors(6+) nor capable to explain the bend curve for group 4. In particular, thelatter defect should have been fixed to some extent by incorporating regular-

Page 73: Stochastic Models of Noncontractual Consumer Relationships

CHAPTER 6. CBG/CNBD-K MODEL 68

0 1 2 3 4 5 6 7+Actual 0.04 0.20 0.43 0.69 0.75 1.06 1.54 2.44

CBG/NBD 0.11 0.29 0.48 0.68 0.87 1.05 1.25 2.10CBG/CNBD-2 0.04 0.17 0.38 0.60 0.78 0.95 1.19 2.05CBG/CNBD-3 0.02 0.13 0.33 0.55 0.71 0.86 1.09 1.87

Group Size 10,988 3,910 2,683 1,730 731 392 239 493

Table 6.7: Comparison of Actual vs. Theoretical Average Number of Donationsper Donor during the Validation Period

Conditional Expectation of Future Transactions

# Transactions in Training Period

Avg

# T

rans

actio

ns in

Val

idat

ion

Per

iod

0.0

0.5

1.0

1.5

2.0

2.5

3.0

0 1 2 3 4 5 6 7+

ActualCBG/NBDCBG/CNBD−2 CBG/CNBD−3

Figure 6.5: Conditional Expectations

ity because the bend curve can be attributed to the regular yearly donors.The reason is that the provided mathematical expressions for the future es-timates are not exact. As can be seen in section A.8.5 of the appendix, somesimplifying approximations need to be made in order to make the derivationsfeasible. Among others, the exact duration since the last recorded transac-tion of each donor is neglected and thus the model is unable to simulate theyearly rhythm appropriately.

Finally, table 6.8 contains the most important figures with respect to theDMEF contest.

Both CBG/CNBD-k models considerably outperform all other models withrespect to the DMEF data set. Incorporating regularity therefore results insignificantly improved estimations on a disaggregated level for the presentcase. This statement is affirmed by several different error measures, like theMSLE, the RMSE and the correlation. Additionally, separate calculations

Page 74: Stochastic Models of Noncontractual Consumer Relationships

CHAPTER 6. CBG/CNBD-K MODEL 69

MSLE RMSE MAE Corr MSLEopt SUMHeuristic 0.0962 0.661 0.258 0.615 0.0909 (0.70) -22%

LM 0.0863 0.642 0.262 0.644 0.0861 (0.93) -31%Pareto/NBD 0.0977 0.653 0.359 0.628 0.0879 (0.66) +22%

BG/NBD 0.0963 0.651 0.362 0.640 0.0880 (0.68) +19%CBG/NBD 0.0959 0.650 0.360 0.639 0.0878 (0.68) +19%

CBG/CNBD-2 0.0831 0.632 0.293 0.660 0.0818 (0.84) -11%CBG/CNBD-3 0.0816 0.637 0.275 0.663 0.0814 (0.94) -24%

Table 6.8: Error Measures

have shown that a modification of chosen training and calibration periodlengths does not change the overall ranking either.

The deviance of the cumulative number of estimated transactions suggestthat the CBG/CNBD-k models are likely to be biased and tend to underes-timate the actual number. This can be reasoned by the simplifications thatare made for the derivations. Nevertheless, the calculated optimized MSLEand the correlation numbers indicate that, regardless of this systematic un-derestimation, the CBG/CNBD-2 and CBG/CNBD-3 models are still morecapable of modeling the number of future transactions for each donor.

6.4 Final Model

The details and calculations of the final model, which have been used for ourcontest submission, are presented within this section.

6.4.1 Estimation of Monetary Component

All the presented probabilistic models make predictions for the future numberof transactions. A missing piece for the computation of customer lifetimevalues is therefore the estimation of the donation amounts.

In chapter 3, several characteristics of the observed donation amounts arebeing identified. Firstly, donation amounts vary tremendously across donors.Secondly, donation amounts normally take certain even integer values. Andthirdly, average donation amounts change over time but it is impossible todetect a clear trend.

Several different approaches of estimating donation amounts are tried out

Page 75: Stochastic Models of Noncontractual Consumer Relationships

CHAPTER 6. CBG/CNBD-K MODEL 70

and evaluated with respect to the resulting MSLE for the calibration period.Schmittlein and Peterson (1994, p. 56) propose a model which combinesindividual past amounts with the average over the complete cohort to makeindividual estimates. A much simpler method is to take the last, the averageor the median of the observed donation amounts for each donor as an esti-mate for future transaction values. The calculations for the validation periodindicate that the mean over the past donation amounts provides the best es-timate with the lowest corresponding MSLE measure for the DMEF dataset. Therefore, this simple assessment is combined with the CBG/CNBD-kmodel.

6.4.2 Submission to DMEF Competition

For our final model we choose the CBG/CNBD-2 model for the number offuture transactions,2 and take the past average dollar amounts as an estimatefor each future donation. Additionally, an optimal multiplication factor isdetermined in order to minimize the MSLE (see the related discussion insection 5.3.3 and also the bracketed optimal ratios within table 6.8). Withrespect to the calibration period, the optimal ratio is set to 0.25.

The parameters r, α, a and b of the CBG/CNBD-2 model have been of coursecalibrated by using the complete provided DMEF data set of 4 years and 8months. Subsequently, the number of transactions within the target period of2 years have been estimated for each donor based upon their past transactionrecords (x, tx, T ). Then the number of transactions are being multiplied withthe corresponding average past donation amounts in order to derive a lifetimevalue (for the target period) for each donor. This value is further multipliedby the determined optimal ratio (0.25) in order to produce an estimate thatwill hopefully minimize the MSLE. This results in our submitted estimatesfor task 2 of the contest.

In addition, we simply assumed that any donor with an estimated numberof transactions of more than 0.5 will be actually donating within the targetperiod and that all others will not. This provides our estimates for task 3.

2The idea for the CBG/CNBD-k model emerged only two days prior to the submissiondeadline of the contest. In the remaining limited time span we therefore focused on thespecial case of Erlang-2 distributed intertransaction times. It was only after the contestthat we succeeded in providing the necessary analytical results for the more general Erlang-k case. Nevertheless, later calculations showed that the CBG/CNBD-2 did indeed providethe best estimate among the family of CBG/CNBD-k models.

Page 76: Stochastic Models of Noncontractual Consumer Relationships

CHAPTER 6. CBG/CNBD-K MODEL 71

Finally we simply guess the solution to task 1, which is an estimate of thecumulated donation amounts of all donors, by assessing the further trendin donation sum in figure 3.5. The CBG/CNBD-k model is not being usedfor this purpose, because of the known overall bias which can lead to poorestimates on an aggregated level.

Page 77: Stochastic Models of Noncontractual Consumer Relationships

Chapter 7

Conclusion

Within this thesis we provided a thorough analysis of several popular prob-abilistic purchase models for noncontractual consumer relationships. Theircorresponding assumptions regarding the underlying behavior were presentedand underwent a critical review. All of the presented existent models sharethe same problematic implications of the NBD model with respect to therandomness of the implied transaction timings. This lack of face validity hasbeen disputed ever since Ehrenberg’s first introduction of the NBD model,nevertheless numerous papers concluded that this model is indeed able toexplain observed count patterns in real world data very well. However, ashas been argued in this thesis, the importance of an accurate timing model ismuch higher if forecasts are being made on a disaggregated level in noncon-tractual setting. This is due to the fact, that the current status of activityfunctions as a knock-out criterion for future transactions, and its estimateis therefore crucial for making accurate forecasts. As a consequence, wesuggest to incorporate the regularity within the observed timing patternsinto the model building. By that, the assessment of the significance of anyobserved frequency and especially recency information could be improved.

In the following, a new probabilistic model variant, the CBG/CNBD-k model,was outlined, which allows to account for an arbitrary extent of regularity.We also succeeded in providing exact derivations for several key mathemat-ical expressions, such as the likelihood, the probability distribution of pur-chase frequencies, and the crucial probability of being active. Though, aclosed-form expression of the expected number of transactions could not bededuced, but instead a heuristic approximation was suggested which madethe calculations feasible.

72

Page 78: Stochastic Models of Noncontractual Consumer Relationships

CHAPTER 7. CONCLUSION 73

This newly introduced model was subsequently applied to donation recordsof a US nonprofit organization. This data set was provided by the DirectMarketing Educational Foundation as part of a lifetime value contest. Adetailed exploratory data analysis revealed, among other findings, the inher-ent regularity in the timing patterns. In particular, the presence of donorswho make monthly, and donors who make yearly donations became appar-ent. After fitting all presented models to the provided data set, it could beconcluded that the CBG/CNBD-k model is capable of considerably outper-forming existent models with respect to parameter plausibility, data fitting,and forecasting accuracy.

This finding was further attested by the final outcome of the DMEF contest.Out of 25 participating teams, ranging from software and consulting com-panies to university institutions, the herewith introduced model finished atthe exceptional second place regarding the forecast accuracy on a disaggre-gated level, only marginally behind the winning model. In particular, theCBG/CNBD-k was able to clearly exceed all other participating stochasticmodels.

The presented idea of extending the NBD to the CNBD model can theoreti-cally be carried out to all other NBD-based models, as such a Pareto/CNBD-k, as well as a BG/CNBD-k are thinkable. Although the analytical complex-ity is significantly raised by this extension, it has been shown that also asimplified, biased model is able to improve the forecast quality. A furtherpromising extension could be the modeling of a varying degree of regularityacross the cohort, as has also been noticeable for the DMEF data set.

However, more generally speaking, we hope that we were able to make thecase for incorporating regularity into the consideration when modeling con-sumer behavior, not just for the stochastic kind. Even further, the inherentdynamics and patterns of the actual transaction timings potentially containvaluable information. Therefore it seems negligent to disregard such infor-mation by reducing given transaction data to simple recency and frequencystatistics in the first place.

F

Page 79: Stochastic Models of Noncontractual Consumer Relationships

Appendix A

Derivation of CBG/CNBD-k

A.1 Assumptions

A1 While active, transactions of customers occur with Erlang-k (rate pa-rameter λ) distributed waiting times.

A2 Heterogeneity in λ follows a Gamma distribution with shape parameterr and rate parameter α across customers.

A3 At time zero and directly after each transaction there is a constant prob-ability p that the customer becomes inactive.

A4 Heterogeneity in p follows a Beta distribution with parameters a and b

across customers.

A5 The transaction rate λ and the dropout probability p are distributedindependently of each other.

A6 The observation period of each individual starts out with a transactionat time zero.

These assumptions differ from the CBG/NBD model only regarding the mod-ified assumption A1 and the newly introduced assumption A6.

74

Page 80: Stochastic Models of Noncontractual Consumer Relationships

APPENDIX A. DERIVATION OF CBG/CNBD-K 75

A.2 Erlang-k

The Erlang-k distribution with parameters k and λ is defined by the proba-bility density

fΓ(t|k, λ) = λktk−1e−λt 1(k − 1)!

∀t > 0; k ∈ N+, λ > 0. (A.1)

The Erlang-k is a specialization of the more general Gamma distribution,with the restriction of k being an integer. If k = 1, then we are dealing withthe exponential distribution again.

The Erlang-k distribution can also be seen as the sum of k i.i.d. exponentiallydistributed random variables with parameter λ. Therefore, the correspondingcounting process of events with Erlang-k distributed waiting times can bededuced from the Poisson process straightforward. Under the assumptionthat an event actually occurred at time zero the probability of encounteringx events until time t is

Pk(X(t) = x) =k−1∑j=0

PP(X(t) = kx + j). (A.2)

This result is straightforward if we take a look at figure A.1, which rendersthe relation between a Poisson process (t′0, t

′1, t

′2, ..) and the timing of Erlang-k

(t0, t1, t2, ..). We consider the occurrence of an event as the k-th realizationof an corresponding exponentially distributed process (tx = t′kx). Therefore,the probability of encountering x events until time t, is the sum of the prob-abilities of encountering kx, kx+1, .., kx+k−1 Poisson events.

The remark that we start counting with an event at time zero is important,since we are not dealing with a memoryless process anymore, as has been thecase for exponentially distributed timings. Being memoryless implies thatthe chances of the event to occur within the near future remains constantand is independent of the time that has past sine the last event. On theother hand, the Erlang-k distribution clearly has a peak unequal to 0 (fork > 1). The absence of a memoryless process is thus the reason, why we hadto postulate assumption A6 for our model.

Haight (1965) distinguished between counting processes that start out withan event at time zero and those who do not. He termed them synchronousand asynchronous counting processes. Chatfield and Goodhardt (1973) stud-ied the asynchronous counting of Erlang-k events and termed the resultingprocess condensed Poisson process.

Page 81: Stochastic Models of Noncontractual Consumer Relationships

APPENDIX A. DERIVATION OF CBG/CNBD-K 76

× × × × × × × × ×t0 =0 t1 t2

t

t′0 t′1 t′2 t′3 t′4 t′5 t′6 t′7 t′8

× × × × × × × × ×t′7 t′8

× × × × × × × × ×t′7 t′8

3·2+2

3·2+1

3·2

-

-

-

Figure A.1: Illustration for Erlang-3 distributed interevent times. P3(X(t) = 2)is the probability of encountering 6, 7 or 8 Poisson events.

A.3 Individual Likelihood

The likelihood of parameters λ and p for a particular purchase pattern(t1, . . . , tx, T ) can be deduced analogous to the referred papers. It is the like-lihood of the observed interevent periods (t1 − t0, t2 − t1, . . . , tx − tx−1), timesthe probability of having ‘survived’ time 0 and the first x−1 purchases, timesthe probability of seeing no transaction within (tx, T ]. Whereas the latter canresult from a customer that defected immediately after the last purchase, orfrom a customer whose next transaction simply happens to be after time T .

L(λ, p|t1, . . . , tx, T ) = (1− p)fΓ(t1|k, λ) · · · (1− p)fΓ(tx − tx−1|k, λ) ·{p + (1− p)P (X(T − tx) = 0|k, λ)

}

Inserting the Erlang-k pdf A.1 and our previous result A.2, it follows that

Page 82: Stochastic Models of Noncontractual Consumer Relationships

APPENDIX A. DERIVATION OF CBG/CNBD-K 77

L(λ, p|t1, . . . , tx, T ) =

= (1− p)x · λktk−11 e−λt1

(k − 1)!· · · λ

k(tx − tx−1)k−1e−λ(tx−tx−1)

(k − 1)!

·{

p + (1− p)k−1∑j=0

PP(X(T − tx) = j|λ)

}

= (1− p)xλkxe−λtx

t :=︷ ︸︸ ︷(1/(k−1)!)x(tx−tx−1)k−1 · · · (t1−0)k−1

·{

p + (1− p)e−λ(T−tx)

k−1∑j=0

λj(T − tx)j

j!

}

= t · p(1− p)xλkxe−λtx + t · (1− p)x+1λkxe−λT

k−1∑j=0

λj(T − tx)j

j!(A.3)

An important difference of this result from the likelihood methods of modelswith exponential timing is that we still have the actual timing of the trans-actions t1, ..., tx (which we subsumed into variable t) in our final formula.(x, tx, T ) is therefore not a sufficient statistic anymore for the likelihood. But,as we will see shortly, we do not need these timings for the estimation of theparameters, and therefore actually do not impose any extra requirementsregarding the input data.

A.4 Aggregate Likelihood

In order to take assumptions A2 and A4 regarding the distribution of λ and p

into account, we need to mix in the gamma- and beta-distribution by meansof integration.

L(r, α, a, b|t1, ..., tx, T ) =

= t ·∫ 1

0

∫ ∞

0

p(1−p)xλkxe−λtx fΓ(λ|r, α)fB(p|a, b) dλdp

+ t ·∫ 1

0

∫ ∞

0

(1−p)x+1

(k−1∑j=0

(T−tx)jλj

j!

)λkxe−λT fΓ(λ|r, α)fB(p|a, b) dλdp

(A.4)

Due to assumption A5 we can solve these integrals separately and will forthis purpose use the following definitions and results from Hoppe and Wagner

Page 83: Stochastic Models of Noncontractual Consumer Relationships

APPENDIX A. DERIVATION OF CBG/CNBD-K 78

(2008, section 2):

IΓ(i, j, r, α) :=∫ ∞

0

λie−λjfΓ(λ|r, α) dλ =αr · (r)i

(j + α)r+i(A.5)

IB(i, j, a, b) :=∫ 1

0

pi(1−p)jfB(p|a, b) dp =B(a + i, b + j)

B(a, b)(A.6)

B(a, b) denotes the Beta-Function, and (r)x the Pochhammer’s symbol:

B(a, b) =Γ(a)Γ(b)Γ(a + b)

(A.7)

(r)x =Γ(r + x)

Γ(r)(A.8)

Furthermore, we can easily see by considering Γ(a+1) = aΓ(a) that

B(a + 1, b + x) =a

b + x·B(a, b) (A.9)

(r)x+y =(r + x)y · (r)x (A.10)

holds. Therefore:

L(r, α,a, b|t1, ..., tx, T ) =

= t · IB(1, x, a, b) · IΓ(kx, tx, r, α)

+ t · IB(0, x + 1, a, b) ·(

k−1∑j=0

(T − tx)j

j!IΓ(kx + j, T, r, α)

)(A.11)

= t · (b)x+1

(a + b)x+1

· αr(r)kx

·(

a

b + x

(1

α + tx

)r+kx

+k−1∑j=0

(T − tx)j

j!(r + kx)j

(α + T )r+kx+j

)(A.12)

For the Erlang-2 case this is

L(r, α,a, b|t1, ..., tx, T ) =

= t · (b)x+1

(a + b)x+1

· αr(r)2x

·(

a

b+x

(1

α+tx

)r+2x

+(

1α+T

)r+2x

+ (T−tx)(r+2x)(

1α+T

)r+2x+1)

(A.13)

with t being t1 · (t2 − t1) · · · (tx − tx−1).

Page 84: Stochastic Models of Noncontractual Consumer Relationships

APPENDIX A. DERIVATION OF CBG/CNBD-K 79

A.5 Parameter Estimation

A well-known parameter estimation method, which is under considerablygeneral conditions asymptotically optimal (i.e. unbiased and efficient), is themaximum likelihood estimation (MLE). This method tries to find a parame-ter set (r, α, a, b) at which the likelihood reaches its global maximum for somegiven data (ti,1, ..., ti,x, Ti)i=1..N .

(r, α, a, b) = argmaxr,α,a,b

L(r, α, a, b|(ti,1, ..., ti,x, Ti)i=1..N)

= argmaxr,α,a,b

N∏i=1

L(r, α, a, b|ti,1, ..., ti,x, Ti))

As can be seen, w can now simply disregard the cumulative term ti for theexact timing patterns, since this multiplicative factor has no effect on thelocation of the maximum, i.e. on the estimated parameters. Therefore, wecan remain to (x, tx, T ) as input data for our further calculations.

To circumvent problems with numerical precision, it is common to actuallyoptimize the logarithmic of the likelihood, which transforms the multiplica-tion (of very small numbers) into a sum.

(r, α, a, b) = argmaxr,α,a,b

N∑i=1

log(L(r, α, a, b|ti,1, ..., ti,x, Ti)) (A.14)

A.6 Probability Distribution of Purchase Fre-

quencies

We now try to deduce an expression for P (X(t) = x|r, α, a, b), i.e. the proba-bility distribution of the purchase frequencies conditional on the (estimated)parameters, and will again closely follow the mathematical derivation fromHoppe and Wagner (2008, section 3.3).

For a single customer (with given λ and p) the probability of encounteringx transactions until time t can be split into two cases. Either the customersimply just had x transactions and is still active at time t, or he/she wouldhave had more than x transactions but defected immediately after the x-thpurchase.

P (X(t) = x|λ, p) = (1− p)x+1P (X(t) = x) + p(1− p)xP (X(t) ≥ x) (A.15)

Page 85: Stochastic Models of Noncontractual Consumer Relationships

APPENDIX A. DERIVATION OF CBG/CNBD-K 80

Using P (X(t) ≥ x) = 1− P (X(t) < x) and result A.2, we derive

P (X(t) = x|λ, p) = (1− p)x+1

kx+k−1∑j=kx

PP(X(t) = j)

+ p(1− p)x

(1− δx>0

kx−1∑j=0

PP(X(t) = j)

). (A.16)

Note that we added the Kronecker-Delta, which is 1 for x > 0 and 0 otherwise,to correctly consider the case x = 0 for which the second summation termsimply becomes the dropout probability p at time zero.

Again we mix in our heterogeneity assumptions:

P (X(t) = x|r, α, a, b) =

=∫ 1

0

∫ ∞

0

P (X(t) = x|λ, p)fΓ(λ|r, α)fB(p|a, b) dλdp

=∫ 1

0

(1− p)x+1fB dp

∫ ∞

0

(kx+k−1∑

j=kx

(λt)j

(j)!e−λt

)fΓ dλ

+∫ 1

0

p(1− p)xfB dp

∫ ∞

0

(1− δx>0

kx−1∑j=0

(λt)j

(j)!e−λt

)fΓ dλ (A.17)

and apply the results A.6 and A.5:

P (X(t) = x|r, α, a, b) =

= IB(0, x + 1, a, b) ·(

kx+k−1∑j=kx

tj

j!IΓ(j, t, r, α)

)

+ IB(1, x, a, b) ·(

1− δx>0

kx−1∑j=0

tj

j!IΓ(j, t, r, α)

)

=B(a, b + x + 1)

B(a, b)·(

kx+k−1∑j=kx

tj

j!αr(r)j

(α + t)r+j

)

+B(a + 1, b + x)

B(a, b)·(

1− δx>0

kx−1∑j=0

tj

j!αr(r)j

(α + t)r+j

)(A.18)

Considering the probability distribution for the negative binomial distribu-tion

PNBD(X(t) = j) =tj

j!αr(r)j

(α + t)r+j, (A.19)

Page 86: Stochastic Models of Noncontractual Consumer Relationships

APPENDIX A. DERIVATION OF CBG/CNBD-K 81

we can also write

P (X(t) = x|r, α, a, b) =

=B(a, b + x + 1)

B(a, b)·(

kx+k−1∑j=kx

PNBD(X(t) = j)

)

+B(a + 1, b + x)

B(a, b)·(

1− δx>0

kx−1∑j=0

PNBD(X(t) = j)

). (A.20)

Thus, for the Erlang-2 case this expression is

P (X(t) = x|r, α, a, b) =

=B(a, b + x + 1)

B(a, b)· (PNBD(X(t) = 2x) + PNBD(X(t) = 2x + 1))

+B(a + 1, b + x)

B(a, b)·(

1− δx>0

2x−1∑j=0

PNBD(X(t) = j)

). (A.21)

A.7 Probability of Being Active

As Schmittlein et al. (1987) pointed out, one of the key expressions of modelsof this kind is the probability of a single customer still being active at theend of the observation period, based on his past transaction history. That is,we ask for P (τ > T | t1, .., tx, T, r, α, a, b) with τ being the unobserved customerlifetime.

P (τ > T | t1, .., tx, T, λ, p) = 1− P (τ ≤ T | t1, .., tx, T, λ, p)

= 1− p

P (X(T − tx) = 0)

= 1− p

p + (1− p)∑k−1

j=0 PP(X(T − tx) = j)

=(1− p)

∑k−1

j=0 PP(X(T − tx) = j)

p + (1− p)∑k−1

j=0 PP(X(T − tx) = j)

By expanding this term with t(1 − p)xλkxe−λtx , and comparing the denomi-nator with equation A.3 it follows that

P (τ > T | t1, .., tx, T, λ, p) =t(1− p)x+1λkxe−λT

∑k−1

j=0λj(T−tx)j

j!

L(λ, p | t1, .., tx, T )(A.22)

Page 87: Stochastic Models of Noncontractual Consumer Relationships

APPENDIX A. DERIVATION OF CBG/CNBD-K 82

Building the double integral

P (τ > T | t1, .., tx, T, r, α, a, b) =∫ 1

0

∫ ∞

0

P (τ > T | t1, .., tx, T, λ, p)fΓ(λ | r, α)fB(p | a, b) dλdp (A.23)

and using the following result from Hoppe and Wagner (2008, section 3.2.3)

f(λ, p | t1, .., tx, T ) =L(λ, p | t1, .., tx, T )fΓ(λ | r, α)fB(p | a, b)

L(r, α, a, b | t1, .., tx, T ), (A.24)

yields

P (τ > T | t1, .., tx, T, r, α, a, b) =

=t

L(r, α, a, b | t1, .., tx, T )·∫ 1

0

(1− p)x+1fB(p|a, b) dp

·∫ ∞

0

λkxe−λT

k−1∑j=0

(T − tx)j

j!λjfΓ(λ|r, α) dλ

= t · IB(0, x + 1, a, b) ·k−1∑j=0

(T − tx)j

j!IΓ(kx + j, T, r, α)

/L(r, α, a, b|t1, .., tx, T ). (A.25)

Comparing this with equation A.11, we can see that the numerator is actuallyone of the summation terms of the aggregated likelihood function in thedenominator. And considering A

A+B= (1 + A

B)−1 the fraction can be reduced

to

P (τ > T |t1, .., tx, T, r, α, a, b) =

=

(1 +

t · IB(1, x, a, b) · IΓ(kx, tx, r, α)

t · IB(0, x + 1, a, b) ·∑k−1

j=0(T−tx)j

j!IΓ(kx + j, T, r, α)

)−1

(A.26)

Fortunately, the term t cancels out and therefore, we still do not requirethe information on the exact timing of the transactions for carrying out ourcalculations. We resolve the integral functions, extract common terms and

Page 88: Stochastic Models of Noncontractual Consumer Relationships

APPENDIX A. DERIVATION OF CBG/CNBD-K 83

use the relation (r)kx+j = (r)kx · (r + kx)j and yield

P (τ > T |x, tx, T, r, α, a, b) =

=

(1 +

B(a + 1, b + x)B(a, b + x + 1)

· αr(r)kx

(α + tx)r+kx· (α + T )r+kx

αr(r)kx

/k−1∑j=0

(T − tx)j

j!(r + kx)j(α + T )j

)−1

=

(1 +

a

b + x

(α + T

α + tx

)r+kx

/k−1∑j=0

(T − tx)j

j!(r + kx)j

(α + T )j

)−1

. (A.27)

Thus, for Erlang-2:

P (τ > T |x, tx, T, r, α, a, b) =

=

(1 +

a

b + x

(α + T

α + tx

)r+2x

/

(1 + (r + 2x)

T − tx

α + T

))−1

(A.28)

A.8 Expected Number of Transactions

In order to arrive at a closed form solution for the predicted number oftransactions for a single customer with given purchase history E(Y (T, T +t)|x, tx, T, r, α, a, b), we try to follow the same steps as in Hoppe and Wagner(2008, section 3.5). Unfortunately, we do not succeed. Nevertheless, wecome up with an heuristic approximation, and provide some reasoning forour simplifications. As the calculations for the DMEF competition haveshown, such an approach can still outperform existing models which assumea Poisson process.

A.8.1 Unconditional Expectation for Condensed Pois-son

The expected number of transactions for an active customer with exponen-tially distributed interevent times is known to be E(X(t)|λ) = λt.

The asynchronous counting process for Erlang-2 waiting times has an expec-tation of E(X(t)|λ) = λt/2 (Chatfield and Goodhardt, 1973, p. 829). Simi-larly, we will now prove that the generalization for Erlang-k E(X(t)|λ) = λt/k

Page 89: Stochastic Models of Noncontractual Consumer Relationships

APPENDIX A. DERIVATION OF CBG/CNBD-K 84

also holds true. Let us recall that asynchronous counting for Erlang-k canalso be seen as a censored counting of a Poisson process, where every k-thevent is being counted. As we start the counting independent of a particularevent, the recording of r censored events can either arise from recording rk,rk+1, rk−1,..., rk+k−1 or rk−k+1 uncensored events. Or, if we take a lookat it the other way around, then rk+j (0 ≤ j ≤ k) uncensored events resultin either r (with probability k−j

k) or r+1 (with probability j

k) censored events

to be counted. Therefore

E(X(t)|λ) =∞∑

r=1

rPC(r)

=∞∑

r=1

r

(k−1∑

j=−k+1

k − |j|k

PP(kr + j)

)

=1k

∞∑r=1

krPP(kr)

+k−1∑j=1

(j

k2

∞∑r=1

krPP(kr − k + j) +k − j

k2

∞∑r=1

krPP(kr + j)

)

︸ ︷︷ ︸=:Tj

Tj can be reduced to

Tj =j

k2

(∑(kr−k+j)PP(kr−k+j) +

∑(k−j)PP(kr−k+j)

)

+k−j

k2

(∑(kr+j)PP(kr+j)−

∑jPP(kr+j)

)

=j

k2

(∑(kr−k+j)PP(kr−k+j) +

∑(k−j)PP(kr−k+j)

)

+k−j

k2

(∑(kr−k+j)PP(kr−k+j)− jPP(j)−

∑jPP(kr−k+j) + jPP(j)

)

=1k

∑(kr − k + j)PP(kr − k + j),

and we receive our previously stated result for the unconditional expectednumber for asynchronous counting:

E(X(t)|λ) =1k

∞∑r=1

krPP(kr) +k−1∑j=1

1k

∞∑r=1

(kr − k + j)PP(kr − k + j)

=1k

∞∑r=1

rPP(r) =λt

k(A.29)

Page 90: Stochastic Models of Noncontractual Consumer Relationships

APPENDIX A. DERIVATION OF CBG/CNBD-K 85

A.8.2 Unconditional Expectation for Grouped Poisson

For a synchronous counting process with Erlang-k waiting times the deriva-tion of the expectation is more difficult. Using result A.2, we can deduce

E(X(t)|λ) =∞∑

r=1

rPG(r) =∞∑

r=1

rk−1∑j=0

PP(rk + j)

=1k

k−1∑j=0

( ∞∑r=1

rkPP(rk + j))

=1k

k−1∑j=0

( ∞∑r=1

(rk + j)PP(rk + j)− j

∞∑r=1

PP(rk + j)

︸ ︷︷ ︸∑∞r=0 PP(rk+j)−PP(j)

)

=1k

( ∞∑r=0

rPP(r)−k−1∑r=0

rPP(r)−k−1∑j=0

j( ∞∑

r=0

PP(rk + j)− PP(j)))

=1k

(λt−

k−1∑j=1

j∞∑

r=0

PP(rk + j)).

For k = 2 it is possible to find a simple closed form for the unconditionalexpected number for synchronous counting.

E(X(t)|λ) =12

(λt−

∞∑r=0

PP(2r + 1))

=12

(λt− e−λt

∞∑r=0

(λt)2r+1

(2r + 1)!

)

=λt

2− 1

2e−λt sinh(λt) (A.30)

The result for the synchronous counting process (A.30) differs from the asyn-chronous result (A.29) only by an additional subtraction term that convergesfor Erlang-2 to 1/4 for t →∞. Hence, for a long time horizon we can assessthe error that we make, if we use the simpler formula A.29.

A.8.3 Expectations for Condensed NBD

Schmittlein and Morrison (1983) published some findings regarding the con-densed negative binomial distribution, but only considered the Erlang-2 case.

Page 91: Stochastic Models of Noncontractual Consumer Relationships

APPENDIX A. DERIVATION OF CBG/CNBD-K 86

They state a formula for the higher moments of the unconditional expecta-tion, in particular

E(X|r, α) =r

2α, and (A.31)

Var(X|r, α) =r

4α+

18

[1−

α + 2

)r ]+

r

4α2, (A.32)

but also derived a formula for the conditional expectation. Due to its com-plexity, we will not reproduce this result here, but rather want to point outtwo important characteristic differences to the NBD that Schmittlein andMorrison noted. First, the expected number of future transactions is notlinear regarding the observed number of transactions anymore, and second,the result now does depend on any elapsed time between the observationand the prediction period. Both of these statements already indicate thatderiving a formula for the conditional expectation of CBG/CNBD-k modelwill be anything but trivial.

A.8.4 Unconditional Expectation for CBG/CNBD-k

Unfortunately, we did not succeed in deriving a closed form for the expres-sion E(X(t) | r, α, a, b). We could derive a (rather complex) expression forE(X(t) |λ, p) for k = 2, but subsequently incorporating heterogeneity wouldhave required solving double integrals of the form

∫ 1

0

∫ ∞

0

pv4(1− p)v5λv1e−λ(v3+v2√

1−p)t dλdp. (A.33)

Nevertheless, we proceed with our calculations by using some simple heuristicmodifications to the results of Hoppe and Wagner (2007). They define

G(v1, v2, v3, v4 |α, t) := 1−(

v4

v4 + t

)v1

2F1(v1, v2 + 1; v3 + a;t

v4 + t)

(A.34)

with 2F1 being the Gaussian hypergeometric function, and stated

E(X(t)|r, α, a, b) =b

a− 1·G(r, b, b, α |α, t) (A.35)

for the unconditional expected number of transactions until time t for theirCBG/NBD model.

Page 92: Stochastic Models of Noncontractual Consumer Relationships

APPENDIX A. DERIVATION OF CBG/CNBD-K 87

Recalling our findings that the expectation for asynchronous counting is sim-ple 1/k of the corresponding Poisson process (see equation A.29), and thatthe synchronous counting only differs by some term that becomes a con-stant for long time horizon, we simply approximate the expected number oftransactions for the CBG/CNBD-k model with

E(X(t)|r, α, a, b) =1k· b

a− 1·G(r, b, b, α |α, t). (A.36)

A.8.5 Conditional Expectation for CBG/CNBD-k

But even if we come up with a proper solution for the unconditional expec-tation, the next hurdle is to calculate the expected number of future trans-actions, based on a given purchase history. Due to the fact that as opposedto the exponential distribution the Erlang-k distribution is not memoryless,we can not use the relation

E(Y (T, T + t)|x, tx, T, r, α, a, b) =

E(X(t)|τ > T, λ, p) · P (τ > T |x, tx, T, λ, p), (A.37)

as it is the case for the CBG/NBD model. Recency (T − tx) actually doesinfluence the expected number of future transactions (i.e. the first multiplica-tion term), and not just the probability of still being active. Assuming thatthe customer has survived the last transaction, a longer time period sincethe last transaction actually makes it more likely that the next transactionwill take place soon. Therefore, we will systematically underestimate futuretransactions, if we still use this relation for CBG/CNBD-k.

Nevertheless, we proceed with our heuristic simplifications, and again adaptthe findings of Hoppe and Wagner. They derived

E(Y (T, T + t)|x, tx, T, r, α, a, b) =a + b + x

a− 1·G(r+x, b+x, b+x, α+T |α, t)

· P (τ >T |x, tx, T, r, α, a, b) (A.38)

for the CBG/NBD model. In their erratum (Wagner and Hoppe, 2008) toBatislam et al. (2007) they note that it is possible to derive the result for theforecast by updating the parameters (r, α, a, b) to (r + x, α + T, a, b + x).

We use our exact derivation (A.27) for P (τ > T |x, tx, T, r, α, a, b), and combinethis with our approximation for the expectation from the previous section.Additionally, we will update the parameters from (r, α, a, b) to (r + kx, α +

Page 93: Stochastic Models of Noncontractual Consumer Relationships

APPENDIX A. DERIVATION OF CBG/CNBD-K 88

T, a, b + x), since we encountered kx uncensored events within (0, T ]). Hence,we conclude:

E(Y (T, T + t)|x, tx, T, r, α, a, b) =1k· a + b + x

a− 1·G(r + kx, b + x, b + x, α + T |α, t)

· P (τ > T |x, tx, T, r, α, a, b) (A.39)

A.9 Concluding Remarks

Despite the fact that we are just able to derive a biased approximation, wedemonstrate in the main part of this thesis that this formula is still able tooutperform classic models based on the Poisson assumption regarding indi-vidual forecasts. It is assumed that the crucial part for a correct prediction isa proper assessment of whether a customer is still active or not (in particularwhen faced with rather long prediction periods). It seems that the error thatwe get by approximating the expected number of transactions is less thenthe gained precision for the assessment of whether a customer is still activeor not.

Page 94: Stochastic Models of Noncontractual Consumer Relationships

Bibliography

M. Abe. Counting Your Customers One by One: A Hierarchical Bayes Ex-tension to the Pareto/NBD Model. Marketing Science, forthcoming, 2008.

E.P. Batislam, M. Denizel, and A. Filiztekin. Empirical validation and com-parison of models for customer base analysis. International Journal ofResearch in Marketing, 24(3):201–209, 2007.

Ben Bolker. bbmle: Tools for general maximum likelihood estimation, 2008.Version 0.8.9; based on stats4 by the R Development Core Team.

C. Chatfield and G.J. Goodhardt. A Consumer Purchasing Model with Er-lang Inter-Purchase Time. Journal of the American Statistical Association,68(344):828–835, 12 1973.

R. Dunn, S. Reader, and N. Wrigley. An Investigation of the Assumptionsof the NBD Model as Applied to Purchasing at Individual Stores. AppliedStatistics, 32(3):249–259, 1983.

A.S.C. Ehrenberg. The Pattern of Consumer Purchases. Applied Statistics,8(1):26–41, 1959.

P. Fader and B. Hardie. Forecasting Repeat Sales at CDNOW: A Case Study.Interfaces, 31(4):94–107, 2001.

P. Fader, B. Hardie, and K.L. Lee. Counting Your Customers the Easy Way:An Alternative to the Pareto/NBD Model. Marketing Science, 24:275–284,2005a.

P. Fader, B. Hardie, and K.L. Lee. A Note on Implementing the Pareto/NBDModel in MATLAB. 3 2005b. URL http://brucehardie.com/notes/008/.

P. Fader, B. Hardie, and K.L. Lee. RFM and CLV: Using Iso-Value Curvesfor Customer Base Analysis. Journal of Marketing Research, 42:415–430,2005c.

89

Page 95: Stochastic Models of Noncontractual Consumer Relationships

BIBLIOGRAPHY 90

J.D. Greene. Consumer behavior models for non-statisticians: the river oftime. Praeger, 1982.

S. Gupta, D. Hanssens, B. Hardie, W. Kahn, V. Kumar, N. Lin, N. Rav-ishanker, and S. Sriram. Modeling Customer Lifetime Value. Journal ofService Research, 9(2):139, 2006.

F.A. Haight. Counting distributions for renewal processes. Biometrika, 52(3-4):395–403, 1965.

J. Herniter. A Probabilistic Market Model of Purchase Timing and BrandSelection. Management Science, 18(4):102–112, 1971.

D. Hoppe and U. Wagner. Customer Base Analysis: The Case for a Cen-tral Variant of the Betageometric/NBD Model. Marketing - Journal ofResearch and Management, 2:75–90, 2007.

D. Hoppe and U. Wagner. Supplementary Appendix to “Customer BaseAnalysis: The Case for a Central Variant of the Betageometric/nbdModel”. Appendix with detailed mathematic derivations that is beingprovided by authors upon request., 2008.

D. Jain and S.S. Singh. Customer Lifetime Value Research in Marketing:A Review and Future Directions. Journal of Interactive Marketing, 16(2):34–46, 2002.

D.R. Mani, J. Drew, A. Betz, and P. Datta. Statistics and data miningtechniques for lifetime value modeling. In Proceedings of the fifth ACMSIGKDD international conference on Knowledge discovery and data min-ing, pages 94–103. ACM New York, NY, USA, 1999.

L. May, D. Austin, T.L. Bartlett, E. Malthouse, and P. Fader. LifetimeValue and Customer Equity Modeling Competition, 2008. URL http://www.the-dma.org/dmef/2008DMEFDKContestAnnouncement.pdf.

D.G. Morrison and D.C. Schmittlein. Generalizing the NBD Model for Cus-tomer Purchases: What Are the Implications and Is It Worth the Effort?Reply. Journal of Business and Economic Statistics, 6(2):165–66, 1988.

R Development Core Team. R: A Language and Environment for StatisticalComputing. R Foundation for Statistical Computing, Vienna, Austria,2008. URL http://www.R-project.org. ISBN 3-900051-07-0; Version 2.7.2.

Page 96: Stochastic Models of Noncontractual Consumer Relationships

BIBLIOGRAPHY 91

W.J. Reinartz and V. Kumar. On the Profitability of Long-Life Customersin a Noncontractual Setting: An Empirical Investigation and Implicationsfor Marketing. Journal of Marketing, 64(4):17–35, 2000.

S. Rosset, E. Neumann, U. Eick, and N. Vatnik. Customer Lifetime ValueModels for Decision Support. Data Mining and Knowledge Discovery, 7(3):321–339, 2003.

R.T. Rust, K.N. Lemon, and V.A. Zeithaml. Return on Marketing: UsingCustomer Equity to Focus Marketing Strategy. Journal of Marketing, 68(1):109–127, 2004.

D.C. Schmittlein and D.G. Morrison. Prediction of Future Random EventsWith the Condensed Negative Binomial Distribution. Journal of the Amer-ican Statistical Association, 78(382):449–456, 1983.

D.C. Schmittlein and R.A. Peterson. Customer Base Analysis: An IndustrialPurchase Process Application. Marketing Science, 13(1):41–67, 1994.

D.C. Schmittlein, D.G. Morrison, and R. Colombo. Counting your customers:who are they and what will they do next? Management Science, 33(1):1–24, 1987.

H. Schroder, M. Feller, and M. Großweischede. Kundenorientierung imCategory Management. 12 1999. URL http://cm.uni-essen.de/praxis/publikationen/download/MH Publikationen 1999 ECR-Studie.pdf.

U. Wagner and D. Hoppe. Erratum on the MBG/NBD Model. InternationalJournal of Research in Marketing, 2008.

U. Wagner and A. Taudes. A Multivariate Polya Model of Brand Choice andPurchase Incidence. Marketing Science, 5(3):219–244, 1986.

U. Wagner and A. Taudes. Stochastic models of consumer behaviour. North-Holland, 1987.

R.D. Wheat and D.G. Morrison. Estimating Purchase Regularity with TwoInterpurchase Times. Journal of Marketing Research, 27(1):87–93, 1990.

M. Wubben and F. von Wangenheim. Instant Customer Base Analysis:Managerial Heuristics Often “Get It Right”. Journal of Marketing, 72:82–93, 5 2008.

S. Zhang, J. Jin, and R.E. Crandall. Computation of Special Functions.Wiley-Interscience, 1996. ISBN 0-471119-63-6.