13
Donor Segmentation: When Summary Statistics Don't Tell the Whole Story Elizabeth J. Durango-Cohen a, & Ramón L. Torres b & Pablo L. Durango-Cohen b a Stuart School of Business, Illinois Institute of Technology, Chicago, IL 60661, USA b Department of Civil and Environmental Engineering, Northwestern University, Evanston, IL 60208, USA Available online 10 July 2013 Abstract Funding pressures amidst the slow economic recovery from the late-2000's recession have forced universities, as well as other not-for-prot organizations, to increase the volume and sophistication of their direct marketing activities. The efciency of direct marketing strategies is linked to an organization's ability to effectively target individuals. In this paper, we present a nite-mixture model framework to segment the alumni population of a university in the midwestern United States. Much of the research on customer segmentation summarizes response data (e.g., purchase and contribution histories) via recency, frequency and monetary value (RFM) statistics. Individuals sharing similar RFM characteristics are grouped together; the rationale being that the best predictor of future behavior is past behavior. Summary statistics such as RFM, however, introduce aggregation bias that mask the dynamics of purchase/contribution behavior. Accordingly, we implement latent-class segmentation models where alumni are classied based on how an individual's contribution sequence compares to those of other individuals. The framework's capability to process contribution sequences, i.e., longitudinal data, provides fundamental new insights into donor contribution behavior, and provides a rigorous mechanism to infer and segment the population based on unobserved heterogeneities (as well as based on other observable characteristics). Specically, we analyze Markov mixture models to segment alumni based on contribution-behavior patterns, under the assumption of serially-dependent contribution sequences. We use the expectationmaximization algorithm to obtain parameter estimates for each segment. Through an extensive empirical study, we highlight the substantive insights gained through the processing of the full contribution sequences, and establish the presence of three distinct classes of alumni in the population (each with a discernible contribution pattern). The proposed framework, collectively, provides a basis to tailor direct marketing policies to optimize specic performance criteria (e.g., prots). © 2013 Direct Marketing Educational Foundation, Inc. Published by Elsevier Inc. All rights reserved. Keywords: Customer segmentation; Non-prot fundraising; Direct mail; Expectationmaximization (EM) algorithm; Customer-based analysis Introduction Of the $28 billion raised by colleges and universities in 2010, 43% ($12.02 billion) came from contributions by indi- viduals (Council for Aid to Education 2011). Understanding the dynamics that drive individuals' contributions is vital to university fundraising efforts. This is particularly important in the current economic environment. In 2009 and 2010, colleges and universities in the United States saw record decreases in alumni contributions in 50 years of record-keeping, with drops of 6.2% and 11.9%, respectively (Hall and Joslyn 2011). Motivated, in part, by sharp declines in endowment income as a result of the global economic crisis, and increased competition from other nonprofits for the same pool of dollars, universities have worked to increase the volume and sophistication of their fundraising efforts. Studies on university fundraising/alumni giving have typically aimed to identify (statistically significant) traits associated with giving, such as demographic and socio-economic characteristics (e.g., Okunade 1996), and to examine alumni/donor motivations to give, such as awareness of the need for financial support and prestige/recognition (e.g., Gaier 2005; Holmes 2009). Moreover, most models in this literature study the contemporaneous rela- tionship between dependent variables and one or more indepen- dent variables (i.e., they are static), and thus ignore the dynamics of alumni contribution behavior. In the related direct marketing optimization literature, studies consider the dynamics of consumer Corresponding author. E-mail addresses: [email protected] (E.J. Durango-Cohen), [email protected] (R.L. Torres), [email protected] (P.L. Durango-Cohen). www.elsevier.com/locate/intmar 1094-9968/$ -see front matter © 2013 Direct Marketing Educational Foundation, Inc. Published by Elsevier Inc. All rights reserved. http://dx.doi.org/10.1016/j.intmar.2013.04.002 Available online at www.sciencedirect.com ScienceDirect Journal of Interactive Marketing 27 (2013) 172 184

Donor Segmentation: When Summary Statistics Don't Tell the Whole Story

  • Upload
    pablo-l

  • View
    222

  • Download
    4

Embed Size (px)

Citation preview

Page 1: Donor Segmentation: When Summary Statistics Don't Tell the Whole Story

www.elsevier.com/locate/intmar

Available online at www.sciencedirect.com

ScienceDirectJournal of Interactive Marketing 27 (2013) 172–184

Donor Segmentation: When Summary Statistics Don't Tell the Whole Story

Elizabeth J. Durango-Cohen a,⁎ & Ramón L. Torres b & Pablo L. Durango-Cohen b

a Stuart School of Business, Illinois Institute of Technology, Chicago, IL 60661, USAb Department of Civil and Environmental Engineering, Northwestern University, Evanston, IL 60208, USA

Available online 10 July 2013

Abstract

Funding pressures amidst the slow economic recovery from the late-2000's recession have forced universities, as well as other not-for-profitorganizations, to increase the volume and sophistication of their direct marketing activities. The efficiency of direct marketing strategies is linked toan organization's ability to effectively target individuals. In this paper, we present a finite-mixture model framework to segment the alumnipopulation of a university in the midwestern United States.

Much of the research on customer segmentation summarizes response data (e.g., purchase and contribution histories) via recency, frequencyand monetary value (RFM) statistics. Individuals sharing similar RFM characteristics are grouped together; the rationale being that the bestpredictor of future behavior is past behavior. Summary statistics such as RFM, however, introduce aggregation bias that mask the dynamics ofpurchase/contribution behavior. Accordingly, we implement latent-class segmentationmodels where alumni are classified based on how an individual'scontribution sequence compares to those of other individuals. The framework's capability to process contribution sequences, i.e., longitudinal data,provides fundamental new insights into donor contribution behavior, and provides a rigorous mechanism to infer and segment the population based onunobserved heterogeneities (as well as based on other observable characteristics). Specifically, we analyze Markov mixture models to segment alumnibased on contribution-behavior patterns, under the assumption of serially-dependent contribution sequences. We use the expectation–maximizationalgorithm to obtain parameter estimates for each segment. Through an extensive empirical study, we highlight the substantive insights gained throughthe processing of the full contribution sequences, and establish the presence of three distinct classes of alumni in the population (each with a discerniblecontribution pattern). The proposed framework, collectively, provides a basis to tailor direct marketing policies to optimize specific performance criteria(e.g., profits).© 2013 Direct Marketing Educational Foundation, Inc. Published by Elsevier Inc. All rights reserved.

Keywords: Customer segmentation; Non-profit fundraising; Direct mail; Expectation–maximization (EM) algorithm; Customer-based analysis

Introduction

Of the $28 billion raised by colleges and universities in2010, 43% ($12.02 billion) came from contributions by indi-viduals (Council for Aid to Education 2011). Understandingthe dynamics that drive individuals' contributions is vital touniversity fundraising efforts. This is particularly important in thecurrent economic environment. In 2009 and 2010, colleges anduniversities in the United States saw record decreases in alumnicontributions in 50 years of record-keeping, with drops of 6.2%and 11.9%, respectively (Hall and Joslyn 2011). Motivated, in

⁎ Corresponding author.E-mail addresses: [email protected] (E.J. Durango-Cohen),

[email protected] (R.L. Torres), [email protected](P.L. Durango-Cohen).

1094-9968/$ -see front matter © 2013 Direct Marketing Educational Foundation, Inhttp://dx.doi.org/10.1016/j.intmar.2013.04.002

part, by sharp declines in endowment income as a result of theglobal economic crisis, and increased competition from othernonprofits for the same pool of dollars, universities have workedto increase the volume and sophistication of their fundraisingefforts.

Studies on university fundraising/alumni giving have typicallyaimed to identify (statistically significant) traits associated withgiving, such as demographic and socio-economic characteristics(e.g., Okunade 1996), and to examine alumni/donor motivationsto give, such as awareness of the need for financial support andprestige/recognition (e.g., Gaier 2005; Holmes 2009). Moreover,most models in this literature study the contemporaneous rela-tionship between dependent variables and one or more indepen-dent variables (i.e., they are static), and thus ignore the dynamicsof alumni contribution behavior. In the related direct marketingoptimization literature, studies consider the dynamics of consumer

c. Published by Elsevier Inc. All rights reserved.

Page 2: Donor Segmentation: When Summary Statistics Don't Tell the Whole Story

Table 1Hypothetical sequences of contributions with identical RFM statistics —recency (R) = 1, frequency (F) = 5, average monetary value (M) = $120.

Annual contribution amount

Donor 2008 2009 2010 2011 2012

A $120 $120 $120 $120 $120B $200 $160 $140 $80 $20C $50 $75 $130 $160 $185D $80 $140 $60 $180 $140

1 In latent-class models, alumni are assumed to belong to a set of classes butwhose individual class membership is unknown, and must be inferred.

173E.J. Durango-Cohen et al. / Journal of Interactive Marketing 27 (2013) 172–184

response/purchase behavior (e.g., Gonul and Shi 1998; Jonker,Piersma, and Van den Poel 2004; Simester, Sun, and Tsitsiklis2006) with the aim to improve mailing strategies to differentsegments. These dynamic segmentation models are usuallyformulated as latent change models that describe how segmentsize and membership evolve over time, with segments definedbased on RFM statistics.

This study contributes to the fundraising/alumni-givingliterature by analyzing dynamic models of contribution behavior,and to the direct marketing segmentation literature by formulatingmanifest change models where segment membership is assumedto be stable, and where manifested changes, i.e., variability in theindividual contribution sequences, are attributed to the character-istics that define the stable segments. This supports the identi-fication of systematic, but unobserved, differences betweenindividuals, and enables optimization of direct marketing policiesfor the ensuing segments. Finally, the segmentation models wedevelop provide a rigorous framework to exploit longitudinaldata, which addresses the aggregation bias introduced whenusing RFM statistics to segment the population. More impor-tantly, these models allow for insights about the evolution ofcontributions.

Specifically, we employ finite mixture models as aframework to analyze alumni contribution behavior. The under-lying assumption in this framework is that the populationis comprised of a finite set of latent classes in unknownproportions. Each class/segment is characterized by a stochasticmodel defining the conditional density of the observations(i.e., the population is made of a mixture of segments, withyear-to-year contributions being governed by a segment-specific, finite-state Markov chain). As with other post-hocsegmentation models, the number of segments and segmentcharacteristics is inferred based on alumni response/contributiondata (Wedel and Kamakura 2000). This is in contrast to à priorisegmentation models, as generally found in university fund-raising models, where the number and types of segments aredetermined in advance by the researcher e.g., study givingbehavior in four segments: alumni, nonalumni, business andother donors, Leslie and Ramey (1988). To illustrate the use of theproposed approach, we analyze the contribution behavior of thealumni population of a Ph.D.-granting research university in themidwestern United States, from 2000 to 2009. The universitywanted to identify more responsive donor segments within theexisting alumni population that could be targeted via direct mailsolicitations.

The remainder of the paper is organized as follows: in thenext section we position our work with respect to the literature;we then present the general mixture modeling framework,along with an introduction to the expectation–maximization(EM) algorithm that is used to estimate the associated seg-mentation model parameters. An overview of the alumnus dataused in our study is then presented. This is followed by apresentation of the notation and assumptions needed toformulate Markov chain mixture models. We then report onan extensive empirical study based on data from the universityand discuss how the results are useful for direct marketing.Concluding remarks summarize the paper's findings.

Background and Literature Review

In this section, we position our work with respect to thenon-profit (university) fundraising and market segmentationliterature, and contrast our work to segmentation models devel-oped as part of integrated frameworks to support optimization ofdirect marketing activities, paying special attention to dynamicsegmentation models. Before discussing the literature, however,we first present an example to further motivate our approach.

Fundraisers at the university in our study relied, as is tradi-tionally done in practice, on recency, frequency and monetaryvalue (RFM) models and on demographic/trait information tosegment the alumni population. Appeals, for example, may be sentto “alumni who graduated with a degree in architecture in the lastfive years, and donated at least $100 in the last fiscal year.” Thisapproach is grounded on the argument that the best predictor offuture donor behavior is past behavior (Hughes 2000) and sodonors with similar RFM characteristics and observed traits aregrouped together.

To contrast this approach to the proposed framework, considerthe four hypothetical alumni, their contribution sequences, asshown in Table 1. We note that each donor contributed in the lastperiod (R = 1); each contributed in the last five years (F = 5);each has the same average contribution amount (M = $120).Based on these data, the traditional segmentation approach, basedon RFM statistics, suggests that all donors/alumni should beassigned to the same segment, and that as a result, should receivesimilar appeals.

While the summary statistics derived from the contributionsequences are identical, it is not clear that for direct marketingpurposes, assigning these individuals to the same segmentwould be desirable. Specifically, the variability in donor D'scontribution sequence may indicate that she can be persuadedto donate in response to solicitations. Moreover, the variability,i.e., trends, in the contribution sequences of donors B and C areignored. That is to say, aggregation bias is introduced whenRFM statistics are constructed in that although RFM statisticscan be derived from contribution sequences, the contributionsequences cannot be reconstructed from RFM data. As a result,we propose to implement latent-class segmentation models1

where donors are classified based on how an individual'scontribution sequence compares to those of other individuals.Our motivation is to model the dynamics that drive contribution

Page 3: Donor Segmentation: When Summary Statistics Don't Tell the Whole Story

174 E.J. Durango-Cohen et al. / Journal of Interactive Marketing 27 (2013) 172–184

behavior, and to address the aggregation bias introduced whenusing summary (RFM) statistics to segment the population.

University (Alumni-giving) Fundraising Models

In reviewing the literature, two important dimensions mustbe considered: (1) whether models are static or dynamic, and(2) the underlying segmentation method. Most models inthe alumni giving/university fundraising literature are static. Anotable exception, discussed later, is Netzer, Lattin, andSrinivasan (2008), who investigate the dynamics of alumni-giving behavior and the factors that can alter this dynamicbehavior. With respect to the segmentation method, wecharacterize models by whether they utilize it à priori or post-hocsegmentation procedures; in à priori segmentation, the numberand types of segments are determined in advance by theresearcher (e.g., donors versus non-donors), whereas in post-hocsegmentation the type and number of segments is determined as aresult of the data analysis (e.g., K-means clustering).

Most papers in this literature rely on à priori segmentationtechniques (e.g., cross-tabulation, regression, logit and dis-criminant analysis), and rely on observable bases comprised ofvariables related to demographic characteristics (e.g., donorsversus nondonors, Sun, Hoffman, and Grady 2007), donorpatronage (e.g., major contributors vs.regular donors, Lindahl andWinship 1992), and/or contribution frequency (e.g., consistentdonors vs. occasional donors, Wunnava and Lauze 2001). Inthese studies, a statistical model is estimated for each segment toexplain its underlying behavior. A common specification is oftenused for all segments to make qualitative judgments about thesignificance of different variables in explaining the behavior ofthe different segments. In addition, some papers employeconometric models to study a particular subset (one segment)of the alumni population, such as alumni who graduated between1988 and 1990 (Marr, Mullin, and Siegfried 2005), or graduatedegree alumni (Okunade 1996).

Papers generally assume that differences in donation behaviorwithin each segment can be explained by personal characteris-tics (marital status, gender, age, race), socio-economic variables(income and education, past giving, sector of employment, type offinancial aid received), behavioral factors (membership in Greekfraternities, volunteering for the college, membership in alumnichapters), and institutional characteristics (size, endowmentvalue). Lindahl and Winship's (1992) study is a seminal examplein the context of university fundraising. They present logit modelsfor both major gifts and annual fund prospects, using both trait/socio-economic and response/contribution data. In their analysis,past giving turns out to be the strongest single factor in predictingfuture giving in both segments, but other factors are alsostatistically significant. Other papers in this literature aim tolearn about what motivates alumni in different segments (alsodefined à priori) to contribute.

Few models employ post-hoc segmentation procedures in thealumni-giving literature. Weerts and Ronca (2009), for example,employ the classification and regression tree (CART)methodologyto distinguish between donors and non-donors, and then to predictcharacteristics tied to alumni giving. As described in Wedel and

Kamakura (2000), CART and other tree search procedures are usedto set thresholds that are used to assign individuals to segments(when there are multiple dependent variables of interest). Le Blancand Rucks (2009), on the other hand, utilize cluster analysis tosegment the alumni population. An algorithm based on the nearestcentroid sorting method, where an alumnus is assigned to thecluster for which the distance between the alumnus' average giftand the center of the cluster is smallest, is used.

A salient feature on many models in this literature is theassumption that donors who share common traits, e.g., demo-graphic or socio-economic characteristics, behave similarly; theyassume that sources of heterogeneity in the population areobservable. As a consequence, these studies define segmentsbased on observed traits, and do not account for unobservedheterogeneity; we refer to unobserved heterogeneity as systematic,but unobserved differences between individuals. These differ-ences are unobserved either because they are not measurable/observable, e.g., personality traits that might explain anindividual's propensity to donate in response to solicitations, orbecause it is too costly to collect the data, or because data are(inadvertently) omitted. To adjust for unobserved individualheterogeneity that might be correlated with giving behavior, somemodels include a random effects specification (for example,Holmes 2009). One of the consequences of unobserved hetero-geneity is that individuals sharing the similar traits do not respondin a homogeneous fashion as expected and, in turn, this can lead tothe implementation of inappropriate or suboptimal marketingstrategies (Wedel et al 1999).

We add to this literature by formulating segmentationmodels under the assumption that the traits that cause hetero-geneity in the data are not known in advance. Thus, segmentsand individual alumni membership must be inferred post-hocfrom the data — the contribution sequences associated witheach individual. The proposed approach, consequently, elimi-nates the aggregation bias introduced with RFM statistics, andleads to segments that differ (e.g., loyal versus transient donorsegments) based on the (evolution of) contribution sequences.

Customer Valuation and Direct Marketing Optimization Models

As early as the 1960s, it was recognized that mailings andother interactions could have an enduring impact on customerbehavior, and that organizations might be able to profit byfocusing on the long-term when designing their mailing policies(Howard 2002). This motivated the development of dynamicsegmentation models, which are based on statistical models thatdescribe how behavior evolves over time. We classify dynamicmodels in the literature as either predictive or prescriptive. Thestudies of Van den Poel (2003), Fader, Hardie, and Shang (2010)and Van Diepen, Donkers, and Frances (2009) are representativeexamples of predictive models. Van den Poel (2003) analyzesthe effect of trait and response variables on repeated purchases.Van Diepen, Donkers, and Frances's (2009) study is worthy ofnote among studies that combine response and trait data(geographic and demographic). They estimate a Tobit modelthat predicts dynamic responses in the presence of competingmailings. They consider a random effects specification of the

Page 4: Donor Segmentation: When Summary Statistics Don't Tell the Whole Story

175E.J. Durango-Cohen et al. / Journal of Interactive Marketing 27 (2013) 172–184

error terms to capture unobserved heterogeneity, i.e., systematicdifferences not explained by the exogenous variables, across thepopulation. They also provide a recent and detailed overview ofother similar predictive models. Fader, Hardie, and Shang (2010)present a discrete-time, beta-geometric/beta-Bernoulli model topredict the likelihood/frequency of donations. With the exceptionof Fader, Hardie, and Shang (2010) that analyze the variabilityin binary data sequences (longitudinal data), these studiessummarize response data via RFM.

In contrast, prescriptive models rely on Markov chains torepresent the evolution of behavior. In these models, a discreteset of segments/states are used to group individuals sharingsimilar RFM statistics. The transition probabilities describehow individuals transition between states (in response to afirm's decisions, e.g., mail catalog or not). These and otherdynamic segmentation models are usually formulated as latentchange models that describe how segment size and member-ship evolve. Representative examples include Pfeifer andCarraway (2000), Bitran and Mondschein (1996), Gonul, andShi (1998), Jonker, Piersma, and Van den Poel (2004), andSimester, Sun, and Tsitsiklis (2006). The models differ in thestructure of the optimization problem used to obtain the catalogmailing strategy. For example, Bitran and Mondschein (1996)present a model where a firm decides to mail catalogs toindividuals in either a house list (existing customers), or a rentedlist (prospective customers). The model includes cash-flow andinventory management considerations. Pfeifer and Carraway(2000) use latent-change Markov models to calculate customerlifetime value (LTV) under different firm-customer relationshipsettings. Other models contribute approaches to establish thestates, estimate the transition probabilities, and period rewards.Building on the work of Rust (1994), Gonul, and Shi (1998)present a closely-related model where the transition probabilitymatrices are estimated using dynamic programming. Extensionsare discussed in Gonul, Kim, and Shi (2000), Gonul, and terHofstede (2006). Jonker, Piersma, and Van den Poel (2004)propose bootstrapping methods to estimate the transition proba-bilities, and a genetic algorithm to perturb the state definitions.

While not specifically estimated as part of a framework tooptimize direct marketing, it is appropriate to include Poulsen(1990), Netzer, Lattin, and Srinivasan (2008), and especiallyCadez, Gaffney, and Smyth (2000) in the aforementioned studies.Poulsen (1990) analyzes the heterogeneity in the data set ofAaker (1970) on brand choice/switching. Netzer, Lattin, andSrinivasan (2008) estimate a latent-change Markov model toshed light on the effect of reunions and volunteer opportunities onthe gift-giving behavior of alumni who may transition between“dormant,” “occasional,” and “active” states, as a result ofreunion attendance and volunteering. To account for unobservedheterogeneity, they generate individual-level transition probabil-ity matrices, whose entries are a function of parameters estimatedwith a random-effects logit model. Cadez, Gaffney, and Smyth(2000) segment individuals based on web-browsing patterns,where the data attached to individuals consist of multiplesequences of URLs — one sequence per session.

Our approach builds on Cadez, Gaffney, and Smyth (2000),and differs from other studies (see Jonker, Piersma, and Van

den Poel 2004 and Bitran and Mondschein 1996) in that ratherthan relying on a set of descriptive statistics, e.g., the averagemonetary value of annual contributions, we make use of theoriginal contribution sequences, i.e., longitudinal data, associ-ated with each individual. In addition to presenting a rigorousframework to exploit longitudinal data, our work contributesto the literature on dynamic segmentation models developed aspart of integrated frameworks to support optimization of directmarketing activities, e.g., catalog mailing decisions (Bitranand Mondschein 1996; Netzer, Lattin, and Srinivasan 2008).The work proposed herein provides an approach to estimatetransition probabilities, one set for each segment, which cansubsequently be used to compute optimal direct marketingpolicies. With the exception of Poulsen (1990), dynamic seg-mentation models typically consist of a single set of transitionprobabilities, a Markov chain, which describes how segmentsize and membership evolve over time. In contrast, we for-mulate manifest change models, following Calantone andSawyer (1978) and Bockenholt and Langeheine (1996), andposit the existence of (static and stable) segments that exhibitstructurally-different dynamic contribution patterns, with eachpattern represented by a Markov chain. The specification ofmultiple patterns improves fit-to-data, and importantly, providesan approach to capture and explain unobserved heterogeneity.While there is ample empirical evidence that such differences canbe (statistically) significant, dynamic segmentation models tosupport direct marketing are only capable of explaining thesedifferences as random variations within the population. In contrast,the proposed framework yields segmentations based on thesedifferences, as well as based on other observable characteristics,and subsequently, can be used to tailor direct marketing policiesfor each of the ensuing segments.

General Methodology: Segmenting Sequences

In this section, we present a general methodology to segmentindividuals based on the monetary value of their annual con-tributions, and how these evolve over time. We next introduce thenotation, state the assumptions used to formulate the finitemixture models, and explain how these support segmentation.Weconclude the section by introducing the problem of estimating theparameters of finite mixture models, as well as the implementa-tion of the EM algorithm that we used to solve such problems.

To capture differences in donation behavior across alumni,we employ latent class modeling. In latent class models,individuals are divided into different segments or classes. Theselatent classes are à priori unobservable. The problem ofclassifying members/alumni is akin to that of classifyingcontribution sequences, i.e., we assume all relevant informationis contained in alumni contribution sequences. We use y torepresent the set of all alumni contribution sequences,y ≡ {ym}m = 1

M , where ym represents the contribution sequencefor member/alumnus m, with ym = {ym

1,ym2, …,ym

T} = {ymt }1

T, for1 ≤ T, where ym

t is the contribution of alumnus m in year t, form = 1,2,…,M, and t = 1,2,…,T. M and T correspond to the totalnumber of individuals in the population, and the total number ofperiods (years) in the sample, respectively.We note that sequences

Page 5: Donor Segmentation: When Summary Statistics Don't Tell the Whole Story

176 E.J. Durango-Cohen et al. / Journal of Interactive Marketing 27 (2013) 172–184

may be of different lengths (unbalanced), as the first contributionfor individual alumni may occur in different periods.2

In latent class modeling, the contribution sequences y arisefrom a population that is a mixture of S segments, inproportions λ1, λ2,…,λ,S, with λ = {λs}s = 1

S . The mixtureproportions correspond to the probability mass functiondescribing an individual's segment membership. That is, λs

corresponds to the à priori probability that a randomly selectedindividual from the population belongs to segment sϵS. The

probabilities are subject to the constraints:XSs¼1

λs ¼ 1, andλs ≥ 0, ∀s.

Further, each segment is specified with a stochastic model,fs(ym|θs), i.e., a probability mass/density function, representingthe probability of observing contribution sequence ym, whereθs denotes a set of parameters that define the function fs(⋅),e.g., θs = (μs,σs) for a Normal distribution. The total probabil-ity of observing sequence ym is thus:

f ymjθð Þ ¼XSs¼1

λs fs ym θsj Þð ð1Þ

where θ ≡ {θs}s = 1S . The total probability is a weighted sum,

i.e., a mixture, of the probabilities associated with eachsegment. Eq. (1) is referred to as a finite mixture model. Inaddition to describing the distribution of sequences across thepopulation, the specification of a finite mixture model providesa segmentation framework based on updating each individual'smembership probabilities in response to their contributionsequence. Specifically, given ym, as shown in Durango-Cohen(2013), Bayes law can be applied to update the probability thatindividual m belongs to segment s, pms, as follows:

pms≡P zms ¼ 1jymð Þ ¼ λsf s ymjθsð Þ∑S

r¼1λrf r ym θrj Þð ð2Þ

where zms = 1 if individual m belongs to segment s, and zms = 0otherwise.

The resulting segmentation is stochastic in that Eq. (2)yields conditional membership probabilities, as opposed todeterministic assignments of individuals to segments. In thesubsequent analysis, we present segmentation models andimplementations of the EM algorithm for different specifica-tions of the fs(⋅) functions. These specifications correspond todifferent assumptions about the variability in the contributionsequences (i.e., whether sequences are serially independentversus state-dependent contribution sequences).

Maximum (Log-)likelihood Estimation Problem

In the previous section, we formulated a finite mixture model,Eq. (1), describing the distribution of contribution sequences. Here,we discuss the approach to estimate the associated parameters —the mixture proportions, λ, and the segment-specific parameters,

2 To simplify the exposition, we ignore the fact that each contributionsequence is specified asym ¼ ytm

� �Tt¼τm

, where τm corresponds to the period ofindividual m's first contribution.

θ. As explained by McLachlan and Peel (1997), if individualmembership were known (i.e.,: zm, ∀m), the parameter estimationwould be straightforward; the fact that these data are “missing” or“unobserved” complicates the problem due to the lack of closed-form solutions to the ensuing MLE problem. As is commonlydone in the estimation of finite mixture models, we present animplementation of the EM algorithm.3

The data for the estimation problem consist of the set ofcontribution sequences for the population, y. Having specifiedEq. (1), the data likelihood for λ and θ is given by

L y;λ; θð Þ ¼ ∏M

m¼1P ym λ; θj Þð ð3Þ

where Eq. (3) represents the joint density function for all(donor) contribution sequences, given parameter estimate θ andmixture proportion λ. The objective is to find parameters,λ and θ, that maximize Eq. (3) subject to the constraintsXSs¼1

λs ¼ 1, and λs ≥ 0, ∀s. As applied in the estimation of

finite mixture models, the EM algorithm relies on the fact thatif individual memberships, zm, ∀m, were known, the ensuingestimation problem would be simplified; leading to thecomplete data likelihood for λ and θ:

Lc y; z;λ; θð Þ ¼ ∏M

m¼1P ym; zmjλ; θð Þ

¼ ∏M

m¼1P ym λ; θj ÞP zm ¼ 1 λ; θj Þð �ð½

where z ≡ {zm}m = 1M are assumed independent of ym, and

zm ≡ {zms}s = 1S . We can rewrite the above, recalling that zms is

an indicator variable such that zms = 1 if ym is generated fromfs(⋅) and 0 otherwise, as follows:

Lc y; z;λ; θð Þ ¼ ∏M

m¼1∏S

s¼1f s ym zms; θj ÞP zms ¼ 1ð jλ; θð Þ½ �zms :

Thus, we can express the associated complete data log-likelihood as:

ln Lc y; z;λ; θð Þ

¼XMm¼1

XSs¼1

zms ln f s ym zms; θsj Þð Þ þ zms ln λsð Þð �:½ ð4Þ

The EM algorithm is a numerical method to maximize thecomplete data log-likelihood function, shown in Eq. (4). Thealgorithm consists of two steps: the expectation step, E-step,where we evaluate the expectation of (4) over the possiblerealizations of z, given the observed data, y, and estimates ofλ and θ, denoted λ̂ and θ̂; and the maximization step, M-step,

where we update λ̂ and θ̂ with arguments that maximize theexpectation of ln Lc(⋅). The EM algorithm alternates between the

3 The EM algorithm, formalized by Dempster, Lair and Rubin (1977), is anumerical method to solve MLE problems in cases were data are missing,e.g., unobserved, censored, etc.

Page 6: Donor Segmentation: When Summary Statistics Don't Tell the Whole Story

Table 2Contribution year, alumni counts, average gift amounts by year, standard

177E.J. Durango-Cohen et al. / Journal of Interactive Marketing 27 (2013) 172–184

E and M steps until a convergence criterion is met. Mathe-matically, the EM algorithm can be written as follows:

deviation of yearly gift amounts, and yearly donation percentiles.

No.activealumni

Mean annualcontributionper alumni

Std. deviationannualcontribution

Percentile

Year25-th 50-th 75-th

2000 4857 $172 $625 $25 $50 $1002001 4239 $166 $581 $25 $50 $1002002 3290 $180 $645 $25 $50 $1002003 3595 $165 $542 $25 $50 $1002004 3473 $193 $590 $25 $60 $1002005 3503 $185 $475 $30 $75 $1352006 3743 $206 $711 $25 $75 $1032007 4086 $225 $715 $30 $80 $1502008 3771 $204 $598 $35 $100 $1502009 3468 $207 $541 $35 $100 $150

E-step:

Q λ̂ ^; θ� �

¼ Ez ln Lc y; z ^;λ ^; θ� �h i

¼ Ez1;…;zM

XMm¼1

XSs¼1

zms ln f s ymjzms; θsð Þð Þ þ zms ln λsð Þ½ �" #

: ð5Þ

M-step:

λ̂; θ̂ ¼ arg maxλ;θ

fQ λ ; θ� �

: λ∈$; θs is in the set of valid

parameters of f s ⋅ð Þg: ð6Þ

More specifically, in the E-step, based on the current parameterestimates, λ̂; θ̂, the expectation is obtained by replacing the zms inthe complete log-likelihood by their expected values, as given byEq. (2), as estimates of the membership probabilities, p̂ms;∀m; s.In theM-step, this expectation is maximized with respect toλ andθ to obtain updated parameter estimates.

In the next section, we discuss the data used in our empiricalstudy. This is followed by the presentation of the EM algorithmfor a specification of Eq. (1) aimed at discerning segments withdistinct contribution patterns.

Data

The data used in the analysis were provided by a majoruniversity in the midwestern United States. The original dataset contains 10,660 active donors4 with 44,726 transactions.Our analysis is based on annual contribution sequences, andso for each alumnus in the data set, we summed the monetaryvalue of all contributions in a given year, resulting in a total of38,025 annual receipts. Table 2 summarizes the donor popu-lation contribution data for the ten-year period.

The mean annual contribution per alumni shows a generallyincreasing trend. The values for the 75th percentile of con-tributions across the several years are significantly lower thanthe mean annual contribution, indicating that contributions areskewed toward smaller values. The abundance of zeros stemmingfrom intermittent contribution patterns is an important feature ofthe data; one that plays an important role both in the segmentationmodel based on frequency and monetary value, as well as in thesegmentation model based on contribution patterns. To describefurther, we note that of the 10,660 individuals in the data set

4 Active donors are those individuals that provided at least one monetarycontribution to the university, and contributed more than $10, but less than$25,000 over any consecutive three years during fiscal years 2000 through2009. The data set contained 85 alumni (with 549 annual contributions) whowere considered major donors, and not targeted through direct mail. As a result,they were excluded from the analysis. Fiscal years are defined to begin June 1and end May 31, i.e., fiscal year 2001 (“FY 2001” or “FY01”), runs from June1, 2000 through May 31, 2001.

33.8% contribute only once, 16.7% twice, 11.14%, 7.4%, 6.2%,5.0%, 5.4%, 4.8%, 4.9%, 4.5% contribute 3, 4, 5, 6, 7, 8, 9, and10 times, respectively.

In addition to the response data described above, theuniversity also provided major, college, degree type, graduationyear, college (e.g., architecture, law, business), and alumnimailing zip code. These data are currently used to segment thealumni population, under the assumption that alumni with sharedtraits respond in a homogeneous fashion to contribution appeals.

Segmentation Based on Contribution Patterns

In this section, we present a Markov chain mixture model todescribe the contribution behavior of alumni. The model relieson the assumption that contribution sequences are seriallydependent. We model the elements in a contribution sequenceas manifestations of a finite set of states. The segments, in turn,are characterized by Markov chains – one for each segment –that describe transitions between states, and that representdistinct contribution patterns. More specifically, the probabilitymass functions characterizing the segments, fs(⋅), correspondto finite-state Markov chains. Each Markov chain describessegment-specific, year-to-year transitions in the contributionsequences of alumni.

In terms of notation and assumptions:

• We assume that the value of the individual contributions is amanifestation of a set of L discrete states. We let the randomvariable xm

t represent the state/level of alumni m's contribu-tion in period t, and, fundamentally, that the relationshipbetween ym

t and xmt is observable. Specifically, we assume

that xmt = l ⇔ bl − 1 b ym

t ≤ bl, l = 1, …,L, where the pointsb0, b1, …,bL define a partition of the state-space of ym

t . Asbefore, xm ≡ {xm

t }t = 1T , and x ≡ {xm}m = 1

M .• In terms of specifying the probability functions, fs(⋅), we have

f s ymjθsð Þ ¼ P xmjθsð Þ ¼ ∏T

t¼1πsxt−1m ;xtm

¼ ∏L

i¼1∏L

j¼1πsij

h iνmij: ð7Þ

For an individual assigned to segment s, πijs is the probability

that her annual contribution transitions between levels i and j in

Page 7: Donor Segmentation: When Summary Statistics Don't Tell the Whole Story

178 E.J. Durango-Cohen et al. / Journal of Interactive Marketing 27 (2013) 172–184

consecutive periods. The parameters that characterize eachsegment, θs ≡ πij

s, i, j = 1, …,L, correspond to the set ofprobabilities that define a time-homogeneous Markov chain.νijm corresponds to the total number of i − j transitions in the

sequence xm.

• For λ and θ, the data likelihood function is obtained bysubstituting Eq. (7) into Eq. (1) as follows:

L y;λ; θð Þ ¼ ∏M

m¼1P xmjλ; θð Þ ¼ ∏

M

m¼1

XSs¼1

λs∏L

i¼1∏L

j¼1πsij

h iνmij:

ð8Þ

From Eqs. (4) and (8), the associated complete datalog-likelihood for λ and θ is given by

ln Lc y; z;λ; θð Þ

¼XMm¼1

XSs¼1

zms

XLi¼1

XLj¼1

νmij ln πs

ij

� �24

35þ zms ln λsð Þ: ð9Þ

The derivation of the EM algorithm to solve the problem ofmaximizing Eq. (9) is presented in Durango-Cohen (2013). Theimplementation is summarized below:

EM Algorithm: Segmentation Based on ContributionPatterns Initialize

k←0λ̂k; θk

While k b K and Q λkþ1; θkþ1� �

−Q λ̂k; θk

� �N�

Given yFor s = 1, …,S:

p̂ms ¼λ̂s∏

L

i¼1∏L

j¼1π̂sij

h iνmij

XSr¼1

λ̂r∏L

i¼1∏L

j¼1π̂ rij

h iνmij

; for m ¼ 1;…;M

λ̂kþ1s ¼ 1

M

XMm¼1

p̂ms

Updating Transition Probabilities:

θ̂kþ1s ≡ π̂s

ij

D Ekþ1� L

i;j¼1

For s = 1, …,S:

π̂sij

D Ekþ1¼

XMm¼1

νmij p̂ms

XLq¼1

XMm¼1

νmiqp̂ms

; i; j ¼ 1;…; L

k ← k + 1.

The estimation of a Markov chain mixture model requiresspecification of partitions defining the states for each segment'sMarkov chain. While the breakpoints that define these par-titions can be set exogenously, we make use of the segmentsidentified in our analysis of annual contributions (presented inAppendix A). From the analysis, six distinct segments areidentified, with each segment characterized by a Normaldistribution describing the annual monetary value of contri-butions. Thereupon, we specify Markov chains with 7 states:an “inactive” state to capture zeros in the contributionsequences, and 6 states aligned with distinct annual contribu-tion ranges. The aim is to find/create a correspondencebetween the Markov chain states and the monetary valuesegments. To this end, we set each breakpoint, bl such that ify ≤ bl then the member's contribution state is l − 1, and ify N bl, then the contribution is in state l. A breakpoint that istoo high results in a state l that is very infrequent/unlikelyrelative to state l − 1. Similarly, a breakpoint that is too lowresults in a state l − 1 that is infrequent/unlikely relative tostate l. Mathematically, the breakpoint, bl,l = 2,…,7, betweenany pair of adjacent states corresponds to a contributionamount (in $) satisfying

∫y¼bl

y¼−∞gl−1 yð jμ̂ l−1; σ̂ l−1Þdy ¼ ∫

y¼∞

y¼bl

gl yð jμ̂ l; σ̂ lÞdyð13Þ

where gl(⋅) corresponds to Normal densities describing the annualmonetary value of the contributions from individuals belongingto segment l, as identified in Appendix A. For example, considerthe distributions that describe annual contributions in segments3 and 4, with μ̂ 3 ¼ 89:71; σ̂ 3 ¼ 39:29Þð and μ̂ 4 ¼ 190:57;ðσ̂ 4 ¼ 107:18Þ, respectively. In this instance, the resulting

breakpoint, b4 = 125, is such that ∫y¼125y¼−∞ g3 yð jμ̂ 3; σ̂ 3Þdy≈∫y¼∞

y¼125

g4 yð jμ̂ 4; σ̂ 4Þdy; that is, we are equally confident that donationsbelow b4 = $125 are from state/level 3, and above bl = $125 arefrom state/leve 4.

The breakpoints and resulting states, satisfying Eq. (13), aregiven in Table 4.

Given the states defined above, the problem we consider is toclassify alumni based on sequences of transitions. To illustratethis, Table 5 shows the yearly contribution amounts made by asample of alumni, and the resulting set of observed contributionstates. As is discussed in the Results section, this sample wasselected to highlight the managerial implications of capturingvariability in state transitions. The proposed model classifiesindividuals based on the state transitions in consecutive periods.For example, individual 1 displays the following 9 transitions: 3to 2, 2 to 2, 2 to 0, 0 to 3, 3 to 4, 4 to 3, 3 to 4, 4 to 0, and 0 to 4.

Finally, in terms of implementation, we considered modelswith one to four segments/classes. For a given number ofsegments, we ran 100 instances with randomly generated initialparameters: proportions and transition probabilities. Theconvergence criteria for each instance were K = 200 iterationsor a tolerance �¼ 5� 105. The results reported are forinstances with the maximum (scaled) log-likelihood.

Page 8: Donor Segmentation: When Summary Statistics Don't Tell the Whole Story

Table 5Contribution states for a sample of alumni, where xm

t represents alumnusm's contribution state in period t, given a contribution amount of ymt and state definitions of Table 4.

m Annual contribution in each year, ymt Contribution state in each year, xm

t

1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 10

1 $100 $50 $50 $0 $100 $175 $125 $200 $0 $200 3 2 2 0 3 4 3 4 0 42 $100 $200 $100 $100 $100 $200 $0 $100 $0 $100 3 4 3 3 3 4 0 3 0 33 $100 $100 $0 $600 $2500 $600 $0 $25 $0 $0 3 3 0 5 6 5 0 1 0 04 $250 $0 $100 $250 $0 $0 $2000 $0 $0 4 0 3 4 0 0 6 0 05 $100 $0 3 06 $500 $100 $100 $100 $100 $500 $500 $500 $250 $250 5 3 3 3 3 5 5 5 4 47 $10 $25 $30 $40 $20 $20 $20 $10 $45 $45 1 1 1 2 1 1 1 1 2 28 $50 $0 $50 $0 $50 $0 $0 2 0 2 0 2 0 0

Table 3Mean (μ̂ ), std. deviation (σ̂ ), and mixture proportion (λ̂) estimates, as well as scaled log-likelihood (LL) values for normal mixture models with different number ofsegments.

Segment Scaled LL

1 2 3 4 5 6 7

4-Segments μ̂ $29.79 $84.89 $229.07 $1318.92 −5.554σ̂ $14.29 $41.45 $149.30 $1749.06λ̂ 41.3% 37.8% 14.2% 6.6%

5-Segments μ̂ $29.53 $81.34 $184.41 $601.21 $2545.11 −5.486σ̂ $14.11 $38.47 $103.43 $433.20 $2610.69λ̂ 40.9% 36.0% 13.6% 7.3% 2.3%

6-Segments μ̂ $21.65 $42.03 $89.71 $190.57 $609.18 $2551.52 −5.423σ̂ $8.01 $17.97 $39.29 $107.18 $436.10 $2615.58λ̂ 22.8% 24.1% 31.0% 12.8% 7.1% 2.3%

7-Segments μ̂ $21.40 $40.70 $84.47 $157.53 $300.99 $782.20 $2829.63 −5.405σ̂ $7.84 $16.80 $36.41 $79.56 $186.28 $526.66 $2815.21λ̂ 22.2% 23.1% 29.8% 12.3% 5.8% 5.1% 1.8%

Note: For explanation of Segmentation Based on Monetary Value, please see Appendix A.

Table 4State definitions for Markov mixture model.

State definition 0 1 2 3 4 5 6Inactive ($0, $30] ($30, $80] ($80, $125] ($125, $270] ($270, $890] ($890, ∞)

179E.J. Durango-Cohen et al. / Journal of Interactive Marketing 27 (2013) 172–184

Results: Segment Transition Probability Estimates

We now analyze the estimation results that arise fromsegmenting alumni based on the evolution of their contributions,as defined by Markov mixture models. With respect to the fitof the models, the scaled log-likelihoods are −0.8761, −0.8675,−0.8655, −0.8561, respectively, for the mixture models with 1,2, 3, and 4 segments/classes.5 Based on goodness-of-fit, andbecause the consequent behaviors implied by the Markov chainsare managerially appealing, we first present in detail the resultsfor the three-segment model, followed by a discussion on how theexplicit representation of the state-dependence in the contributionsequences supports the inference of distinct contribution patters,not distinguishable when the population is viewed as a whole(one-segment). We then illustrate the segmentation process toprovide insight about the underlying behavior, for a sample ofalumni. The estimation results suggest the presence of

5 The likelihoods are not comparable to those presented earlier because thedata correspond to the discrete levels, as opposed to the contribution amounts.

heterogeneity not explained by the available trait data. Toillustrate, we conclude the section by examining the contributionpatterns for different groups of alumni target by the university.

Estimation Results for Three-Segment Markov Mixture Model

The transition probability estimates for the three-segmentmodel are presented in Table 6. In terms of inferring contributionpatterns from the 3-segment model, we note:

• The diagonal entries in the matrix for segment 1 (except forstate 2) indicate a high probability (exceeding 50%) thatalumni assigned to this segment contribute at the same levelin consecutive years. In addition, this probability appears toincrease in the states (or contribution amounts), meaningthat alumni who contribute at higher levels are more likely tocontribute at that same level from year-to-year (π̂4,4

1 = 70.4%π̂5,51 = 72.0%, π̂6,6

1 = 96.0%), relative to alumni contribut-ing at lower levels. Interestingly, inactive alumni transitioninto active states over 45% of the time, while the probability

Page 9: Donor Segmentation: When Summary Statistics Don't Tell the Whole Story

Table 6Transition and steady-state probabilities for 3-segment mixture model, where

π̂Seg:kij represents the one-year probability of transitioning from state i to state j,

i, j ∈ {0, …,6}, for an individual assigned to segment k,k ∈ {1,2,3}.

State Steady-state

0 1 2 3 4 5 6

π̂Seg:1ij

0 54.3% 32.9% 12.6% 0.2% 0.1% 0.0% 0.0% 37.4%1 39.5% 50.5% 9.5% 0.3% 0.2% 0.0% 0.0% 33.1%2 36.8% 24.3% 33.3% 5.5% 0.0% 0.0% 0.0% 13.5%3 4.9% 0.7% 2.9% 76.1% 13.0% 2.2% 0.3% 6.1%4 4.0% 1.0% 0.4% 14.4% 70.4% 9.2% 0.6% 4.0%5 1.0% 0.0% 0.0% 6.0% 16.5% 72.0% 4.5% 2.2%6 0.9% 0.0% 0.0% 0.0% 1.0% 2.2% 96.0% 3.7%

π̂Seg:2ij

0 92.0% 2.2% 2.5% 2.1% 0.7% 0.3% 0.3% 90.5%1 73.6% 19.6% 4.4% 1.8% 0.5% 0.1% 0.0% 3.0%2 74.5% 4.8% 13.8% 5.4% 1.0% 0.3% 0.3% 3.0%3 73.7% 2.1% 5.8% 14.6% 2.9% 0.6% 0.3% 2.3%4 70.0% 1.7% 2.9% 9.0% 11.8% 4.0% 0.7% 0.7%5 81.1% 0.0% 0.8% 3.3% 2.8% 8.2% 3.8% 0.3%6 70.8% 0.7% 1.7% 1.0% 0.9% 1.5% 23.4% 0.4%

π̂Seg:3ij

0 42.2% 2.0% 10.5% 22.5% 12.7% 5.5% 4.6% 31.2%1 9.7% 60.2% 19.6% 8.2% 1.7% 0.3% 0.2% 5.7%2 19.6% 8.3% 51.3% 16.3% 4.0% 0.5% 0.1% 16.3%3 29.5% 2.6% 10.3% 42.1% 13.2% 1.7% 0.7% 21.6%4 31.3% 0.7% 4.2% 19.3% 34.7% 8.6% 1.2% 12.9%5 31.4% 0.3% 1.3% 5.2% 14.6% 38.0% 9.2% 6.6%6 30.3% 0.1% 0.1% 2.2% 1.9% 11.0% 54.4% 5.4%

Table 7Mixture proportions for three-segment Markov mixture model.

Segment proportions Segment 1 (LV) Segment 2 (TS) Segment 3 (HV)15.6% 57.1% 27.3%

Table 8Transition and steady-state probabilities for the alumni population, where π̂Pop:

ijrepresents the probability that an individual (from the population) transitionsfrom state i to state j, i, j ∈ {0, …,6}, in consecutive years.

π̂Pop:ij 0 1 2 3 4 5 6 Steady-state

0 81.9% 4.9% 4.5% 4.7% 2.2% 1.0% 0.9% 68.2%1 45.1% 41.5% 10.0% 2.7% 0.7% 0.1% 0.1% 7.4%2 41.1% 10.1% 35.5% 10.6% 2.2% 0.3% 0.2% 7.1%3 38.3% 2.1% 7.8% 39.7% 10.2% 1.4% 0.5% 8.3%4 33.4% 1.0% 3.1% 16.2% 37.6% 7.8% 1.0% 4.6%5 34.1% 0.2% 0.9% 5.0% 12.8% 39.8% 7.1% 2.1%6 33.8% 0.2% 0.5% 1.4% 1.5% 6.7% 56.0% 1.9%

180 E.J. Durango-Cohen et al. / Journal of Interactive Marketing 27 (2013) 172–184

that active alumni become inactive drops precipitously in thestates. The transition probabilities observed in segment 1 areconsistent with the observed behavior from “loyal alumni,”who tend to contribute at the same level from year-to-year.Thus, we label this segment “Low Variance” (LV).

• The transition probabilities of segment 2 are qualitativelydifferent. The probability that an individual assigned to thisclass becomes inactive is over 70% for all states, with theprobability of becoming inactive being higher for alumniin lower-contribution states (state 5 being the exception).Inactive alumni in a given year have less than a 10% chanceof becoming active. Consequently, we refer to individuals inthis segment as “Transient” (TS).

• Alumni in segment 3 are much more likely to transition toany state, as compared to the transitions in those assigned toLV or TS. We label segment 3 as “High Variance” (HV),and hypothesize that one factor that contributes to thevariance in contributions over time are solicitation changes,both in terms of messaging and frequency. Specifically, theuniversity sends distinct solicitations to test the efficacy ofdifferent messages in generating contributions, as well assends additional solicitations, beyond annual gift requests,to active alumni in order to generate more revenues.Thus, while not the only factor leading to variability incontributions, we believe appeals may have a significanteffect on the contribution sequences of alumni in thissegment. Over 50% of the time, inactive alumni in this class

transition to active states; alumni in states 3, 4 and 5transition to other states over 60% of the time, and membersin states 1, 2 and 6 transition to other states close to 50% ofthe time.

These results indicate the presence of three distinct classesof alumni in the population (each with a discernible contri-bution pattern). The segment mixture proportions are given inTable 7. The results are consistent with our findings that over50% of alumni in the data set contribute no more than twice.Identifying HV alumni (27% of the population) is importantbecause, arguably, a subset of this class is more likely to bepersuaded by solicitations.

For comparison, we next consider the one-step transitionprobability matrix for the alumni population as a whole,i.e., assuming the population is comprised of one (homogeneous)segment. Each entry in Table 8 represents the probability that anindividual's contributions transition between states/levels, inconsecutive years. For example, π̂Pop:

44 means that there is a 37.6%probability that an individual (from the population) whosecontribution falls in state 4 (between $125 and $270) in a givenyear, will contribute at the same level in the following year; whileπ̂Pop:40 means that there is a 33.4% probability that the same

individual transitions to the “Inactive” state in the following year.Similarly, if an alumnus does not contribute in a given year, thereis an 81.9% probability that he will remain inactive the followingyear.

There are significant differences between the Markov chainfor the population as a whole (Table 8) and the ones estimatedfor the three-segment mixture model (Table 6). As a measure ofthese differences, we consider the steady-state probabilities (thelong-run expected proportion of annual contributions/alumniacross all states/levels.). For the population as a whole, forexample, the expectation is that 68.25% of alumni do notcontribute in any given year. In contrast, for the three-segmentmodel, we expect 37.4% of alumni in segment 1, 90.5% ofalumni in segment 2, and 31.2% of segment 3 to be inactive.

Page 10: Donor Segmentation: When Summary Statistics Don't Tell the Whole Story

Table 10Mixture proportions for LYBUNTs (alumni who donated “Last Year BUt NotThis”) and alumni who are 2-years lapsed in making donations.

LYBUNTS LV TS HV 2-Yr lapsed LV TS HV

Yes 21% 43% 36% Yes 7% 85% 8%No 15% 60% 25% No 26% 26% 48%Population 16% 57% 27% Population 16% 57% 27%

181E.J. Durango-Cohen et al. / Journal of Interactive Marketing 27 (2013) 172–184

This result, coupled with the mixture proportions, indicatessignificant heterogeneity in the population.

We also note that the behavior (transition probabilities)for the population as a whole does not capture the behavior ofany particular segment. This failure to capture heterogeneityin the population can lead to inefficient marketing policies. Inparticular, marketing policies aimed at the (average behavior inthe) population may not be appropriate for any segment ofindividuals that comprise it.

Results: Segmentation Based on Contribution Patterns

We next discuss the assignment of alumni to the low-variance,transient and high-variance classes. We highlight the managerialimplications using a sample of the alumni population.

In Table 9, we observe how the algorithm assigns each alumnusa probabilistic measure of class membership, based on theobserved giving behavior, and make the following observations:

• Comparing the outcomes for alumni 3 and 4 provides anindication of the role of state-dependence. The algorithmassigns a lower probability that alumnus 4 belongs to the TSgroup (11%) than it does for alumnus 3 (27%) even thoughthe former has proportionally more transitions into theinactive state. This may be explained in part because ofalumnus 4's transition from $0 to $2000 in year 8; such atransition is unlikely to occur for alumni in the transientclass (π06

TS = 0.3%) relative to those in the high-varianceclass (π06

HV = 4.6%).• Similarly, although alumni 6 and 7 both have three out-of-statetransitions (5 → 3 − 3 − 3 − 3 → 5 − 5 − 5 → 4 − 4 and1 − 1 → 2 − 2 → 1 − 1 − 1 − 1 → 2 − 2, respectively),the algorithm, given the estimated transition probabilities(see Table 6), assigns alumnus 6 to the low variance class(97% probability), and classifies alumnus 7 with 88% (12%)probability into the high variance (low-variance) class.

• The membership probabilities for alumni 5 and 8 reflectmuch uncertainty about their class membership. Foralumnus 5, this is most likely due to the fact that alumnus5 has only been an active alumnus for two years, and failedto contribute in the last year. A different mechanism is atplay with alumnus 8. His many periods with $0 contribu-tions make it more likely that he belongs to the transient

Table 9Assigned posterior probability, p̂mr , that member m, m ∈ {1, …,8} belongs to segannual contribution amounts, ym

t , for member m in year t, t ∈ {1, …,10}, for an alu

m p̂m;LV p̂m;TS p̂m;HV ym1 ym

2 ym3

1 0% 0% 100% $100 $50 $502 0% 0% 100% $100 $200 $100 $3 0% 27% 73% $100 $100 $0 $4 0% 11% 89% $250 $0 $5 2% 83% 15%6 97% 0% 3% $500 $100 $100 $7 12% 0% 88% $10 $25 $308 39% 57% 4%

class (57%). However, the likelihood of transitions to/fromthe inactive state to/from state 2 ($50), lead to a 39%probability that alumnus 8 belongs to the low-variance and a4% change that he belongs to the high-variance class.

Results: Capturing Unobserved Heterogeneity

In addition to targeting alumni based on trait characteristics(i.e., major, number of degrees, graduation year), the universityalso utilizes recency and frequency measures as criteria formailings. Examples include sending appeals to alumni whocontributed last year but not this (LYBUNTs), or to alumni whoare 2 years lapsed in making donations. We next illustrate thebenefits of using contribution data to capture unobservedheterogeneity vis-à-vis contribution patterns. Table 10 showsthe mixture proportions for LYBUNTs and alumni who are2-year lapsed for FY 2009. We observe heterogeneity withinboth groups of alumni.

These results indicate that targeting LYBUNTs is a soundpolicy, as they are more likely to belong to the low-varianceand high-variance segments. The opposite is true for 2-yearlapsed alumni. The classification of LYBUNT alumni intothe different segments allows the university to craft a dif-ferent messaging that may be more effective in solicitingcontributions.

We also consider the distribution of all alumni across thedifferent behavior classes, based on the decade of graduation(Table 11). Earlier graduation classes (1960–1980) have thelargest proportions of HV and LV alumni, while more recentclasses have the highest proportion of TS alumni. As discussedearlier, identifying HV alumni is important because a subset ofthis class may be responsive to solicitations, which can lead toincreased revenues. By the same token, identifying LV alumnimay free fundraisers time so that they focus on other activities(e.g., engaging non-donor alumni).

ment r, r ∈ {Low Variance(LV), Transitory(TS), High Variance(HV)}, givenmni sample.

Annual contributions

ym4 ym

5 ym6 ym

7 ym8 ym

9 ym10

$0 $100 $175 $125 $200 $0 $200100 $100 $200 $0 $100 $0 $100600 $2500 $600 $0 $25 $0 $0100 $250 $0 $0 $2000 $0 $0

$100 $0100 $100 $500 $500 $500 $250 $250$40 $20 $20 $20 $10 $45 $45$50 $0 $50 $0 $50 $0 $0

Page 11: Donor Segmentation: When Summary Statistics Don't Tell the Whole Story

Table 11Mixture proportions by class decade.

Class decade LV TS HV

1960 20.9% 46.5% 32.7%1970 17.7% 53.6% 28.7%1980 13.8% 61.7% 24.5%1990 10.5% 73.3% 16.2%2000 11.4% 69.1% 19.5%Overall 14.9% 60.6% 24.5%

182 E.J. Durango-Cohen et al. / Journal of Interactive Marketing 27 (2013) 172–184

Conclusions

In this paper we use finite mixture models to study thecontribution behavior of alumni at a university in themidwestern United States. The proposed framework serves asa tool to segment donor populations into homogeneous groups,based on their contribution sequences, which can be targetedwith specific marketing actions. The framework's capability toprocess contribution sequences, i.e., longitudinal data, providesfundamental new insights into donor contribution behavior, notdiscernible with summary (RFM) statistics. It also provides arigorous mechanism to infer and segment the population basedon unobserved heterogeneities (as well as based on otherobservable characteristics). Further, the proposed methodologyprovides a basis to tailor direct marketing policies to optimizespecific performance criteria (i.e., revenues raised, alumniparticipation).

To validate the proposed approach, we analyze alumnicontributions to the university from 2000 to 2009. Specifically,we formulate a segmentation model under the assumption thatindividual sequences are serially-dependent, i.e., variability issystematic. This allows for alumni to be clustered based on howcontributions evolve over time. We assume contributions canbe categorized by a discrete set of states, and represent theassociated sequences as the outcomes of finite-state Markovchains— one for each segment. The empirical results indicate thepresence of three distinct classes of alumni in the population: a“Low Variance” segment associated with alumni who consis-tently contribute at a the same level, a “Transient” segment whosemembers become inactive after one or a few contributions, anda “High Variance” segment for alumni whose contributionfrequencies and levels exhibit significant variation. The lattersegment is of particular interest because variability could be(partially) explained by the propensity to donate in response tothe university appeals. In all, the estimated transition probabilitiesnot only provide an intuitive means to capture state-dependence,but are consistent with those who needed support optimization ofdirect marketing activities.

Appendix A. Segmentation Based on Monetary Value

In this section, we formulate a finite mixture model tosegment the alumni population based on the monetary value oftheir annual contributions, under the assumption that eachalumnus' contribution sequence is comprised of elements thatare independent and identically distributed (i…). As describedin the General Methodology section, we require the

specification of a probability function, fs(⋅), for each of thesegments. Under the assumption that ym sequences are i…,continuous random variables, the formulation of the fs(⋅)functions are given as follows:

f s ymð jθsÞ ¼ ∏T

t¼1gs ytm� θsÞ ð14Þ

where gs(⋅) correspond to Normal densities, describing theannual monetary value of the contributions from individualsbelonging to segment s. Thus, in this case, the estimation of thefinite mixture model consists of finding the first two momentsof the segment density functions, θs = {μs,θs}, and finding themixture proportions, λ.

A detailed derivation of the EM algorithm to address thissegmentation problem is presented in McLachlan and Peel(1997). In the E-step of the algorithm, the expectation of thelog-likelihood is calculated with respect to the conditionaldistribution of the (non-observed) data on segment membership,z, given the observed contribution sequences and the currentestimates of θ. In calculating the complete log-likelihood, asgiven by Eq. (4) in the General Methodology section, theunobserved data zms can be replaced by their expected values,E[zms|ym] (derivation details appear in Durango-Cohen (2013)).This expectation is equal to the posterior probability thatalumnus m belongs to segment s, p̂ms, as given by Eq. (15),where θs

k are the estimates of θs at iteration k of the algorithm.

p̂ms ¼∏T

t¼1gs ytmjθks� �

λ̂k

s

∑Sr¼1 ∏T

t¼1gr ytm� θ̂k

r

h ��λ̂

k

r

; for m ¼ 1;…;M: ð15Þ

In the M-step, the log-likelihood is maximized with respectto λ and θ under the assumption that the missing data areknown (i.e., zm are replaced by their current expectation, p̂ms).This leads to new estimates of the segment proportions andgenerative distribution parameters, as given by Eqs. (16) and(17), respectively.

λ̂kþ1s ¼ 1

M

XMm¼1

p̂ms ð16Þ

μ̂ kþ1s ¼ 1

Mλ̂

kþ1

s

XMm¼1

p̂ms1T

XTt¼1

ytm and σ̂ kþ1s

¼ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi

1

Mλ̂kþ1s

XMm¼1

p̂ms1T

XTt¼1

ytm−μ̂kþ1s

� �2

vuut : ð17Þ

Results: Segment Parameter Estimates

In this section, we report the parameter estimates for mixturemodels with four to seven segments, and describe the processused to determine the appropriate number of segments. Table 3

Page 12: Donor Segmentation: When Summary Statistics Don't Tell the Whole Story

183E.J. Durango-Cohen et al. / Journal of Interactive Marketing 27 (2013) 172–184

contains the segmentation results for the alumni population, forthe different multivariate normal (Gaussian) mixture models,6

and the associated scaled log-likelihood (LL) values.7

In latent class models, the number of segments is not knownà priori, and must be selected/inferred. To do so, we note thatfor each of the estimated models, the scaled log-likelihoodincreases (at a decreasing) rate with the number of segments.Analysis of the models' segments suggests that much of theimprovement stems from fitting a small percentage of alumniat the higher contribution levels. As a result, we selected thesix-segment multivariate normal mixture model for additionalanalysis, as there appears to be little improvement in the fit-to-data from including additional segments. In the six-segmentmodel, approximately 23% of alumni are assigned to a segmentwith a mean annual contribution level of $21.65. Slightly over aquarter (24.1%) of alumni are assigned to a segment with a meancontribution level of $42. Close to a third (31%) of alumni belongto as segment with a mean of approximately $90. In addition,12.8% and 7.1% of alumni assigned to segments with meansof $190.57 and $609.18, respectively. The remaining alumni(2.3%) are assigned to a segment with a mean contribution levelof $2551.

References

Aaker, David (1970), “A New Method for Evaluating Stochastic Models ofBrand Choice,” Journal of Marketing Research, 8, 300–6.

Bitran, Gabriel R. and Susana V. Mondschein (1996), “Mailing Decisions in theCatalog Sales Industry,” Management Science, 42, 9, 1364–81.

Bockenholt, Ulf and Rolf Langeheine (1996), “Latent Change in RecurrentChoice Data,” Psychometrika, 61, 285–302.

Cadez, Igor, Scott Gaffney, and Padhraic Smyth (2000), “A GeneralProbabilistic Framework for Clustering Individuals,” Proceedings of theSixth ACM SIGKDD International Conference on Knowledge Discoveryand Data Mining. 140–9.

Calantone, Roger J. and Alan G. Sawyer (1978), “The Stability of BenefitSegments,” Journal of Marketing Research, 15, 395–404.

Council for Aid to Education (2011), Voluntary Support of Education Report.Dempster, A., N. Lair, and D. Rubin (1977), “Maximum Likelihood from

Incomplete Data via the EM Algorithm (with Discussion),” Journal of theRoyal Statistical Society B, 39, 1, 1–38.

Drineas, P., A. Frieze, R. Kannan, S. Vempala, and V. Vinay (2004),“Clustering Large Graphs via the Singular Value Decomposition,” MachineLearning, 56, 9–33.

Durango-Cohen, Elizabeth J. (2013), “Modeling Contribution Behavior inFundraising: Segmentation Analysis for a Public Broadcasting Station,”European Journal of Operational Research, 227, 3, 538–51.

Fader, Peter S., Bruce G.S. Hardie, and Jen Shang (2010), “Customer-BaseAnalysis in a Discrete-Time Noncontractual Setting,” Marketing Science,29, 6, 1086–108.

Gaier, Scott E. (2005), “Alumni Satisfaction with their UndergraduateAcademic Experience and the Impact of Alumni Giving and Participation,”International Journal of Educational Advancement, 5, 4, 279–88.

6 Following Drineas et al (2004), among others, we note that principalcomponent analysis (PCA) provides an alternative to Gaussian mixture models.7 The scaled log-likelihood is obtained by dividing the log-likelihood by the

number of observations.

Gonul, Fusun F. and Mengze Shi (1998), “Optimal Mailing of Catalogs:Methodology Using Estimable Structural Dynamic Programming Models,”Management Science, 44, 9, 1249–61.

———, Byung-Do Kim, and Mengze Shi (2000), “Mailing Smarter to CatalogCustomers,” Journal of Interactive Marketing, 14, 2, 2–16.

——— and Frenkel ter Hofstede (2006), “How to Compute Optimal CatalogMailing Decisions,” Marketing Science, 25, 1, 65–74.

Hall, Holly and Heather Joslyn (2011), “Giving Rose by 2.1% Last Year, NewEstimate Shows,” The Chronicle of Higher Education (http://chronicle.com/article/Giving-Rose-by-21-Last-Year/127954/).

Holmes, Jessica (2009), “Prestige, Charitable Deductions and Other De-terminants of Alumni Giving: Evidence from a Highly Selective LiberalArts College,” Economics of Education Review, 28, 18–28.

Howard, Ronald A. (2002), “Comments on the Origin and Application ofMarkov Decision Processes,” Operations Research, 50, 1, 100–2.

Hughes, Arthur M. (2000), Strategic Database Marketing. 2nd edition.McGraw-Hill.

Jonker, J., N. Piersma, and Dirk Van den Poel (2004), “Joint Optimization ofCustomer Segments and Marketing Policy to Maximize Long-termProfitability,” Expert Systems with Applications, 27, 159–68.

Le Blanc, Louis A. and Conway T. Rucks (2009), “Data Mining of UniversityPhilanthropic Giving: Cluster-discriminant Analysis and Pareto Effects,”International Journal of Educational Advancement, 9, 2, 64–82.

Leslie, Larry L. and Garey Ramey (1988), “Donor Behavior and VoluntarySupport for Higher Education Institutions,” Journal of Higher Education,59, 2, 115–32.

Lindahl, Wesley E. and Christopher Winship (1992), “Predictive Models forAnnual Fundraising and Major Gift Fundraising,” Nonprofit Management& Leadership, 3, 1, 43–63.

Marr, Kelly A., Charles H. Mullin, and John J. Siegfried (2005), “Undergrad-uate Financial Aid and Subsequent Alumni Giving Behavior,” QuarterlyReview of Economics, 45, 123–43.

McLachlan, Geoffrey and David Peel (1997), Finite Mixture Models. NewYork, NY: John Wiley & Sons, 81–116.

Netzer, Oded, James M. Lattin, and V. Srinivasan (2008), “A Hidden MarkovModel of Customer Relationship Dynamics,” Marketing Science, 27, 2,185–204.

Okunade, Albert Ade (1996), “Graduate School Alumni Donations toAcademic Funds: Micro-data Evidence,” American Journal of Economicsand Sociology, 55, 2, 213–29.

Pfeifer, Phillip E. and Robert L. Carraway (2000), “Modeling Customer Relation-ships as Markov Chains,” Journal of Interactive Marketing, 14, 2, 43–55.

Poulsen, Carsten S. (1990), “Mixed Markov and Latent Markov ModelingApplied to Brand Choice Behavior,” International Journal of Research inMarketing, 7, 5–19.

Rust, John (1994), “Structural Estimation of Markov Decision Processes,”Handbook of Econometrics, IV, 3082–143.

Simester, Duncan I., Peng Sun, and John N. Tsitsiklis (2006), “DynamicCatalog Mailing Policies,” Management Science, 52, 5, 683–96.

Sun, Xiaogeng, Sharon C. Hoffman, andMarilyn L. Grady (2007), “AMultivariateCausal Model of Alumni Giving: Implications for Alumni Fundraisers,”International Journal of Educational Advancement, 7, 4, 307–32.

Van den Poel, Dirk (2003), “Predicting Mail-order Repeat Buying: WhichVariables Matter?,” Tijdschrift voor Economie en Management, 48, 3,371–403.

Van Diepen, Merel, Bas Donkers, and Phillip H. Frances (2009), “Dynamic andCompetitive Effects of Direct Mailings: A Charitable Giving Application,”Journal of Marketing Research, 46, 120–33.

Wedel, Michel and Wagner Kamakura (2000), Market Segmentation:Conceptual and Methodological Foundations. Norwell, MA: KluwerAcademic Publishers.

———, Wagner Kamakura, Neeraj Arora, Albert Bemmaor, Jeongwen Chiang,Terry Elrod, Rich Johnson, Peter Lenk, Scott Neslin, and Carster S. Poulsen(1999), “Discrete and Continuous Representations of Unobserved Heteroge-neity in Choice Modeling,” Marketing Letters, 10, 3, 219–32.

Weerts, David and Justin Ronca (2009), “Using Classification Trees to PredictAlumni Giving for Higher Education,” Education Economics, 17, 1,95–122.

Page 13: Donor Segmentation: When Summary Statistics Don't Tell the Whole Story

184 E.J. Durango-Cohen et al. / Journal of Interactive Marketing 27 (2013) 172–184

Wunnava, Phanindra V. and Michael A. Lauze (2001), “Alumni Giving at aSmall Liberal Arts College: Evidence from Consistent and OccasionalDonors,” Economics of Education Review, 20, 533–43.

Elizabeth J. Durango-Cohen is an Associate Professor of OperationsManagement in the Stuart School of Business at the Illinois Institute ofTechnology. Dr. Durango-Cohen's current research efforts focus on theinterface of Marketing and Operations. In addition to her work on directmarketing optimization for nonprofit institutions, Professor Durango-Cohen isalso interested in modeling the effect of capacity on pricing decisions for supplychains with competing national and store-brand products.

Ramón Torres is a Ph.D. candidate in the Transportation Systems Analysis andPlanning Program at Northwestern University. He holds a bachelor's degree inAerospace Engineering from the Massachusetts Institute of Technology.

Pablo L. Durango-Cohen is an Associate Professor in the TransportationSystems Analysis and Planning Program at Northwestern University. Prof.Durango-Cohen's research interests include problems of segmentation,estimation and control with applications in transportation and marketing. Hiswork has been recognized with multiple awards, including an NSF EarlyCAREER Development Award. He serves on editorial boards in a number ofleading transportation journals.