View
2
Download
0
Category
Preview:
Citation preview
Cognitive Simplicity and Consideration Sets by
John R. Hauser Olivier Toubia
Theodoros Evgeniou
Rene Befurt
Daria Silinskaia
John R. Hauser is the Kirin Professor of Marketing, MIT Sloan School of Management, Massa-chusetts Institute of Technology, E40-179, One Amherst Street, Cambridge, MA 02142, (617) 253-2929, fax (617) 253-7597, jhauser@mit.edu. Olivier Toubia is the David W. Zalaznick Associate Professor of Business, Columbia Business School, Columbia University, 522 Uris Hall, 3022 Broadway, New York, NY, 10027, (212) 854-8243, ot2107@columbia.edu. Theodoros Evgeniou is an Associate Professor of Decision Sciences and Technology Manage-ment, INSEAD, Boulevard de Constance 77300, Fontainebleau, FR, (33) 1 60 72 45 46, theo-doros.evgeniou@insead.edu. Rene Befurt is a Visiting Scholar at the MIT Sloan School of Management, Massachusetts Insti-tute of Technology, E40-157, One Amherst Street, Cambridge, MA 02142, (857) 753-7531, be-furtr@mit.edu. Daria Silinskaia is a doctoral student at the MIT Sloan School of Management, Massachusetts Institute of Technology, E40-170, One Amherst Street, Cambridge, MA 02142, (617) 253-2268, dariasil@mit.edu. We would like to thank Daniel Bailiff (AMS), Simon Blanchard (PSU), Robert Bordley (GM), Anja Dieckmann (GfK), Holger Dietrich (GfK), Min Ding (PSU), Steven Gaskin (AMS), Patricia Hawkins (GM), Phillip Keenan (GM), Clarence Lee (MIT), Carl Mela (Duke), Andy Norton (GM), Daniel Roesch (GM), Matt Seleve (MIT), Glen Urban (MIT), Limor Weisberg and Kaifu Zhang (INSEAD) for their insights, inspiration, and help on this project. This paper has benefited from presentations at the Analysis Group Boston, the Columbia Business School, Digital Business Conference at MIT, General Motors, the London Business School, Northeastern University, the Marketing Science Conference in Vancouver, B.C., and the Seventh Triennial Choice Symposium at the University of Pennsylvania. Upon publication Matlab software and the US data are available from the authors.
Cognitive Simplicity and Consideration Sets Abstract
We develop and test methods to identify cognitively-simple decision rules that explain
which products consumers select for their consideration sets. Drawing on qualitative research we
propose disjunctions-of-conjunctions (DOC) decision rules that generalize well-studied decision
models such as disjunctive, conjunctive, lexicographic, and subset conjunctive rules. We draw-
ing on behavioral insights about cognitive simplicity and illustrate how these insights enhance
DOC rules. Using synthetic and empirical data we compare cognitively-simple DOC-based rules
to extant compensatory and non-compensatory rules. Synthetic data suggest that estimation
methods matched to data-generating rules predict validation data best. Empirically we observe
consumers’ consideration sets for global positioning systems for both estimation and validation
data. On validation data DOC-based rules, which account for cognitive simplicity (and market
commonalities), fit significantly better than other rules. This result is robust with respect to sam-
ple (German representative vs. US student), format by which consideration is measured (four
formats tested), and presentation of profiles (pictures vs. text). We illustrate that gains due to
cognitive simplicity apply to alternative estimation methods and are robust with respect to alter-
native means to estimate the benchmark rules. Empirically, our analyses suggest that cogni-
tively-simple DOC rules predict validation data well and imply different managerial insights.
Keywords: Consideration sets, non-compensatory decisions, consumer heuristics, statistical
learning, machine learning, revealed preference, conjoint analysis, cognitive
complexity, cognitive simplicity, environmental regularity, lexicography, logical
analysis of data, decision trees, combinatorial optimization.
Cognitive Simplicity and Consideration Sets
1. Introduction and Focus We focus on decision rules that consumers use to form consideration sets. We explore
decision rules (disjunctions of conjunctions) which generalize disjunctive, conjunctive, lexico-
graphic, and subset conjunctive rules that have been shown to explain and predict consideration
decisions. In theory, disjunctions of conjunctions (DOC) can involve complex logical patterns,
but prior research suggests that consumers use relatively simple rules when deciding on their
consideration sets. We therefore enforce cognitive simplicity.
Although the concept of DOC rules comes from in-depth qualitative interviews, our focus
in this paper is to develop and test methods that incorporate cognitive simplicity while inferring
decision rules directly from data in which consumers indicate which products (or product pro-
files) they would or would not consider. Such “revealed” inferences complement data-
augmentation methods in which only choices are observed as well as more-intensive and, poten-
tially intrusive, measurements of decision rules such as process tracing, information display
boards, eye-tracking, and in-depth qualitative research. For example, to the extent that valida-
tion-data predictions are best if the estimation approach matches the “true” decision rules, re-
vealed-inference methods can be used to test hypotheses developed from more-intensive meas-
urement. Behavioral experiments can vary context, observe consideration-sets, and infer the
likely context-dependent decision rules. Moreover, to the extent that non-compensatory decision
rules predict better, conjoint-like simulators based on such rules might be used to evaluate mana-
gerial actions designed to affect consumers’ consideration decisions.
Building on evidence from a variety of fields that simple rules, if they are used, balance
consumer benefits with cognitive processing costs, we modify estimation methods to incorporate
the cognitive simplicity of the decision rules. Synthetic data experiments suggest that modifica-
tions which better match the data-generating decision rules predict better. Empirical data, in
which German consumers evaluate Global Positioning Systems (GPSs), suggest that the pro-
posed cognitively-simple DOC decision rules predict consideration (from a new set of stimuli)
better than either compensatory rules or existing non-compensatory rules. The results are robust
with respect to country (Germany vs. the US), sample (random vs. student), stimuli (text vs. pic-
tures), and data collection format (four different versions). We examine managerial implications
and close by demonstrating that the performance of cognitively-simple DOC rules is not unique
to a single estimation method for DOC rules or the benchmarks.
1
Cognitive Simplicity and Consideration Sets
To provide context, we begin with a short discussion of the managerial and scientific im-
portance of consideration sets and review evidence that consumers use cognitively-simple deci-
sion rules when evaluating many products as is common in consideration decisions. Subsequent
sections review existing methods, introduce disjunctions-of-conjunctions decision rules, present
four formal results which motivate cognitive simplicity, and develop one example estimation
method for cognitively-simple DOC rules. We then describe the simulation experiments, em-
pirical tests, robustness checks, managerial implications, and alternative estimation methods.
2. Evidence for Consideration Sets and for Cognitive Simplicity When consumers are faced with a large number of alternative products, as is increasingly
common in today’s retail and web-based shopping environments, they typically screen the full
set of products down to a smaller, more-manageable consideration set which they evaluate fur-
ther (e.g., Bronnenberg and Vanhonacker 1996; DeSarbo et al., 1996; Hauser and Wernerfelt
1990; Jedidi, Kohli and DeSarbo, 1996; Mehta, Rajiv, and Srinivasan, 2003; Montgomery and
Svenson 1976; Payne 1976; Roberts and Lattin, 1991; Shocker et al., 1991; Wu and Rangas-
wamy 2003). Understanding the formation of consideration sets can be managerially important.
For example, consideration sets for packaged goods are typically 3-4 products rather than the 30-
40 products that are available while in automobiles the typical consumer focuses on 5-6 options
out of the 350+ make-model combinations on the market (Hauser and Wernerfelt 1990, Urban
and Hauser 2004). Marketing strategies that encourage consumers to add a product to the con-
sideration set increase the odds of a purchase dramatically. For example, top management at
General Motors (GM) is investing heavily to increase consideration of GM automobiles because
GM believes that its vehicles are much better than consumers perceive them to be. GM believes
that the key barrier to sales is that consumers reject GM before seriously evaluating GM products
(e.g., only 36% of California consumers will even consider a GM vehicle). This strategic initia-
tive has led to tactics such as bringing test drives to consumers, directed customer relationship
management, moderated community groups, web-based showrooms, and web-based auto-choice
advisors (Rhoads, Urban, and Sultan 2004). Indeed, GM has begun to apply the measurement
protocols and models explored in this paper.
Most experimental evidence suggests that consumers make consideration decisions with
relatively simple rules that enable them to make good decisions while avoiding excess cognitive
effort (e.g., Bettman, Luce and Payne 1998; Gigerenzer and Todd 1999; Payne, Johnson and
2
Cognitive Simplicity and Consideration Sets
Bettman 1988, 1993; Simon 1955; Shugan 1980). Some researchers suggest that such simple
rules lead to better decision outcomes than more-complex compensatory rules in the decision en-
vironments that consumers normally face (e.g., Bröder 2000; Gigerenzer and Goldstein 1996;
Hogarth and Karelaia 2005; Martignon and Hoffrage 2002). This perspective is consistent with
economic theories of consideration-set formation which posit that consumers balance search
costs and the option value of utility maximization (Hauser and Wernerfelt 1990; Roberts and
Lattin 1991). Low search-and-evaluation-cost rules might be the most efficient search or evalua-
tion methods.
From our perspective we do not need that every consumer uses a simple decision rule for
consideration decisions nor that they use simple rules in all contexts, only that it is scientifically
interesting and managerially relevant to study cognitively simple decision rules. In this paper we
focus on the consideration decision. Our analyses are consistent with existing, well-studied mod-
els of choice from among considered products.
3. Established Models of Decision Rules for Consideration Decisions Figure 1 illustrates sixteen features that consumers use to evaluate handheld GPSs. These
features were chosen as the most important based on two pretests of 58 and 56 consumers, re-
spectively. Ten of the features are represented by text and icons while the remaining six features
are represented by text and visual cues. (We review the detailed measurement formats in Section
9 and test alternative presentation formats, including text-only, in Section 10.) We focus on data
in which respondents are asked to indicate which of several profiles (32 in our experiments) they
would consider. Respondents are free to select any size consideration set. In some formats they
must classify each profile as considered or not considered; in other formats they do not need to
evaluate every profile. In this paper we explore situations in which features are described by
finitely many levels as is common in most conjoint-analysis applications. The concept of cogni-
tive simplicity also applies to continuous features, an issue we address at the end of this paper.
3
Cognitive Simplicity and Consideration Sets
Figure 1 Features of Handheld GPSs
Let j index the profiles, index the levels, f index the features (sometimes called “attrib-
utes” in the literature), and h index the respondents. Let J, L, F, and H be the corresponding
numbers of profiles, levels, features, and respondents. For ease of exposition only, we do not
write J, L, and F as dependent (e.g., L
l
f). Our models and estimation can (and do) handle such
dependency, but the notation is cumbersome. Let = 1 if profile j has feature f at level . Oth-
erwise = 0. Let
ljfx l
ljfx jxr be the binary vector (of length LF) describing profile j. Let yhj = 1 if we
observe that respondent h considers profile j. Otherwise, yhj = 0. Let hyr be the binary vector de-
scribing respondent h’s consideration decisions. All notations are summarized in Appendix 1.
Compensatory Decision Rules If consumers are utility-maximizing, then they will consider a profile if its utility is above
some threshold, Th, which accounts for search and processing costs. If hβr
is the vector of part-
worths for respondent h, then h’s evaluation of the utility of profile j is hjhjx εβ +′rr where hjε is an
error term drawn from an extreme-value distribution. Subsuming the threshold in the scaling of
the partworths, yields the standard logit model (e.g., Swait and Erdem 2007, p. 691). We defer
estimation of this and other benchmark models to Section 7.
Non-compensatory Decision Rules Commonly-studied non-compensatory rules are disjunctive, conjunctive, lexicographic,
elimination-by-aspects, and subset conjunctive rules (e.g., Gilbride and Allenby 2004, 2006;
4
Cognitive Simplicity and Consideration Sets
Jedidi and Kohli 2005; Montgomery and Svenson 1976; Ordóñez, Benson and Beach 1999;
Payne, Bettman, and Johnson 1988; Yee, et. al. 2007). Subset conjunctive rules generalize dis-
junctive and conjunctive rules (Jedidi and Kohli 2005). For consideration decisions, they also
generalize lexicographic rules and deterministic elimination-by-aspects.1
Disjunctive Rules
In a disjunctive rule, a profile is considered if at least one of the features is at an “accept-
able” (or satisfactory) level. Let = 1 if level of feature f is acceptable to respondent h.
Otherwise, = 0. Let be the binary vector of acceptabilities for respondent h. A disjunctive
rule states that respondent h considers profile j if
lhfa l
lhfa har
1≥′ hjax rr .
Conjunctive Rules
In a conjunctive rule, a profile is considered if all of the features are at an acceptable
level. With this definition, a feature may have no effect on consideration if all levels of that fea-
ture are acceptable. (Conjunctive rules usually assume a larger set of acceptable levels than dis-
junctive rules, but this is not required.) Because the use in each rule is clear in context, we use
the same notation: in a conjunctive rule, respondent h considers profile j if Fax hj =′ rr .
Subset Conjunctive Rules
In a subset conjunctive rule, a profile is considered if at least S features are at an accept-
able level.2 Using the same notation, respondent h considers profile j if Sax hj ≥′ rr . Clearly, a
disjunctive rule is a special case where S = 1 and, because hjax rr′ can never exceed F, a conjunc-
tive rule is a special case where S = F. We denote subset conjunctive rules by Subset(S).
Lexicographic Rules
In a lexicographic rule, respondents decide on an order of features and an order of levels
within features. Respondents first rank profiles on levels within the first feature. They use the
1 Tversky’s (1972) elimination-by-aspects (EBA) rule has been used in its defined-aspect-order form by researchers such as Hogarth and Karelaia (2005), Johnson, Meyer, and Ghose (1989), Montgomery and Svenson (1976), Payne, Bettman, and Johnson (1988). If the aspect order is defined, then the EBA rule for consideration is the same as a lexicographic-by-aspects rule. Additionally, if aspects measures are non-zero for a limited set of acceptable levels, then probabilistic EBA is the same as a conjunctive rule. 2 Subset(S) rules are equivalent to “image-theory” rules in organizational behavior (Ordóñez, Benson and Beach 1999). Image-theory rules are defined as rejecting options if F-S features are below thresholds. Such rules are mathematically equivalent to accepting S features.
5
Cognitive Simplicity and Consideration Sets
next feature in the lexicographic order only when profiles are tied on all higher-ranked features.
Lexicographic rules have been applied to consideration decisions by allowing ties and assuming
that only the first S features affect consideration (e.g., Yee, et. al. 2007).3 However, if we do not
distinguish ranks within consideration sets, then the only feature ordering that matters is that the
first S features have acceptable levels and the exact feature ordering is not unique. For example,
if a respondent considers any GPS with an extra bright, high resolution display, then we get the
same consideration set whether brightness is ranked before resolution or vice versa. In consid-
eration decisions, a lexicographic rule will be indistinguishable from a conjunctive rule if har is
coded such that the appropriate levels of the first S features are acceptable (and the remaining
features coded as not affecting the decision). With this coding, the lexicographic rules are
equivalent to a conjunctive rule, which, in turn, can be written as a subset conjunctive rule.
Because the disjunctive, conjunctive, and lexicographic rules (and most common forms
of EBA) can be written as subset conjunctive rules, we adopt subset conjunctive rules as our
non-compensatory benchmark. We now discuss empirically-reasonable non-compensatory
screening rules that cannot be written as subset conjunctive rules.
4. Disjunctions of Conjunctions (DOC) Our initial motivation for disjunctions of conjunctions came from qualitative discussions
with consumers. For example, when we first began interviewing respondents about GPSs, we
heard respondents express rules for handheld GPSs that were based on one or more conjunctive-
like criteria. A respondent might be willing to consider a GPS with a B&W screen if the GPS is
small and the screen is high resolution, but would require a color screen on a large GPS. Such
rules can be written as logical patterns: (B&W screen ∧ small size ∧ high resolution) ∨ (color
screen ∧ large size), where ∧ is the logical “and” and ∨ is the logical “or.” Patterns might also
include negations (¬), for example, a consumer might accept a B&W screen as long as the GPS
is less than the highest price of $399: (B&W screen ∧ ¬ $399).
In a qualitative study sponsored by General Motors some respondents considered auto-
mobiles based on two or more conjunctive-like criteria (Anonymous 2008). That study used in-
depth interviewing for 38 automobile consumers who were asked to describe their consideration
decisions for 100 real automobiles that were balanced to market data. For example, the follow- 3 Yee, et. al. (2007) also consider lexicographic rules in which the ranking of profiles is inferred from feature-level combinations (called “aspects”). The conceptual arguments in this paragraph also apply to such rules.
6
Cognitive Simplicity and Consideration Sets
ing respondent considers automobiles that satisfy either of two criteria. The first criterion is
clearly conjunctive (good styling, good interior room, excellent mileage). The second criterion
allows cars that are “hotrods.” “Hotrods” usually have poor interior room and poor mileage.
[I would consider the Toyota Yaris because]the styling is pretty good, lot of interior room, mileage is supposed to be out of this world. I definitely [would] consider [the Infinity M-Sedan], though I would proba-bly consider the G35 before the "M". I like the idea of a kind of a hotrod.
All interviews were video-recorded and the videos were evaluated by independent judges
who were blind to any hypotheses about consumers’ decision rules (Hughes and Garrett 1990;
Perreault and Leigh 1989). Most respondents made consideration decisions rapidly (89% aver-
aged less than 5 seconds per profile) and most used non-compensatory decision rules (76%).
Typically, consumers used conjunctive-like criteria defined on specific levels of features. Some
consumers would consider an automobile if it satisfied at least one of multiple criteria (a disjunc-
tion of two or more conjunctions).
We seek to formalize these qualitative insights with a class of decision rules that general-
izes previously-proposed rules. First, following Tversky (1972) we define an aspect as a binary
descriptor such as “B&W screen.” A profile either has or does not have an aspect. A pattern is a
conjunction of aspects or their negations such as (B&W screen ∧ ¬ $399). We define the size, s,
of a pattern as the number of aspects in the pattern. For example, (B&W screen ∧ ¬ $399) has
size s = 2. If p indexes patterns, then we say that a profile j matches pattern p if profile j contains
all aspects (or negations) in pattern p.
We study rules where the respondents consider a profile if the profile matches one or
more target patterns. Because each pattern is a conjunction, these logical rules are disjunctions of
conjunctions (DOC). DOC rules generalize both disjunctive and conjunctive rules and are con-
sistent with the qualitative interviews. While other logical patterns are possible, we show that
DOC rules are sufficiently general to fit any observed consideration decisions.
Formal Definition of DOC Rules For any set of features and levels there are many potential conjunctions of aspects. If all
F features were binary, there would be 3F – 1 possible patterns. The factor of 3 comes from the
fact that every conjunctive pattern could either ignore a feature, contain a feature, or contain its
negation. There are many more patterns if the features have many levels. Based on the literature
7
Cognitive Simplicity and Consideration Sets
cited earlier, we expect that decision rules are cognitively simple. Thus, we expect the number
of aspects in a pattern to be small. To capture this concept, we define DOC(S) as the set of DOC
rules in which the maximum size of the patterns is S. (It will be clear in context whether S refers
to the maximal pattern size in DOC(S) or the subset sizes in Subset(S).)
For a set of allowable patterns, let whp = 1 if pattern p is one of the patterns describing re-
spondent h’s decision rule and let mjp = 1 if profile j matches pattern p. Otherwise, whp and mjp
are zero. Let and hwr jmr be the corresponding binary vectors with length equal to the number of
allowable patterns in a DOC rule. Then a DOC rule implies that respondent h considers profile j
if 1≥′ hjwm rr .
While the binary vectors, and hwr jmr , play a role analogous to partworth and feature vec-
tors in conjoint analysis, their length can be quite large because the number of allowable patterns
grows rapidly with S. For example, if all 16 features in Figure 1 were binary, then there would
be 32 patterns for S = 1, 512 for S =2, 4,992 for S = 3, and 34,112 for S = 4, growing to almost
20 million for S = 10. Fortunately, behavioral theory suggests we can limit our search to cogni-
tively-simple DOC rules.
Relationship of DOC Rules to other Non-compensatory Rules DOC rules generalize Subset(S) rules and are more flexible. For example, (B&W screen
∧ small size ∧ high resolution) ∨ (color screen ∧ large size) cannot be written as a subset con-
junctive rule. DOC rules nest other logical rules and match specific rules for some values of S.
In particular, we have the following results.
Result 1. The following sets of rules are equivalent: (a) disjunctive rules, (b) Subset(1)rules, and (c) DOC(1) rules.
Result 2. Conjunctive rules are equivalent to Subset(F) rules which, in turn, are a subset of the DOC(F) rules, where F is the number of features.
Result 3. All Subset(S) rules can be written as a DOC(S) rule, but not all DOC(S) rules can be written as a Subset(S) rule.
The formal proofs to Results 1, 2, and 3 are contained in Appendix 2. We have already
argued that disjunctive and conjunctive rules are equivalent to Subset(1) and Subset(F), respec-
tively. Part (c) of Result 1 follows because a conjunction of size S = 1 is just a single aspect and,
if we include all relevant aspects in a DOC(1) rule, it is a disjunction of the aspects. The second
8
Cognitive Simplicity and Consideration Sets
half of Result 2 follows because DOC(F) rules allow conjunctions up to the number of features.
Result 3 follows similar logic. A Subset(S) rule implies that a profile is considered if any
subset of S features is at an acceptable level. Each subset of S features corresponds to one con-
junction. Because at least one of the conjunctions needs to be satisfied, this is just a disjunction
of conjunctions. DOC(S) allows disjunctions of conjunctions that are not allowed with a Sub-
set(S) rule and allows conjunctions with fewer than S features. For example, suppose there were
three features: battery life, track log, and price, and consider a Subset(2) rule with the following
acceptable features: 30-hour battery life, track log (yes), $249, and $299. This rule can be can be
written as a DOC(2) rule: (30 hours ∧ yes) ∨ (30 hours ∧ $299) ∨ (30 hours ∧ $249) ∨ (yes ∧
$299) ∨ (yes ∧ $249). However, the rule (30 hours ∧ yes) ∨ ($249) can be written as a DOC(2)
rule, but not a Subset(2) rule.
Results 1 and 2 are important because they make predictions that we test with synthetic
data. For example, we should not be surprised if either a DOC-based or a Subset(1)-based esti-
mation method does well on data generated with disjunctive rules (Result 1) or if either a DOC-
based or a Subset(F)-based estimation method does well on data generated with conjunctive
rules (Result 2). Result 3 implies that the comparison of Subset(S) rules and DOC rules is inter-
esting. A DOC-based estimation may not do as well as a Subset(S)-based estimation on data
generated with Subset(S) rules or vice versa. Of course, the ability to fit a rule to data depends
upon our ability to estimate the parameters of a rule. We turn now to estimation.
5. Estimation of DOC Rules: Issues of Complexity One strength of DOC rules is their generality, but this also presents a challenge to estima-
tion that infers DOC rules from observed consideration decisions. The following result illus-
trates the generality of DOC rules:
Result 4. Any set of considered profiles can be fit perfectly with at least one DOC rule. Moreover, the DOC rule need not be unique.
To establish Result 4 we recognize that every considered profile, j, is a set of aspects. Let
pj be a pattern of length F that contains all aspects in j and only those aspects. Clearly, pattern pj
matches profile j. This pattern will match no other profile that is not identical to j. A disjunction
of the pj patterns (for all considered j) will match all considered profiles but no other profiles.
For example, fix all features in Figure 1 except battery life, track log (yes or no), and price and
suppose that respondent h considers only two profiles (out of J): {“30-hour battery,” “track log,”
9
Cognitive Simplicity and Consideration Sets
$299} and {“15-hour battery,” “track log,” $249}. These data could be fit perfectly with (“30-
hour battery” ∧ “track log” ∧ “$299”) ∨ (“15-hour battery” ∧ “track log” ∧ “$249).
The second half of Result 4 is clear by counterexample. Suppose that respondent h con-
siders only GPSs that have “30-hour battery” and a “track log.” The pattern (“30 hour battery” ∧
“track log”) will fit the data. However, this DOC rule with one size-2 pattern is equivalent to
another DOC rule in which we have a disjunction of two size-3 patterns each of which includes
the simple rule combined with any aspect or its negation: (“30 hour battery” ∧ “track log” ∧
$249) ∨ (“30 hour battery” ∧ “track log” ∧ ¬ $249). Expanding to size-F patterns we find a very
large number of rules consistent with the observed data.
Result 4 implies a very important challenge: if for any observed consideration set we can
find at least one, and possibly many, DOC rules that fit the data perfectly, then, without further
constraints, we are likely to over-fit the observed data. Fortunately, the behavioral literature
(cited earlier) suggests a solution to the dilemma of Result 4. Experimental evidence suggests
that consumers use simple rules (e.g., Gigerenzer and Goldstein 1996; Payne, Bettman, and
Johnson 1993). This experimental evidence suggests further that simple rules do well for deci-
sions that consumers face on a day-to-day basis. These behavioral hypotheses are consistent
with the statistical learning literature which recommends avoiding complexity when estimating
the parameters of models (e.g., Cucker and Smale 2002; Evgeniou, Boussios and Zacharia 2005;
Hastie, Tibshirani and Friedman 2001; Langley 1996; Vapnik 1998).
Based on both literatures we propose to focus on simplicity by placing limits on cognitive
complexity. A simpler decision rule that is consistent with experimental evidence may not fit es-
timation data perfectly, but may predict validation data better – because it is more likely to repre-
sent the consumer’s true decision rule or because complexity control mitigates Result 4 and
avoids overfitting. (Simulations in Section 8 hint that the former is a sufficient explanation; ei-
ther or both explanations are consistent with our empirical tests.)
However, even if we reward cognitive simplicity, we might not identify unique patterns
that have empirical validity. To help identify DOC rules for individual respondents we use
“market” information. There are at least two motivations for using “market” information. The
first motivation is an analogy to population shrinkage which enhances accuracy in hierarchical
Bayesian models (e.g., Rossi and Allenby 2003). The second motivation is drawn from the be-
havioral literature which hypothesizes that consumers use simple rules because they “capitalize
10
Cognitive Simplicity and Consideration Sets
on environmental regularities to make smart inferences (Chase, Hertwig and Gigerenzer 1998, p.
209).” Gigerenzer and Selten (2001) argue further that simple rules are “ecologically rational.”
Similarly, Payne, Johnson and Bettman (1993, pp. 97-99) demonstrate that the performance of
simple decision rules varies with the decision environment. By inference, commonalities among
respondents facing similar decision environments provide valuable information on which rules
are more likely for a respondent. Either motivation is sufficient to suggest that “market” behavior
provides valuable information for identifying DOC rules for each respondent.
6. Identifying Decision Rules by Accounting for Cognitive Simplicity In this section we illustrate an estimation method that identifies DOC rules while ac-
counting for cognitive simplicity (and market commonalities). We choose a statistical-learning
algorithm because the modifications for cognitive simplicity and market information are trans-
parent. We believe that the predictive performance is due to the cognitively-simple DOC rules
rather than the specific estimation method. For example, in Section 12 we illustrate how cogni-
tively-simple DOC rules can be identified with a different statistical-learning algorithm (logical
analysis of data). We also demonstrate that (1) algorithms (such as decision trees) without cog-
nitive complexity control and market information do less well and (2) statistical-learning bench-
marks do less well than statistical-learning methods for cognitively-simple DOC rules. To the
extent that another approach to estimating cognitively-simple DOC rules improves predictions
relative to the methods we test, our results are conservative.4
The basic data we observe, for a set of respondents and profiles, is whether or not a re-
spondent considers a profile (yhj). We seek to identify the patterns respondent h uses to evaluate
profiles. Fit on a calibration sample is maximized when we select patterns such that profile j is
considered if 1≥′ hjwm rr and not considered if 0=′ hj wm rr . (Recall that jmr identifies the patterns
which match profile j and is a binary vector that identifies patterns.) hwr
To measure errors, we define penalty variables. Let be non-negative integers such
that
+hjξ
+≤′ hjhj wm ξrr . Based on this constraint, will equal 1 (or greater) whenever DOC rules pre-
dict that profile j is considered. Similarly, let be non-negative integers such that
+hjξ
−hjξ
4 Bayesian methods for cognitively-simple DOC rules may face practical challenges due to the length of the hwr vec-tor. Such formulations require further research. See Section 12.
11
Cognitive Simplicity and Consideration Sets
−−≥′ hjhj wm ξ1rr . will equal 1 (or greater) whenever DOC rules predict that profile j is not
considered.
−hjξ
We can now define prediction errors. Because yhj = 1 when we observe that profile j is
considered in the estimation data, will equal 1 (or greater) when we observe profile j is
considered, but we predict it is not-considered (false negative predictions). Similarly,
will equal 1 (or greater) for false positive predictions. Part of our objective in esti-
mating DOC rules is to minimize the sum of these false prediction errors in the estimation data.
(We might also weigh false positives more or less than false negatives. We address this issue in
Section 9.)
−hjhjy ξ
+− hjhjy ξ)1(
We can define cognitive simplicity in many ways. In this section we penalize the number
of patterns and favor simple rules by allowing only patterns that have a maximum length of S.
(Other penalties, such as the length of the patterns, are also possible. Section 12 provides an il-
lustration.) If is a vector of 1’s, of length equal to the number of potential patterns, then we
measure complexity by which counts the number of patterns in
er
hwe rr′ hwr .
We balance fit and cognitive simplicity with the following loss function. (γc is a parame-
ter that tells us how much to penalize the lack of cognitive simplicity.)
(1) . hc
J
jhjhjhjhj weyy rr′+−+∑
=
+− γξξ1
])1([
Result 4 cautions us about non-uniqueness even if we favor cognitive simplicity, hence we in-
corporate information from the “market.” In particular, we choose decision rules that are more
likely to match profiles that are considered by other respondents in the market. If Mj is the (mar-
ket) percent of respondents who consider profile j, then we “shrink” to the market with an addi-
tional criterion in the loss function (γM is a parameter that tells us how much to weigh market
considerations).
(2) ∑=
+− −+J
jhjjhkjM MM
1
])1([ ξξγ
If we select γM to be small, Equation 2 will break ties among those patterns that minimize Equa-
tion 1. Such market-based constraints have proven valuable in other marketing applications
(e.g., Evgeniou, Pontil and Toubia 2007).
There are many estimation methods that we might consider. Statistical learning provides
12
Cognitive Simplicity and Consideration Sets
one transparent estimation method to illustrate the concepts of estimating DOC patterns while
accounting for cognitive simplicity and market commonalities. In particular, we use the integer
program in Equation 3, which, for simplicity, we call DOCMP. (The cognitive simplicity con-
straint on S is implicit in the definition of hwr .)
(3) hc
J
jhjjhkjM
J
jhjhjhjhj
hh
weMMyyw
rrrr
′+−++−+ ∑∑=
+−
=
+− γξξγξξξ 11
])1([])1([},{
min
Subject to: +≤′ hjhj wm ξrr for all j = 1 to J
−−≥′ hjhj wm ξ1rr for all j = 1 to J
, ≥ 0, +hjξ −
hjξ hwr a binary vector
Solving the Mathematical Program (DOCMP) DOCMP is equivalent to the set-covering problem and, hence, is an NP-hard integer
program (Cormen, et. al. 2001). Fortunately, efficient approximation algorithms have been
developed and tested for this class of problems. For example, a greedy heuristic runs in
polynomial time (Fiege 1998; Lund and Yannakakis 1994). The greedy approximation adds
non-empty patterns sequentially by choosing the patterns based on the greatest reduction in the
objective function and stopping when no further reduction is feasible. Alternatively, DOCMP
can be solved approximately with a linear-programming relaxation in which we first allow the
to be continuous on [0, 1], then round up to 1.0 any positive whwr hj that is above a threshold.
Formulated thus, the linear-programing relaxation is similar to the “LASSO” method in
statistical learning. The “LASSO” method usually provides sparse solutions in which a relatively
few patterns are chosen (Hastie, Tisbshirani, and Friedman 2003, and references therein). In our
estimations, we use both the greedy and the relaxation methods, choosing the solution that
provides the best value of the objective function. These solution methods scale sufficiently well
for cognitively-simple values of S and can easily handle the 16-feature empirical application in
Section 9.
Choosing DOCMP’s Tuning Parameters with Leave-one-out Cross Validation DOCMP, when solved, chooses the number of patterns automatically, However, there
are two explicit tuning parameters in Equation 3, γc and γM. These tuning parameters tell us how
13
Cognitive Simplicity and Consideration Sets
much to penalize the number of patterns (one measure of cognitive simplicity) and how much to
shrink h’s patterns toward the market.
We set γM to an arbitrary small number so that market information is used only to break
ties among patterns. We select γc with a method called leave-one-out cross validation. Leave-
one-out cross validation has been used successfully in both the statistical learning and marketing
literatures (e.g., Cooil, Winer and Rados 1987; Efron and Tibshirani 1997; Evgeniou, Pontil and
Toubia 2007, Hastie, Tibshirani, and Friedman 2003; Kearns and Ron 1999; Kohavi 1995; Shao
1993; Toubia, Evgeniou and Hauser 2007; Zhang 2003). Specifically, for each potential value of
γc we leave out one profile from the estimation data and use Equation 3 to identify patterns with
data on the remaining J–1 profiles. We predict consideration for the left-out profile, repeat for
each profile, and sum errors over respondents choosing γc to minimize leave-one-out cross-
validation errors on the estimation data. No data from any holdout or validation observations are
used in leave-one-out cross validation. We test sensitivity to the choice of γc for both the
calibration and validation samples and, in Section 12, we examine an algorithm that does not use
leave-one-out cross validation.
We enforce an upper bound on pattern length by fixing S. We choose S = 4 as
appropriate to the goals of this paper. In the simulation experiments F = 4, hence S = 4 provides
a reasonable test of DOCMP. In the empirical application S = 4 provides a rich set of possible
DOC rules (30,000-plus potential patterns) and is consistent with the DOC rules articulated in
qualitative pretests.5 More importantly, a fixed S provides a conservative perspective on whether
accounting for cognitive complexity improves predictions. Similarly, the performance of
DOCMP with S = 4 is a conservative indicator of what is possible when other estimation
methods are modified to identify cognitively-simple DOC decision rules.
7. Benchmark Decision Rules We compare cognitively-simple DOC rules to compensatory, conjunctive, disjunctive,
subset conjunctive, and lexicographic rules. Results 1-3, and the degeneracy of lexicographic
rules for consideration decisions, imply that conjunctive, disjunctive, and lexicographic rules are
special cases of Subset(S). These benchmarks provide a broad sampling from previous proposals
5 Large S is neither consistent with behavioral theory nor parsimonious. For example, when S = 7 any proposed es-timation method must deal with almost 2 million patterns. We choose to be conservative in our use of empirical validation data and, hence, fix S to an empirically-reasonable value.
14
Cognitive Simplicity and Consideration Sets
and enable us to test DOC rules vs. other rules of consideration decisions. In Section 12 we
address other estimation methods for cognitively-simple DOC rules.
To estimate our benchmarks, we use published hierarchical Bayes (HB) methods. We
retain the basic Bayesian formulations cited in the references, modified slightly for consideration
decisions. These formulations have been applied widely and have been shown to predict well
(e.g., Arora and Huber 2001; Gilbride and Allenby 2004, 2006; Rossi and Allenby 2003).
The benchmark rules can also be estimated with statistical-learning methods (less
common in the marketing literature). As a check on the robustness of our empirical
comparisons, we formulate statistical-learning methods analogous to DOCMP for both
compensatory and Subset(S) rules (see Appendix 6). The performance is comparable to the
Bayesian methods and does not change the basic interpretations of the empirical comparisons.
HB Compensatory Estimation
Respondent h considers profile j if hjhjx εβ +′rr is above a threshold.6 Subsuming the
threshold in the partworths, we get a standard logit likelihood function:
(4) hj
hj
x
x
hjhje
exyβ
β
β rr
rrrr
′
′
+==
1),|1Pr(
),|0Pr( hjhj xy βrr
= = 1 - ),|1Pr( hjhj xy βrr
= . We impose a first-stage prior on hβr
that is normally
distributed with mean 0βr
and covariance D. The second stage prior on D is inverse-Wishart with
parameters equal to I/(Q+3) and Q+3, where Q is the number of parameters to be estimated and I
is an identity matrix. We use diffuse priors on 0βr
. Inference is based on a Monte Carlo Markov
chain with 20,000 iterations, the first 10,000 of which are used for burn-in.
HB Subset(S) Estimation (includes Disjunctive and Conjunctive) We use the hierarchical Bayes model of Gilbride and Allenby (2004) modified to esti-
mate Jedidi and Kohli’s (2005) subset conjunctive rules. The modifications reflect differences in
data and the generalization in models. In particular, we observe consideration directly while it is
a latent construct in the Gilbride-Allenby formulation. We also do not impose constraints that 6 An additive model can represent a lexicographic or conjunctive model, for example, if one partworth is greater than the sum of the other partworths. To account for this phenomenon Hogarth and Karelaia (2005), Martignon and Hoffrage (2002), and Yee, et. al.’s (2007) constrain an additive model such that the partworths are compensatory. Their data suggest that an additive model outperforms a constrained compensatory model. Hence, HB Compensa-tory is a conservative benchmark for DOC-based estimation.
15
Cognitive Simplicity and Consideration Sets
levels within a feature are ordered. This allows us to address multi-level features in which there
is no defined ordering. The Subset(S)-based likelihood function is:
(5) ⎪⎩
⎪⎨⎧
<′
≥′==
Saxifb
Saxifbaxy
hj
hj
hjhj rr
rrrr
2
1
),|1Pr(
Where again ),|0Pr( hjhj axy rr= = 1 - ),|1Pr( hjhj axy rr
= . The parameters, b1 and b2 model re-
sponse errors. Specifically, a profile is considered with probability b1 if it satisfies a Subset(S)
rule; a profile is considered with probability b2 if it does not.
The first-stage prior on each is a binomial distribution with parameter, lhfa lfθ . The sec-
ond-stage priors are beta for b1 and b2 and Dirichlet for the lfθ ’s. (We use the same distributions
and parameterization used by Gilbride and Allenby 2004.) We impose the constraint, b1 > b2,
with rejection sampling (e.g., Allenby, Arora and Ginter 1995). Inference is based on 20,000 it-
erations of the Monte Carlo Markov chain, the first 10,000 of which are used for burn-in. Be-
cause the set of possible acceptabilities, har , is large, we follow Gilbride and Allenby (2004, p.
404) and use a “Griddy Gibbs” algorithm. Details are available in Gilbride and Allenby and are
summarized in Appendix 3.
8. Simulation Experiments We seek to test whether matched decision rules predict better than decision rules that are
mismatched, where we say a decision rule is matched if both the estimation and the data genera-
tion are based on that decision rule. For example, we might expect compensatory-based estima-
tion to predict better than non-compensatory-based estimation when the data are generated with a
compensatory rule. We might also expect Subset(S)-based estimation to predict better when the
data are generated with Subset(S) and DOC-based estimation to predict better when the data are
generated with DOC rules, unless the generating rules are equivalent as per Results 1-3.
We consider products with 4 features, each with 4 levels. We generate two orthogonal
designs of 32 profiles each.7 The first orthogonal design is used for estimation (including leave-
one-out cross validation). The second orthogonal design is used purely for validation. We gen-
erate data independently for each of eight “true” decision rules: compensatory, disjunctive [same
7 Orthogonal designs might pose problems for leave-one-out cross validation (Evgeniou, Pontil and Toubia 2007). Thus, the choice of orthogonal designs favors the HB benchmark estimation methods and provides a conservative test of DOCMP. We return to this issue at the end of Section 9.
16
Cognitive Simplicity and Consideration Sets
as Subset (1) and DOC(1)], Subset(2), Subset(3), conjunctive [same as Subset(4) and lexico-
graphic for consideration sets], DOC(2), DOC(3), and DOC(4). For each decision rule we gen-
erate estimation and validation data for four independent sets of 100 respondents.
We generated the data to be consistent with the HB formulations: normally-distributed
partworths and binomial sampling from logit probabilities for the compensatory rules; Dirichlet-
distributed acceptability parameters and binomial sampling for choices for Subset(S) rules. The
DOC-based data were based on Dirichlet-distributed pattern weights and a binomial distribution
for choices with the same b1,b2 probabilities as for Subset(S). To maintain consistency among
alternative data-generation rules, we calibrated the decision rules to be as parsimonious as feasi-
ble and to hold the average number of considered profiles constant across decision rules (8 pro-
files; a number consistent with our empirical data). Details are provided in Appendix 4.
In making comparisons, we take Results 1-3 into account. For example, both DOCMP
and HB Subset(1) are matched to a disjunctive rule. The results are given in Table 1. The per-
centages in bold are the best predictions on validation data (or not significantly different from the
best) for the indicated data-generation decision rule.
Table 1 Out-of-Sample Hit Rate
(Each Estimation Method and Each Data-Generation Decision Rule)
Hit Rate for Indicated Estimation Method
Data Generation Decision Rule
HB Compensatory
HB Subset(1)
HB Subset(2)
HB Subset(3)
HB Subset(4) DOCMP
Compensatory 74.6%* 45.2% 59.3% 66.7% 72.4% 72.8%
Subset(2) 78.5% 71.1% 88.0%* 85.4% 80.3% 84.5%
Subset(3) 78.6% 61.3% 81.9% 87.2%* 80.9% 83.8%
Conjunctive [Subset(4)] 78.7% 60.3% 80.7% 87.1% 89.0%* 89.2%* Disjunctive [DOC(1), Subset(1)] 84.4% 85.6% 86.4% 86.1% 83.7% 90.8%*
DOC(2) 77.6% 70.6% 76.1% 78.6% 78.8% 87.0%*
DOC(3) 76.3% 51.0% 65.4% 76.4% 77.8% 83.3%*
DOC(4) 74.8% 53.7% 65.8% 75.0% 76.9% 82.9%*
*Best predictive hit rate, or not significantly different than the best at the 0.05 level, for that decision rule (row).
17
Cognitive Simplicity and Consideration Sets
Table 1 has a distinctly diagonal flavor. Predictions are usually best whenever estimation
is matched to the decision rule by which synthetic respondents make consideration decisions.
There is redundancy for conjunctive and disjunctive rules. For conjunctive rules, the predictive
abilities of DOCMP and HB Subset(4) are not statistically different. However, for disjunctive
rules DOCMP does better than HB Subset(1). Table 1 suggests that an estimation method pre-
dicts well on average when it matches the true decision rule. This is a necessary, but not suffi-
cient, condition for using this set of estimation methods to attempt to infer whether DOC rules
are a reasonable description of empirical consideration decisions.
9. Empirical Application – Global Positioning Systems (GPSs) Using the sixteen features in Figure 1 we generated an orthogonal design of 32 GPS pro-
files.8 We then developed four alternative formats by which to measure consideration. These
respondent task formats were developed based on qualitative pretests to approximate the shop-
ping environment for GPSs. Each respondent task format was implemented in a web-based sur-
vey and pretested extensively with over 55 potential respondents from the target market. At the
end of the pretests respondents found the tasks easy to understand and felt that the task formats
were reasonable representations of the GPS market.
We invited two sets of respondents to complete the web-based tasks: a representative
sample of German consumers who were familiar with GPSs and a US-based student sample. In
this section we describe results from our primary format using the German sample of representa-
tive consumers. We defer to Section 10 discussion of the student sample, the other formats, and
a text-only version.
Figure 2 provides screen-shots in English and German for the basic format. A “bullpen”
is on the far left. As respondents move their cursor over a generic image in the bullpen, a GPS
appears in the middle panel. If respondents click on the generic image, they can evaluate the
GPS in the middle panel deciding whether or not to consider it. If they decide to consider the
GPS, its image appears in the right panel. Respondents can toggle between current consideration
sets and their current not-consider sets. There are many ways in which they can change their
mind, for example, putting a GPS back or moving it from the consideration set to the not-
consider set, or vice versa. In this format respondents continue until all GPSs are evaluated. 8 To make the task realistic and to avoid dominated profiles (Johnson, Meyer and Ghose 1989), price was manipu-lated as a two-level price increment. Profile prices were based on this increment plus additive feature-based costs. We return to the issue of orthogonal designs at the end of this section.
18
Cognitive Simplicity and Consideration Sets
Figure 2 Consideration Task in One of the Formats (English and German)
Because decision rules are often context dependent (Payne, Bettman and Johnson 1988,
1993), it is possible that forcing respondents to evaluate every profile would influence decision
rules. Thus, we tested two formats that do not require respondents to evaluate all GPSs. We also
tested a format in which respondents saw profiles randomly. See Section 10.
Before respondents made consideration decisions, they reviewed screens that described
GPSs in general and each of the GPS features. They also viewed instruction screens for the con-
sideration task and instructions that encouraged incentive compatibility. Following the consid-
19
Cognitive Simplicity and Consideration Sets
eration task respondents ranked profiles within the consideration set (data not used in this paper)
and then completed tasks designed to cleanse memory. These tasks included short brain-teaser
questions that direct respondents’ attention away from GPSs. Following the memory-cleansing
tasks, respondents completed the consideration task a second time, but for a different orthogonal
set of GPSs. These second consideration decisions are validation data and are not used in the es-
timation of any rules.
Respondents were drawn from a web-based panel of consumers maintained by the GfK
Group. Initial screening eliminated respondents who had no interest in buying a GPS and no ex-
perience using a GPS. Those respondents who completed the questionnaire received an incen-
tive of 200 points toward general prizes (Punkte) and were entered in a lottery in which they
could win one of the GPSs (plus cash) that they considered. This lottery was designed to be in-
centive compatible as in Ding (2007) and Ding, Grewal, and Liechty (2005). (Respondents who
completed only the screening questionnaire received 15 Punkte.)
In total 2,320 panelists were invited to answer the screening questions. The incidence
rate (percent eligible) was 64%, the response rate was 47%, and the completion rate was 93%.
Respondents were assigned randomly to one of the five task formats (the basic format in Figure
2, three alternative formats, and a text-only format). After eliminating respondents who had null
consideration sets or null not-consider sets in the estimation task, we retained 580 respondents.
The average size of the consideration set (estimation data) for the task format in Figure 2 was 7.8
profiles. There was considerable variation among respondents (standard deviation was 4.8 pro-
files). The average size of the consideration set in the validation task was smaller, 7.2 profiles,
but not significantly different. Validation consideration set sizes had an equally large standard
deviation (4.8 profiles).
Predictive Tests Initially, we estimate HB Compensatory, HB Subset(S) models for S = 1 to 4, and
DOCMP. Table 2 summarizes the ability of each estimation method (calibrated on the estima-
tion task) to predict consideration for the validation task. DOC-based estimation is significantly
better than all benchmark estimation methods on the ability to predict consideration (hit rate).
The next best method is HB Compensatory estimation.
Interestingly, if we were to examine hit rate alone on this data set, and limit ourselves to
estimation methods other than DOCMP, we might conclude erroneously that a compensatory
20
Cognitive Simplicity and Consideration Sets
rule has the best hit rate. Given the robustness of the linear model for empirical data (e.g.,
Dawes 1979; Dawes and Corrigan 1974), this is not surprising. Including DOC-based estimation
gives, potentially, a different interpretation: cognitively-simple non-compensatory rules, DOC,
have the best hit rate.9
Table 2 Empirical Comparison of Estimation Methods
(Representative German Sample, Task Format in Figure 2)
Estimation method Overall hit rate† Relative hit-rate improvement
K-L divergence percentage
HB Compensatory 78.5% 34.4% 15.0%
HB Subset(1) [Disjunctive] 66.7% -1.7% 17.8%
HB Subset(2) 69.1% 5.8% 21.6%
HB Subset(3) 74.8% 23.0% 24.9%
HB Subset(4) 75.4% 24.0% 24.7%
DOCMP [Disjunctions of conjunctions] 81.9%* 44.8%* 32.0%*
† Number of profiles predicted correctly, divided by 32. * Best at the 0.05 level.
Hit rate was sufficient for the synthetic-data experiments because we sought relative pre-
dictive ability. However, hit rates alone are difficult to interpret for empirical consideration de-
cisions. For example, if a respondent considers 8 profiles (of 32) in the validation sample, then
naively predicting the respondent considers nothing will predict all not-considered profiles cor-
rectly and give a hit rate of 75%. If 8 profiles were considered in the validation sample, a ran-
dom prediction that 25% of the profiles are considered gives an average hit rate of 62.5%
((0.25)2 + (0.75)2). To account for this phenomenon, Srinivasan (1988), Srinivasan and Park
(1997) and, in a related situation, Payne, Bettman and Johnson (1993, p. 128) use a relative
measure: (observed hit rate – random hit rate)/(100% - random hit rate). This relative measure is
given in Table 2 where, in our case, “random” is the expected hit rate obtained on the validation
sample if we randomly predict consideration in proportion to that observed on the estimation
sample.
Another issue with hit rate is that it places equal weight on false positives and false nega-
9 This is a paramorphic statement. We are hesitant to conclude that the model with the best hit rate is the best iso-morphic description of respondents’ decision rules. The simulation results are necessary, but not sufficient, to con-clude that DOC is the true decision model.
21
Cognitive Simplicity and Consideration Sets
tives. If we knew the managerial situation we could weigh these two types of error differently in
Equation 1. Absent managerial weights we turn to information theory for a natural way to com-
bine false positives and false negatives. This measure is the Kullback-Leibler (K-L) divergence
(Chaloner and Verdinelli 1995; Kullback and Leibler 1951; Lindley 1956). It measures the ex-
pected gain in Shannon’s information relative to a random model. Because the null model de-
pends upon the number of profiles considered, we normalize by comparing the K-L divergence
for a model to the K-L divergence for perfect prediction (e.g., as in Hauser 1978). Appendix 5
provides formulae for the K-L divergence percentage. This measure complements relative hit
rate because none of the estimation methods is designed to maximize K-L divergence. The K-L
divergences, reported in Table 2, are consistent with those from overall and relative hit rates;
cognitively-simple DOC-based estimation is significantly better than the other rule/estimation
methods.
Empirical Evidence Implies Relatively Simple DOC Rules Although DOCMP encourages cognitive simplicity, the estimation could choose cogni-
tively complex rules if they were the best rules on the calibration data. Despite the large number
of potential patterns, DOCMP chose relatively simple rules for our data. Only 7.1% of the re-
spondents used more than one pattern and no one used more than two patterns. For most respon-
dents DOCMP appears to predict well because it focuses on a relatively few specific patterns and
is flexible about pattern length (subject to complexity). Subset(S) rules are less specific and less
flexible; they require a disjunction of conjunctions of size S. The pure disjunctive rule might be
too simple; it does not appear to predict well.
To further examine the advantage of simple rules, we re-estimated DOCMP without ac-
counting for cognitive simplicity and market commonalities. The relative hit rate for this model
(25.9%) was comparable to the non-DOC benchmarks and significantly worse than the full
DOCMP model (p < 0.001).10
Statistical-Learning Benchmarks and LOOCV Sensitivity Bayesian estimation for the compensatory and Subset(S) benchmarks is common in mar-
keting. However, it is possible that the results in Table 2 are due to the use of statistical-learning
and/or leave-one-out cross validation (LOOCV) rather than DOC rules, cognitive simplicity, and 10 The hit rate for the reduced model was 75.7% and the K-L divergence was 29.6%. Eliminating either complexity or market commonality, but not both, gives intermediate results.
22
Cognitive Simplicity and Consideration Sets
market commonality. To test this hypothesis, we re-estimated the compensatory and Subset(S)
benchmarks using integer programs formulated to be as similar to DOCMP as feasible
(CompMP and SubsetMP – see Appendix 6). The statistical-learning methods predicted better
than the Bayesian methods for compensatory and less well for Subset(S); DOCMP was better
than both CompMP and SubsetMP on both K-L percentage and hit rate with most comparisons
significant.11 This suggests that the higher performance of DOCMP is due at least partly to the
use of cognitively-simple DOC rules.
On our data DOCMP’s performance is relatively insensitive to γc. For γc =1 to 4.5 the
CV hit rate (used to select γc) varies from 80.8% to 81.6% and validation hit rate varies from
81.9% and 82.4%. This robustness is consistent with Evgeniou, Pontil and Toubia (2007).
In summary, the performance of cognitively-simple DOC rules does not appear to be due
solely to either statistical learning methods or to LOOCV.
Sensitivity to Orthogonal Designs There has been significant research in marketing on efficient experimental designs for
choice-based conjoint experiments (Arora and Huber 2001; Huber and Zwerina 1996; Kanninen
2002; Toubia and Hauser 2007), but we are unaware of any research on efficient experimental
designs for consideration decisions or for the estimation of cognitively-simple DOC rules. When
decisions are made with respect to the full set of 32 profiles, aspects are uncorrelated up to the
resolution of the design and, if there were no errors, we should be able to identify DOC patterns.
However, when one profile is removed for LOOCV, aspects are no longer uncorrelated and pat-
terns may not be defined uniquely – especially if one of the considered profiles is left out. As a
mild test, we re-estimated the two best-performing rules, DOCMP and HB Comp, with only the
17 of 32 most-popular profiles (#’s 16-17 were tied). DOCMP remained significantly better on
both comparison measures: DOCMP achieved a K-L of 29.4% and a hit rate of 79.3%; HB
Comp achieved a K-L percentage of 15.8% and a hit rate of 76.3%.
Until the issue of optimal DOC-consideration experimental designs is resolved, the per-
formance of DOCMP remains a conservative test of cognitively-simple DOC rules. Improved or
adaptive experimental designs might improve performance.
11 K-L percentage: CompMP 23.0%, p < 0.001, SubsetMP 11.8%, p < 0.001. Hit rate: CompMP 80.6%, p = 0.08, SubsetMP, p < 0.001. Statistical tests compare to DOCMP’s K-L percentage of 32.0% and hit rate of 81.9%.
23
Cognitive Simplicity and Consideration Sets
Summary of Empirical Results (Initial Tests) DOC-based estimation appears to yield simple rules, predict hit rates well, and provide
information about future consideration decisions. Some of this improvement is due to a focus on
DOC rules and some due to accounting for cognitive simplicity and market commonality. These
results appear to be robust with respect to the method by which the benchmarks are estimated.
10. Robustness: Target Population, Task Format, and Profile Representation Table 2 is promising, but we would like to know whether the predictive ability in Table 2
is an anomaly or whether it is a more robust finding. For example, we would like to examine
hypotheses that the predictive ability is unique to the GfK respondents, to the task format in Fig-
ure 2, or to the way we present profiles.
US Student Sample vs. Representative German Sample We replicated the GPS measurement with a sample of MBA students at a US university.
Students were invited to an English-language website (e.g., first panel of Figure 2). As incen-
tives, and to maintain incentive-compatibility, they were entered in a lottery with a 1-in-25
chance of winning a laptop bag worth $100 and a 1-in-100 chance of winning a combination of
cash and one of the GPSs that they considered. The response rate for US students was lower,
26%, and consideration-set sizes were, on average, larger. Despite the differences in sample, re-
sponse rate, incentives, and consideration-set size, DOCMP was still the best estimation method
on both the hit-rate and K-L divergence metrics. See Table 3.
Table 3 Replication with a US Student Sample
Estimation method Overall Hit Rate† Relative hit-rate improvement
K-L divergence percentage
HB Compensatory 78.9% 39.2% 19.4%
HB Subset(1) [Disjunctive] 61.2% -11.6% 20.6%
HB Subset(2) 72.9% 22.0% 27.9%
HB Subset(3) 72.0% 19.6% 26.3%
HB Subset(4) 72.7% 21.5% 26.6%
DOCMP [Disjunctions of conjunctions] 82.3%* 49.2%* 36.5%*
† Number of profiles predicted correctly, divided by 32. * Best at the 0.05 level.
24
Cognitive Simplicity and Consideration Sets
Variations in Task Formats In Figure 2 respondents must evaluate every profile (“evaluate all profiles”). However,
such a restriction may be neither necessary nor descriptive. For example, Ordóñez, Benson and
Beach (1999) argue that consumers screen products by rejecting products that they would not
consider further. Because choice rules are context dependent (e.g., Payne, Bettman and Johnson
1993), the task format could influence the propensity to use a DOC rule.
To examine context sensitively, we tested alternative task formats. One format asked re-
spondents to indicate only the profiles they would consider (“consider only”); another asked re-
spondents to indicate only the profiles they would reject (“reject only”). The tasks were other-
wise identical to “evaluate all profiles.” We also tested a “no browsing” format in which re-
spondents evaluated profiles sequentially (in a randomized order). Representative screen shots
for these formats are shown in Appendix 7.
Table 4 Comparison of Predictive Ability for Different Task Formats
K-L Divergence Percentage for Each Respondent Task Format (respondents were assigned randomly to format)
Estimation method Evaluate all profiles
Consider only Reject only No
browsing Text only
HB Compensatory 17.8% 5.7% 14.6% 17.6% 13.9%
HB Subset(1) [Disjunctive] 15.0% 9.3% 20.5% 17.8% 11.2%
HB Subset(2) 21.6% 13.1% 27.9% 26.0% 18.5%
HB Subset(3) 24.9% 15.5% 28.2% 25.9% 20.7%
HB Subset(4) 24.7% 15.5% 27.9% 25.7% 21.3%
DOCMP [Disjunctions of conjunctions] 32.0%* 29.4%* 42.1%* 34.1%* 30.5%*
* Best at the 0.05 level.
We first examine predictive ability where, for simplicity, we show only the K-L diver-
gence percentage for the German respondents. See Table 4. DOC-based estimation was signifi-
cantly better than all benchmarks. It was also significantly better on German-respondent hit rates
and on the US student respondents for both hit rates and K-L divergence.12 For ease of compari-
12 Tables available from the authors. The German sample sizes were 93, 135, 94, 123, and 135, respectively, for the formats in Table 4.
25
Cognitive Simplicity and Consideration Sets
son, we repeat the K-L divergence percentages for “evaluate all profiles (Figure 2).”
As predicted by the evaluation-cost theory of consideration-set formation, respondents
considered fewer profiles when the relative evaluation cost (for consideration) was higher: 4.3
profiles in “consider only,” 7.8 in “evaluate all,” and 10.6 in “reject only.” As predicted by the
theory of context dependence, the propensity to use a second DOC pattern varied as well. Second
disjunctions were more common when consideration sets were larger: 0% for “consider only,”
7.1% for “evaluate all,” and 9.8% for “reject only.” While our data cannot distinguish whether
the differences are due to the size of the consideration set or due to differential evaluation efforts
induced by task variation, these data illustrate how revealed-preference non-compensatory esti-
mation provides a non-intrusive indicator that complements more direct (but intrusive) measures.
Text-Only vs. Visual Representation of the GPS Profiles The profile representations in Figure 1 were designed by a professional graphic artist and
were pretested extensively. Pretests suggested which features should be included in the “JPEGs”
and which features should be included as satellite icons. Nonetheless, it is possible that the rela-
tive predictive ability of the estimation methods might depend upon the specific visual represen-
tations of the profiles. To examine this hypothesis we included a task format that was identical
to the task in Figure 2 except that all features were described by text rather than pictures, icons,
and text (see Appendix 7). The results are given in the last column of Table 4. DOC-based es-
timation is again the best predictive method. Interestingly, there is no significant difference be-
tween picture-representations and text-representations for DOCMP predictions (t = 0.40).
Summary of Robustness Tests The relative predictive ability of the tested methods appears to be robust with respect to:
• respondent sample (representative German vs. US student),
• format of the respondent task (evaluate all profiles, consideration only, rejection only,
or no browsing),
• presentation of the stimuli (pictures vs. text).
11. Managerial Implications and Category Context To investigate whether the empirical data for GPSs leads to differential insights we com-
pare the estimated cognitively-simple DOC rules with the estimated compensatory rules. We
compare measures of relative influence in the consumer’s decision, the implied value to the firm
26
Cognitive Simplicity and Consideration Sets
of feature improvements, and two indicators of co-occurrence.
Comparing Conjunctive Features to Compensatory Partworths In our data, compensatory rules suggest that price is the most important feature represent-
ing, on average, 14% of the relative importance. DOC rules suggest that price is even more in-
fluential in screening: 70% of the respondents include one or more price levels in a conjunction
and 92% of those use price as a rejection mechanism. This result is face valid for a relatively
new and unfamiliar category such as GPSs. The next highest screening features are the mini-
USB port (36%), an extra bright display (25%), and a color display (21%) with relative compen-
satory importances of 10%, 8%, and 11%, respectively.
Value of Feature Improvements Ofek and Srinivasan (2002, p. 401) propose that a value of a feature be defined as “the
incremental price the firm would charge per unit improvement in the product attribute (assumed
to be infinitesimal) if it were to hold market share (or sales) constant." In DOC rules features and
price levels are discrete, hence we modify their definition slightly. We compute the incremental
improvement in market share if a feature is added for an additional $50 in price. Because this
calculation is sensitive to the base product, we select the features of the base product randomly.
We illustrate two of the many differences between DOC rules and compensatory rules.
(1) According to HB Compensatory, the Magellan brand has higher average partworths. Further,
54% of the respondents have higher (consideration-set) partworths for Magellan compared to
Garmin. However, according to DOCMP about 12% of the respondents screen on brand and
82% of those prefer Garmin. As a result, DOC rules predict that that consideration share would
increase if we switch to Garmin and raise the price by $50, but compensatory rules predict that
consideration share would decrease. (2) The HB compensatory model predicts that “extra
bright” is the highest-valued feature improvement yielding an 11% increase for the $50 price.
However, DOC rules predict a much smaller improvement (2%) because many of the respon-
dents who screen on “extra bright” also eliminate on the higher price.
Comparing Correlation with Co-occurrence Correlation and co-occurrence matrices are somewhat different data summaries.13 To
13 Correlation among partworths for compensatory rules. Relative co-occurrence of features within a pattern for DOC rules. Cutoff of 20.5% defined by correlation significance at the 0.05 level.
27
Cognitive Simplicity and Consideration Sets
fully appreciate their implications we would need to embed DOC rules or compensatory rules
within a product-line optimization. While this is beyond the scope of this paper, we obtain initial
insight by comparing these two summaries of co-variation.
Estimated compensatory rules imply complex covariation with over 77% of the entries
significant. Cognitively-simple DOC rules imply simpler covariation with 21% of the entries
above the comparable cutoff. If a full product-line optimization model were developed, DOC-
based rules might imply that products need fewer features in order to be considered. However,
because DOC rules also tend to be heterogeneous, the co-occurrence matrix might also imply a
broader product line. We leave full investigation to future research.
Summary of Managerial Comparisons We have illustrated a few of the many managerial differences for the GPS market. These
differences have face validity, but we are cautious about generalizations. The GPS category was
relatively new and unfamiliar to many respondents at the time of our study. We expect that this
domain favors cognitively simple rules. For example, Yee, et. al. (2007) found more lexico-
graphic rules in Smart Phones, which were new to the market at the time of their study, than they
found in personal computers. It is possible that relatively more DOC rules are used to screen
GPSs than would be used in a familiar category such as standard cell phones.
12. Alternative Estimation Methods DOCMP predicts well and is robust, but DOCMP is not the only estimation method that
can be used to estimate DOC rules while favoring cognitive simplicity and market commonal-
ities. In this section we (1) illustrate a non-LOOCV DOC-based estimation method and (2) sug-
gest how other popular methods can be modified to identify cognitively-simple DOC rules.
Logical Analysis of Data Logical analysis of data (LAD) attempts to identify minimal sets of features to
distinguish “positive” events from “negative” events (Boros, et. al. 1997; 2000). In its basic
form, LAD uses a greedy algorithm to find the fewest patterns necessary to match the set of
considered profiles. The union of these patterns is a DOC rule. When we applied LAD to our
data we were able to achieve a relative hit rate of 40.3% and a K-L divergence percentage of
32.5%, both of which are significantly better than the non-DOC models in Table 2. Basic LAD is
significantly worse than DOCMP on hit rate (p = 0.03) but not K-L divergence (p = 0.79).
28
Cognitive Simplicity and Consideration Sets
We next modified LAD to favor cognitive simplicity and account for market commonal-
ities. To favor cognitive simplicity we limit the number of patterns, P, and the length of the pat-
terns, S. We break ties based on market commonalities. We call this method LAD-DOC(P, S).
We begin with P = 2 and S = 4. For the German data in Table 2, the relative hit rate im-
proves to 45.6%, which is significantly better than basic LAD (p = 0.002). The K-L divergence
improved to 34.6%, but was only marginally significantly better (p = 0.07). LAD-DOC(2, 4) was
slightly better than DOCMP, but not significantly so (p = 0.74 and 0.12, respectively). We gain
similar insight from all task formats and both samples.14 When we expand the comparisons to
include LAD-DOC(3, 4) and LAD-DOC(4, 4), we get similar results.
The LAD-DOC results suggest that the performance of cognitively-simple DOC rules is
not unique to DOCMP estimation. LAD can be used to identify DOC rules and can be modified
to account for cognitive complexity and market commonalities.
Decision Trees Decision trees, as proposed by Currim, Meyer and Le (1988) for modeling consumer
choice, are compatible with DOC rules for classification data (consider vs. not consider). In the
growth phase, decision trees select the aspect that best splits profiles into considered vs. not
considered. Subsequent splits are conditioned on prior splits. For example, we might split first
on “B&W” vs. “color,” then split “B&W” based on screen size and split “color” based on
resolution. With enough levels, decision trees fit estimation data perfectly (similar to Result 4),
hence researchers either prune the tree with a defined criterion (usually a minimum threshold on
increased fit) or grow the tree subject to a stopping criterion on the tree’s growth (e.g., Breiman,
et. al. 1984).
Each node in a decision tree is a conjunction, hence the set of all “positive” nodes is a
DOC rule. However, because the logical structure is limited to a tree-structure, a decision tree
often takes more than S levels to represent a DOC(S) model. For example, suppose we generate
errorless data with the DOC(2) rule: (a ∧ b) ∨ (c ∧ d). To represent these data, a decision tree
would require up to 4 levels and produce either (a ∧ b) ∨ (a ∧ ¬b ∧ c ∧ d) ∨ (¬a ∧ c ∧ d) or
14 When we compare all formats and both data sets on both hit rates and K-L divergence, two comparisons were marginally significant (one favoring DOCMP and one favoring LAD-DOC(2, 4)). These differences were not sig-nificant when we corrected for the fact that we ran 18 simultaneous t-tests (p > 0.10).
29
Cognitive Simplicity and Consideration Sets
equivalent reflections.15 This DOC(3) rule is logically equivalent to (a ∧ b) ∨ (c ∧ d), but more
complex in both the number of patterns and pattern lengths. To impose cognitive simplicity we
would have to address these representation and equivalence issues.
As a test, we applied the Currim, Meyer and Le (1988) decision tree to the data in Table
2. We achieved a relative hit rate of 38.5% and a K-L divergence of 28.4%, both excellent, but
not as good as those obtained with DOCMP and LAD-DOC estimation.16 While many
unresolved theoretical and practice issues remain in order to best incorporate cognitive simplicity
and market commonalities into decision trees, we have no reason to doubt that once these issues
are resolved, decision trees can be developed to estimate cognitively-simple DOC rules.
Continuous Models Conjunctions are analogous to interactions in a multilinear model; DOC decision rules
are analogous to a limited set of interactions (Bordley and Kirkwood 2004; Mela and Lehmann
1995). Thus, in principle, we might use continuous estimation to identify DOC decision rules.
For example, Mela and Lehmann (1995) use finite-mixture methods to estimate interactions in a
two-feature model. In addition, continuous models can be extended to estimate “weight”
parameters for the interactions and thresholds on continuous features.
We do not wish to minimize either the practical or theoretical challenges of scaling
continuous models from a few features to many features. For example, without enforcing
cognitive simplicity there are over 130,000 interactions to be estimated for our GPS application.
Cognitive simplicity constrains the number of parameters and, potentially, improves predictive
ability, but would still require over 30,000 interactions to be estimated. Nonetheless, with
sufficient creativity and experimentation researchers might extend either finite-mixture,
Bayesian, simulated-maximum-likelihood, or kernel estimators to find feasible and practical
methods to estimate continuously-specified DOC rules (Evgeniou, Boussios, and Zacharia 2005;
Mela and Lehmann 1995; Rossi and Allenby 2003; Swait and Erdem 2007).
In summary, we posit that the predictive ability in Tables 2, 3, and 4 is due to cogni-
tively-simple DOC rules combined with market commonalities, rather than the particular estima-
tion method used to identify such rules. At least one other estimation method, modified to ac-
15 Depending on the incidence of profiles, the decision tree might also produce (c ∧ d) ∨ (c ∧ ¬d ∧ a ∧ b) ∨ (¬c ∧ a ∧ b), which is also logically equivalent to (a ∧ b) ∨ (c ∧ d). Other logically equivalent patterns are also feasible. 16 LAD-DOC (p = 0.002) and DOCMP (p = 0.01) are significantly better on relative hit rate. LAD-DOC (p = 0.002) is significantly better and DOCMP is better (p = 0.06) on information percentage.
30
Cognitive Simplicity and Consideration Sets
count for cognitive simplicity and market commonalities, LAD-DOC, does well on our data.
Furthermore, statistical-learning algorithms which estimate compensatory and Subset(S) rules do
not predict as well as comparable algorithms which estimate DOC rules. We are optimistic that
the phenomena we explore in this paper are relevant to many estimation methods.
13. Summary and Future Directions Consumers often make decisions with a two-stage process in which they first form a con-
sideration set and then choose a product from that set. Two-stage processes are managerially
relevant – a product cannot be purchased if it is not considered and a product that is often con-
sidered has a better chance of being purchased. Evidence from a variety of perspectives suggests
that consideration decisions are based on cognitively-simple decision rules, especially when
there are many features or many product alternatives to be evaluated.
In this paper we explore a generalization of existing non-compensatory decision rules:
disjunctions of conjunctions (DOC) and their relationship to cognitive simplicity and market
commonalities. While we illustrate estimation with DOCMP and LAD-DOC, we posit the basic
concepts can be implemented with many extant estimation methods.
We test cognitively-simple DOC models with synthetic and empirical data. The simula-
tion experiments suggest predictive ability is maximized when the estimation method matches
the decision rule used to generate the data. The empirical data suggest that cognitively-simple
DOC-based rules have better predictive ability than the benchmark rules. This result is robust
across sample, respondent task format, profile presentation, and estimation method. While good
predictive ability does not guarantee that consumers actually use a DOC decision rule, the pre-
dictive ability is encouraging and suggests future research with different product categories, dif-
ferent samples, different task formats, and, perhaps, other forms of cognitive simplicity.
The identified DOC rules are simple. Our field experiments suggest that one or two pat-
terns per consumer are sufficient. Further, DOCMP and LAD-DOC perform better when we en-
force cognitive simplicity and market commonalities as is consistent with the existing experi-
mental literature (e.g., Gigerenzer and Goldstein 1996; Payne, Bettman and Johnson 1988,
1993).
We did not address explicitly the choice stage of a consider-then-choice rule. Such re-
search is promising and complementary. For example, Gaskin, et. al. (2007) combine greedoid
methods for consideration with adaptive polyhedral conjoint methods to estimate two-stage
31
Cognitive Simplicity and Consideration Sets
choice models. The two-stage models outperform one-stage compensatory methods. There is
also a rich history in marketing of two-stage models in which consideration is a latent, unob-
served construct (e.g., Andrews and Srinivasan 1995; Gensch 1987; Gilbride and Allenby 2004;
Siddarth, Bucklin, and Morrison 1995; Swait and Erdem 2007). We believe that DOC rules
combined with cognitive simplicity could complement these lines of research.
Finally, our empirical test represents a single category, GPSs, and decisions among prod-
uct profiles described by features with finitely many levels. There were a large number of fea-
tures and products in this category were relatively new to our respondents. Both characteristics
are likely to favor cognitively-simple decision rules. We posit that cognitively-simple DOC
rules are relevant in many but not all categories, retail environments, and contexts.
32
Cognitive Simplicity and Consideration Sets
References
Allenby, Greg M., Neeraj Arora, and James L. Ginter (1995), “Incorporating Prior Knowledge into the Analysis of Conjoint Studies,” Journal of Marketing Research, 32, (May), 152-162.
Andrews, Rick L. and T. C. Srinivasan (1995), “Studying Consideration Effects in Empirical Choice Models Using Scanner Panel Data,” Journal of Marketing Research, 32, (Febru-ary), 30-41.
Arora, Neeraj and Joel Huber (2001), “Improving Parameter Estimates and Model Prediction by Aggregate Customization in Choice Experiments,” Journal of Consumer Research, 28, (September), 273-283.
Anonymous (2008), “Qualitative Evidence of Non-compensatory Processes in the Consideration of New Automobile Models.”
Bettman, James R., Mary Frances Luce, and John W. Payne (1998), “Constructive Consumer Choice Processes,” Journal of Consumer Research, 25(3), 187-217.
Bordley, Robert F. and Craig W. Kirkwood (2004), “Multiattribute Preference Analysis with Performance Targets,” Operations Research, 52, 6, (November-December), 823-835.
Boros, Endre, Peter L. Hammer, Toshihide Ibaraki, and Alexander Kogan (1997), “Logical Analysis of Numerical Data,” Mathematical Programming, 79:163--190, August 1997
------, ------, ------, ------, Eddy Mayoraz, and Ilya Muchnik (2000), “An Implementation of Logi-cal Analysis of Data,” IEEE Transactions on Knowledge and Data Engineering, 12(2), 292-306.
Breiman, Leo, Jerome H. Friedman, Richard A. Olshen, and Charles J. Stone (1984), Classifica-tion and Regression Trees, (Belmont, CA: Wadsworth).
Bröder, Arndt (2000), “Assessing the Empirical Validity of the “Take the Best” Heuristic as a Model of Human Probabilistic Inference,” Journal of Experimental Psychology: Learn-ing, Memory, and Cognition, 26, 5, 1332-1346.
Bronnenberg, Bart J., and Wilfried R. Vanhonacker (1996), “Limited Choice Sets, Local Price Response, and Implied Measures of Price Competition,” Journal of Marketing Research, 33 (May), 163-173.
Chaloner, Kathryn and Isabella Verdinelli (1995), “Bayesian Experimental Design: A Review,” Statistical Science, 10, 3, 273-304. (1995)
Chase, Valerie M., Ralph Hertwig, and Gerd Gigerenzer (1998), “Visions of Rationality,” Trends in Cognitive Sciences, 2, 6, (June), 206-214.
Cooil, Bruce, Russell S. Winer and David L. Rados (1987), “Cross-Validation for Prediction,” Journal of Marketing Research, 24, (August), 271-279.
Cormen, Thomas H., Charles E. Leiserson, Ronald L. Rivest and Clifford Stein (2001), Introduc-tion to Algorithms, 2E, (Cambridge, MA: MIT Press).
Cucker, Felipe, and Steve Smale (2002), “On the Mathematical Foundations of Learning,” Bulle-tin of the American Mathematical Society, 39(1), 1-49.
33
Cognitive Simplicity and Consideration Sets
Currim, Imran S., Robert J. Meyer, and Nhan T. Le (1988), “Disaggregate Tree-Structured Mod-eling of Consumer Choice Data,” Journal of Marketing Research, 25(August), 253-265.
Dawes, Robyn M. (1979), “The Robust Beauty of Improper Linear Models in Decision Making,” American Psychologist, 34, 571-582.
------ and Bernard Corrigan (1974), “Linear Models in Decision Making,” Psychological Bulle-tin, 81, 95-106.
DeSarbo, Wayne S., Donald R. Lehmann, Gregory Carpenter, and Indrajit Sinha (1996), “A Sto-chastic Multidimensional Unfolding Approach for Representing Phased Decision Out-comes,” Psychometrika, 61(3), 485-508.
Ding, Min (2007), “An Incentive-Aligned Mechanism for Conjoint Analysis,” Journal of Mar-keting Research, 54, (May), 214-223.
------, Rajdeep Grewal, and John Liechty (2005), “Incentive-Aligned Conjoint Analysis,” Jour-nal of Marketing Research, 42, (February), 67–82.
Efron, Bradley and Robert Tibshirani (1997), “Improvements on Cross-Validation: The .632+ Bootstrap Method,” Journal of the American Statistical Association, 92, 438, (June), 548-560.
Evgeniou, Theodoros, Constantinos Boussios, and Giorgos Zacharia (2005), “Generalized Ro-bust Conjoint Estimation,” Marketing Science, 24(3), 415-429.
------, Massimiliano Pontil, and Olivier Toubia (2007), “A Convex Optimization Approach to Modeling Heterogeneity in Conjoint Estimation,” Marketing Science, 26, 6, (November-December), 805-818.
Feige, Uriel (1998), “A threshold of ln n for approximating set cover,” Journal of the Association for Computing Machinery, 45(4), 634 – 652.
Gaskin, Steven, Theodoros Evgeniou, Daniel Bailiff, John Hauser (2007), “Two-Stage Models: Identifying Non-Compensatory Heuristics for the Consideration Set then Adaptive Poly-hedral Methods Within the Consideration Set,” Proceedings of the Sawtooth Software Conference in Santa Rosa, CA, October 17-19.
Gensch, Dennis H. (1987), “A Two-stage Disaggregate Attribute Choice Model,” Marketing Sci-ence, 6, (Summer), 223-231.
Gigerenzer, Gerd and Daniel G. Goldstein (1996), “Reasoning the Fast and Frugal Way: Models of Bounded Rationality,” Psychological Review, 103(4), 650-669.
------ and Reinhard Selten (2001), "Rethinking rationality", in Gerd Gigerenzer and Reinhard Selten, eds, Bounded Rationality: The Adaptive Toolbox, (Cambridge, MA: MIT Press).
------, Peter M. Todd, and the ABC Research Group (1999), Simple Heuristics That Make Us Smart, (Oxford, UK: Oxford University Press).
Gilbride, Timothy J. and Greg M. Allenby (2004), “A Choice Model with Conjunctive, Disjunc-tive, and Compensatory Screening Rules,” Marketing Science, 23(3), 391-406.
------ and ------ (2006), “Estimating Heterogeneous EBA and Economic Screening Rule Choice Models,” Marketing Science, 25, 5, (September-October), 494-509.
34
Cognitive Simplicity and Consideration Sets
Hastie, Trevor, Robert Tibshirani, Jerome H. Friedman (2003), The Elements of Statistical Learning, (New York, NY: Springer Series in Statistics).
Hauser, John R. (1978), "Testing the Accuracy, Usefulness and Significance of Probabilistic Models: An Information Theoretic Approach," Operations Research, Vol. 26, No. 3, (May-June), 406-421.
------ and Birger Wernerfelt (1990), “An Evaluation Cost Model of Consideration Sets,” Journal of Consumer Research, 16 (March), 393-408.
Hogarth, Robin M. and Natalia Karelaia (2005), “Simple Models for Multiattribute Choice with Many Alternatives: When It Does and Does Not Pay to Face Trade-offs with Binary At-tributes,” Management Science, 51, 12, (December), 1860-1872.
Huber, Joel, and Klaus Zwerina (1996), “The Importance of Utility Balance in Efficient Choice De-signs,” Journal of Marketing Research, 33 (August), 307-317.
Hughes, Marie Adele and Dennis E. Garrett (1990), “Intercoder Reliability Estimation Approaches in Marketing: A Generalizability Theory Framework for Quantitative Data,” Journal of Market-ing Research, 27, (May), 185-195.
Jedidi, Kamel and Rajeev Kohli (2005), “Probabilistic Subset-Conjunctive Models for Heteroge-neous Consumers,” Journal of Marketing Research, 42 (4), 483-494.
------, ------ and Wayne S. DeSarbo (1996), “Consideration Sets in Conjoint Analysis,” Journal of Marketing Research, 33 (August), 364-372.
Johnson, Eric J., Robert J. Meyer, and Sanjoy Ghose (1989), “When Choice Models Fail: Com-pensatory Models in Negatively Correlated Environments,” Journal of Marketing Re-search, 26, (August), 255-290.
Kanninen, Barbara J. (2002), “Optimal Design for Multinomial Choice Experiments,” Journal of Marketing Research, 39, (May), 214-227.
Kearns, Michael and Dana Ron (1999), “Algorithmic Stability and Sanity-Check Bounds for Leave-One-Out Cross-Validation,” Neural Computation, 11, 1427–1453.
Kohavi, Ron (1995), "A study of cross-validation and bootstrap for accuracy estimation and model selection," Proceedings of the Fourteenth International Joint Conference on Arti-ficial Intelligence. 2, 12, 1137-1143.
Kohli, Rajeev, and Kamel Jedidi, “Representation and Inference of Lexicographic Preference Models and Their Variants,” Marketing Science, 26(3), 380-399.
Kullback, Solomon, and Leibler, Richard A. (1951), “On Information and Sufficiency,” Annals of Mathematical Statistics, 22, 79-86.
Langley, Pat (1996), Elements of Machine Learning, (San Francisco, CA: Morgan Kaufmann).
Lindley, Dennis V. (1956), “On a Measure of the Information Provided by an Experiment,” The Annals of Mathematical Statistics, 27, 4 (December), 986-1005.
Lund, Carsten, and Mihalis Yannakakis (1994), “On the Hardness of Approximating Minimiza-tion Problems,” Journal of the Association for Computing Machinery, 41(5), 960 - 981
Martignon, Laura and Ulrich Hoffrage (2002), “Fast, Frugal, and Fit: Simple Heuristics for Paired Comparisons,” Theory and Decision, 52, 29-71.
35
Cognitive Simplicity and Consideration Sets
Mehta, Nitin, Surendra Rajiv, and Kannan Srinivasan (2003), “Price Uncertainty and Consumer Search: A Structural Model of Consideration Set Formation,” Marketing Science, 22(1), 58-84.
Mela, Carl F. and Donald R. Lehmann (1995), “Using Fuzzy Set Theoretic Techniques to Iden-tify Preference Rules From Interactions in the Linear Model: An Empirical Study,” Fuzzy Sets and Systems, 71, 165-181.
Montgomery, H. and O. Svenson (1976), “On Decision Rules and Information Processing Strate-gies for Choices among Multiattribute Alternatives,” Scandinavian Journal of Psychol-ogy, 17, 283-291.
Ofek, Elie, and V. Srinivasan (2002), “How Much Does the Market Value an Improvement in a Product Attribute?,” Marketing Science, 21, 4, (Fall), 398-411.
Ordóñez, Lisa D., Lehmann Benson III, and Lee Roy Beach (1999), “Testing the Compatibility Test: How Instructions, Accountability, and Anticipated Regret Affect Prechoice Screen-ing of Options,” Organizational Behavior and Human Decision Processes, 78, 1, (April), 63-80.
Payne, John W. (1976), “Task Complexity and Contingent Processing in Decision Making: An Information Search,” Organizational Behavior and Human Performance, 16, 366-387.
------, James R. Bettman and Eric J. Johnson (1988), “Adaptive Strategy Selection in Decision Making,” Journal of Experimental Psychology: Learning, Memory, and Cognition, 14(3), 534-552.
------, ------ and ------ (1993), The Adaptive Decision Maker, (Cambridge UK: Cambridge Uni-versity Press)..
Perreault, William D., Jr. and Laurence E. Leigh (1989), “Reliability of Nominal Data Based on Qualitative Judgments,” Journal of Marketing Research, 26, (May), 135-148.
Rhoads, Bryan, Glen L. Urban, and Fareena Sultan (2004), “Building Customer Trust Through Adaptive Site Design,” MSI Conference, Yale University, New Haven, CT, December 11.
Roberts, John H., and James M. Lattin (1991), “Development and Testing of a Model of Consid-eration Set Composition,” Journal of Marketing Research, 28 (November), 429-440.
Rossi, Peter E., Greg M. Allenby (2003), “Bayesian Statistics and Marketing,” Marketing Sci-ence, 22(3), p. 304-328.
Shao, Jun (1993), “Linear Model Selection by Cross-Validation,” Journal of the American Sta-tistical Association, 88, 422, (June), 486-494.
Shocker, Allan D., Moshe Ben-Akiva, Bruno Boccara, and Prakash Nedungadi (1991), “Consid-eration Set Influences on Consumer Decision-Making and Choice: Issues, Models, and Suggestions,” Marketing Letters, 2(3), 181-197.
Shugan, Steven (1980), “The Cost of Thinking,” Journal of Consumer Research, 27(2), 99-111.
Siddarth, S., Randolph E. Bucklin, and Donald G. Morrison (1995), “Making the Cut: Modeling and Analyzing Choice Set Restriction in Scanner Panel Data,” Journal of Marketing Re-search, 33, (August), 255-266.
Simon, Herbert A. (1955), “A Behavioral Model of Rational Choice,” The Quarterly Journal of Economics, 69(1). 99-118.
36
Cognitive Simplicity and Consideration Sets
Srinivasan, V. (1988), “A Conjunctive-Compensatory Approach to The Self-Explication of Mul-tiattributed Preferences,” Decision Sciences, 295-305.
------ and Chan Su Park (1997), “Surprising Robustness of the Self-Explicated Approach to Cus-tomer Preference Structure Measurement,” Journal of Marketing Research, 34, (May), 286-291.
Swait, Joffre and Tülin Erdem (2007), “Brand Effects on Choice and Choice Set Formation Un-der Uncertainty,” Marketing Science 26, 5, (September-October), 679-697.
Toubia, Olivier, Theodoros Evgeniou, and John Hauser (2007), “Optimization-Based and Ma-chine-Learning Methods for Conjoint Analysis: Estimation and Question Design,” in An-ders Gustafsson, Andreas Herrmann and Frank Huber, Eds, Conjoint Measurement: Methods and Applications, 4E, (New York, NY: Springer).
------ and John R. Hauser (2007), “On Managerial Efficient Designs,” Marketing Science, 26, 6, (November-December), 851-858.
Tversky, Amos (1972), “Elimination by Aspects: a Theory of Choice,” Psychological Review, 79(4), 281-299.
Urban, Glen L. and John R. Hauser (2004), “’Listening-In’ to Find and Explore New Combina-tions of Customer Needs,” Journal of Marketing, 68, (April), 72-87.
Vapnik, Vladimir (1998), Statistical Learning Theory, (New York, NY: John Wiley and Sons).
Wu, Jianan and Arvind Rangaswamy (2003), “A Fuzzy Set Model of Search and Consideration with an Application to an Online Market,” Marketing Science, 22(3), 411-434.
Yee, Michael, Ely Dahan, John R. Hauser and James Orlin (2007) “Greedoid-Based Noncom-pensatory Inference,” Marketing Science, 26, 4, (July-August), 532-549.
Zhang, Tong (2003), “Leave One Out Bounds for Kernel Methods,” Neural Computation, 15, 1397–1437.
37
Cognitive Simplicity and Consideration Sets, Appendices
Appendix 1: Summary of Notation and Acronyms lhfa binary indicator of whether level lof feature f is acceptable to respondent h (disjunctive,
conjunctive, or subset conjunctive models, use varies by model) har binary vector of acceptabilities for respondent h
b1, b2 parameters of the HB subset conjunctive model, respectively, the probability that a pro-file is considered if Sax hj ≥′ rr and the probability it is not considered if Sax hj <′ rr
er a vector of 1’s of length equal to the number of potential patterns D covariance matrix used in estimation HB compensatory f indexes features, F is the total number of features h indexes respondents (mnemonic to households), H is the total number of respondents I the identity matrix of size equal to the total number of aspects j indexes profiles, J is the total number of profiles l indexes levels within features, L is the total number of levels mjp binary indicator of whether profile j matches pattern p
jmr binary vector describing profile j by the patterns it matches Mj percent of respondents in the sample (“market”) that consider profile j p indexes patterns; also used for significance level in t-tests when clear in context P maximum number of patterns [LAD-DOC(P, S) estimation] Q number of partworths (compensatory model) s size of a pattern (number of aspects in a conjunction) S maximum subset size [Subset(S) model] or maximum number of aspects in a conjunctive
pattern [DOC(S) model, LAD-DOC(P, S) estimation] Th threshold for respondent h in compensatory model whp binary indicator of whether respondent h considers profiles with pattern p
hwr binary vector indicating the patterns used by respondent h
ljfx binary indicator of whether profile j has feature f at level l
jx binary vector describing profile j yhj binary indicator of whether respondent h considers profile j
hyr binary vector describing respondent h’s consideration decisions
hβr
vector of partworths (compensatory model) for respondent h
hjε extreme value error in compensatory model γc, γM parameters penalizing, respectively, complexity and deviation from the “market”
+hjξ non-negative integer that indicates a model predicts consideration if 1≥+
hjξ−hjξ non-negative integer that indicates a model predicts non-consideration if 1≥−
hjξ
DOC(S) set of disjunctions of conjunctions models. S, when indicated, is the maximum size of the patterns.
DOCMP combinatorial optimization estimation for DOC models (see Equation 3) LAD-DOC(P, S) alternative estimation method for DOC models in which we limit both the
number of patterns, P, and the size of the patterns, S Subset(S) set of subset conjunctive models with maximum subset size of S
A1
Cognitive Simplicity and Consideration Sets, Appendices
Appendix 2: Proofs to the Results in the Text Result 1. The following sets of rules are equivalent (a) disjunctive rules, (b) Subset(1)rules, and
(c) DOC(1) rules.
Proof. A disjunctive rule requires 1≥′ hjax rr ; a Subset(S) rule requires Sax hj ≥′ rr ; a DOC(S) rule
requires 1≥′ hjwm rr . Clearly the first two rules are equivalent with S = 1. For DOC(1) recognize
that all patterns are single aspects hence jmr and hwr correspond one-to-one with aspects and jmr
can be recoded to match jxr and can be recoded to match hwr har .
Result 2. Conjunctive rules are equivalent to Subset(F) rules which, in turn, are a subset of the
DOC(F) rules, where F is the number of features.
Proof. A conjunctive rule requires Fax hj =′ rr Setting S = F establishes the first statement. The
second statement follows directly from Result 3 with S = F.
Result 3. A Subset(S) rule can be written as a DOC(S) rule, but not all DOC(S) rules can be
written as a Subset(S) rule.
Proof. Sax hj ≥′ rr holds if any S aspects are acceptable. Therefore jxr must match at least one
pattern of length S. Let be the set of such patterns, then SΣ jxr matches at least one element of
. Consider the DOC(S) rule defined by wSΣ hj = 1 for any pattern in SΣ . The inequality
Sax hj ≥′ rr holds if and only if 1≥′ hjwm rr , establishing that Subset(S) can be written as a DOC(S)
rule. By definition, a DOC(S) rule also includes patterns of size less than S, hence, Sax hj <′ rr for
some DOC(S) rules. This establishes the second statement.
Result 4. Any set of considered profiles can be fit perfectly with at least one DOC rule. More-
over, the DOC rule need not be unique.
Proof. For each considered profile, create a pattern of size F that matches that profile. This pat-
tern will not match any other profile because F aspects establishes a profile uniquely. Create hwr
such that whj = 1 for all such profiles and whj = 0 otherwise. Then 1=′ hjwm rr if profile j is consid-
ered and 0=′ hj wm rr otherwise. The second half of the proof is established by the examples in the
text which establishes the existence of non-unique DOC rules.
A2
Cognitive Simplicity and Consideration Sets, Appendices
Appendix 3: HB Estimation of the Subset Conjunctive Model All posterior distributions are known, hence we use Monte Carlo Markov chains
(MCMC) with Gibbs sampling. Recall that S is fixed.
),,',','|Pr( 21 bbssaothersya fhhjhf ll
r θ . We follow Gilbride and Allenby (2004, p. 404)
and use a “Griddy Gibbs” algorithm. For each h we update the acceptabilities, , aspect by
aspect. For each candidate set of acceptabilities we compute the likelihood as if we kept all other
acceptabilities constant replacing only the candidate . The likelihood is based on Equation 5
and the prior on the
lhfa
chfa l
lfθ ’s. The probability of drawing is then proportional to the likelihood
times the prior summed over the set of possible candidates.
chfa l
),,','|Pr( 21 bbsasy hhjflrθ . The lfθ ’s are drawn successively, hence we require the mar-
ginal of the Dirichlet distribution – the beta distribution. Because the beta distribution is conju-
gate to the binomial likelihood, we draw lhfθ from ∑∑ −++h hfh hf aaBeta )]1(6,6[ ll .
).',','|,Pr( 21 ssasybb fhhj l
r θ Because the beta distribution is conjugate to the binomial
likelihood, we draw b1 from ∑∑ ≥′−≥′+jh hjhjjh hjhj SaxySaxyBeta
,,)]()1(,)(1[ rrrr δδ and we
draw b2 from ∑∑ <′−<′+jh hjhjjh hjhj SaxySaxyBeta
,,)]()1(,)(1[ rrrr δδ , where )(•δ is the indica-
tor function.
Appendix 4: Generation of Synthetic Data for Simulation Experiments Compensatory model: We drew partworths from a normal distribution that was zero-
mean except for the intercept. The covariance matrix was I/2. We adjusted the value of the in-
tercept (to 1.5) such that respondents considered, on average, approximately 8 profiles. Profiles
were identified as considered with Bernoulli sampling from logit probabilities.
Subset conjunctive model: We drew each acceptability parameter from a binomial dis-
tribution with the same parameters for all features and levels. We adjusted the binomial prob-
abilities such that respondents considered, on average, approximately 8 profiles. This gave us
0.06, 0.23, 0.43, and 0.69 for S = 1 to 4. We set b1 = 0.95 and b2 = 0.05.
Disjunctions of conjunctions model: We drew binary pattern weights from a Dirichlet
distribution adjusting the marginal binomial probabilities such that respondents considered, on
A3
Cognitive Simplicity and Consideration Sets, Appendices
average, approximately 8 profiles. This gave us 0.025, 0.018, and 0.017 for S = 2 to 4. We
simulate consideration decisions such that the probability of considering a profile with a match-
ing pattern is 0.95 and the probability of considering a profile without a matching pattern is 0.05.
Appendix 5: Kullback-Leibler Divergence for Consideration Data To describe this statistic, we introduce additional notation. Let qj be the null probability
that profile j is considered and let rj be the probability that profile j is considered based on the
model and the observations. The K-L divergence for respondent h is ∑ j jjj qrr ]/[ln{ +
. To use the K-L divergence for discrete predictions we let z)]}1/()1[(ln)1( jjj qrr −−− hj and
be the indicator variables for hjz validation consideration, that is, zhj =1 if respondent h considers
profile j and = 1 if respondent h is predicted to consider profile j. They are zero otherwise.
Let be the number of profiles considered in the estimation task. Let and
be corresponding observed and predicted numbers for the validation task. Let
be the number of false negatives (observed as considered but predicted as
not considered) and be the number of false positives (observed as not con-
sidered but predicted as considered). (F
hjz
∑=j hje yC ∑=
j hjv zC
∑=j hjv zC ˆˆ
∑ −=j hjhjn zzF )ˆ1(
∑ −=j hjhjp zzF ˆ)1(
n and Fp are not to be confused with F, the number of
features as used in the text.) Substituting, we obtain the K-L divergence for a model being
evaluated. The second expression expands the summations and simplifies the fractions.
K-L divergence = ∑∑=
−
−−−
−
=−
−
⎥⎥
⎦
⎤
⎢⎢
⎣
⎡
−−−
+−
+⎥⎥
⎦
⎤
⎢⎢
⎣
⎡+
−
0ˆ:)(
ˆˆ
ˆ
1ˆ:)(
ˆˆ
ˆ
lnˆˆ
lnˆlnˆlnˆˆ
hje
v
nv
e
v
n
hje
v
p
e
v
pv
zj JCJCJ
FCJ
v
nv
JC
CJF
v
n
zj JCJC
F
v
p
JCC
FC
v
pv
CJFCJ
CJF
CF
CFC
))(ˆ()ˆ(ln)ˆ(
)ˆ(ln
)(ˆlnˆ)ˆ(
ln)ˆ(ev
nvnv
ev
nn
ev
pp
ev
pvpv CJCJ
FCJJFCJCCJ
JFFCJC
JFF
CCFCJ
FC−−
−−−−+
−+
−+
−−=
The perfect-prediction benchmark sets hjhj zz ˆ= , hence Fn = Fp = 0 and . The relative K-
L divergence is the K-L divergence for the model versus the null model, divided by the K-L di-
vergence for perfect prediction versus the null model.
vv CC =ˆ
A4
Cognitive Simplicity and Consideration Sets, Appendices
Appendix 6: Statistical-Learning for Compensatory and Subset Rules The following mathematical programs were formulated to be as similar as feasible to
DOCMP. Both can be simplified with algebraic substitutions. As in HB Compensatory we sub-
sume the threshold in the partworths estimated by CompMP. We set K to a number that is large
relative to Th. For comparability and to be conservative, we set S = 4 in SubsetMP.
CompMP: hc
J
jhjjhkjM
J
jhjhjhjhj
hh
eMMyy βγξξγξξξβ
rrrr
′+−++−+ ∑∑=
+−
=
+−
11
])1([])1([},{
min
Subject to: ++≤′ hjhhj KTx ξβrr for all j = 1 to J
)1( −−≥′ hjhhj Tx ξβrr for all j = 1 to J
, ≥ 0, +hjξ −
hjξ hβr
≥ 0
SubsetMP: SMMyySa
c
J
jhjjhkjM
J
jhjhjhjhj
hh
γξξγξξξ
+−++−+ ∑∑=
+−
=
+−
11
])1([])1([},,{
minrr
Subject to: +≤′ hjhj Sax ξrr for all j = 1 to J
)1( −−≥′ hjhj Sax ξrr for all j = 1 to J
, ≥ 0, +hjξ −
hjξ har a binary vector, S > 0, integer
A5
Cognitive Simplicity and Consideration Sets, Appendices
Appendix 7: Consider-only, Reject-only, No-browsing, Text-only, and Example Feature-Introduction Screenshots
Screenshots are shown in English, except for the text-only format. German versions, and
other screenshots from the surveys, are available from the authors.
A6
Recommended