26
JOURNAL OF MATHEMATICAL PSYCHOLOGY 34, 393-418 (1990) Relations between Exemplar-Similarity and Likelihood Models of Classification ROBERT M. NOFWFSKY Indiana University The similarity choice model of identification (Lute, 1963; Shepard, 1957) and the context model of categorization (Medin & SchaNer, 1978; Nosofsky, 1986) are shown to be closely related to a variety of likelihood-based models. In particular, it is shown that: (1) for category distributions defined over independent dimensions, general versions of the context model and Estes’ (1986) similarity-likelihood model are formally identical; (2) the context model and similarity choice model can be given interpretations as exemplar-based likelihood models; (3) an independent feature addition-deletion model is a special case of the similarity choice model; and (4) a perception/likelihood-based decision model of identilication generates predictions that are characterizable by the similarity choice model. 0 1990 Academic Press. Inc. Two main classes of models that have been formulated to provide quantitative predictions of categorization performance are “exemplar-similarity” models and “feature-probability” or “likelihood” models (e.g., Ashby & Perrin, 1988; Estes, 1986; Fried & Holyoak, 1984; Medin & Schaffer, 1978; Nosofsky, 1986; Reed, 1972). According to examplar-similarity models, the observer stores individual traces of category exemplars in memory, and makes classification decisions on the basis of similarity comparisons with the stored exemplars. According to likelihood models, the observer makes likelihood estimates that a particular probe was generated from alternative category distributions, and makes classification decisions on the basis of these likelihood estimates. The purpose of the present article is to demonstrate formal relations between a particular exemplar-similarity model and a variety of likelihood-based models. The exemplar-similarity model that is considered is the context model proposed by Medin and Schaffer (1978) and generalized by Nosofsky (1986). In the special case in which each stimulus defines its own category, the decision rule in the context model reduces to that of the classic similarity choice model for predicting identilica- tion confusion data (e.g., Lute, 1963; Shepard, 1957; Smith 1980; Townsend & Landon, 1982). In addition to considering categorization, the present article will address relations between the similarity choice model and likelihood-based models of identification. Reprint requests should be sent to Dr. Robert M. Nosofsky, Department of Psychology, Indiana University, Bloomington, IN 47405. 393 0022-2496190 $3.00 Copyright c, 1990 by Academic Press, Inc. All rights of reproduction m any form reserved

Relations between Exemplar-Similarity and Likelihood ... · Exemplar-Similarity Model According to the context model (Estes, 1986; Medin & Schaffer, 1978; Nosofsky, 1984, 1986), the

  • Upload
    others

  • View
    5

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Relations between Exemplar-Similarity and Likelihood ... · Exemplar-Similarity Model According to the context model (Estes, 1986; Medin & Schaffer, 1978; Nosofsky, 1984, 1986), the

JOURNAL OF MATHEMATICAL PSYCHOLOGY 34, 393-418 (1990)

Relations between Exemplar-Similarity and Likelihood Models of Classification

ROBERT M. NOFWFSKY

Indiana University

The similarity choice model of identification (Lute, 1963; Shepard, 1957) and the context model of categorization (Medin & SchaNer, 1978; Nosofsky, 1986) are shown to be closely related to a variety of likelihood-based models. In particular, it is shown that: (1) for category distributions defined over independent dimensions, general versions of the context model and Estes’ (1986) similarity-likelihood model are formally identical; (2) the context model and similarity choice model can be given interpretations as exemplar-based likelihood models; (3) an independent feature addition-deletion model is a special case of the similarity choice model; and (4) a perception/likelihood-based decision model of identilication generates predictions that are characterizable by the similarity choice model. 0 1990 Academic Press. Inc.

Two main classes of models that have been formulated to provide quantitative predictions of categorization performance are “exemplar-similarity” models and “feature-probability” or “likelihood” models (e.g., Ashby & Perrin, 1988; Estes, 1986; Fried & Holyoak, 1984; Medin & Schaffer, 1978; Nosofsky, 1986; Reed, 1972). According to examplar-similarity models, the observer stores individual traces of category exemplars in memory, and makes classification decisions on the basis of similarity comparisons with the stored exemplars. According to likelihood models, the observer makes likelihood estimates that a particular probe was generated from alternative category distributions, and makes classification decisions on the basis of these likelihood estimates.

The purpose of the present article is to demonstrate formal relations between a particular exemplar-similarity model and a variety of likelihood-based models. The exemplar-similarity model that is considered is the context model proposed by Medin and Schaffer (1978) and generalized by Nosofsky (1986). In the special case in which each stimulus defines its own category, the decision rule in the context model reduces to that of the classic similarity choice model for predicting identilica- tion confusion data (e.g., Lute, 1963; Shepard, 1957; Smith 1980; Townsend & Landon, 1982). In addition to considering categorization, the present article will address relations between the similarity choice model and likelihood-based models of identification.

Reprint requests should be sent to Dr. Robert M. Nosofsky, Department of Psychology, Indiana University, Bloomington, IN 47405.

393 0022-2496190 $3.00

Copyright c, 1990 by Academic Press, Inc. All rights of reproduction m any form reserved

Page 2: Relations between Exemplar-Similarity and Likelihood ... · Exemplar-Similarity Model According to the context model (Estes, 1986; Medin & Schaffer, 1978; Nosofsky, 1984, 1986), the

394 ROBERT M. NOSOFSKY

Organization of the Article

In recent theoretical work, Estes (1986) showed at a baseline version of the context model and an independent-feature “similarity-likelihood” model are formally identical under certain conditions. In Part 1 of this article I extend Estes’ (1986) work by showing that the formal equivalence holds under more general conditions. In Part 2, I show that the context model and similarity choice model can be given interpretations as exemplar-based likelihood models. Part 3 shows that an independent feature additiondeletion model of identification performance is a special case of the similarity-choice model; while Part 4 shows that a fairly general perceptual identification model involving likelihood-based decision processes generates predictions that are characterizable in terms of the similarity choice model. A central theme running throughout the article concerns the close corre- spondence between classification models based on “similarity” and those based on “likelihood” or “probability.”

Because a variety of closely related models are considered, tables and a figure are provided in the appendix that summarize the definitions of the models, the terms that appear in them, and the relations among the models.

Exemplar-Similarity Model

According to the context model (Estes, 1986; Medin & Schaffer, 1978; Nosofsky, 1984, 1986), the strength of making a Category J response (R,) given presentation of Stimulus i (Si) is found by summing the (weighted) similarity of Stimulus i to all presented exemplars of Category J (C,), and then multiplying by the response bias for Category J. This strength is then divided by the sum of strengths for all categories to determine the conditional probability with which Stimulus i is classified in Category J,

bJCjeCJ- Uj, J) qc p(RJ’S1)=& b,&.,L(k, K)v],

where qij (vii= qjilii, vii= 1) gives the similarity between exemplars i and j, b, (0 <b,< 1, C b,= 1) is the bias associated with Category J, and L(j, J) is the relative frequency (likelihood) with which Exemplar j is presented during training in conjunction with Category J. (Here and throughout the remainder of this article I use lowercase letters as subscripts to index individual stimuli, identification responses, and stimulus-level parameters; whereas uppercase letters are used as subscripts to index categories, categorization responses, and category-level parameters.) Note that L( j, J) in Eq. 1 is given by L( j, J) = L(S,) C,) L( C,), where L(Sj 1 C,) is the likelihood that Stimulus j is presented given a Category J trial, and L(C,) is the Category J base rate.

In the special case in which each stimulus defines its own category (i.e., an identification experiment), Eq. (1) reduces to

Page 3: Relations between Exemplar-Similarity and Likelihood ... · Exemplar-Similarity Model According to the context model (Estes, 1986; Medin & Schaffer, 1978; Nosofsky, 1984, 1986), the

MODELS OF CLASSIFICATION 395

which is the similarity choice model. I have not included the L(j,j) relative frequency terms in Eq. (2) because, with respect to predicting identification confusion data, the frequency-sensitive and frequency-insensitive versions of the model cannot be distinguished. The reason is that any influence of the relative frequency terms can be absorbed by the identification response bias parameters (b,). In applications to categorization confusion experiments (Eq. (l)), however, the relative frequency terms are critical. Manipulation of presentation frequencies of individual exemplars has been shown to exert dramatic influence on patterns of categorization confusions in a manner that is interpretable in terms of the relative frequency terms but not in terms of global changes in overall category response bias (Nosofsky, 1988).

In the original experiments that tested applications of the context model, Medin and Schaffer (1978) used stimuli varying along binary-valued dimensions, and assumed that the similarity between exemplars i and j was given by

where s, (0 < s, < 1) is a (similarity) parameter reflecting the salience of a given dimension, M is the number of dimensions composing the stimuli, and S,(i,j) is an indicator variable equal to one if exemplars i and j mismatch on dimension m, and equal to zero if exemplars i and j match on dimension m. I will refer to Eqs. (1) and (3) taken together as the binary-valued version of the context model.

Nosofsky (1984, 1986) noted that the multiplicative similarity rule (Eq. (3)) can be viewed as a special case of a multidimensional scaling (MDS) approach to modeling similarity. Representing each exemplar as a point in a multidimensional psychological space, the distance between exemplars i andj is given by the weighted Minkowski power model formula (e.g., Carroll & Wish, 1974)

M w d,=

I 1 w,Ii,--j,,I’

m=l 1 , (4)

where i, is the psychological value of exemplar i on dimension m, and w, z 0 is the weight given to dimension m in computing distance. (The weights are often inter- preted as representing selective attention processes-see Nosofsky, 1984, 1986). Of course, when r = 1 in Equation 4 we have the city-block metric; and when r = 2 we have the Euclidean metric. The distance d, is converted to a similarity measure using the transformation

vii = exp( -d;). (5)

Two special cases of Eq. (5) have received support in previous work: p = 1, which yields an exponential decay function; and p = 2, which yields a Gaussian function (Nosofsky, 1985; Shepard, 1958, 1987). It is straightforward to verify that when p = r in Eqs. (4) and (5), an interdimensional multiplicative similarity rule which

Page 4: Relations between Exemplar-Similarity and Likelihood ... · Exemplar-Similarity Model According to the context model (Estes, 1986; Medin & Schaffer, 1978; Nosofsky, 1984, 1986), the

396 ROBERT M.NOSOFSKY

generalizes Medin and Schaffer’s rule (Eq. (3)) is yielded (Nosofsky, 1986). For the case of binary-valued psychological dimensions, Eq. (3) is yielded exactly. Throughout this article, the similarity rule is assumed to be interdimensional multi- plicative in form.

Because the MDS approach to modeling similarity (Eqs. (4) and (5)) generalizes Medin and Schaffer’s (1978) multiplicative rule (Eq. 3), Nosofsky (1986) referred to the system of Eqs. (l), (4) and (5) as the generalized context model (GCM). For purposes of discussing and relating the various models in this article, however, it is more convenient to refer to this system of equations as the MDS-context model. Specialized versions of the MDS-context model arise depending on the choice of similarity function and distance metric. For example, when r = 2 in Eq. (4) and p = 2 in Eq. (5), we have the Gaussian/Euclidean MDS-context model, and so forth.

FEATURE-LIKELIHOOD MODELS

According to independent feature-likelihood models, subjects record in memory information about the relative frequency with which individual features were generated by alternative categories, and also store information about category base rates. For a given feature set composing Stimulus i, subjects would then compute the likelihood (using Bayes’ theorem) that it was generated by one of the alternative categories (on the assumption that individual features are generated independently). Categorization responses are made by probability matching to these likelihood estimates, with modulation by category response biases (e.g., Fried & Holyoak, 1984),

bJE(Sl I C,) UC,) P(RJ’Si)=&bk L,(S,I C,) L(C,)’ (6)

where 6, is the Category J bias, L&S,1 C,) gives the estimated likelihood that the feature set composing Si is generated on a Category J trial, and L(C,) is the Category J base rate. Let (ir , i,, . . . . iM) denote the set of features composing Si, and let O(i, 1 C,) denote the probability that feature i, is generated on a Category J trial. In the independent-feature likelihood model, the estimate L,(S,I C,) would be given by

LE(si I cJ) = fi e(im I cJ). (7) m=l

When categories are defined over independent probability distributions of features, the estimated likelihood of Si given C, is equal to the actual likelihood of Si given C,, L,(S,( C,) = L(Sil C,). In this case, L,(S,ICJ) L(C,) in Eq. (6) gives the

Page 5: Relations between Exemplar-Similarity and Likelihood ... · Exemplar-Similarity Model According to the context model (Estes, 1986; Medin & Schaffer, 1978; Nosofsky, 1984, 1986), the

MODELS OF CLASSIFICATION 397

likelihood with which Exemplar i is presented in conjunction with Category J, L(i, J), and Eq. (6) can be written as

b.J(i, Jl P(RJI Si) = %& b,L(i, K)’ (8)

Clearly, the context model (Eq. (1)) yields Eq. (8) as a special case when vii = 0 for all i#j (Estes, 1986). Thus, for categories defined over independent distributions of features, the context model generalizes the independent feature likelihood model.

1. RELATION BETWEEN THE CONTEXT MODEL AND AN INDEPENDENT-FEATURE SIMILARITY-LIKELIHOOD MODEL

Further comparability between the context model and the independent-feature likelihood model is limited, however, because the feature model does not include a similarity component. To achieve additional comparability, Estes (1986) proposed that psychological records of the relative frequencies of features may be influenced by similarities between features. To illustrate, suppose that Dimension 1 corresponds to color, with stimuli being either red or blue. Each time a red exemplar from Category 1 was presented, the appropriate feature counter would be incremented by one. But when a blue stimulus from Category 1 was presented, this feature counter would be incremented by s (0 6 s < l), where s reflects the similarity between red and blue. By this process, the asymptotic “similarity-likelihood” of the feature red given Category 1 would be given by

8*(redlC,)=B(red)C,)+s[0(blue(C,)]

= B(red 1 C, ) + s[ 1 - B(red 1 C, )]. (9)

The similarity-likelihood for each of the individual features composing an exemplar would then be combined as in Eq. (7) to compute the overall similarity-likelihood of the exemplar given the category L*(S, 1 C,).

Estes (1986) restricted attention to stimuli varying along binary-valued dimensions. In the following, we generalize Estes’ (1986) presentation and provide a formal definition of the independent-feature similarity-likelihood model.

It is assumed that the stimuli vary along A4 multivalued dimensions. We denote by n, the number of values on Dimension m. Because categories are defined over independent probability distributions of the values on the dimensions, there is a total of n,“= I n, stimuli. The similarity between values i, and j, on Dimension m is denoted by simim, with sid, = s,,~, and sjmi, = 1. (If the dimensions are binary- valued, and i, #j,, then siml, = s,, the Dimension m similarity parameter in the binary-valued context model. )

Page 6: Relations between Exemplar-Similarity and Likelihood ... · Exemplar-Similarity Model According to the context model (Estes, 1986; Medin & Schaffer, 1978; Nosofsky, 1984, 1986), the

398 ROBERT M. NOSOFSKY

DEFINITION 1.1. The similarity-likelihood of value i,, given Category J, is given by

B*(i,IC,)= f W,IC.f)s,tij,. (10) jm= 1

Note that if simim = 0 for all i, Zj,, then O*(i, 1 C,) = O(i, ) C,). Also, if the dimen- sions are binary-valued, then O*(i, 1 C,) = O(i, ) C,) + s,[l - e(i, I C,)], as in the previous example (Eq. (9)).

DEFINITION 1.2. The overall similarity-likelihood of the feature set composing Stimulus i given Category J, L*(S, I C,), is given by

M

I;*($,1 CJ)=L*(iI, i2, . . . . iM/ CJ) = n 8*(&l C.,). (11) m=l

DEFINITION 1.3. According to the independent-feature similarity-likelihood model, the probability that Si is classified in C, is given by

bJL*(Si I c.f) L(cJ) P(R,(Si)=CKbKL*(SiICK)L(CK). (12)

Estes (1986) showed that for category distributions defined over independent dimensions, the multiplicative-similarity context model and independent-feature similarity-likelihood model are formally identical. However, his proof was limited to the case of stimuli varying along two binary-valued dimensions and for which similarity (s) was constant across dimensions. In this section I show that the formal identity holds for stimuli varying along M multivalued dimensions and for which separate similarity parameters (siJm) are allowed for each dimension.

THEOREM 1. For stimuli varying along M multivalued dimensions, the context model and independent-feature similarity-likelihood model are formally identical if

(i) category distributions are defined over independent dimensions, such that

L(S,IC,)=L(i,, i,, . . . . iMICJ)= fi W,Ic.A; m=l

(ii) the similarity between items i andj, qij, is given by the interdimensional multiplicative rule

qy= fi Siam. ??I=1

Page 7: Relations between Exemplar-Similarity and Likelihood ... · Exemplar-Similarity Model According to the context model (Estes, 1986; Medin & Schaffer, 1978; Nosofsky, 1984, 1986), the

MODELS OF CLASSIFICATION 399

ProoJ We start by noting that the context model decision rule (Eq. (1)) can be written as

bJCje CJLtsj I c.J) L(cJ) 'lij

P(RJ I si) = ~7 6, c

ke CK L(& I CK) UC,) Vik’

(13)

Comparing Eqs. (12) and (13), we see that the formal identity between the models is established by showing that

L*(si I cJ) = 1 L(sj I cJ) ‘Iv. (14) jcCJ

For the similarity-likelihood model, by Definitions 1.1 and 1.2,

L*(siIc,)= fi e*(i,Ic,) m=l

= fi (15)

m=l [ 2 e(jm I cJ) sidm].

jm = 1

For the context model, we sum the similarity of Stimulus i to all Stimulij weighted by their conditional likelihoods,

= t fJ ... T L(j,,j,,...,j,IC,)u, jl= I j2= 1 j,q=l

= 5 2 . . -,z 1 [Nil I C.f) et& I CJ) . . . et jM I CJ)I . . . Csi, jlsi2 jx . . ‘siM jM1 jl=ljz=l

(by parts i and ii in Theorem 1)

= 2 ? . . -,F , CWl I CJ) si, j,l CW2 I CJ) si2 j21 . . . CW, I CJ) si,jM1 jl=lj*=l

= [ 2 W, I CJ) silj, 2 W,l CJ) si2j2 ... f WMI CJ) si,jM jl = 1 I[ j2 = 1 1 [ jM=l 1

= (16)

which is the same as Eq. (15). i

A special case of the independent-feature similarity-likelihood model that is of interest arises when the sim jm values are computed using an MDS approach, namely

Page 8: Relations between Exemplar-Similarity and Likelihood ... · Exemplar-Similarity Model According to the context model (Estes, 1986; Medin & Schaffer, 1978; Nosofsky, 1984, 1986), the

400 ROBERT M. NOSOFSKY

I will refer to this case as the MDS-based independent-feature similarity-likelihood model. It is straightforward to see that for category distributions defined over independent dimensions, this model is formally identical to the MDS-context model when p = r in Eqs. (4) and (5), because

(17b)

where the sim jm are given as in Eq. (17a).

Application to an Example

In this section we illustrate an application of the MDS-based independent-feature similarity-likelihood model and verify that for categories defined over independent

1.5

i

Y l.Oi

Lll (40.5 i

4 l 5 l 6

[ -11 t-6) 0.011 2 3 I 0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0

(-4 C.3) (4

bl X bl FIG. 1. Hypothetical stimulus configuration for illustrating applications of the MDS-based

independent-feature similarity likelihood model and the MDS-context model. Values in parentheses are the conditional probabilities of dimension values given Category 1, and values in brackets are the conditional probabilities of dimension values given Category 2.

Page 9: Relations between Exemplar-Similarity and Likelihood ... · Exemplar-Similarity Model According to the context model (Estes, 1986; Medin & Schaffer, 1978; Nosofsky, 1984, 1986), the

MODELS OF CLASSIFICATION 401

dimensions, it yields classification predictions that are identical to the MDS-context model (when p = r). We also illustrate how the formal identity between the models breaks down in the nonindependent case. Estes (1986) provided such illustrations for situations in which stimuli varied along two binary-valued dimensions with similarity parameters constant across dimensions (i.e., M = 2, n, = n, = 2, and $il jl = sil~2 = s in Eqs. (15) and (16)). Here, we provide examples for stimuli varying along two dimensions with three values per dimension and similarity parameters nonconstant.

The stimulus set is illustrated in Fig. 1. The nine stimuli are generated by combin- ing orthogonally three values across two dimensions. The values along Dimension 1 (0, 1, and 4) will be denoted x0, x1, and x4 ; and the values along Dimension 2 (0, .5, and 2) will be denoted y,, )I,~, and y2. The stimuli are assigned probabilisti- tally to two categories. The values in parentheses along each dimension indicate the likelihood of each of the individual dimension values given Category 1, while the values in brackets indicate the Category 2 conditional likelihoods. For example, 13(x, I C, ) = .5, 0(x, I C,) = .8, 0(~,~ / C,) = .2, and so forth. Similarities between exemplars will be computed using a city-block metric (r = 1 in Eq. (4)) and exponential decay function (p = 1 in Eq. (5)). For simplicity, the weight parameters in the distance function (Eq. (4)) will not be used. So, for example, the city-block distance between S, and S,, d,,, equals 3.5; and so q35 = exp( - 3.5) = .030.

We illustrate an application of the MDS-based independent-feature similarity- likelihood model to prediction of the probability that S3 is classified in C,, P(RI ) S,). For simplicity, we assume a bias-free experiment (b, = b2) and equal category base rates [L( C, ) = L( Cz) J. According to the similarity-likelihood model,

P(R, I S3) = L*(s3 I C,)

L*(s, I c, I+ L*(s, I Cd

e*e4 I c, 1 e*t Ycl I c, ) = e*(-x, I cl ) e*b, i c,) + e*h i c,) e*(h i cd’

(18)

Using Definition 1.1 and Eq. (17a), the similarity-likelihood 8*(x, I C,) is given by

e*(x,iC,)=e(~~,ic,)exp(-4)+e(.~,Ic,)exp(-3)+e~x,Ic,) (19)

=.22409.

Likewise, 0*( y, I C,) = .74837, 8*(x, ( C,) = .80681, and 0*( y, 1 C,) = .26892. Substituting into Eq. (18) gives L*(S3 ( C,) = .I6770 and L*(S3 I C,) = .21697, and so P(R, IS,) = ,436. Note that if dimension values combine independently, then L(S,IC1)=B(x,IC1)B(y01C,)=.2(.6)=.12, whereas L(S,)Cz)=e(xqICz) B(y,l C,) = .8(.1) = .08. Thus, we have L(S3 I C,) > L(S3) C,), but L*(S, I C,) < L*(s3 I C,).

480!34/4-3

Page 10: Relations between Exemplar-Similarity and Likelihood ... · Exemplar-Similarity Model According to the context model (Estes, 1986; Medin & Schaffer, 1978; Nosofsky, 1984, 1986), the

402 ROBERT M. NOSOFSKY

TABLE I

Context Model Application: Conditional Likelihoods and Exemplar 3 Similarities for Illustration in Fig. 1

S,

A. Independent Case

-us,1 C,) us,1 Cd 43, s,

B. Nonindependent Case

us,1 Cl) us,1 C,) ‘13,

1 .30 .Ol ,018 1 .20 .oo .018 2 .18 .Ol ,050 2 .20 .lO .050 3 .12 .08 1.000 3 .20 SKI 1.000 4 .10 .Ol ,011 4 .lO .10 ,011 5 .06 .Ol ,030 5 .10 .oo ,030 6 04 .08 ,607 6 .oo SKI .607 7 .lO .08 ,002 7 .20 .oo 002 8 .06 .08 ,007 8 .oo sxl 007 9 .04 .64 ,135 9 .oo 20 ,135

The application of the context model is illustrated in Tables IA and IB. In Table IA, we list the likelihood of each of the individual stimuli given Category 1 and Category 2, respectively, assuming that the categories are defined over independent probability distributions of the dimension values. We also list the values of qji computed from the exponential/city-block model. The bias-free context model with equal category base rates predicts

(20)

Substituting the values listed in Table IA gives cjE Cl L(S, ( C, ) qv = .16770 and xjE c2 L(S,l C,) foci= .21697, the same values having been computed previously for L*(S3 I C,) and L*(S3 I C,), as should be the case. Thus, P(R, 1 S,) = ,436.

Table IB lists alternative values of L(S, I C,) and L(S, 1 C,). These values satisfy the marginal probability constraints shown in Fig. 1, but now the individual dimension values do not combine independently. The similarity-likelihood model makes the same prediction for P(R, I S,) as before (.436), because it makes use of only the marginal probabilities of the dimension values. Applying the context model, however, yields zjEC, L(S,I C,) foci= ,218, cjEc, L(S,I C,) qX, = .114, and P(R, 1 S,) = .657. Thus, the similarity-likelihood model and the context model make dramatically different predictions for the nonindependent case illustrated in Table IB. Indeed, whereas the similarity-likelihood model’s predictions remain constant across Tables IA and IB, the context model predicts a qualitative shift, namely, a tendency to classify S, in Category 2 in the former case, and to classify S, in Category 1 in the latter. The context model is sensitive to joint probabilities of features. In the example in Table IB, feature combination x., - y, (i.e., S,) has high likelihood given Category 1, but zero likelihood given Category 2.

Page 11: Relations between Exemplar-Similarity and Likelihood ... · Exemplar-Similarity Model According to the context model (Estes, 1986; Medin & Schaffer, 1978; Nosofsky, 1984, 1986), the

MODELS OF CLASSIFICATION 403

2. RELATIONS BETWEEN THE EXEMPLAR-SIMILARITY MODEL AND EXEMPLAR-BASED LIKELIHOOD MODELS

In this section I identify formal relations between the multiplicative-similarity context model and what will be termed an exemplar-based likelihood model. Imagine that the learner’s category representation consists of collections of stored exemplars, but that the representation of each exemplar is itself a set of parameters that defines a probability distribution (cf. Fried & Holyoak, 1984). The learner is assumed to store information in memory that will allow him or her to make likelihood estimates that a particular probe was generated from the alternative exemplar distributions. It is then straightforward to compute the overall likelihoods that the probe was generated from the alternative categories of individual exemplar distributions. We assume that categorization decisions are made by probability matching to these overall category likelihoods, modulated by potential category response bias. Formally, the probability that Stimulus i is classified in Category J is given as in Bayes’ theorem:

bJL(si I cJ) L(cJ)

P(R,‘Si)=CKbKL(Si,CK)L(CK). (21)

According to the exemplar-based likelihood model, the likelihood that S, is generated on a Category J trial is found by summing over the likelihoods that Si was generated by each of the individual exemplar distributions composing Category J weighted by their respective base rates, i.e.,

USi I CJ) = C LCsi I sj) L(sj I cJ)3 (22) jECI

where L(S, 1 Sj) gives the likelihood that S, is generated from the probability distribution corresponding to Sj, and L(S,l C,) gives the likelihood that S, is the actual generating distribution on a Category J trial. Substituting into Eq. (21) gives

b, CjecJ

P(R, I Si) = .& b, c

L(si I sj) L(sj I cJ) L(cJ)

ks CK L(SiI sk) L(sk I C,) L(C,) (23)

The relation between the context model (Eq. (1)) and this exemplar-based likelihood model is readily apparent. Indeed, if the L(Sil Sj) terms are treated as free parameters corresponding to subjective likelihood estimates, with L(S,J S,) = L(S,j S,), then the two models are formally identical-we simply set vii = L(S,I Si). (Of course, we could also remove the restrictions vii= vji and L(S,ISj) = L(S,j Si) and the models would maintain their formal identity.) This

Page 12: Relations between Exemplar-Similarity and Likelihood ... · Exemplar-Similarity Model According to the context model (Estes, 1986; Medin & Schaffer, 1978; Nosofsky, 1984, 1986), the

404 ROBERT M. NOSOFSKY

same likelihood-based interpretation is also available for the similarity choice model.

In the case of stimuli varying along independent, binary-valued dimensions, one might imagine that the subject estimates a fixed subjective probability p, of each individual dimension value transforming into its alternative value. Then

L(s,I sj) = fi p$W)(l -p,)Cl --bdi,~)l, (24) Wl=l

where S,(i,j) is the indicator variable defined earlier. Substituting into Eq. (23) and dividing all resulting terms by I-I,“= ,( 1 -p,)

yields

(25)

which is formally identical to the binary-valued version of the context model (Eqs. (1) and (3)) with s, =p,/( 1 -p,). This formal correspondence between the binary-valued versions of the context model and exemplar-based likelihood model was noted previously by Fried and Holyoak (1984, p. 256, Note 7).

Assume instead that we have stimuli varying along independent, multivalued continuous dimensions. The probability density associated with exemplar Sj on dimension m is assumed to be Gaussian with mean pjm and variance 0;. (Note that variability along a dimension is presumed to be constant across stimuli.) The likelihood of some probe x = (x1, x1, . . . . x~) given S, is

Substituting into Eq. (23) and dividing all terms by the factor n,“= r (l/a a,) yields

(27)

Equation (27) is formally identical to the Gaussian/Euclidean MDS-context model (i.e., Eqs. (l), (4), and (5), with p= r = 2 in Eqs. (4) and (5)), with j, in Eq. (4) equal to pjm in Eq. (27) and w, in Eq. (4) equal to l/20: in Eq. (27). To see this, note that in the Gaussian/Euclidean MDS-context model, the distance between probe x and exemplar j is given by

‘44

1 112

dxj= 1 W,(X, -j,)’ m=l

(28)

Page 13: Relations between Exemplar-Similarity and Likelihood ... · Exemplar-Similarity Model According to the context model (Estes, 1986; Medin & Schaffer, 1978; Nosofsky, 1984, 1986), the

MODELS OF CLASSIFICATION 405

and the similarity between x and j is given by

= exp [ - Z wm(xm-i.,,)‘] Wl=l

= fi exp [-w,(x,-j,)‘]. m=1

Substituting the expression for q-VI into the context model equation for P(R,I S,) (Eq. (1)) yields Eq. (27). Thus, the Gaussian/Euclidean MDS-context model is formally identical to the exemplar-based likelihood model whenever the exemplar distributions are independent, multivariate Gaussian, with variance on each individual dimension constant across the different exemplar distributions. Note that the relation w, = l/20: implies that reduced variability along a dimension in the likelihood model corresponds to greater “weight” given to that dimension in the similarity model.

In related work, Ashby and Perrin (1988) noted a formal corespondence between their recently developed “general Gaussian recognition model” and the traditional Euclidean multidimensional scaling model of similarity judgments. In the general Gaussian recognition model, it is assumed that similarity judgments are related to proportion of overlap in distributions of “perceptual effects” to which stimuli give rise. Ashby and Perrin showed in particular that, with appropriate transformations of the underlying variables, the Euclidean multidimensional scaling model arises as a special case of the general Gaussian recognition model whenever the distributions of perceptual effects are defined over independent dimensions, with variance along each dimension constant across stimuli. Each individual dimension, however, can have a different variance associated with it. In this case, the weights in the distance model (Eq. (4)) are inversely related to the variance terms in the general recognition model.

The development in the present article linking the Gaussian/Euclidean MDS- context model to the Gaussian exemplar-based likelihood model (Eq. (27)) parallels Ashby and Perrin’s work, although different transformations of the underlying variables are involved in predicting categorization probabilities as opposed to similarity judgments.

Ashby and Perrin’s (1988) general recognition model can also be applied to categorization. The subject is assumed to establish decision boundaries that partition the multidimensional space into response regions. Any perceptual effect falling into Region J would result in a Category J response. To predict the probability with which Stimulus i is classified in Category J, one would integrate over the portion of the Stimulus i distribution that falls into Region J. In fitting the model, one needs to specify the kinds of decision boundaries that the subject adopts (Ashby & Gott, 1988; Ashby & Townsend, 1986). Here, it is of interest to note that decision boundaries defined by summing similarities to individual exemplars (in the

Page 14: Relations between Exemplar-Similarity and Likelihood ... · Exemplar-Similarity Model According to the context model (Estes, 1986; Medin & Schaffer, 1978; Nosofsky, 1984, 1986), the

406 ROBERT M. NOSOFSKY

Gaussian/Euclidean model) are the same as those defined by computing exemplar- based likelihoods (assuming independent-dimension Gaussian distributions, with variability along each dimension constant across stimuli). This assertion follows directly from the previous demonstration that the Gaussian/Euclidean MDS- context model is formally identical to the Gaussian exemplar-based likelihood model (Eq. (27)). In the general recognition model, however, rather than probability matching (as in Eq. (27)), subjects would respond with the category that maximizes likelihood (or summed similarity).

3. RELATION BETWEEN THE SIMILARITY CHOICE MODEL AND AN INDEPENDENT

FEATURE ADDITION-DELETION MODEL

Sections 3 and 4 in this article are concerned with identification, in which stimuli are assigned unique responses rather than being classified in groups. As noted earlier, in application to identification, the decision rule in the context model (Eq. (1)) reduces to the similarity choice model (Eq. (2)). The point of Section 3 is to show that a particular process model of identification, in which individual features are probabilistically added to or deleted from an observer’s percept, is a special case of the similarity choice model. Thus, once again, a model conceptualized in terms of “probability” is closely related to a “similarity’‘-based model.

An important type of identification paradigm involves the use of stimuli composed of discrete features, with each feature being either present or absent (Garner, 1978). Because of either external noise or internal noise in the information-

TABLE II

Independent Feature Addition-Deletion Model

Stimulus

0 x

Response

Y -UY

0

x

Y

(l-a,) (1 -a,,)

4( 1 - a) 1

(I-axId,

aAl -q)

(1 -4) (1 -a,)

a4

(1 - 4) a,

4 a,

(1 -a,)

(l-d,)

aTa,

(1 -d,)a,

a,(1 - 4)

XY dx 4 (I-&Id, d,(l - 4) (1 -4)

(l-4)

Note. Entry in cell (i,j) gives the probability that Stimulus i is perceived as Stimulus j. a, = probability that feature m is added; d, = probability that feature m is deleted.

Page 15: Relations between Exemplar-Similarity and Likelihood ... · Exemplar-Similarity Model According to the context model (Estes, 1986; Medin & Schaffer, 1978; Nosofsky, 1984, 1986), the

MODELS OF CLASSIFICATION 407

processing system, a given stimulus may not be perceived verdically. Assume, in particular, that there is some probability a, that feature m is added to the percept (assuming that it is not in the original stimulus). These added features would correspond to “ghost” features, such as studied by Townsend and Ashby (1982). Likewise, assume that there is some probability d, that feature m is deleted from the percept (assuming that it is in the original stimulus). Such deletions might be the result of “state limitations,” such as studied by Garner (1974). Furthermore, we assume that these additions and deletions take place independently (although empirical research suggests that this assumption may be wrong, e.g., Townsend, Hu, & Evans, 1984). So, for example, for a set consisting of the null stimulus (a), the stimulus composed of only feature x, the stimulus composed of only feature J, and the stimulus composed of both x and y, the identification confusion matrix would be as illustrated in Table II.

More generally, for any powerset constructed from M features (e.g., M= 2 in Table II), the independent feature addition-deletion model is defined as follows:

DEFINITION 3.1. Let i, = 1 if feature m is present in Stimulus i, and let i, = 0 if feature m is absent (and likewise for j,). According to the independent feature addition-deletion model, the probability that Stimulus i is identified as Stimulus j, P( R, 1 S,), is given by

where

r a, if i, = 0 and j, = 1

f (L,j,) = dm if i, = 1 and j,=O

(1 -%I) if i, =0 and j,=O

(l-4,) if i,= 1 and j,= 1.

(30)

An interesting aspect of this independent feature additiondeletion model is that it can produce patterns of confusions that are highly asymmetric. For example, suppose that in Table II, a, = .1 and d,,, = .4 for both features. Then the probability with which 12/ is identified as xy would by only .Ol, whereas the probability with which xy is identified as @ would be .16. Indeed, Garner and Haun (1978) used a set of discrete-feature stimuli with a logical structure corresponding to the illustra- tion in Table II, and observed highly asymmetric confusions. In a condition in which external noise was used to induce confusions, the direction of errors was from stimuli with fewer features to stimuli with more features. The opposite direc- tion of errors was observed in a condition in which short exposure durations were used to induce confusions.

I next show that for any powerset constructed from M features, the independent

Page 16: Relations between Exemplar-Similarity and Likelihood ... · Exemplar-Similarity Model According to the context model (Estes, 1986; Medin & Schaffer, 1978; Nosofsky, 1984, 1986), the

408 ROBERT M. NOSOFSKY

feature addition-deletion model is a special case of the similarity choice model.’ In addition to linking a “probability” model and a “similarity” model, this result is interesting in view of the symmetric-similarity assumption of the choice model (i.e., qti= qjj). As will be seen, the asymmetric patterns of confusions that can be produced by the addition-deletion model are all characterizable in terms of differential bias in the choice model. Smith (1980) analyzed Garner and Haun’s (1978) data in terms of the choice model, and found that the observed asymmetries were indeed characterizable in terms of differential bias. The relation between the choice model and the present addition-deletion model, however, was not part of his analysis.

THEOREM 3.1. The independent-feature addition-deletion modef is a special case of the similarity choice model.

Proof. To prove the formal correspondence between the models, we make use of the well known result that the similarity choice model holds if and only if the following cycle condition is satisfied for all triples i, j, and k:

f’(RjISi) f’(RJ sj) P(RiI S/c)=P(RkISi) P(RjIS,) P(RtI sj) (31)

(e.g., Bishop, Fienberg, & Holland, 1975, p. 287; Smith, 1982). Thus, to prove that the additiondeletion model is a special case of the choice model, it suffices to prove that the additiondeletion model implies the cycle condition.

Substituting the additiondeletion model expression for P( R, ] S,) (Eq. (30)) into Eq. (31), we need to prove

(32)

Equation (32) can be rewritten as

m~~f~L9j,)f(i... k,)f(L L)= fi f(L kJf(k,,j,)f(j,, id. (33) m=l

Equation (33) holds because, as is shown next, for all m,

f (Ljdf (j,, k,)fW,, L) =f CL k,)f (Lj,)f(j,, L). (34)

There are two main cases. Either i, =j, = k, in which case Eq. (34) holds vacuously, or else exactly two of i,, j,, and k, are equal (because each of i,, j,n,

1 F. G. Ashby (personal communication, December 1988) points out that for powersets constructed from M features, the independent-feature addition4eletion model is also a special case of the general recognition theory (Ashby & Townsend, 1986).

Page 17: Relations between Exemplar-Similarity and Likelihood ... · Exemplar-Similarity Model According to the context model (Estes, 1986; Medin & Schaffer, 1978; Nosofsky, 1984, 1986), the

MODELS OF CLASSIFICATION 409

and k, is equal to either 0 or 1.) Without loss of generality, assume i, =j, #k,,. Substituting into Eq. (34) gives

f(L* L)f(L, k,)fk, L,)=f(L, k,)f(k,, i,)f(i,, i,),

and the proof is complete. [

The formal correspondence between the choice model similarity and bias parameters and the addition-deletion parameters is as follows. Let I denote the set of features composing stimulus i, and let J denote the set of features composing stimulus j. Then we have

THEOREM 3.2. The choice model similarity parameters are expressed in terms of addition-deletion parameters as

ad, 1 I/Q ‘lij= rI (1 -a,)(1 -4J .

rnEC(l”J?,I~J,l

(35)

(Note that [I u J] - [I n J] corresponds to the distinctive features of i and j.)

Thus, assuming a, < .5 and d,,, < .5 for all m, the (symmetric) similarity between stimuli i and j decreases as the number of distinctive features increases.

THEOREM 3.3. The choice model bias parameters are expressed in terms of addition-deletion parameters as

(36)

where 6, denotes the bias for the null stimulus.

Thus, if additions tend to be more probable than deletions, bias is larger for stimuli with more features, and vice versa if deletions tend to be more probable than additions. Note that in the present case the bias parameters will be reflecting perceptual processes as opposed to purely response-related processes.

Proof of Theorem 3.2. The similarity parameters in the choice model are expressed in terms of predicted cell probabilities as

[

P(R,I Si) P(R,I S,) I’*

ylii= P(RJSJ P(R,IS,) 1

Substituting in terms of the additiondeletion parameters gives

(37)

[

n,M=,f(i,,j,)n,M=,f(j,, i,) I’*

“= n,M=,f(i,,i,)n,M=lf(j,,jm) 1 [ = fi f(i,T/,)f(j,y L) ‘I2

rn= ,f(b, L)f (jm,j,) 1 . (38)

Page 18: Relations between Exemplar-Similarity and Likelihood ... · Exemplar-Similarity Model According to the context model (Estes, 1986; Medin & Schaffer, 1978; Nosofsky, 1984, 1986), the

410 ROBERT M. NOSOFSKY

The product in Eq. (38) can be factored into two subproducts, one corresponding to values of m for which i, #j, (i.e., the distinctive feature values for Stimuli i and j), and the other to values of m for which i, =j, (i.e., the common feature values for Stimuli i and j):

All factors in the second subproduct are equal to one, and so we have

(40)

But if i, Zj,, then exactly one of f(im,j,) and f(j,, i,) is equal to a, and the other equal to d,, and so exactly one of f(i,, i,) andf(j,,j,) is equal to (1 -a,) and the other equal to (1 - d,)-see Eq. (30) in Definition 3.1. 1

Proof of Theorem 3.3. The ratio b.Jb, in the choice model is expressed in terms of predicted cell probabilities as

P(R,) Si) P(R,) S,) ‘I* 1 P(R,ISj) P(R,ISi) (41)

Substituting in terms of the addition-deletion parameters yields

4 [

n,M=lf(i,,j,)n,M=,f(j,,jm) “2

b,= II,“=,f(j,, LJll;l;i=,f(L, i,) 1 [ = fi f(il”‘L)f(Md 1/Z. (42)

m=lfL cJf(L cn) 1 Equation (42) can be factored into two subproducts, one corresponding to values

of m for which j, = 1 (i.e., the set of features J that compose Stimulus j), and the other corresponding to values of m for which j, = 0 (i.e., m $ J). Furthermore, for the null stimulus @, i, = 0 for all m. This yields

6. I=

b0

All factors in the second subproduct are equal to one, and we are left with

b- /= a,(1 -d,) 1’2

b0 n

,..,4n(l -en) 1 . I

4. RELATION BETWEEN THE SIMILARITY CHOICE MODEL AND A PERCEPTION/LIKELIHOOD-BASED DECISION MODEL

(44)

The relation between the choice model and the additiondeletion model that was considered in Section 3 was limited to situations involving complete powersets. If an

Page 19: Relations between Exemplar-Similarity and Likelihood ... · Exemplar-Similarity Model According to the context model (Estes, 1986; Medin & Schaffer, 1978; Nosofsky, 1984, 1986), the

MODELS OF CLASSIFICATION 411

incomplete powerset were used, there would be trials in which the addition-deletion process would yield a percept for which there was no corresponding response. Clearly, some decision process would need to be appended to the perceptual model to allow it to make predictions across such more general situations.

Section 4 considers a perception/decision model in which decisions are made on the basis of likelihood. Let PS denote the complete powerset that would be generated from the base set of features that is used in the experiment, and let HG PS denote the subset of actual stimuli that is used. Let h, denote the probability that stimulus i is perceived to be stimulus x E PS, and let D, denote the probability with which a decision is made that percept x was generated from stimulus je H. Thus, the probability that stimulus i is identified as stimulus j, P( Rjl S,), would be given by

P(Rjl Si) = 1 hi.yD.v,. YEPS

(45)

In the present model, the subject is assumed to make decisions by estimating the likelihood that x was generated from the alternative members of the set H, and to choose responses by matching to these likelihoods. Assuming that the decision system computes likelihood estimates without bias, and assuming that stimulus base rates are equal, then

D.xj = h j.v

C h’ (46)

ktH kr

Substituting into Eq. 45 gives

P(R,[$)= c h reps Ix (z,?hk,)’

(47)

THEOREM 4.1. The perception/likelihood-based decision model (Eq. (47)) generates predictions of identification confusions that are characterizable in terms of the similarity choice model.

Proof: The cycle condition is obviously satisfied because for all i and j, P(R,I SJ = P(R,I Sj). 1

More generally, assume that decisions are made using biased likelihood estimates, and also that stimulus base rates differ. Then

= b,*p$ (48)

Page 20: Relations between Exemplar-Similarity and Likelihood ... · Exemplar-Similarity Model According to the context model (Estes, 1986; Medin & Schaffer, 1978; Nosofsky, 1984, 1986), the

412

where

ROBERT M.NOSOFSKY

(49)

THEOREM 4.2. The biased perceptionllikelihood-based decision model with unequal stimulus base rates (Eq. (48)) generates predictions of identification confusions that are characterizable in terms of the similarity choice model.

ProoJ First, note that p$=p;. Again, the cycle condition is satisfied, because

P(R, I Si) P(R, I S,, P(R, I S,) = b,*p;@p,Zb:p&

= b,*p,*,b,?p,*,b:p,T

= b; p,*b,* p;r;b,* p;r (50)

=P(R,Isi)P(R,lS,)P(RiIS,). I

Thus, for any system in which identification responses are made by combining a biased likelihood-based decision rule with an underlying perceptual process, the confusion data are characterizable in terms of the choice model. Note that the addition-deletion model provides but one example of an underlying perceptual process-the formal identity noted above holds regardless of the perceptual process that generates the hi.,‘s.

SUMMARY

The purpose of this article was to demonstrate relations between the similarity choice model, the context model, and likelihood-based models of classification. The exemplar-similarity choice model approach has a long-standing history of success in making predictions of classification performance. The similarity choice model for predicting identification continues to serve as a standard against which alternative models of identification are compared (e.g., Ashby & Perrin, 1988; Smith, 1980; Townsend & Landon, 1983), and its recent extensions for predicting categorization also appear quite promising (e.g., Medin & Schaffer, 1978; Nosofsky, 1984, 1986). A number of researchers have demonstrated formal relations between choice-model formulations and other classes of confusion models, including sophisticated guessing models and Thurstonian models using a maximal-match response rule (e.g., Smith, 1980; Townsend & Landon, 1982, 1983; van Santen & Bamber, 1981). The present research follows in the spirit of this earlier work by demonstrating relations between choice-model formulations and likelihood-based models of classification. At least in part, the apparent ubiquity of the exemplar-similarity choice-model approach may derive from its close formal relations with these alternative classes of models.

Page 21: Relations between Exemplar-Similarity and Likelihood ... · Exemplar-Similarity Model According to the context model (Estes, 1986; Medin & Schaffer, 1978; Nosofsky, 1984, 1986), the

------

------

------

/I

\ J

GAUS

SIAN

-EUC

LIDE

AN

____

___

MDS

-CON

TEXT

M

ODEL

SIM

ILAR

ITY-

LIKE

LIHO

OD

II M

ODEL

----

----

----

----

--

----

_---

---

[ BI

NARY

VA

LUED

EX

EMPL

AR-B

ASED

LI

XELI

HOOD

M

ODEL

FIG.

Al

. Su

mm

ary

of

rela

tions

am

ong

mod

els.

Ho

rizon

tal

dash

ed

lines

in

dica

te

mod

els

that

ar

e fo

rmall

y id

entic

ally

to

one

anot

her.

Horiz

onta

l da

shed

lin

es

labe

led

with

an

“i”

in

dica

te

mod

els

that

ar

e fo

rmall

y id

entic

al

when

ca

tego

ry

dist

ribut

ions

ar

e de

fined

ov

er

indep

rrzde

nf

dim

ensio

ns.

Solid

lin

es

with

ar

rows

in

dica

te

nest

ing

rela

tions

am

ong

mod

els

(the

mod

el

that

is

th

e so

urce

of

th

e ar

row

is t

he

mor

e ge

nera

l m

odel

, an

d th

e m

odel

th

at

rece

ives

the

arro

w is

the

sp

ecia

l ca

se).

GAUS

SIAN

EX

EMPL

AR-B

ASED

LI

KELI

HOOD

M

ODEL

Page 22: Relations between Exemplar-Similarity and Likelihood ... · Exemplar-Similarity Model According to the context model (Estes, 1986; Medin & Schaffer, 1978; Nosofsky, 1984, 1986), the

414 ROBERT M. NOSOFSKY

TABLE AI

Glossary of Terms

am b, b, c, 4 4 D., fL i,)

h

fY

Ll (il. i2, . . . . iM)

J

UC,)

us, I C,)

L(j, J)

LdS, I C./J

L*(s, I C,) M urn Pm

P(R,Is,) f’(R,I S,) PS

4 RJ

sm

Jb,/m

St w

S;i,i)

probability with which feature m is added to an observer’s percept

bias parameter associated with identification-response j

bias parameter associated with categorization-response J

category J

distance between stimuli i and j

probability with which feature m is deleted from an observer’s percept

probability with which a decision is made that percept x was generated by stimulus j

factor that appears in the independent-feature addition-deletion model that denotes the probability with which additions and deletions take place (see Eq. (30))

probability that stimulus i is perceived as percept x

subset of stimuli used in an identification experiment

value of stimulus i on dimension M set of dimension values that composes stimulus i

set of features composing stimulus j

likelihood of category J

likelihood of stimulus j given a category J trial

likelihood that stimulus j is presented in conjunction with category J; L(j, J)=

L(S,I C,) L(CJ) estimated likelihood that stimulus i is presented on a category J trial for the independent- feature likelihood model

similarity-likelihood of stimulus i given category J

number of dimensions composing the stimuli

number of values on dimension M

probability that the value on dimension m is transformed to its alternative value in the binary-valued exemplar-based likelihood model

probability that stimulus i is identified as stimulus j

probability that stimulus i is classified in category J

powerset of stimuli constructed from M features

identification response j

categorization response J

similarity parameter on dimension m for the binary-valued version of the context model

similarity between values i, and j, on dimension m for the multivalued version of the context model

Stimulus i

weight given to dimension II? in computing distance

indicator variable equal to one if stimuli i and j mismatch on dimension m, and equal to zero otherwise

similarity between stimuli i and j

likelihood of dimension-value i, given category J

similarity-likelihood of dimension-value i, given category J

mean of stimulus-distribution j on dimension M

variance on dimension m

the null stimulus

Page 23: Relations between Exemplar-Similarity and Likelihood ... · Exemplar-Similarity Model According to the context model (Estes, 1986; Medin & Schaffer, 1978; Nosofsky, 1984, 1986), the

MODELS OF CLASSIFICATION 415

TABLE AI1

Summary of Definitions of Models

Categorization-Strength Terms for P( RJI S,)

Model Strength term

MDS-context model

Gaussian/Euclidean MDS-context model

Independent-feature likelihood model

Independent-feature similarity-likelihood model

Context model

Binary-valued context model

Multiplicative-similarity context model

b, c m J)vc, /EC,

Context model with

k/=;r m ShdLl)

I?=,

Context model with

vi/= fi Sh,“, ml=,

Context model with M

v,, = exp( -dg), 4, = C warnI h,, -Ll’ m=,

MDS-context model withp=r=2

b,L,(S,l CJ) UC,)

with L,(S, 1 C,) = fi &i, 1 C,) /?I=,

b,L*(S,I CJ) UC,) with

M

MDS-based independent-feature similarity-likelihood model

Exemplar-based h./ c -uj> 4 us,1 s,, likelihood model JEr; Binary-valued exemplar-based likelihood model

Exemplar-based likelihood model with

L(S, I S,) = f; p~““‘( 1 -Pm)cl -‘Lcvr1

!?I=,

L*(s,Ic,)==L*(i,, iz ,..., iMIc,)= n B*(i,IcJ), m= I

Q*(L#Ic,)= : w,lc,)~,m,m ,m= 1

Independent-feature similarity-likelihood model with

(Table continued)

Page 24: Relations between Exemplar-Similarity and Likelihood ... · Exemplar-Similarity Model According to the context model (Estes, 1986; Medin & Schaffer, 1978; Nosofsky, 1984, 1986), the

416 ROBERT M. NOSOFSKY

TABLE AII-Continued

Gaussian exemplar-based likelihood model

Similarity-choice model h,%

Independent-feature additiondeletion model

Perception/likelihood- based decision model

Exemplar-based likelihood model with

L(S,I.s,)= ;; -J-e -(L-P,,)’ m=,&rm 24

Identification-Strength Terms for P(R,) S,)

if i,=O and j,=I

APPENDIX

This appendix provides two tables and a figure to summarize the definitions and relations among the various models. Table AI is a glossary of terms. Table AI1 summarizes the definitions of the models. Note that all models are predicting probabilities in n x m confusion matrices, where n is the number of stimuli and m the number of responses. For the categorization models, m <n; and for the identification models, m = n. The definitions of the models are summarized by listing, for each model, the “strength” with which Stimulus i is classified in Category J (or, for the identification models, the strength with which Stimulus i is identified as Stimulus j). To predict the probability with which Stimulus i is classified in Category J, one divides this strength by the sum of strengths with which Stimulus i is classified in all of the categories. The strengths listed for the independent-feature additiondeletion model and the perception/likelihood-based decision model are actual probabilities, so there is no need to normalize them. Figure Al summarizes the relations among the various models.

ACKNOWLEDGMENTS

This work was supported by NSF Grant BNS 87-19938 to Indiana University. The work was stimulated by discussions that I had with Lisbeth Fried and Keith Smith while I was a Visiting Assistant Professor at the University of Michigan. I thank the members of the Human Performance Center for the opportunity that the semester provided. I also thank Greg Ashby, Danny Ennis, Tony Marley, and an anonymous reviewer for their criticisms of an earlier version of this article.

Page 25: Relations between Exemplar-Similarity and Likelihood ... · Exemplar-Similarity Model According to the context model (Estes, 1986; Medin & Schaffer, 1978; Nosofsky, 1984, 1986), the

MODELS OF CLASSIFICATION 417

REFERENCES

ASHBY, F. G., Kc GOTT, R. E. (1988). Decision rules in the perception and classification of multi- dimensional stimuli. Journal of Experimental Psvchology: Learning, Memory, and Cognition, 14. 33-53.

ASHBY, F. G., & PERRIN. N. A. (1988). Toward a unified theory of similarity and recognition. Ps-vchological Review, 95, 124-I 50.

ASHBY. F. G.. &TOWNSEND, J. T. ( 1986). Varieties of perceptual independence. Psychological Reaiew, 93, 154-179.

BISHOP, Y. M., FIENBERG, S. E., & HOLLAND, P. W. (1975). Discrete multivariafe analysis: Theory and practice. Cambridge, MA: MIT Press.

CARROLL, J. D., & WISH, M. (1974). Models and methods for three-way multidimensional scaling. In D. H. Krantz, R. C. Atkinson, R. D. Lute, & P. Suppes (Eds.) Contemporary developmenrs in mathematical ps,vchology. Vol. 2. San Francisco: Freeman.

ESTES. W. K. (1986). Array models for category learning. Cognitive Psvchology, 18, 50@549. FRIED, L. S.. & HOLYOAK. K. J. (1984). Induction of category distributions: A framework for classi-

fication learning. Journal of Experimental Psychology: Learning, Memory, and Cognition, 10. 234257.

GARNER, W. R. (1974). The processing qf information and structure. New York: Wiley. GARNER, W. R. (1978). Aspects of a stimulus: Features, dimensions, and configurations. In

E. Rosch & B. B. Lloyd (Eds.), Cognition and categorization. Hillsdale, NJ: Erlbaum. GARNER. W. R.. & HAUN, F. (1978). Letter identification as a function of type of perceptual limitation

and type of attribute. Journal of E.uperimenral Psycholog-v: Human Percepption and Performance, 4, 199-209.

LUCE, R. D. (1963). Detection and recognition. In R. D. Lute, R. R. Bush, & E. Galanter (Eds.), Handbook of mathematical psychology (pp. 103-189). New York: Wiley.

MEDIN, D. L., & SCHAFFER, M. M. (1978). Context theory of classilication learning. Pswhologicaf Review, 85, 207-238.

NO~OFSKY, R. M. (1984). Choice, similarity, and the context theory of classification. Journal of E.yperimental Psychology: Learning, Memory and Cognition, 10, 104-l 14.

NOSOFSKY. R. M. (1985). Overall similarity and the identification of separable-dimension stimuli: A choice model analysis. Perception and Psvchophysics, 38, 415432.

NOSOFSKY, R. M. (1986). Attention, similarity, and the identitication+ategorization relationship. Journal of Experimental Psvchology: General, 115, 39-57.

NOSOFSKY, R, M. (1988). Similarity, frequency, and category representations. Journal of E.uperimental Psychology: Learning, Memory and Cognition, 14, 5465.

REED, S. K. (1972). Pattern recognition and categorization. Cognitive Psychology, 3, 382407. SHEPARD, R. N. (1957). Stimulus and response generalization: A stochastic model relating generalization

to distance in psychological space. Psychomefrika, 22, 325-345. SHEPARD, R. N. (1958). Stimulus and response generalization: Deduction of the generalization gradient

from a trace model. Psvchologieal Review, 65, 242-256. SKEPARD, R. N. (1987). Toward a universal law of generalization for psychological science. Science, 237,

1317-1323. SMITH, J. E. K. (1980). Models of identification. In R. Nickerson (Ed.), Allenlion andperformance VIII.

Hillsdale, NJ: Erlbaum. SMITH, J. E. K. (1982). Recognition models evaluated: A commentary on Keren and Baggen. Perception

& Psychophysics. 31, 183-189. TOWNSEND, J. T.. & ASHBY. F. G. (1982). Experimental tests of contemporary mathematical models of

visual letter recognition. Journal of E.xperimental Psychology: Human Perception and Performance, 8, 834-864.

TOWNSEND, J. T., Hu, G. G., & EVANS, R. J. (1984). Modeling feature perception in brief displays with evidence for positive interdependencies. Perception & Psychophysics, 36, 3549.

Page 26: Relations between Exemplar-Similarity and Likelihood ... · Exemplar-Similarity Model According to the context model (Estes, 1986; Medin & Schaffer, 1978; Nosofsky, 1984, 1986), the

418 ROBERT M.NOSOFSKY

TOWNSEND, J. T., & LANDON, D. E. (1982). An experimental and theoretical investigation of the constant-ratio rule and other models of visual letter confusion. Journal of Mathematical Psychology, 25, 119-162.

TOWNSEND, J. T.. & LANDON, D. E. (1983). Mathematical models of recognition and confusion in psychology. Mathematical Social Sciences. 4, 25-71.

VAN SANTEN, J. P. H., & BAMBER, D. (1981). Finite and infinite state confusion models. Journal of Mathematical Psychology, 24. 101-l 11.

RECEIVED: July 11, 1988