Upload
others
View
5
Download
0
Embed Size (px)
Citation preview
JOURNAL OF MATHEMATICAL PSYCHOLOGY 34, 393-418 (1990)
Relations between Exemplar-Similarity and Likelihood Models of Classification
ROBERT M. NOFWFSKY
Indiana University
The similarity choice model of identification (Lute, 1963; Shepard, 1957) and the context model of categorization (Medin & SchaNer, 1978; Nosofsky, 1986) are shown to be closely related to a variety of likelihood-based models. In particular, it is shown that: (1) for category distributions defined over independent dimensions, general versions of the context model and Estes’ (1986) similarity-likelihood model are formally identical; (2) the context model and similarity choice model can be given interpretations as exemplar-based likelihood models; (3) an independent feature addition-deletion model is a special case of the similarity choice model; and (4) a perception/likelihood-based decision model of identilication generates predictions that are characterizable by the similarity choice model. 0 1990 Academic Press. Inc.
Two main classes of models that have been formulated to provide quantitative predictions of categorization performance are “exemplar-similarity” models and “feature-probability” or “likelihood” models (e.g., Ashby & Perrin, 1988; Estes, 1986; Fried & Holyoak, 1984; Medin & Schaffer, 1978; Nosofsky, 1986; Reed, 1972). According to examplar-similarity models, the observer stores individual traces of category exemplars in memory, and makes classification decisions on the basis of similarity comparisons with the stored exemplars. According to likelihood models, the observer makes likelihood estimates that a particular probe was generated from alternative category distributions, and makes classification decisions on the basis of these likelihood estimates.
The purpose of the present article is to demonstrate formal relations between a particular exemplar-similarity model and a variety of likelihood-based models. The exemplar-similarity model that is considered is the context model proposed by Medin and Schaffer (1978) and generalized by Nosofsky (1986). In the special case in which each stimulus defines its own category, the decision rule in the context model reduces to that of the classic similarity choice model for predicting identilica- tion confusion data (e.g., Lute, 1963; Shepard, 1957; Smith 1980; Townsend & Landon, 1982). In addition to considering categorization, the present article will address relations between the similarity choice model and likelihood-based models of identification.
Reprint requests should be sent to Dr. Robert M. Nosofsky, Department of Psychology, Indiana University, Bloomington, IN 47405.
393 0022-2496190 $3.00
Copyright c, 1990 by Academic Press, Inc. All rights of reproduction m any form reserved
394 ROBERT M. NOSOFSKY
Organization of the Article
In recent theoretical work, Estes (1986) showed at a baseline version of the context model and an independent-feature “similarity-likelihood” model are formally identical under certain conditions. In Part 1 of this article I extend Estes’ (1986) work by showing that the formal equivalence holds under more general conditions. In Part 2, I show that the context model and similarity choice model can be given interpretations as exemplar-based likelihood models. Part 3 shows that an independent feature additiondeletion model of identification performance is a special case of the similarity-choice model; while Part 4 shows that a fairly general perceptual identification model involving likelihood-based decision processes generates predictions that are characterizable in terms of the similarity choice model. A central theme running throughout the article concerns the close corre- spondence between classification models based on “similarity” and those based on “likelihood” or “probability.”
Because a variety of closely related models are considered, tables and a figure are provided in the appendix that summarize the definitions of the models, the terms that appear in them, and the relations among the models.
Exemplar-Similarity Model
According to the context model (Estes, 1986; Medin & Schaffer, 1978; Nosofsky, 1984, 1986), the strength of making a Category J response (R,) given presentation of Stimulus i (Si) is found by summing the (weighted) similarity of Stimulus i to all presented exemplars of Category J (C,), and then multiplying by the response bias for Category J. This strength is then divided by the sum of strengths for all categories to determine the conditional probability with which Stimulus i is classified in Category J,
bJCjeCJ- Uj, J) qc p(RJ’S1)=& b,&.,L(k, K)v],
where qij (vii= qjilii, vii= 1) gives the similarity between exemplars i and j, b, (0 <b,< 1, C b,= 1) is the bias associated with Category J, and L(j, J) is the relative frequency (likelihood) with which Exemplar j is presented during training in conjunction with Category J. (Here and throughout the remainder of this article I use lowercase letters as subscripts to index individual stimuli, identification responses, and stimulus-level parameters; whereas uppercase letters are used as subscripts to index categories, categorization responses, and category-level parameters.) Note that L( j, J) in Eq. 1 is given by L( j, J) = L(S,) C,) L( C,), where L(Sj 1 C,) is the likelihood that Stimulus j is presented given a Category J trial, and L(C,) is the Category J base rate.
In the special case in which each stimulus defines its own category (i.e., an identification experiment), Eq. (1) reduces to
MODELS OF CLASSIFICATION 395
which is the similarity choice model. I have not included the L(j,j) relative frequency terms in Eq. (2) because, with respect to predicting identification confusion data, the frequency-sensitive and frequency-insensitive versions of the model cannot be distinguished. The reason is that any influence of the relative frequency terms can be absorbed by the identification response bias parameters (b,). In applications to categorization confusion experiments (Eq. (l)), however, the relative frequency terms are critical. Manipulation of presentation frequencies of individual exemplars has been shown to exert dramatic influence on patterns of categorization confusions in a manner that is interpretable in terms of the relative frequency terms but not in terms of global changes in overall category response bias (Nosofsky, 1988).
In the original experiments that tested applications of the context model, Medin and Schaffer (1978) used stimuli varying along binary-valued dimensions, and assumed that the similarity between exemplars i and j was given by
where s, (0 < s, < 1) is a (similarity) parameter reflecting the salience of a given dimension, M is the number of dimensions composing the stimuli, and S,(i,j) is an indicator variable equal to one if exemplars i and j mismatch on dimension m, and equal to zero if exemplars i and j match on dimension m. I will refer to Eqs. (1) and (3) taken together as the binary-valued version of the context model.
Nosofsky (1984, 1986) noted that the multiplicative similarity rule (Eq. (3)) can be viewed as a special case of a multidimensional scaling (MDS) approach to modeling similarity. Representing each exemplar as a point in a multidimensional psychological space, the distance between exemplars i andj is given by the weighted Minkowski power model formula (e.g., Carroll & Wish, 1974)
M w d,=
I 1 w,Ii,--j,,I’
m=l 1 , (4)
where i, is the psychological value of exemplar i on dimension m, and w, z 0 is the weight given to dimension m in computing distance. (The weights are often inter- preted as representing selective attention processes-see Nosofsky, 1984, 1986). Of course, when r = 1 in Equation 4 we have the city-block metric; and when r = 2 we have the Euclidean metric. The distance d, is converted to a similarity measure using the transformation
vii = exp( -d;). (5)
Two special cases of Eq. (5) have received support in previous work: p = 1, which yields an exponential decay function; and p = 2, which yields a Gaussian function (Nosofsky, 1985; Shepard, 1958, 1987). It is straightforward to verify that when p = r in Eqs. (4) and (5), an interdimensional multiplicative similarity rule which
396 ROBERT M.NOSOFSKY
generalizes Medin and Schaffer’s rule (Eq. (3)) is yielded (Nosofsky, 1986). For the case of binary-valued psychological dimensions, Eq. (3) is yielded exactly. Throughout this article, the similarity rule is assumed to be interdimensional multi- plicative in form.
Because the MDS approach to modeling similarity (Eqs. (4) and (5)) generalizes Medin and Schaffer’s (1978) multiplicative rule (Eq. 3), Nosofsky (1986) referred to the system of Eqs. (l), (4) and (5) as the generalized context model (GCM). For purposes of discussing and relating the various models in this article, however, it is more convenient to refer to this system of equations as the MDS-context model. Specialized versions of the MDS-context model arise depending on the choice of similarity function and distance metric. For example, when r = 2 in Eq. (4) and p = 2 in Eq. (5), we have the Gaussian/Euclidean MDS-context model, and so forth.
FEATURE-LIKELIHOOD MODELS
According to independent feature-likelihood models, subjects record in memory information about the relative frequency with which individual features were generated by alternative categories, and also store information about category base rates. For a given feature set composing Stimulus i, subjects would then compute the likelihood (using Bayes’ theorem) that it was generated by one of the alternative categories (on the assumption that individual features are generated independently). Categorization responses are made by probability matching to these likelihood estimates, with modulation by category response biases (e.g., Fried & Holyoak, 1984),
bJE(Sl I C,) UC,) P(RJ’Si)=&bk L,(S,I C,) L(C,)’ (6)
where 6, is the Category J bias, L&S,1 C,) gives the estimated likelihood that the feature set composing Si is generated on a Category J trial, and L(C,) is the Category J base rate. Let (ir , i,, . . . . iM) denote the set of features composing Si, and let O(i, 1 C,) denote the probability that feature i, is generated on a Category J trial. In the independent-feature likelihood model, the estimate L,(S,I C,) would be given by
LE(si I cJ) = fi e(im I cJ). (7) m=l
When categories are defined over independent probability distributions of features, the estimated likelihood of Si given C, is equal to the actual likelihood of Si given C,, L,(S,( C,) = L(Sil C,). In this case, L,(S,ICJ) L(C,) in Eq. (6) gives the
MODELS OF CLASSIFICATION 397
likelihood with which Exemplar i is presented in conjunction with Category J, L(i, J), and Eq. (6) can be written as
b.J(i, Jl P(RJI Si) = %& b,L(i, K)’ (8)
Clearly, the context model (Eq. (1)) yields Eq. (8) as a special case when vii = 0 for all i#j (Estes, 1986). Thus, for categories defined over independent distributions of features, the context model generalizes the independent feature likelihood model.
1. RELATION BETWEEN THE CONTEXT MODEL AND AN INDEPENDENT-FEATURE SIMILARITY-LIKELIHOOD MODEL
Further comparability between the context model and the independent-feature likelihood model is limited, however, because the feature model does not include a similarity component. To achieve additional comparability, Estes (1986) proposed that psychological records of the relative frequencies of features may be influenced by similarities between features. To illustrate, suppose that Dimension 1 corresponds to color, with stimuli being either red or blue. Each time a red exemplar from Category 1 was presented, the appropriate feature counter would be incremented by one. But when a blue stimulus from Category 1 was presented, this feature counter would be incremented by s (0 6 s < l), where s reflects the similarity between red and blue. By this process, the asymptotic “similarity-likelihood” of the feature red given Category 1 would be given by
8*(redlC,)=B(red)C,)+s[0(blue(C,)]
= B(red 1 C, ) + s[ 1 - B(red 1 C, )]. (9)
The similarity-likelihood for each of the individual features composing an exemplar would then be combined as in Eq. (7) to compute the overall similarity-likelihood of the exemplar given the category L*(S, 1 C,).
Estes (1986) restricted attention to stimuli varying along binary-valued dimensions. In the following, we generalize Estes’ (1986) presentation and provide a formal definition of the independent-feature similarity-likelihood model.
It is assumed that the stimuli vary along A4 multivalued dimensions. We denote by n, the number of values on Dimension m. Because categories are defined over independent probability distributions of the values on the dimensions, there is a total of n,“= I n, stimuli. The similarity between values i, and j, on Dimension m is denoted by simim, with sid, = s,,~, and sjmi, = 1. (If the dimensions are binary- valued, and i, #j,, then siml, = s,, the Dimension m similarity parameter in the binary-valued context model. )
398 ROBERT M. NOSOFSKY
DEFINITION 1.1. The similarity-likelihood of value i,, given Category J, is given by
B*(i,IC,)= f W,IC.f)s,tij,. (10) jm= 1
Note that if simim = 0 for all i, Zj,, then O*(i, 1 C,) = O(i, ) C,). Also, if the dimen- sions are binary-valued, then O*(i, 1 C,) = O(i, ) C,) + s,[l - e(i, I C,)], as in the previous example (Eq. (9)).
DEFINITION 1.2. The overall similarity-likelihood of the feature set composing Stimulus i given Category J, L*(S, I C,), is given by
M
I;*($,1 CJ)=L*(iI, i2, . . . . iM/ CJ) = n 8*(&l C.,). (11) m=l
DEFINITION 1.3. According to the independent-feature similarity-likelihood model, the probability that Si is classified in C, is given by
bJL*(Si I c.f) L(cJ) P(R,(Si)=CKbKL*(SiICK)L(CK). (12)
Estes (1986) showed that for category distributions defined over independent dimensions, the multiplicative-similarity context model and independent-feature similarity-likelihood model are formally identical. However, his proof was limited to the case of stimuli varying along two binary-valued dimensions and for which similarity (s) was constant across dimensions. In this section I show that the formal identity holds for stimuli varying along M multivalued dimensions and for which separate similarity parameters (siJm) are allowed for each dimension.
THEOREM 1. For stimuli varying along M multivalued dimensions, the context model and independent-feature similarity-likelihood model are formally identical if
(i) category distributions are defined over independent dimensions, such that
L(S,IC,)=L(i,, i,, . . . . iMICJ)= fi W,Ic.A; m=l
(ii) the similarity between items i andj, qij, is given by the interdimensional multiplicative rule
qy= fi Siam. ??I=1
MODELS OF CLASSIFICATION 399
ProoJ We start by noting that the context model decision rule (Eq. (1)) can be written as
bJCje CJLtsj I c.J) L(cJ) 'lij
P(RJ I si) = ~7 6, c
ke CK L(& I CK) UC,) Vik’
(13)
Comparing Eqs. (12) and (13), we see that the formal identity between the models is established by showing that
L*(si I cJ) = 1 L(sj I cJ) ‘Iv. (14) jcCJ
For the similarity-likelihood model, by Definitions 1.1 and 1.2,
L*(siIc,)= fi e*(i,Ic,) m=l
= fi (15)
m=l [ 2 e(jm I cJ) sidm].
jm = 1
For the context model, we sum the similarity of Stimulus i to all Stimulij weighted by their conditional likelihoods,
= t fJ ... T L(j,,j,,...,j,IC,)u, jl= I j2= 1 j,q=l
= 5 2 . . -,z 1 [Nil I C.f) et& I CJ) . . . et jM I CJ)I . . . Csi, jlsi2 jx . . ‘siM jM1 jl=ljz=l
(by parts i and ii in Theorem 1)
= 2 ? . . -,F , CWl I CJ) si, j,l CW2 I CJ) si2 j21 . . . CW, I CJ) si,jM1 jl=lj*=l
= [ 2 W, I CJ) silj, 2 W,l CJ) si2j2 ... f WMI CJ) si,jM jl = 1 I[ j2 = 1 1 [ jM=l 1
= (16)
which is the same as Eq. (15). i
A special case of the independent-feature similarity-likelihood model that is of interest arises when the sim jm values are computed using an MDS approach, namely
400 ROBERT M. NOSOFSKY
I will refer to this case as the MDS-based independent-feature similarity-likelihood model. It is straightforward to see that for category distributions defined over independent dimensions, this model is formally identical to the MDS-context model when p = r in Eqs. (4) and (5), because
(17b)
where the sim jm are given as in Eq. (17a).
Application to an Example
In this section we illustrate an application of the MDS-based independent-feature similarity-likelihood model and verify that for categories defined over independent
1.5
i
Y l.Oi
Lll (40.5 i
4 l 5 l 6
[ -11 t-6) 0.011 2 3 I 0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0
(-4 C.3) (4
bl X bl FIG. 1. Hypothetical stimulus configuration for illustrating applications of the MDS-based
independent-feature similarity likelihood model and the MDS-context model. Values in parentheses are the conditional probabilities of dimension values given Category 1, and values in brackets are the conditional probabilities of dimension values given Category 2.
MODELS OF CLASSIFICATION 401
dimensions, it yields classification predictions that are identical to the MDS-context model (when p = r). We also illustrate how the formal identity between the models breaks down in the nonindependent case. Estes (1986) provided such illustrations for situations in which stimuli varied along two binary-valued dimensions with similarity parameters constant across dimensions (i.e., M = 2, n, = n, = 2, and $il jl = sil~2 = s in Eqs. (15) and (16)). Here, we provide examples for stimuli varying along two dimensions with three values per dimension and similarity parameters nonconstant.
The stimulus set is illustrated in Fig. 1. The nine stimuli are generated by combin- ing orthogonally three values across two dimensions. The values along Dimension 1 (0, 1, and 4) will be denoted x0, x1, and x4 ; and the values along Dimension 2 (0, .5, and 2) will be denoted y,, )I,~, and y2. The stimuli are assigned probabilisti- tally to two categories. The values in parentheses along each dimension indicate the likelihood of each of the individual dimension values given Category 1, while the values in brackets indicate the Category 2 conditional likelihoods. For example, 13(x, I C, ) = .5, 0(x, I C,) = .8, 0(~,~ / C,) = .2, and so forth. Similarities between exemplars will be computed using a city-block metric (r = 1 in Eq. (4)) and exponential decay function (p = 1 in Eq. (5)). For simplicity, the weight parameters in the distance function (Eq. (4)) will not be used. So, for example, the city-block distance between S, and S,, d,,, equals 3.5; and so q35 = exp( - 3.5) = .030.
We illustrate an application of the MDS-based independent-feature similarity- likelihood model to prediction of the probability that S3 is classified in C,, P(RI ) S,). For simplicity, we assume a bias-free experiment (b, = b2) and equal category base rates [L( C, ) = L( Cz) J. According to the similarity-likelihood model,
P(R, I S3) = L*(s3 I C,)
L*(s, I c, I+ L*(s, I Cd
e*e4 I c, 1 e*t Ycl I c, ) = e*(-x, I cl ) e*b, i c,) + e*h i c,) e*(h i cd’
(18)
Using Definition 1.1 and Eq. (17a), the similarity-likelihood 8*(x, I C,) is given by
e*(x,iC,)=e(~~,ic,)exp(-4)+e(.~,Ic,)exp(-3)+e~x,Ic,) (19)
=.22409.
Likewise, 0*( y, I C,) = .74837, 8*(x, ( C,) = .80681, and 0*( y, 1 C,) = .26892. Substituting into Eq. (18) gives L*(S3 ( C,) = .I6770 and L*(S3 I C,) = .21697, and so P(R, IS,) = ,436. Note that if dimension values combine independently, then L(S,IC1)=B(x,IC1)B(y01C,)=.2(.6)=.12, whereas L(S,)Cz)=e(xqICz) B(y,l C,) = .8(.1) = .08. Thus, we have L(S3 I C,) > L(S3) C,), but L*(S, I C,) < L*(s3 I C,).
480!34/4-3
402 ROBERT M. NOSOFSKY
TABLE I
Context Model Application: Conditional Likelihoods and Exemplar 3 Similarities for Illustration in Fig. 1
S,
A. Independent Case
-us,1 C,) us,1 Cd 43, s,
B. Nonindependent Case
us,1 Cl) us,1 C,) ‘13,
1 .30 .Ol ,018 1 .20 .oo .018 2 .18 .Ol ,050 2 .20 .lO .050 3 .12 .08 1.000 3 .20 SKI 1.000 4 .10 .Ol ,011 4 .lO .10 ,011 5 .06 .Ol ,030 5 .10 .oo ,030 6 04 .08 ,607 6 .oo SKI .607 7 .lO .08 ,002 7 .20 .oo 002 8 .06 .08 ,007 8 .oo sxl 007 9 .04 .64 ,135 9 .oo 20 ,135
The application of the context model is illustrated in Tables IA and IB. In Table IA, we list the likelihood of each of the individual stimuli given Category 1 and Category 2, respectively, assuming that the categories are defined over independent probability distributions of the dimension values. We also list the values of qji computed from the exponential/city-block model. The bias-free context model with equal category base rates predicts
(20)
Substituting the values listed in Table IA gives cjE Cl L(S, ( C, ) qv = .16770 and xjE c2 L(S,l C,) foci= .21697, the same values having been computed previously for L*(S3 I C,) and L*(S3 I C,), as should be the case. Thus, P(R, 1 S,) = ,436.
Table IB lists alternative values of L(S, I C,) and L(S, 1 C,). These values satisfy the marginal probability constraints shown in Fig. 1, but now the individual dimension values do not combine independently. The similarity-likelihood model makes the same prediction for P(R, I S,) as before (.436), because it makes use of only the marginal probabilities of the dimension values. Applying the context model, however, yields zjEC, L(S,I C,) foci= ,218, cjEc, L(S,I C,) qX, = .114, and P(R, 1 S,) = .657. Thus, the similarity-likelihood model and the context model make dramatically different predictions for the nonindependent case illustrated in Table IB. Indeed, whereas the similarity-likelihood model’s predictions remain constant across Tables IA and IB, the context model predicts a qualitative shift, namely, a tendency to classify S, in Category 2 in the former case, and to classify S, in Category 1 in the latter. The context model is sensitive to joint probabilities of features. In the example in Table IB, feature combination x., - y, (i.e., S,) has high likelihood given Category 1, but zero likelihood given Category 2.
MODELS OF CLASSIFICATION 403
2. RELATIONS BETWEEN THE EXEMPLAR-SIMILARITY MODEL AND EXEMPLAR-BASED LIKELIHOOD MODELS
In this section I identify formal relations between the multiplicative-similarity context model and what will be termed an exemplar-based likelihood model. Imagine that the learner’s category representation consists of collections of stored exemplars, but that the representation of each exemplar is itself a set of parameters that defines a probability distribution (cf. Fried & Holyoak, 1984). The learner is assumed to store information in memory that will allow him or her to make likelihood estimates that a particular probe was generated from the alternative exemplar distributions. It is then straightforward to compute the overall likelihoods that the probe was generated from the alternative categories of individual exemplar distributions. We assume that categorization decisions are made by probability matching to these overall category likelihoods, modulated by potential category response bias. Formally, the probability that Stimulus i is classified in Category J is given as in Bayes’ theorem:
bJL(si I cJ) L(cJ)
P(R,‘Si)=CKbKL(Si,CK)L(CK). (21)
According to the exemplar-based likelihood model, the likelihood that S, is generated on a Category J trial is found by summing over the likelihoods that Si was generated by each of the individual exemplar distributions composing Category J weighted by their respective base rates, i.e.,
USi I CJ) = C LCsi I sj) L(sj I cJ)3 (22) jECI
where L(S, 1 Sj) gives the likelihood that S, is generated from the probability distribution corresponding to Sj, and L(S,l C,) gives the likelihood that S, is the actual generating distribution on a Category J trial. Substituting into Eq. (21) gives
b, CjecJ
P(R, I Si) = .& b, c
L(si I sj) L(sj I cJ) L(cJ)
ks CK L(SiI sk) L(sk I C,) L(C,) (23)
The relation between the context model (Eq. (1)) and this exemplar-based likelihood model is readily apparent. Indeed, if the L(Sil Sj) terms are treated as free parameters corresponding to subjective likelihood estimates, with L(S,J S,) = L(S,j S,), then the two models are formally identical-we simply set vii = L(S,I Si). (Of course, we could also remove the restrictions vii= vji and L(S,ISj) = L(S,j Si) and the models would maintain their formal identity.) This
404 ROBERT M. NOSOFSKY
same likelihood-based interpretation is also available for the similarity choice model.
In the case of stimuli varying along independent, binary-valued dimensions, one might imagine that the subject estimates a fixed subjective probability p, of each individual dimension value transforming into its alternative value. Then
L(s,I sj) = fi p$W)(l -p,)Cl --bdi,~)l, (24) Wl=l
where S,(i,j) is the indicator variable defined earlier. Substituting into Eq. (23) and dividing all resulting terms by I-I,“= ,( 1 -p,)
yields
(25)
which is formally identical to the binary-valued version of the context model (Eqs. (1) and (3)) with s, =p,/( 1 -p,). This formal correspondence between the binary-valued versions of the context model and exemplar-based likelihood model was noted previously by Fried and Holyoak (1984, p. 256, Note 7).
Assume instead that we have stimuli varying along independent, multivalued continuous dimensions. The probability density associated with exemplar Sj on dimension m is assumed to be Gaussian with mean pjm and variance 0;. (Note that variability along a dimension is presumed to be constant across stimuli.) The likelihood of some probe x = (x1, x1, . . . . x~) given S, is
Substituting into Eq. (23) and dividing all terms by the factor n,“= r (l/a a,) yields
(27)
Equation (27) is formally identical to the Gaussian/Euclidean MDS-context model (i.e., Eqs. (l), (4), and (5), with p= r = 2 in Eqs. (4) and (5)), with j, in Eq. (4) equal to pjm in Eq. (27) and w, in Eq. (4) equal to l/20: in Eq. (27). To see this, note that in the Gaussian/Euclidean MDS-context model, the distance between probe x and exemplar j is given by
‘44
1 112
dxj= 1 W,(X, -j,)’ m=l
(28)
MODELS OF CLASSIFICATION 405
and the similarity between x and j is given by
= exp [ - Z wm(xm-i.,,)‘] Wl=l
= fi exp [-w,(x,-j,)‘]. m=1
Substituting the expression for q-VI into the context model equation for P(R,I S,) (Eq. (1)) yields Eq. (27). Thus, the Gaussian/Euclidean MDS-context model is formally identical to the exemplar-based likelihood model whenever the exemplar distributions are independent, multivariate Gaussian, with variance on each individual dimension constant across the different exemplar distributions. Note that the relation w, = l/20: implies that reduced variability along a dimension in the likelihood model corresponds to greater “weight” given to that dimension in the similarity model.
In related work, Ashby and Perrin (1988) noted a formal corespondence between their recently developed “general Gaussian recognition model” and the traditional Euclidean multidimensional scaling model of similarity judgments. In the general Gaussian recognition model, it is assumed that similarity judgments are related to proportion of overlap in distributions of “perceptual effects” to which stimuli give rise. Ashby and Perrin showed in particular that, with appropriate transformations of the underlying variables, the Euclidean multidimensional scaling model arises as a special case of the general Gaussian recognition model whenever the distributions of perceptual effects are defined over independent dimensions, with variance along each dimension constant across stimuli. Each individual dimension, however, can have a different variance associated with it. In this case, the weights in the distance model (Eq. (4)) are inversely related to the variance terms in the general recognition model.
The development in the present article linking the Gaussian/Euclidean MDS- context model to the Gaussian exemplar-based likelihood model (Eq. (27)) parallels Ashby and Perrin’s work, although different transformations of the underlying variables are involved in predicting categorization probabilities as opposed to similarity judgments.
Ashby and Perrin’s (1988) general recognition model can also be applied to categorization. The subject is assumed to establish decision boundaries that partition the multidimensional space into response regions. Any perceptual effect falling into Region J would result in a Category J response. To predict the probability with which Stimulus i is classified in Category J, one would integrate over the portion of the Stimulus i distribution that falls into Region J. In fitting the model, one needs to specify the kinds of decision boundaries that the subject adopts (Ashby & Gott, 1988; Ashby & Townsend, 1986). Here, it is of interest to note that decision boundaries defined by summing similarities to individual exemplars (in the
406 ROBERT M. NOSOFSKY
Gaussian/Euclidean model) are the same as those defined by computing exemplar- based likelihoods (assuming independent-dimension Gaussian distributions, with variability along each dimension constant across stimuli). This assertion follows directly from the previous demonstration that the Gaussian/Euclidean MDS- context model is formally identical to the Gaussian exemplar-based likelihood model (Eq. (27)). In the general recognition model, however, rather than probability matching (as in Eq. (27)), subjects would respond with the category that maximizes likelihood (or summed similarity).
3. RELATION BETWEEN THE SIMILARITY CHOICE MODEL AND AN INDEPENDENT
FEATURE ADDITION-DELETION MODEL
Sections 3 and 4 in this article are concerned with identification, in which stimuli are assigned unique responses rather than being classified in groups. As noted earlier, in application to identification, the decision rule in the context model (Eq. (1)) reduces to the similarity choice model (Eq. (2)). The point of Section 3 is to show that a particular process model of identification, in which individual features are probabilistically added to or deleted from an observer’s percept, is a special case of the similarity choice model. Thus, once again, a model conceptualized in terms of “probability” is closely related to a “similarity’‘-based model.
An important type of identification paradigm involves the use of stimuli composed of discrete features, with each feature being either present or absent (Garner, 1978). Because of either external noise or internal noise in the information-
TABLE II
Independent Feature Addition-Deletion Model
Stimulus
0 x
Response
Y -UY
0
x
Y
(l-a,) (1 -a,,)
4( 1 - a) 1
(I-axId,
aAl -q)
(1 -4) (1 -a,)
a4
(1 - 4) a,
4 a,
(1 -a,)
(l-d,)
aTa,
(1 -d,)a,
a,(1 - 4)
XY dx 4 (I-&Id, d,(l - 4) (1 -4)
(l-4)
Note. Entry in cell (i,j) gives the probability that Stimulus i is perceived as Stimulus j. a, = probability that feature m is added; d, = probability that feature m is deleted.
MODELS OF CLASSIFICATION 407
processing system, a given stimulus may not be perceived verdically. Assume, in particular, that there is some probability a, that feature m is added to the percept (assuming that it is not in the original stimulus). These added features would correspond to “ghost” features, such as studied by Townsend and Ashby (1982). Likewise, assume that there is some probability d, that feature m is deleted from the percept (assuming that it is in the original stimulus). Such deletions might be the result of “state limitations,” such as studied by Garner (1974). Furthermore, we assume that these additions and deletions take place independently (although empirical research suggests that this assumption may be wrong, e.g., Townsend, Hu, & Evans, 1984). So, for example, for a set consisting of the null stimulus (a), the stimulus composed of only feature x, the stimulus composed of only feature J, and the stimulus composed of both x and y, the identification confusion matrix would be as illustrated in Table II.
More generally, for any powerset constructed from M features (e.g., M= 2 in Table II), the independent feature addition-deletion model is defined as follows:
DEFINITION 3.1. Let i, = 1 if feature m is present in Stimulus i, and let i, = 0 if feature m is absent (and likewise for j,). According to the independent feature addition-deletion model, the probability that Stimulus i is identified as Stimulus j, P( R, 1 S,), is given by
where
r a, if i, = 0 and j, = 1
f (L,j,) = dm if i, = 1 and j,=O
(1 -%I) if i, =0 and j,=O
(l-4,) if i,= 1 and j,= 1.
(30)
An interesting aspect of this independent feature additiondeletion model is that it can produce patterns of confusions that are highly asymmetric. For example, suppose that in Table II, a, = .1 and d,,, = .4 for both features. Then the probability with which 12/ is identified as xy would by only .Ol, whereas the probability with which xy is identified as @ would be .16. Indeed, Garner and Haun (1978) used a set of discrete-feature stimuli with a logical structure corresponding to the illustra- tion in Table II, and observed highly asymmetric confusions. In a condition in which external noise was used to induce confusions, the direction of errors was from stimuli with fewer features to stimuli with more features. The opposite direc- tion of errors was observed in a condition in which short exposure durations were used to induce confusions.
I next show that for any powerset constructed from M features, the independent
408 ROBERT M. NOSOFSKY
feature addition-deletion model is a special case of the similarity choice model.’ In addition to linking a “probability” model and a “similarity” model, this result is interesting in view of the symmetric-similarity assumption of the choice model (i.e., qti= qjj). As will be seen, the asymmetric patterns of confusions that can be produced by the addition-deletion model are all characterizable in terms of differential bias in the choice model. Smith (1980) analyzed Garner and Haun’s (1978) data in terms of the choice model, and found that the observed asymmetries were indeed characterizable in terms of differential bias. The relation between the choice model and the present addition-deletion model, however, was not part of his analysis.
THEOREM 3.1. The independent-feature addition-deletion modef is a special case of the similarity choice model.
Proof. To prove the formal correspondence between the models, we make use of the well known result that the similarity choice model holds if and only if the following cycle condition is satisfied for all triples i, j, and k:
f’(RjISi) f’(RJ sj) P(RiI S/c)=P(RkISi) P(RjIS,) P(RtI sj) (31)
(e.g., Bishop, Fienberg, & Holland, 1975, p. 287; Smith, 1982). Thus, to prove that the additiondeletion model is a special case of the choice model, it suffices to prove that the additiondeletion model implies the cycle condition.
Substituting the additiondeletion model expression for P( R, ] S,) (Eq. (30)) into Eq. (31), we need to prove
(32)
Equation (32) can be rewritten as
m~~f~L9j,)f(i... k,)f(L L)= fi f(L kJf(k,,j,)f(j,, id. (33) m=l
Equation (33) holds because, as is shown next, for all m,
f (Ljdf (j,, k,)fW,, L) =f CL k,)f (Lj,)f(j,, L). (34)
There are two main cases. Either i, =j, = k, in which case Eq. (34) holds vacuously, or else exactly two of i,, j,, and k, are equal (because each of i,, j,n,
1 F. G. Ashby (personal communication, December 1988) points out that for powersets constructed from M features, the independent-feature addition4eletion model is also a special case of the general recognition theory (Ashby & Townsend, 1986).
MODELS OF CLASSIFICATION 409
and k, is equal to either 0 or 1.) Without loss of generality, assume i, =j, #k,,. Substituting into Eq. (34) gives
f(L* L)f(L, k,)fk, L,)=f(L, k,)f(k,, i,)f(i,, i,),
and the proof is complete. [
The formal correspondence between the choice model similarity and bias parameters and the addition-deletion parameters is as follows. Let I denote the set of features composing stimulus i, and let J denote the set of features composing stimulus j. Then we have
THEOREM 3.2. The choice model similarity parameters are expressed in terms of addition-deletion parameters as
ad, 1 I/Q ‘lij= rI (1 -a,)(1 -4J .
rnEC(l”J?,I~J,l
(35)
(Note that [I u J] - [I n J] corresponds to the distinctive features of i and j.)
Thus, assuming a, < .5 and d,,, < .5 for all m, the (symmetric) similarity between stimuli i and j decreases as the number of distinctive features increases.
THEOREM 3.3. The choice model bias parameters are expressed in terms of addition-deletion parameters as
(36)
where 6, denotes the bias for the null stimulus.
Thus, if additions tend to be more probable than deletions, bias is larger for stimuli with more features, and vice versa if deletions tend to be more probable than additions. Note that in the present case the bias parameters will be reflecting perceptual processes as opposed to purely response-related processes.
Proof of Theorem 3.2. The similarity parameters in the choice model are expressed in terms of predicted cell probabilities as
[
P(R,I Si) P(R,I S,) I’*
ylii= P(RJSJ P(R,IS,) 1
Substituting in terms of the additiondeletion parameters gives
(37)
[
n,M=,f(i,,j,)n,M=,f(j,, i,) I’*
“= n,M=,f(i,,i,)n,M=lf(j,,jm) 1 [ = fi f(i,T/,)f(j,y L) ‘I2
rn= ,f(b, L)f (jm,j,) 1 . (38)
410 ROBERT M. NOSOFSKY
The product in Eq. (38) can be factored into two subproducts, one corresponding to values of m for which i, #j, (i.e., the distinctive feature values for Stimuli i and j), and the other to values of m for which i, =j, (i.e., the common feature values for Stimuli i and j):
All factors in the second subproduct are equal to one, and so we have
(40)
But if i, Zj,, then exactly one of f(im,j,) and f(j,, i,) is equal to a, and the other equal to d,, and so exactly one of f(i,, i,) andf(j,,j,) is equal to (1 -a,) and the other equal to (1 - d,)-see Eq. (30) in Definition 3.1. 1
Proof of Theorem 3.3. The ratio b.Jb, in the choice model is expressed in terms of predicted cell probabilities as
P(R,) Si) P(R,) S,) ‘I* 1 P(R,ISj) P(R,ISi) (41)
Substituting in terms of the addition-deletion parameters yields
4 [
n,M=lf(i,,j,)n,M=,f(j,,jm) “2
b,= II,“=,f(j,, LJll;l;i=,f(L, i,) 1 [ = fi f(il”‘L)f(Md 1/Z. (42)
m=lfL cJf(L cn) 1 Equation (42) can be factored into two subproducts, one corresponding to values
of m for which j, = 1 (i.e., the set of features J that compose Stimulus j), and the other corresponding to values of m for which j, = 0 (i.e., m $ J). Furthermore, for the null stimulus @, i, = 0 for all m. This yields
6. I=
b0
All factors in the second subproduct are equal to one, and we are left with
b- /= a,(1 -d,) 1’2
b0 n
,..,4n(l -en) 1 . I
4. RELATION BETWEEN THE SIMILARITY CHOICE MODEL AND A PERCEPTION/LIKELIHOOD-BASED DECISION MODEL
(44)
The relation between the choice model and the additiondeletion model that was considered in Section 3 was limited to situations involving complete powersets. If an
MODELS OF CLASSIFICATION 411
incomplete powerset were used, there would be trials in which the addition-deletion process would yield a percept for which there was no corresponding response. Clearly, some decision process would need to be appended to the perceptual model to allow it to make predictions across such more general situations.
Section 4 considers a perception/decision model in which decisions are made on the basis of likelihood. Let PS denote the complete powerset that would be generated from the base set of features that is used in the experiment, and let HG PS denote the subset of actual stimuli that is used. Let h, denote the probability that stimulus i is perceived to be stimulus x E PS, and let D, denote the probability with which a decision is made that percept x was generated from stimulus je H. Thus, the probability that stimulus i is identified as stimulus j, P( Rjl S,), would be given by
P(Rjl Si) = 1 hi.yD.v,. YEPS
(45)
In the present model, the subject is assumed to make decisions by estimating the likelihood that x was generated from the alternative members of the set H, and to choose responses by matching to these likelihoods. Assuming that the decision system computes likelihood estimates without bias, and assuming that stimulus base rates are equal, then
D.xj = h j.v
C h’ (46)
ktH kr
Substituting into Eq. 45 gives
P(R,[$)= c h reps Ix (z,?hk,)’
(47)
THEOREM 4.1. The perception/likelihood-based decision model (Eq. (47)) generates predictions of identification confusions that are characterizable in terms of the similarity choice model.
Proof: The cycle condition is obviously satisfied because for all i and j, P(R,I SJ = P(R,I Sj). 1
More generally, assume that decisions are made using biased likelihood estimates, and also that stimulus base rates differ. Then
= b,*p$ (48)
412
where
ROBERT M.NOSOFSKY
(49)
THEOREM 4.2. The biased perceptionllikelihood-based decision model with unequal stimulus base rates (Eq. (48)) generates predictions of identification confusions that are characterizable in terms of the similarity choice model.
ProoJ First, note that p$=p;. Again, the cycle condition is satisfied, because
P(R, I Si) P(R, I S,, P(R, I S,) = b,*p;@p,Zb:p&
= b,*p,*,b,?p,*,b:p,T
= b; p,*b,* p;r;b,* p;r (50)
=P(R,Isi)P(R,lS,)P(RiIS,). I
Thus, for any system in which identification responses are made by combining a biased likelihood-based decision rule with an underlying perceptual process, the confusion data are characterizable in terms of the choice model. Note that the addition-deletion model provides but one example of an underlying perceptual process-the formal identity noted above holds regardless of the perceptual process that generates the hi.,‘s.
SUMMARY
The purpose of this article was to demonstrate relations between the similarity choice model, the context model, and likelihood-based models of classification. The exemplar-similarity choice model approach has a long-standing history of success in making predictions of classification performance. The similarity choice model for predicting identification continues to serve as a standard against which alternative models of identification are compared (e.g., Ashby & Perrin, 1988; Smith, 1980; Townsend & Landon, 1983), and its recent extensions for predicting categorization also appear quite promising (e.g., Medin & Schaffer, 1978; Nosofsky, 1984, 1986). A number of researchers have demonstrated formal relations between choice-model formulations and other classes of confusion models, including sophisticated guessing models and Thurstonian models using a maximal-match response rule (e.g., Smith, 1980; Townsend & Landon, 1982, 1983; van Santen & Bamber, 1981). The present research follows in the spirit of this earlier work by demonstrating relations between choice-model formulations and likelihood-based models of classification. At least in part, the apparent ubiquity of the exemplar-similarity choice-model approach may derive from its close formal relations with these alternative classes of models.
------
------
------
/I
\ J
GAUS
SIAN
-EUC
LIDE
AN
____
___
MDS
-CON
TEXT
M
ODEL
SIM
ILAR
ITY-
LIKE
LIHO
OD
II M
ODEL
----
----
----
----
--
----
_---
---
[ BI
NARY
VA
LUED
EX
EMPL
AR-B
ASED
LI
XELI
HOOD
M
ODEL
FIG.
Al
. Su
mm
ary
of
rela
tions
am
ong
mod
els.
Ho
rizon
tal
dash
ed
lines
in
dica
te
mod
els
that
ar
e fo
rmall
y id
entic
ally
to
one
anot
her.
Horiz
onta
l da
shed
lin
es
labe
led
with
an
“i”
in
dica
te
mod
els
that
ar
e fo
rmall
y id
entic
al
when
ca
tego
ry
dist
ribut
ions
ar
e de
fined
ov
er
indep
rrzde
nf
dim
ensio
ns.
Solid
lin
es
with
ar
rows
in
dica
te
nest
ing
rela
tions
am
ong
mod
els
(the
mod
el
that
is
th
e so
urce
of
th
e ar
row
is t
he
mor
e ge
nera
l m
odel
, an
d th
e m
odel
th
at
rece
ives
the
arro
w is
the
sp
ecia
l ca
se).
GAUS
SIAN
EX
EMPL
AR-B
ASED
LI
KELI
HOOD
M
ODEL
414 ROBERT M. NOSOFSKY
TABLE AI
Glossary of Terms
am b, b, c, 4 4 D., fL i,)
h
fY
Ll (il. i2, . . . . iM)
J
UC,)
us, I C,)
L(j, J)
LdS, I C./J
L*(s, I C,) M urn Pm
P(R,Is,) f’(R,I S,) PS
4 RJ
sm
Jb,/m
St w
S;i,i)
probability with which feature m is added to an observer’s percept
bias parameter associated with identification-response j
bias parameter associated with categorization-response J
category J
distance between stimuli i and j
probability with which feature m is deleted from an observer’s percept
probability with which a decision is made that percept x was generated by stimulus j
factor that appears in the independent-feature addition-deletion model that denotes the probability with which additions and deletions take place (see Eq. (30))
probability that stimulus i is perceived as percept x
subset of stimuli used in an identification experiment
value of stimulus i on dimension M set of dimension values that composes stimulus i
set of features composing stimulus j
likelihood of category J
likelihood of stimulus j given a category J trial
likelihood that stimulus j is presented in conjunction with category J; L(j, J)=
L(S,I C,) L(CJ) estimated likelihood that stimulus i is presented on a category J trial for the independent- feature likelihood model
similarity-likelihood of stimulus i given category J
number of dimensions composing the stimuli
number of values on dimension M
probability that the value on dimension m is transformed to its alternative value in the binary-valued exemplar-based likelihood model
probability that stimulus i is identified as stimulus j
probability that stimulus i is classified in category J
powerset of stimuli constructed from M features
identification response j
categorization response J
similarity parameter on dimension m for the binary-valued version of the context model
similarity between values i, and j, on dimension m for the multivalued version of the context model
Stimulus i
weight given to dimension II? in computing distance
indicator variable equal to one if stimuli i and j mismatch on dimension m, and equal to zero otherwise
similarity between stimuli i and j
likelihood of dimension-value i, given category J
similarity-likelihood of dimension-value i, given category J
mean of stimulus-distribution j on dimension M
variance on dimension m
the null stimulus
MODELS OF CLASSIFICATION 415
TABLE AI1
Summary of Definitions of Models
Categorization-Strength Terms for P( RJI S,)
Model Strength term
MDS-context model
Gaussian/Euclidean MDS-context model
Independent-feature likelihood model
Independent-feature similarity-likelihood model
Context model
Binary-valued context model
Multiplicative-similarity context model
b, c m J)vc, /EC,
Context model with
k/=;r m ShdLl)
I?=,
Context model with
vi/= fi Sh,“, ml=,
Context model with M
v,, = exp( -dg), 4, = C warnI h,, -Ll’ m=,
MDS-context model withp=r=2
b,L,(S,l CJ) UC,)
with L,(S, 1 C,) = fi &i, 1 C,) /?I=,
b,L*(S,I CJ) UC,) with
M
MDS-based independent-feature similarity-likelihood model
Exemplar-based h./ c -uj> 4 us,1 s,, likelihood model JEr; Binary-valued exemplar-based likelihood model
Exemplar-based likelihood model with
L(S, I S,) = f; p~““‘( 1 -Pm)cl -‘Lcvr1
!?I=,
L*(s,Ic,)==L*(i,, iz ,..., iMIc,)= n B*(i,IcJ), m= I
Q*(L#Ic,)= : w,lc,)~,m,m ,m= 1
Independent-feature similarity-likelihood model with
(Table continued)
416 ROBERT M. NOSOFSKY
TABLE AII-Continued
Gaussian exemplar-based likelihood model
Similarity-choice model h,%
Independent-feature additiondeletion model
Perception/likelihood- based decision model
Exemplar-based likelihood model with
L(S,I.s,)= ;; -J-e -(L-P,,)’ m=,&rm 24
Identification-Strength Terms for P(R,) S,)
if i,=O and j,=I
APPENDIX
This appendix provides two tables and a figure to summarize the definitions and relations among the various models. Table AI is a glossary of terms. Table AI1 summarizes the definitions of the models. Note that all models are predicting probabilities in n x m confusion matrices, where n is the number of stimuli and m the number of responses. For the categorization models, m <n; and for the identification models, m = n. The definitions of the models are summarized by listing, for each model, the “strength” with which Stimulus i is classified in Category J (or, for the identification models, the strength with which Stimulus i is identified as Stimulus j). To predict the probability with which Stimulus i is classified in Category J, one divides this strength by the sum of strengths with which Stimulus i is classified in all of the categories. The strengths listed for the independent-feature additiondeletion model and the perception/likelihood-based decision model are actual probabilities, so there is no need to normalize them. Figure Al summarizes the relations among the various models.
ACKNOWLEDGMENTS
This work was supported by NSF Grant BNS 87-19938 to Indiana University. The work was stimulated by discussions that I had with Lisbeth Fried and Keith Smith while I was a Visiting Assistant Professor at the University of Michigan. I thank the members of the Human Performance Center for the opportunity that the semester provided. I also thank Greg Ashby, Danny Ennis, Tony Marley, and an anonymous reviewer for their criticisms of an earlier version of this article.
MODELS OF CLASSIFICATION 417
REFERENCES
ASHBY, F. G., Kc GOTT, R. E. (1988). Decision rules in the perception and classification of multi- dimensional stimuli. Journal of Experimental Psvchology: Learning, Memory, and Cognition, 14. 33-53.
ASHBY, F. G., & PERRIN. N. A. (1988). Toward a unified theory of similarity and recognition. Ps-vchological Review, 95, 124-I 50.
ASHBY. F. G.. &TOWNSEND, J. T. ( 1986). Varieties of perceptual independence. Psychological Reaiew, 93, 154-179.
BISHOP, Y. M., FIENBERG, S. E., & HOLLAND, P. W. (1975). Discrete multivariafe analysis: Theory and practice. Cambridge, MA: MIT Press.
CARROLL, J. D., & WISH, M. (1974). Models and methods for three-way multidimensional scaling. In D. H. Krantz, R. C. Atkinson, R. D. Lute, & P. Suppes (Eds.) Contemporary developmenrs in mathematical ps,vchology. Vol. 2. San Francisco: Freeman.
ESTES. W. K. (1986). Array models for category learning. Cognitive Psvchology, 18, 50@549. FRIED, L. S.. & HOLYOAK. K. J. (1984). Induction of category distributions: A framework for classi-
fication learning. Journal of Experimental Psychology: Learning, Memory, and Cognition, 10. 234257.
GARNER, W. R. (1974). The processing qf information and structure. New York: Wiley. GARNER, W. R. (1978). Aspects of a stimulus: Features, dimensions, and configurations. In
E. Rosch & B. B. Lloyd (Eds.), Cognition and categorization. Hillsdale, NJ: Erlbaum. GARNER. W. R.. & HAUN, F. (1978). Letter identification as a function of type of perceptual limitation
and type of attribute. Journal of E.uperimenral Psycholog-v: Human Percepption and Performance, 4, 199-209.
LUCE, R. D. (1963). Detection and recognition. In R. D. Lute, R. R. Bush, & E. Galanter (Eds.), Handbook of mathematical psychology (pp. 103-189). New York: Wiley.
MEDIN, D. L., & SCHAFFER, M. M. (1978). Context theory of classilication learning. Pswhologicaf Review, 85, 207-238.
NO~OFSKY, R. M. (1984). Choice, similarity, and the context theory of classification. Journal of E.yperimental Psychology: Learning, Memory and Cognition, 10, 104-l 14.
NOSOFSKY. R. M. (1985). Overall similarity and the identification of separable-dimension stimuli: A choice model analysis. Perception and Psvchophysics, 38, 415432.
NOSOFSKY, R. M. (1986). Attention, similarity, and the identitication+ategorization relationship. Journal of Experimental Psvchology: General, 115, 39-57.
NOSOFSKY, R, M. (1988). Similarity, frequency, and category representations. Journal of E.uperimental Psychology: Learning, Memory and Cognition, 14, 5465.
REED, S. K. (1972). Pattern recognition and categorization. Cognitive Psychology, 3, 382407. SHEPARD, R. N. (1957). Stimulus and response generalization: A stochastic model relating generalization
to distance in psychological space. Psychomefrika, 22, 325-345. SHEPARD, R. N. (1958). Stimulus and response generalization: Deduction of the generalization gradient
from a trace model. Psvchologieal Review, 65, 242-256. SKEPARD, R. N. (1987). Toward a universal law of generalization for psychological science. Science, 237,
1317-1323. SMITH, J. E. K. (1980). Models of identification. In R. Nickerson (Ed.), Allenlion andperformance VIII.
Hillsdale, NJ: Erlbaum. SMITH, J. E. K. (1982). Recognition models evaluated: A commentary on Keren and Baggen. Perception
& Psychophysics. 31, 183-189. TOWNSEND, J. T.. & ASHBY. F. G. (1982). Experimental tests of contemporary mathematical models of
visual letter recognition. Journal of E.xperimental Psychology: Human Perception and Performance, 8, 834-864.
TOWNSEND, J. T., Hu, G. G., & EVANS, R. J. (1984). Modeling feature perception in brief displays with evidence for positive interdependencies. Perception & Psychophysics, 36, 3549.
418 ROBERT M.NOSOFSKY
TOWNSEND, J. T., & LANDON, D. E. (1982). An experimental and theoretical investigation of the constant-ratio rule and other models of visual letter confusion. Journal of Mathematical Psychology, 25, 119-162.
TOWNSEND, J. T.. & LANDON, D. E. (1983). Mathematical models of recognition and confusion in psychology. Mathematical Social Sciences. 4, 25-71.
VAN SANTEN, J. P. H., & BAMBER, D. (1981). Finite and infinite state confusion models. Journal of Mathematical Psychology, 24. 101-l 11.
RECEIVED: July 11, 1988