23
Journal a) Experimental Psychology: General 1976, Vol. 105, No. 3, 254-276 A Fuzzy Set Approach to Modifiers and Vagueness in Natural Language Harry M. Hersh and Alfonso Caramazza Johns Hopkins University SUMMARY Recent developments in semantic theory, such as the work of Labov (1973) and Lakoff (1973), have brought into question the assumption that meanings are precise. It has been proposed that the meanings of all terms are to a lesser or greater degree vague, such that, the boundary of the application of a term is never a point but a region where the term gradually moves from being applicable to nonapplicable. Developments in fuzzy set theory have made it possible to offer a formal treat- ment of vagueness of natural language concepts. In this article, the proposition that natural language concepts are represented as fuzzy sets of meaning compo- nents and that language operators—adverbs, negative markers, and adjectives— can be considered as operators on fuzzy sets was assessed empirically. In a series of experiments, we explored the application of fuzzy set theory to the meaning of phrases such as very small, sort of large, and so on. In Experiment 1, subjects judged the applicability of the set of phrases to a set of squares of varying size. The results indicated that the group interpretation of the phrases can be characterized within the framework of fuzzy set theory. Similar results were obtained in Experiment 2, where each subject's responses were analyzed individually. Although the responses of the subjects, in general, could be interpreted in terms of fuzzy logical operations, one subject responded in a more idiomatic style. Experiments 3 and 4 were attempts to influence the logical-idiomatic distinction in interpretation by (a) varying the presentation mode of the phrases and by (b) giving subjects only a single phrase to judge. Overall, the results were consistent with the hypothesis that natural language concepts and operators can be described more completely and more precisely using the framework of fuzzy set theory. Picture a conversation between a mother mother's statement. If the son had been and her S-year-old son. The son asks the asked directly how many constitute a few, mother if he might have some jelly beans he probably would have hesitated for a from the bowl on the table. The mother moment and then responded "three" (or replies that he may take a few, and he does "four," or "five," and so on). Pose the just that. It is a perfectly normal conversa- same question to the mother, and the odds tion until one stops and considers how are that she would have responded with many a few are. The mother obviously not too different a number. The number of knows what is meant by the term a few. beans taken might have differed from what The information was understood by her either person might have replied, but the son because he responded correctly to the actual number was probably an acceptable 254

A Fuzzy Set Approach to Modifiers and Vagueness in Natural

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Journal a) Experimental Psychology: General1976, Vol. 105, No. 3, 254-276

A Fuzzy Set Approach to Modifiers and Vaguenessin Natural Language

Harry M. Hersh and Alfonso CaramazzaJohns Hopkins University

SUMMARY

Recent developments in semantic theory, such as the work of Labov (1973)and Lakoff (1973), have brought into question the assumption that meanings areprecise. It has been proposed that the meanings of all terms are to a lesser or greaterdegree vague, such that, the boundary of the application of a term is never apoint but a region where the term gradually moves from being applicable tononapplicable.

Developments in fuzzy set theory have made it possible to offer a formal treat-ment of vagueness of natural language concepts. In this article, the propositionthat natural language concepts are represented as fuzzy sets of meaning compo-nents and that language operators—adverbs, negative markers, and adjectives—can be considered as operators on fuzzy sets was assessed empirically. In a seriesof experiments, we explored the application of fuzzy set theory to the meaningof phrases such as very small, sort of large, and so on.

In Experiment 1, subjects judged the applicability of the set of phrases to aset of squares of varying size. The results indicated that the group interpretationof the phrases can be characterized within the framework of fuzzy set theory.Similar results were obtained in Experiment 2, where each subject's responseswere analyzed individually. Although the responses of the subjects, in general,could be interpreted in terms of fuzzy logical operations, one subject responded ina more idiomatic style.

Experiments 3 and 4 were attempts to influence the logical-idiomatic distinctionin interpretation by (a) varying the presentation mode of the phrases and by (b)giving subjects only a single phrase to judge.

Overall, the results were consistent with the hypothesis that natural languageconcepts and operators can be described more completely and more precisely usingthe framework of fuzzy set theory.

Picture a conversation between a mother mother's statement. If the son had beenand her S-year-old son. The son asks the asked directly how many constitute a few,mother if he might have some jelly beans he probably would have hesitated for afrom the bowl on the table. The mother moment and then responded "three" (orreplies that he may take a few, and he does "four," or "five," and so on). Pose thejust that. It is a perfectly normal conversa- same question to the mother, and the oddstion until one stops and considers how are that she would have responded withmany a few are. The mother obviously not too different a number. The number ofknows what is meant by the term a few. beans taken might have differed from whatThe information was understood by her either person might have replied, but theson because he responded correctly to the actual number was probably an acceptable

254

FUZZY SET APPROACH IN NATURAL LANGUAGE 255

definition from the mother's point of view.If it were not, the son certainly wouldhave heard about it!

This scene demonstrates the transmissionof some vague, quantitative informationbetween two people. But actually, whenone stops and considers, most of thequantitative information that one receivesduring the course of a day is of just thisnature. A house may be quite large; a girlmay be sort of short; a man may be notvery old. Not only do people understandsuch statements, they also have the abilityto operate upon and manipulate thesevague concepts.

Recently, there has been considerableinterest on the part of linguists and com-puter scientists in just such problems asthe role of vagueness in language and thequantification of meaning. Much of thisinterest has been the result of the develop-ment of fuzzy set theory, a generalizationof the traditional theory of sets. A majorfeature of fuzzy set theory is that a quanti-tatively specifiable system can containlinguistic variables in addition to numericvariables. These linguistic variables can bemanipulated and operated upon in muchthe same way as numeric variables innonfuzzy systems.

This new way of dealing with complexsystems appears quite promising in termsof the specification of complex behavioralprocesses, such as the measurement ofword meaning or the description of reason-ing processes in everyday situations. Inthis article, we explore the possibilities ofdeveloping a treatment of (a) naturallanguage concepts as fuzzy sets and (b)modifiers as operators on fuzzy sets.

This research was supported in part by PublicHealth Service Grant 5-T01-GM02148 to The JohnsHopkins University.

We would like to express our appreciation toWilliam Labov for clarifying several linguisticissues, and to Bert Green and Howard Egeth for the.advice they have given us in many stages of thiswork.

Requests for reprints should be sent to HarryM. Hersh, Pattern Analysis and RecognitionCorporation, Rome, New York 13440.

Quantification of Meaning

One of the first attempts at the quanti-fication of meaning was made by Mosier(1941). Mosier hypothesized that the mean-ing of a word may be considered as con-taining two components: (a) a constantcomponent reflecting the overall meaningvalue along a continuum and (b) a variablecomponent representing the variation inthe meaning of the word due to context,speaker, and the like. He defined themeaning (M) of a word as:

M = x + i + c,

where x equals the constant componentover people and context, i equals thevariation in meaning due to the individual,and c equals the variation in meaning dueto the context.

According to this theory of word mean-ing, any one of the components could bezero for certain words, and for ambiguouswords, x could have multiple values. Inaddition, there is an assumption of aunidimensional continuum along whichevery word must fall. The model also pre-dicts, for example, that context effects willbe independent of the word used and theindividual involved. Although these andother simplifications reduce the model'sexplanatory and predictive power con-siderably, it was probably the first signif-icant attempt made to quantify meaning.

More interesting than Mosier's (1941)theoretical formulations are his empiricalinvestigations in the same article. He hadsubjects rate a list of evaluative adjec-tives (e.g., unsatisfactory, excellent) alonga favorable-neutral-unfavorable con-tinuum. He then scaled the responses bythe method of successive intervals. Thescale values for each word were interpretedas the constant component of the meaningof the word, while the spread of the dis-tribution (i.e., the vagueness) was inter-preted as ambiguity, or the variable com-ponent. The data tended to support hismodel in that it was possible to assign eachword to a scale value along a unidimensionalcontinuum, and the variation in meaningabout this scale value was normally dis-

256 HARRY M. HERSH AND ALFONSO CARAMAZZA

tributed. (These results were later sup-ported by Jones and Thurstone, 1955, fora set of preference words and phrases.)

Mosier's subjects also rated a subset ofthe adjectives paired with adverbial modi-fiers, or intensifiers (e.g., very, extremely).An analysis of these data showed that theaddition of an intensifier caused a shift inthe meaning of the base word away fromthe neutral point toward the extreme.This finding that adverbs tended to causea shift in the scale values served as the basisfor a later study on the influence of adverbsby Cliff (1959).

M osier had shown that the meaning ofan evaluative adjective could be repre-sented as a point along a unidimensionalcontinuum reflecting favorableness. Osgoodand his colleagues (Osgood, 1952; Osgood,Suci, & Tannebaum, 1957) felt that it waspossible to specify the meaning of a widerrange of words by rating the words on ajudiciously chosen set of scales. Thesescales were combined to form the semanticdifferential. Using this procedure, a subjectis presented with a word (e.g., quicksand),which he characterizes by rating it along anumber of antonymous scales, such asclean-dirty, fast-slow, and serious-hu-morous. The resulting profile is consideredto be a multidimensional representation ofthe meaning of the word. Osgood's (1957)factor analysis of the various scales of thesemantic differential yielded three mainfactors, which could be called potency,activity, and evaluation.

According to Osgood (1957), the simi-larity between two words (or between thesame word for different populations) isinterpreted as the Euclidean distance be-tween corresponding profiles. Using thisdefinition of the distance between con-cepts, Rowan (Note 1) had subjects rate alist of words (e.g., sleep, hero, gentleness)using the semantic differential and themethod of triads. The distances betweenword pairs obtained from a multidimen-sional scaling of the triad data correlatedhighly with the distances from the semanticdifferential. In addition, the first two di-mensions from the multidimensional scalingcould be interpreted as evaluation and po-

tency/activity, the major factors derivedfrom the semantic differential.

In the Mosier (1941) and the Jones andThurstone (1955) studies, the rating scale(unfavorable-neutral-favorable) could beconsidered to capture a dimension of themeaning of the terms used. Thus, the wordswere rated according to the relative valueof the corresponding denotative, or ex-tensive, component. These ratings can beinterpreted as the meaning of the words tothe extent that both share a similar quanti-tative interpretation in the same context.Similarly, the semantic differential seemsto demonstrate that subjects have theability to rate the words in such a mannerthat overall differences between words arereflected in differential scale values. Thesemantic differential, however, does notdemonstrate that the scale values reflect inany way the overall meaning of a word.Subjects may have the, ability to rate aword such as moon as having a value on akind-cruel scale. Even if this rating re-flects a connotative property of the word,connotation is but one aspect of the mean-ing of the word. However, most scales onthe semantic differential appear to reflectsuch connotative components. To suggestthat word meaning can be uniquely speci-fied by any number of such (nonorthogonal)scales implies that these scales representall possible qualitative components inmeaning—connotative and denotative. Ithas not been demonstrated that thesemantic differential represents the fullmeaning of words in any comprehensivemanner. Whether such a theory could everadequately represent meaning is unclear.

The above studies have all representedmeaning as a point along one or morerating scales. Mosier (1941) showed thatthe meaning of evaluative adjectives couldbe represented by scale values. How thenmight adverbs be characterized? Cliff(1959), extending Mosier's findings, pro-posed a formal model where adverbs func-tioned as multiplicative constants. If thescale value of the jth adjective is Sj, thenthe value of an adverb-adjective combina-tion is represented as:

xn = dSj + K,

FUZZY SET APPROACH IN NATURAL LANGUAGE 257

where «y equals the obtained scale value ofthe *th adverb in combination with the j'thadjective, d equals the multiplying valueof the ith adverb, Sj, equals the psycho-logical scale position of the jth adjective,and K equals the difference between thearbitrary zero point of the obtained scalevalues and the psychological zero pointof the scale.

To test the model, Cliff had subjectsrate a set of adjective-adverb combinations(e.g., extremely lovable) along a favorable-unfavorable continuum, as in the Mosierstudy. Over all combinations of 10 adverbsand 15 adjectives, the predictions of themodel were extremely accurate. Values ofci and Sj were completely independent.The only deviation from the model wasthat the value of K appeared to varyslightly as a function of the adjectiveemployed. Overall, the hypothesis thatadverbs operate as multipliers was clearlyconfirmed. The results explain why com-binations such as unusually average appearso awkward: The scale value of average isapproximately zero, so multiplying zeroby any finite value will leave a product ofzero—the adverb is superfluous.

Quantification of Vagueness

An underlying assumption in the workon meaning quantification has been that themeanings of words can be specified as pointsalong a scale. The variability about thescale value is attributable to the statisticalnature of the system. Recent developmentsin semantic theory, such as the work ofLabov (1973) and Lakoff (1973), havebrought into question the assumption thatmeanings are'precise. Instead, it has beenproposed that natural language terms areto a lesser or greater degree inherentlyvague, such that, the boundary of a termis never a point but a region where the termgradually moves from being applicable tononapplicable. Though this question of theinherent vagueness of language has onlyrecently become of concern to psycholo-gists, it has occupied the attention ofphilosophers since the time of the Greeks,who posed the question of vagueness in theform of the paradoxes of sorites (the heap)

and falakros (the bald man). The latterparadox might take the form: How manyhairs must be plucked from a man's headbefore he is considered to be bald? Andgiven that there occurs a transition frombeing not bald to being bald, where is theone strand of hair that determines thetransition, that is, where does one drawthe line?

This paradox raises obvious difficultiesfor the law of the excluded middle: Itappears unreasonable to assume that fora concept such as baldness every elementof the universe is either a clear member ofthe set or a clear member of the comple-ment of the set. The problem does not existfor artificial concepts with sharply definedboundaries, but appears unsolvable forvague, natural language concepts.

But what is a vague concept, and can itbe specified? This was a rather productivearea of research in philosophy during theearly part of the twentieth century (e.g.,Black, 1937; Copilowish, 1939; Hempel,1939; Peirce, 1902; Russell, 1923; see alsoBlack, 1963; Korner, 1957; Labov, 1973;Schmidt, 1974). Russell (1923) argued thatthe world is neither vague nor precise: It iswhat it is. Vagueness (and its complement,precision) are characteristics that can onlybelong to a symbolic representation, suchas language. He further argued that arepresentation is vague when there existsa one-to-many relationship between therepresenting system and the system beingrepresented.

Black (1937) agreed that the problem ofvagueness is a property of natural language,but he faulted Russell for confoundingvagueness and generality (p. 432, Note 12).Black differentiated these two terms froma third, ambiguity. Generality is whatRussell described as vagueness, that is, aone-to-many relationship between a symboland the items that the symbol represents.Ambiguity is the state of affairs where thesame phonetic form has a finite number ofalternative meanings. The vagueness of asymbol, however, will not be found as theresult of a one-to-many relation nor anumber of alternative meanings. Blackreasoned that vagueness is a feature of the

258 HARRY M. HERSH AND ALFONSO CARAMAZZA

,, REGION OF CERTAINTY

REGION OF CERTAINTY •-

S , THE DOMAIN OF APPLICABILITY

FIGURE 1. Hypothetical consistency profile.

boundary of a symbol's extension, not ofthe symbol itself. No matter how closelyone looks or accurately one measures, thevagueness remains.

Peirce (1902) defined vagueness in asimilar manner: "A proposition is vaguewhen there are possible states of thingsconcerning which it is intrinsically un-certain whether, had they been contem-plated by the speaker, he would haveregarded them as excluded or allowed bythe proposition" (p. 748). There are someobjects that a group of speakers of alanguage would definitely consider to bechairs; others that would never be calledchairs. However, there will always be someobjects that tend to straddle the boundary;no matter how closely one examines them,these objects cannot be classified as clearlybelonging or not belonging to the categoryin question. It appears then that vaguenessenters in the process of mapping a linguisticterm onto a universe. That is, what is vagueis the use of the linguistic term. (The termnatural language concept thus refers to theresult of this mapping operation, whilelinguistic term refers to the verbal labelapplied to the particular mapping. However,throughout the present article these twolabels are used interchangeably to refer toboth a term and its application.)

Black (1937), using what he termed a"consistency profile," first attempted aquantitative description of vagueness asdefined by Peirce (1902). He hypothesizedthat while the vagueness of a word impliesvariability in the application of a term by

a group of language users, the variationsshould be specifiable and systematic. Ifthey are not, it would be impossible todistinguish between terms. He defined theconsistency of application of a term, T,to an element, s, of a set, S (i.e., S = {s}),as in:

„,_ . .. MC(T, s) = hm —,

where M equals the number of judgmentsthat T applies to s, and TV equals the numberof judgments that not T applies to s. Theconsistency profile was then defined as thefunction C(T, s) over the domain of ap-plicability, 61. Figure 1 depicts a typicalexample, where it can be seen that the mostdoubtful cases correspond to C(T, s) — 1.0.This profile, then, was used to define thevagueness of a term by taking the slope ofthe curve from Point b to Point c (seeFigure 1) as an index of vagueness.

In the same article, Black put forthalternative formulations for describing thevagueness of a term. He redefined the con-sistency of application as T(s, C). Using thislatter notation, a term, T, will be said toapply to an item, s, of a series, S, with aconsistency, C. The term not T will thenapply with a consistency, \/C. Thus, thelaw of the excluded middle can be replacedby a reciprocal relation where the productof the applicability of a term and itscomplement is always unity.

Attacking the same problem of vague-ness in language, Hempel' (1939) redefinedthe consistency of application of a term,r, to an object, s, as:

C(T, s} = liraM

M-N -

M + N'

where M and N are as defined above. As aconsequence, the range of C is restricted tothe closed interval 0 to 1, and the "doubt-ful" cases take on values of about 1/2. (Ina recent study of word boundaries, Labov,1973, implicitly used just this type offormulation to describe the manner inwhich the form of objects and context;

FUZZY SET APPROACH IN NATURAL LANGUAGE 259

interact to influence the extension of wordboundaries.)

Hempel argued that Black's technique ofspecifying the vagueness of a term as theslope of the borderline region was inade-quate, since the scale on the abscissa wasarbitrary. Hempel reasoned, using hisformulation, that the greater the vaguenessin a term, the more objects there would befor which C(T, s) takes on values around1/2. The precision (pr) of a term thencould be formalized as:

_ £ ̂N *-i

and the vagueness (vg) as:

vg = 1 — pr.

Specifying the vagueness of a term in thismanner, independent of the slope of theconsistency profile, avoids another problemencountered by Black (1937). As Figure 1shows, there are two regions of certainty(a — b and c — d) and a region connectingthese, which has been called the border-line region, or fringe. Black specified thevagueness of a term as the slope of thefringe. But what delimits the fringe area?Are there precise points at which the un-certainty begins or ends (as depicted inFigure 1) ? One seems drawn to the con-clusion that the extent of the boundariesof a term's application seems to fade in andout in an almost imperceptible manner.There is a smooth, continuous transitionfrom applicability to uncertainty to non-applicability. The entire extension of theterm must then be considered to be vague,for the boundary extends (asymptotically)along the entire continuum.

The above discussion of the presence ofvagueness in natural language arguesagainst the Thurstonian model of meaningas put forward by Mosier (1941) and Cliff(1959). Such a theory implies that naturallanguage concepts are precise and thus canbe represented as points along a continuum.The observation that concepts form dis-tributions is explained in terms of thestatistical nature of the system. But if one

accepts the theoretical position that naturallanguage concepts are inherently vague,then the point concept becomes an in-appropriate theoretical construct. Not onlycan a vague concept refer to a range, butalso the variability is an integral part ofthe meaning, of the concept.

Theory of Fuzzy Sets

Computer scientists and systems engi-neers have long recognized that people canunderstand and operate upon vague,natural language concepts. Computers,however, are extremely rigid and preciseinformation-processing systems. This in-herent rigidity severely limits a computer'sability to abstract and generalize funda-mental conceptual functions. Recently,Zadeh (1965, 1973) and others (e.g.,Goguen, 1967, 1969; Santos, 1970; Le-Faivre, Note 2) have developed quantita-tive techniques for dealing with vaguenessin complex systems. The techniques arebased on fuzzy set theory, a generalizationof the traditional theory of sets.

The unique feature of fuzzy logic is thatit allows complex systems to contain bothnumeric and linguistic variables, where alinguistic variable is defined as a label of afuzzy set. For example, the linguisticvariable age may take on values of veryyoung, rather young, young, middle age, notvery old, and so on, in addition to a range ofnumeric values. The assumption underlyingthese fuzzy sets is that the transition frommembership to nonmembership is seldoma step function; rather, there is a gradualbut specifiable change from membershipto nonmembership. In nonfuzzy set theorya membership (characteristic) functionspecifies which elements are members ofthe set (i.e., for which elements * G X has atruth value of 1). In fuzzy systems, thegrade of membership and the correspondingtruth value of the proposition x G X maytake on any value in the closed real intervalfrom .0 to 1.0.

Fuzziness is distinctly different from un-certainty as measured by the probabilityof an event. The uncertainty of a coin tossresulting in a head has a certain probability

260 HARRY M. HERSH AND ALFONSO CARAMAZZA

associated with it. No vagueness is involved—only lack of knowledge concerning anevent occurring in the future. Once thisknowledge becomes available, the state ofaffairs is completely determined. Referringback to the problem of falakros (the baldman), the concept baldness can be con-sidered to be fuzzy. Unlike a coin toss, nomatter how closely one measures or ex-amines, the concept will apply more tosome elements of the universe (men) thanto others. No amount of information canmake the boundary between bald and notbald free of imprecision.

To aid in discussions later in this article,it might be helpful at this point to sum-marize some of the important propertiesof fuzzy sets and show how traditional settheory is a special case of fuzzy set theory.(Much of this discussion is taken fromZadeh, 1965 and 1973.) Let X be a uni-verse of points (or elements, or objects,or . . .) with a generic element of X de-noted by x, A fuzzy subset of X, labeled A,is characterized by a membership function,/A, that associates with each element xin X a real number, /A (x), in the closedinterval from .0 to 1.0, which represents thegrade of membership of x in A. An elementof the fuzzy set A thus can be designatedby the ordered pair:

If A (*),*]. (1)

where /A (x) is the grade of membership ofx in A. Note that the nearer the value of/A(X) to 1.0, the higher the grade of mem-bership of x in A. When A is a nonfuzzyset, its membership function can take ononly values of 1 and 0 according to whetherx does or does not belong to A, respectively.

As an example, consider the concept tall,where the membership function specifiesthe grade of membership of heights (ininches) in the set labeled tall.1 Representa-tive values might be: /(60) = .0, /(66) = .2,/(68) = .7, and/(74) = 1.0. Thus someonewhose height is 5 ft. (152 cm) clearly is nottall, someone 5 ft. 8 in. (173 cm) is moretall than not tall, and someone 6 ft. 2 in.(188 cm) is clearly tall. It is important torealize that although the membershipfunction might be defined precisely, it

does not follow that the concept itself isprecise over the domain.

Membership in a fuzzy set (as in anonfuzzy set) is specified by a mappingfrom the universe to the set in question.This mapping may be performed byenumeration, by a function, by analgorithm, and the like. Whatever themethod, the result will be that every ele-ment in X will have associated with it anumber corresponding to its grade ofmembership in that fuzzy set. Note that therelations among the elements (or amongrelevant dimensions of the elements) neednot specify a continuum. Where the rele-vant relations among the elements cannotbe ordered, the mapping onto the fuzzysubsets would be by enumeration or analgorithm. However, where at least anordinal relationship of elements (along oneor more relevant dimensions) is implied,the grade of membership may be deter-mined by a function, as well as by enumera-tion or an algorithm. Once this mapping isspecified, the set can be used as a linguisticvariable in fuzzy inferences and algorithmsand can be modified by operations such asnegation and union.

In both fuzzy and nonfuzzy set theorythe support of the set A is the set of ele-ments in X for which /A (*) > 0. A (fuzzyor nonfuzzy) singleton is a set whose sup-port is a single element in X. Therefore, afuzzy set A may be specified as the union(see Equation 10) of its constituentsingletons. Thus A may be represented by:

= I CM*),Jx

*], (2)

1 For directness of exposition, concepts are as-sumed to be context constant (but not contextfree) here and throughout the; study. In terms ofmembership functions, this implies that althoughthe functions most likely contain more than oneindependent variable, the context-relevant variableshave been held constant. For example, the member-ship function for tall is obviously influenced by theperspective (i.e., height) of the respondent. Assuminga constant perspective here in no way limits theapplicability of the fuzzy set approach. (But seeLabov, 1973, for a preliminary investigation of theinteraction of form and context in referentialmeaning.)

FUZZY SET APPROACH IN NATURAL LANGUAGE 261

where the integral sign represents theunion of the fuzzy singletons \^/A(X), x"^\.If the fuzzy set A has finite support (i.e., ifthere are a finite number of elements whosegrades of membership are greater than 0),then Equation 2 may be replaced by thesummation :

A = [/A (*!),*!]+ [/A (*.),*»]

+ . . .+ [/A (*„),*»], (3)

or

(4)

in which fA (xi) is the grade of membershipof Xi in A. Note that the + sign in Equa-tion 3 specifies the union of elements (seeEquation 10) and not the algebraic sum.For example, given that the domain is thenatural numbers, a fuzzy set labeled fewmight be defined as:

few = (.2, 1) + (.2, 2) + (.8, 3)+ (1.0, 4) + (1.0, 5)+ (.8, 6) + (.5, 7), (5)

where the support of few is the set ofnumbers 1, 2, . . . , 7 ; that is, according tothe definition, numbers greater than 7have a grade of membership of .0 in theset labeled few. Notice that each orderedpair relates a natural number to the gradeof membership of that number in theconcept. This relationship may be extendedby having the grade of membership itselfbe a fuzzy set. For example (from Zadeh,1973), if the universe, X, is:

X = Tom + Jim + Dick + Bob, (6)

and A is the fuzzy set agile, then:

agile = (medium, Tom) + (low, Jim)+ (low, Dick) -I- (high, Bob). (7)

Low, in turn, is a fuzzy subset of the uni-verse of possible grade of membershipvalues (referring mainly to values near .0),or:

low = (.5, .2) + (.7, .3) + (1.0, .4)+ (.7, .5) + (.5, .6). (8)

A crossover point in ^4 is defined as anelement that possesses a grade of member-ship in A of .5. Thus, penguin might be acrossover point in the set of birds, meaningthat a penguin is as much a bird as it isnot a bird.

Two fuzzy sets, A and B, are equal(A = B), if and only if fA(x) =/*(*) forall x in X. Where the grades of membershiptake on values of either 0 or 1, this relationreduces to the traditional set theorydefinition of the equality of two sets. Ais contained in B (or B entails A, or A is afuzzy subset of B, i.e., A C B) if and onlyif/x(«) </*(*) for all * in JT.

Operations on Fuzzy Sets

Negation, logical and algebraic opera-tions, hedges, and other terms that in-fluence the representation of linguisticvariables can be considered as labels ofvarious operations defined on fuzzy sub-sets of X. The more basic operations willbe reviewed here. For a more completeoverview, see Zadeh (1965, 1973) andGoguen (1967, 1969).

The complement of a set A is denotednot A, or A, and is defined as:

not A = -M*),*], (9)

that is, as in nonfuzzy set theory, theoperation of complementation correspondsto negation. It is interesting to note thatBlack's (1937) reciprocal model of nega-tion can be shown to be isomorphic to thiscomplement operation.

The union of two fuzzy sets, A and B,is denoted A + B, and is defined as:

A+B= f [/A(») A /*(*),*], (10)J x

where

/A(«) A /*(*) = max[/A(*), /*(*)]. (11)

The union of two sets corresponds to thelogical or operation. In fact, in the situationwhere the grades of membership take onvalues of only 0 and 1, Equation 10 reducesto the Boolean or operation of set theory.

262 HARRY M. HERSH AND ALFONSO CARAMAZZA

TABLE 1LABELS OF FUZZY SETS AND THEIR

PREDICTED OPERATIONS

Label Operation

SMALLLARGENOT SMALLNOT LARGEVERY SMALLVERY LARGENOT VERY SMALLNOT VERY LARGEVERY VERY SMALLVERY VERY LARGENOT VERY VERY SMALLNOT VERY VERY LARGEEITHER LARGE OR SMALL

f smellJ large1 —f small1 ~ /large/small1

/ large1 J email1 / largef ,,4 — fJ email J veryf , * — fJ large J very1 — /smal^1 / large

smalllarge"

nlclX[_/ large, J small J

The intersection of two fuzzy sets isdenoted A (~} B and is denned as:

AC\B = f UA(X) A /*(*),*], (12)Jx

where

/A(«) A JB(x) = min[/x(*), /*(*)]. (13)

The intersection corresponds to the logicalconnective and, which also reduces to theBoolean operator for grades of membershipof 0 and 1.

The above descriptions serve to demon-strate how various linguistic operators canbe denned in terms of fuzzy sets. Zadeh(1972) and Lakoff (1973) have attemptedto demonstrate that various linguisticoperators (e.g., very, rather) can likewisebe incorporated into the system of fuzzylogic by being considered as additionaloperators upon linguistic variables. Forexample, the adverb very appears to act asan intensifier. Zadeh reasoned that givena fuzzy set labeled A, very A should be ofthe form:

very A = f C/x'(*),*]. (14)Jx

This formulation was later generalized to:

fvery A = / \_fA

a(x), x],J x

where a > 1.0. (15)

operations that were concatenations of theprimitive operations. For example, giventhe definitions of not (Equation 9) andvery (Equation 14), the operation of notvery A implies:

not very A = -fA*(x),xl. (16)

As part of the axiomatic system of fuzzyset theory, Zadeh (1968) generalized tofuzzy set theory the relation between theprobability (P) of an event A, and theexpected value of its membership function,that is:

P(A) = E ( f A ) . (17)

Equation 17 has certain direct psychologicalimplications. For example, the applicationof a term to an element might imply agrade of membership between .0 and 1.0.However, the overt judgment of the rela-tion of the term and element might requirea binary (yes-no) reply. Using Equation 17,the distribution of binary responses canbe related to the grade of membership ofthe element in the fuzzy set labeled by theterm. A technique is thus available forempirically verifying the predictions offuzzy set theory in terms of the implica-tions for human information processing.

Psychological Implications

Zadeh (1973) proposed that in dealingsystems we applywith humanistic the

5 6 7 6

SQUARE SIZE

Both Lakoff and Zadeh further analyzed FIGURE 2. Membership function for not small.

FUZZY SET APPROACH IN NATURAL LANGUAGE 263

principle of incompatibility: "As the com-plexity of a system increases, our abilityto make precise and yet significant state-ments about its behavior diminishes untila threshold is reached beyond which preci-sion and significance (or relevance) becomealmost mutually exclusive characteristics"(p. 28). Given the complexity of the systemthat psychologists study, one need notsearch far in order to appreciate the ap-plicability of the above principle to thestudy of behavior.

This principle can actually be appliedfrom two separate viewpoints. The first isin the description and explanation of be-havior and in data analytic techniques.Some work has already been started insuch areas as role theory (Thomason &Marines, 1972), pattern classification (Bell-man, Kalaba, & Zadeh, 1966), and cluster-ing techniques (Ruspini, 1970; Bezdek,Note 3). The second view is in describingand explaining how people interact withthe world around them. While both lines ofresearch could profit greatly from utilizingthese new quantitative techniques providedby fuzzy set theory, we have chosen thelatter line of inquiry.

It was shown in the section on vaguenessthat people have the ability to comprehendand manipulate vague concepts. If thisbehavior could be described in terms offuzzy logic, this new quantitative tech-nique might motivate the development ofnew methods for the modeling of language

4 5 6 7 8 9 10 II 12

SOUARE SIZE

0.0-

X I"' VERY SMALL

• f NOT VERY SMALL

I 2 3 4 5 6 7 8 9 10

SOUARE SIZE

FIGURE 3. Membership function for not large.

FIGURE 4. Membership function for not very small.

comprehension, reasoning, natural problemsolving, and other complex behavioral proc-esses. To be more specific, the theoreticalposition that runs through this line ofresearch is that people comprehend vagueconcepts (i.e., all natural language con-cepts) as if the concepts are represented asfuzzy sets. Moreover, people manipulatevague concepts as if they are processingaccording to the rules of fuzzy logic. Severalpreliminary studies appear to support thistheoretical position.

EXPERIMENT 1

Since Zadeh (1972) and Lakoff (1973)both make explicit claims as to the mannerin which fuzzy sets should be transformed,it was felt that a logical first step would beto obtain some baseline fuzzy sets andempirically evaluate the transformationsthat occur when these sets are operatedupon by negation and the addition ofhedges.

Method

Subjects, Nineteen undergraduates at The JohnsHopkins University served as paid subjects.

Stimuli. Twelve slides, each containing a blacksquare on a white background, were used as thephysical stimuli. When projected on the screen, thesquares measured 4, 6, 8, 10, 12, 16, 20, 24, 28, 32,40, and 48 in. (10.2 cm to 121.9 cm) on one side.

Labels of the fuzzy sets consisted of the adjec-tives large and small paired with various combina-tions of not and the intensifier very. In addition, inorder to test the concept of fuzzy union, the phrase

264 HARRY M. HERSH AND ALFONSO CARAMAZZA

2 3 4 5 6 7 8 9 10 II 12SQUARE SIZE

FIGURE 5. Membership function for not very large.

either large or small was also used. The 13 labels areshown in Table 1, along with the predicted trans-formations from Zadeh (1973).

Procedure. Subjects were run in one group. Eachreceived an answer sheet containing the 13 phrasesin one of 10 random orders. Below each phrase were12 spaces. Subjects were instructed to look at thefirst phrase. They were told that they would beshown 12 squares in a random order. They were tosimply look at each square and decide whether thephrase applied to it. If it was appropriate, theywere to enter yes in the appropriate space; if itwas not appropriate, they were to enter no. Thisprocedure was repeated for each phrase, with adifferent random order of squares in each block.Slides were exposed for 1 sec. Subjects were givenadditionally S sec in which to respond. Beforethe start of the experiment the 12 squares wereshown in ascending order to insure that everyonewas operating within approximately the samecontext.

After every block of 12 slides, subjects were givena 30-sec break. After all 13 blocks, they were givenan additional 5-min. rest, and the process then wasrepeated with different randomizations. Thus everysubject responded twice to every combination ofsquare and phrase. The entire session lasted SO min.

Results

Since Zadeh (1968) had equated theprobability of a fuzzy event with the ex-pected grade of membership of the event,the proportion of yes responses for aparticular square and phrase was inter-preted as the grade of membership for thatsquare in the fuzzy set labeled by thephrase. (The data were scaled by themethod of successive intervals [Diederich,Messick, & Tucker, 1957] to obtain a

psychological scale for the square size.Since the use of this scale in no way changedany of the results", the data are reportedsimply in terms of the ordinal square size.)' Figures 2 and 3 show the membershipfunctions for not small and not large,respectively. In each graph the negativephrase is plotted with the complement ofthe corresponding affirmative phrase. Inthese figures, as in all remaining figures,the ordinate corresponds to the grade ofmembership (truth value), while the ab-scissa corresponds to the ordinal squaresize. Figures 4 and 5 relate not very small tothe complement of very small and not verylarge to the complement of very large,respectively. Figures 6 and 7 show the cor-responding relations for very very small andvery very large, respectively. The fact thatthe graphs indicate a reasonably good fit,and the fact that the average root meansquare error over all positive-negativepairs was less than .07 supports the fuzzyset notion of negation as being the comple-ment of the positive set.2

2 Note that these results can likewise be predictedby a Thurstonian model where the concept is as-sumed to be constant and the evaluation of thesquare size is the variable component. Fuzzy settheory assumes that the response variability ismainly due to the vagueness of the verbal concept:The perception of square size is relatively constant.Differences at this point are more a matter oftheoretical perspective than of quantitative pre-diction.

4 5 6 7

SOUARE SIZE

FIGURE 6. Membership function fornot very very small.

FUZZY SET APPROACH IN NATURAL LANGUAGE 265

Figure 8 is a graph of not large and small.Although the two functions are mono-tonically decreasing, they do not representthe same concept. Not large appears toextend the concept small to include themidrange of the continuum.

The effect of the intensifier wry on theconcepts small and large are plotted inFigures 9 and 10, respectively. Note thatthe relation between large and very large,very large and very very large, and so oncorresponds to the definition of entailmentfor fuzzy sets. Within each triplet (e.g.,small, very small, very very small) the slopesof the functions appear approximatelyequal. (This was confirmed by the succes-sive intervals analysis.) The equality of theslopes implies that the influence of verydid not appear to reduce the vagueness ofthe concept. Now if Zadeh's (1972, 1973)hypothesis concerning the functioning ofvery as a power function is accurate, theslope of the function should increase as theintensifies are concatenated. This does notseem to be the case.

Various classes of operators were triedin order to find a reasonable mappingfunction from a concept to the conceptmodified by very. Although a reasonable fitwas obtained with a power function, it wasfelt that a more straightforward explana-tion was appropriate. In examining theplots of the membership functions it ap-peared that the addition of the intensifier

1.0-

.9-

.8-

£ .5-o2 .3

-X X X X ft

X '"'VERY VERY LARGE

• fNOT VERY VERY LARGE

I 3 3 4 5 6 7 8 9 10SQUARE SIZE

FIGURE 7. Membership function fornot very very large.

SMALLNOT LARGE

5 6 7SQUARE SIZE

FIGURE 8. Membership functions fornot large and small.

very served to simply translate the func-tion along the abscissa.3 Support for thisfinding comes from the work of Cliff (1959),who found that adverbs tended to functionas multiplicative constants. (Recall thatCliff was working under the assumptionthat the meaning of an adjective could becharacterized as a point on a scale: Theadverb functioned to shift the point alongthe scale.) A least squares method was usedto obtain the best shift for very and veryvery. However, for ease of exposition, it issufficient to describe the translation forvery as two ordinal scale units and very veryas an additional ordinal scale unit towardthe extreme of the scale.4 The results ofthis shifting operation are shown in Figures11 and 12.

Figure 13 is a plot of the phrase eitherlarge or small and the operation of fuzzyunion on the concepts large and small.While there is a definite similarity in theshape of the two functions, the discrep-ancies are more than minimal. The plot ofthe phrase is depressed in relation to the

3 Root mean square errors overall were .074, .214,and .061 for the best-fitting power, exponential,and translation functions, respectively,

4 From the successive intervals scaling of thepsychological square size, the ratio of the besttranslation to the mean intersquare distance wasactually 1.81 and 1.15 for very and very very, re-spectively.

266 HARRY M. HERSH AND ALFONSO CARAMAZZA

o.a

SMALL

VERY SMALL

VERY VERY SMALL

5 6 7

SQUARE SIZE

4 5 6 7

SQUARE SIZE

FIGURE 9. The effect of very on the concept small. FIGURE 11. Very as a translation operation. ~

fuzzy set theoretic prediction. Some of thediscrepancy is probably due to the difficultythat some subjects reported in evaluatingthis concept: the result being that the datacontains possible inverted responses.

Overall, several conclusions are clear.Concepts defined by extension can beconsidered as fuzzy sets, and people tendto interpret these concepts and theirvariations in terms of operations on fuzzysets. Negation can be characterized by thecomplement of the positive set. The effectof the intensifier very can also be inter-preted as an operation on a fuzzy set, al-though its empirical form deviates from thepredicted function. Operators like not andvery can be concatenated with themselves

1—• 'LARGE1—• 'VERY LARGE

5 6 7 8

SQUARE SIZE

FIGURE 10. The effect of wry on the concept large.

and each other, with predictable results.It would have been nicer if Zadeh's

(1972, 1973) hypothesis of wry operatingas a power function of the grade ofmembership was applicable. Although sucha finding would have supported thegenerality of the operation of very, theform of the operation is an empiricalquestion. Very defined as a translationoperation is obviously class dependent:It can certainly be generalized to otherrelative adjectives (e.g., good, short, hot).How such an intensifier would operate onthe class of absolute adjectives (e.g., he isvery British, this is very red) remains tobe determined. Intuitively, it appears thatthe meaning of very in very large is qualita-tively different from very in very British.The former implies an extreme of a con-tinuum ; the latter implies a greateremphasis on characteristic features (Lak-off, 1973).

The most important point of Experiment1 is that no longer is the variability aboutthe mean of a distribution considered tobe strictly the result of noise in the systemor the uncontrollable influence of ex-traneous variables. The variability has aspecific interpretation in terms of the gradeof membership of the element in the fuzzyset denoted by the concept label.

EXPERIMENT 2Experiment 1 demonstrated that the

composite responses of a group of subjects

FUZZY SET APPROACH IN NATURAL LANGUAGE 267

'VERY VERY LARGE'V)

—•• 'VERY VERY

FIGURE 12. Very very as a translation operation.

could be considered as reflecting operationson fuzzy sets. The argument would bemore convincing if it could be shown thatindividual subjects display similar be-havior. Experiment 2 therefore can be con-sidered a replication of Experiment 1with repetitions across days for a singlesubject rather than across subjects for asingle presentation.

MethodSubjects. Four students at The Johns Hopkins

University, who had not been in Experiment 1,served as paid subjects.

Procedure. The experimental procedure wassimilar to Experiment 1. Exposure time for the slidesremained at 1 sec; however, the intertrial intervalswere under subject control. In order to obtain moreinformation from each trial, confidence ratings werealso used. Subjects were instructed to respond yes orno according to whether the square viewed wasappropriate for the phrase. After this response wasrecorded, subjects rated how confident they werein their decision on a S-point scale, where 1 wasinterpreted as purely guessing and 5 as absolutelycertain.

In a typical session, which lasted less than 1 hr.,a subject judged all combinations of the 12 squaresand 13 phrases twice. Each subject participated infive such sessions over 5 days.

Results

The binary decisions and the confidenceratings were integrated into a single scaleover the closed interval .0 to 1.0. For ex-ample, a no judgment with a confidencerating of five was assigned .0, while a yesresponse with a similar confidence rating

1.0

.9

.8

0.

OLtu .6

0.0

MAX [fLAR6E'fSMALLJ

'EITHER LARGE OR SMALL

I 8 3 4 5 6 7 8 9 10SQUARE SIZE

FIGURE 13. Membership function foreither large or small.

was assigned l.O.6 Although there is noclear theoretical justification for this map-ping, the resulting scale seemed to reflectthe important conceptual ideas implicit infuzzy set theory. The extremes of definitemembership and nonmembership cor-respond to judgments that were rated ashighly confident. Likewise, judgments rated

6 More formally, if d is the binary applicabilitydecision (1 = yes, — 1 = no) and r is the value onthe confidence scale (1 < r < 5), then the gradeof membership is defined as:

Grade = 5. + d( —

'SMALL (GROUP)•-—• 'LARGE (GROUP)X * 'LARGE(SI)

x 'SMALL(SI)

5 6 7 8 9

SQUARE SIZE

FIGURE 14. Membership functions from Subject 1and the group data.

268 HARRY M. HERSH AND ALFONSO CARAMAZZA

- 'LARGE

-« '-'SMALL

'NOT SMALL

•* 'NOT LARGE

3 4 5 6 7

SQUARE SIZE

'x—x i -VERY VERY LARGEX---X I-'VERY VERY SMALL

'NOT VERY VERY SMALL

—• 'NOT VERY VERY LARGE

2 3 4 5 6 7 8 9

SQUARE SIZE

FIGURE 15. Membership functions from Subject 1 FIGURE 17. Membership functions from Subject 1for not large and not small. for not very very large and not very very small.

as purely guessing corresponded to thecrossover points in a fuzzy set.6

The results of three of the four subjectsresembled, with minor differences, theresults of the group experiment (Experi-ment 1). Moreover, since there were noimportant differences among Subjects 1-3,only Subject 1 will be discussed. Subject 4was quite different and will be discussedseparately.

The results of Subject 1 are shown inFigures 14-20. Figure 14 compares themembership functions for large and smallwith similar functions obtained from thegroup experiment (Experiment 1). Al-

'NOT VERY LARGE

»•-• 'NOT VERY SMALL

x—x i-'VERY SMALLX--X I-'VERY LARGE

5 6 7 8

SQUARE SIZE

though the functions for large appear verysimilar, the concept small is more exten-sive for Subject 1 than for the group.Information of this type might be useful instudies on individual differences in com-prehension, but is irrelevant for the theorybeing considered. The hypothesis beingtested is that the operations on fuzzy setsshould remain constant, independent of thebasic membership functions.

Figures 15, 16, and 17 compare thenegative concept to the complement of thecorresponding positive concept. As withExperiment 1, the functions are quitesimilar, implying that the effect of negatinga concept is to replace the grades ofmembership with their complements.

Plots of very as a shifting operation areshown in Figures 18 and 19. As with thegroup data, the goodness of fit indicatesthat the translation operation is a tenabledescription of the effect of modifying aconcept by very. Finally, Figure 20 showsthat the membership function for eitherlarge or small shows a slight depression inrelation to the fuzzy union of large andsmall, as was found in Experiment 1.

FIGURE 16. Membership functions from Subject 1for not very large and not very small.

6 Inherent in this transformation is the assump-tion that equal confidence intervals reflect equaldifferential grades of membership. The comparisonof the concept large for the group and for theindividual subject, here Subject 1 (see Figure 14),tends to support this assumption.

FUZZY SET APPROACH IN NATURAL LANGUAGE 269

X X 'VERY SMALL \

*—X'VERY LARGE

'LARGE(y-2J

»-—»'SMALL(yt2)

5 6 7 8 9 10SQUARE SIZE

FIGURE 18. Very as a translation operation.(Data are from Subject 1.)

Of the 3 individual subjects and the 19group subjects tested, all responded in asimilar manner. Their judgments all re-flected fuzzy logical operations where theconcept appeared to dichotomize the con-tinuum (in a fuzzy manner) into a regionof inclusion and a region of exclusion. Al-though these findings do reflect the as-sumptions of fuzzy set theory—from theperspective of linguistic processing—someof the results are rather anomalous. Ac-cording to Bolinger (1972), the use oflitotes such as not very does not imply thecomplement of the continuum as Figures 4and 5 show, but only some range near themiddle of the continuum: For example,to say that someone is not very tall istypically interpreted as meaning that heis rather short or sort of short. Subject 4made judgments drastically different fromthe other subjects and seemed to be re-sponding according to this more "idio-matic" interpretation of the phrases.

Figures 21 and 22 show the variouspositive membership functions as per-ceived by Subject 4. Whereas the othersubjects perceived the operation of veryon the concept small as defining successivelysmaller fuzzy subsets, Subject 4 interpretedthe phrases as (fuzzy) overlapping cate-gories. A square that is judged to have amaximum grade of membership in the setvery very small is considered to be a marginal(/ = .56) member of the set small. From

< VERY VERY SMALLX--*'VERY VERY LARGE» • 'VERY LARGE (y-i)•—• 'VERY SMALL (yti)

FIGURE 19. Very very as a translation operation.(Data are from Subject 1.)

the shape of the membership functions inFigures 21 and 22 it can be argued thatthe concept of strict entailment was notfunctioning for this subject. The fact thata square is judged to be very very largedoes not entail that it is also large.

The membership functions for not smalland not large are presented in Figures 23and 24, respectively, along with large andsmall. It appears that the two negativeterms are rather general concepts repre-senting the middle region of the contextualrange. Functions for not very small and notvery large are plotted in Figures 25 and 26,respectively. The functions seem to indicatethat for this one subject the term notoperates upon the intensifier wry rather

•—• MAX ['LARGE,' SMALL]

*—* 'EITHER LARGE OR SMALL

5 6 7 8 9 10 II 12SQUARE SIZE

FIGURE 20. Membership function from Subject 1 foreither large or small.

270 HARRY M. HERSH AND ALFONSO CARAMAZZA

'SMALL*—* 'VERY SMALL

'VERY VERY SMALL

3 4 5 6 7 8 9SQUARE SIZE

i.o

.9

.80.

HiXlaS .5

.2

.1

0.0

i 'SMALL*• -X 'NOT SMALL•——• 'LARSE

FIGURE 21. Membership functions containing FIGURE 23. Membership function from Subject 4small for Subject 4. for not small.

than on the entire concept. More explicitly,the original empirical formulation for notwry small is:

Jnot wry small v*V == •*• J very small (.%) (.*•")

= 1 ~ /„*»(* + <*), (19)

where J is the amount of translation on theabscissa. Over at least a portion of thedomain, a tentative hypothesis for the be-havior of Subject 4 might be:

Jnot wry imall \%) "~~ Jvery small (.% ^(l)

' = fma(* - d}, (20)

that is, the function of not operates on the

°—° 'LARGE>••"* 'VERY LARGE•—• 'VERY VERY LARGE

.7

.6

I .5

*.4

.3

.2

. 1

0.0I 2 4 5 6 7 8 9 10

SQUARE SIZE

FIGURE 22. Membership functions containing largefor Subject 4.

function wry by translating the functionabout the base membership function. Thisformulation is extremely speculative, andthe data to support the operation comesfrom only one subject out of 23. Neverthe-less, it must be recognized that the em-pirical functions produced by this onesubject, far from being random, appear toobey lawful relationships.

EXPERIMENT 3

The question remained as to why 22subjects would respond in a (fuzzy)logically predictable manner, and only onesubject respond in a manner that somewhatreflected a linguistic interpretation. Per-haps the demands of Experiment 2 weresuch as to force Subject 4 into respondingin the more logical manner. To test thishypothesis Experiment 3 was performed;the context was varied in the hope that amore linguistic interpretation would result.

Method

Subjects. Fourteen undergraduates at The JohnsHopkins University, not involved in the previousexperiments, participated as paid subjects.

Stimuli. The 12 squares used in Experiments 1and 2 served as the physical stimuli. The terms werecomposed of the adjectives large and small, whichappeared unmodified and in combination with not,very, and not very. In addition, the hedge sort ofwas included in order to evaluate the relation be^tween not very and this relaxive adverb.

FUZZY SET APPROACH IN NATURAL LANGUAGE 271

5 6 7 8

SQUARE SIZE

• • 'SMALLB—° 'VERYSMALLX X f NOT VERY SMALL

•x—x

4 S 6 7 6SQUARE SIZE

FIGURE 24. Membership function from Subject 4 FIGURE 25. Membership function from Subject 4for not large. for not very small.

The terms were integrated into sentences of theform, "This is sort of small," which were then re-corded in a random order onto one track of a stereomagnetic tape. The hope was that the auditorypresentation of the sentences might be reflected in amore natural language interpretation. Two identicaltapes were made. On Tape 1 the phrases not verylarge and not very small received strong emphasison the word not. On Tape 2 the word very wasemphasized. The remaining sentences were dubbedfrom the same master tape and thus were identical.On the second track of each stereotape synchroniza-tion information to automatically advance the slideprojector was recorded.

Procedure. Subjects were randomly assigned toone of two treatment groups corresponding to whichaudiotape they would hear. The procedure foreach group was identical. On each trial a slidewould be projected. At the same instant that theslide was exposed, a sentence would be heard. Thesubject was to simply look at the slide and decidewhether the sentence he was hearing was appro-priate with reference to the square on the slide.Subjects circled the appropriate response (yes orno) on an answer sheet and indicated on a ratingscale from 1 to 3 their degree of confidence in theirdecision. After 10 stimulus pairs there was a 15-sec pause, and after every 60 trials a 2-min. restperiod. Each subject saw all combinations of the 12squares and 10 sentences two times in a completelyrandomized order. Before they began, subjectswere shown the entire range of slides in ascendingorder and the range of sentences, so they would allbe operating in approximately the same context.The entire session lasted less than 45 min.

Results

The findings of Experiment 3 demon-strate how robust the various concepts areunder differing experimental conditions.

Figures 27 and 28 show, respectively, thecorrespondence between small and largefrom Experiments 1 and 3. It is apparentthat neither the change in modality(visual/auditory), emphasis, context(phrases/sentences), nor design (blocked/completely randomized) had any influenceon the results. The functions for the otherconcepts correspond in a similar manner.The only other result of interest is themembership functions for sort of small andsort of large. These functions are plottedin Figures 29 and 30, respectively, andcompare favorably with the plots for notvery small and not very large for Subject 4.

1.0

.9

.8

| 'f

to .€

'LARGE

'VERY LARGE* 'NOT VERY LARGE

FIGURE 26. Membership function from Subject 4for not very large.

272 HARRY M. HERSH AND ALFONSO CARAMAZZA

i.o

.9

.8

1-7g

sI .5

*->.2

.1

0.0

X X 'SMALL(VISUAL)

• • 'SMALL(AUOITORY)

2 3 4 5 6 7 8SQUARE SIZE

9 10 II 12

FIGURE 27. Membership functions for small fromExperiment 1 (visual stimuli) and from Experiment3 (auditory stimuli).

EXPERIMENT 4

In Experiments 1-3, having to makejudgments on more than one phrase mayhave led subjects to treat concepts andmodifiers in the observed "logically" pre-dictable manner. This may result fromsubjects redefining the phrases in relationto each other. That is, the subjects mayadopt a response strategy for specificphrases that is influenced by the otherphrases used in the experiment. So, forexample, the type of response to not verysmall may be a function of how the subjectresponded to very small. To overcome thispossible contextual effect, an experimentwas performed where each subject had tojudge only a single phrase. It was felt thatin this case, subjects might give a linguisticas opposed to logical interpretation to thephrases.

Method

Subjects. Forty-five undergraduates at The JohnsHopkins University, who had not participated inthe earlier experiments, served as unpaid subjects.There were 10, 11, 12 and 13 subjects in the large,very large, not very large, and sort of large conditions,respectively.

Stimuli. Twelve black squares, proportional insize to the squares used in Experiments 1-3, weremounted on a white background. The mountedsquares were fastened in ascending order (left toright) to a blackboard in the front of a classroom.Below each square was a number corresponding to

the ordinal ranking. Unlike the preceding experi-ments, only four phrases were used: large, verylarge, not very large, and sort of large.

Procedure. Subjects were given a sheet of papercontaining a single phrase and were instructed tolook at the 12 squares and write down the identify-ing number or numbers of squares that mightcorrectly be characterized by the phrase writtenon their sheet. Experiment 4 lasted approximately2 min.

Results and Discussion

Figure 31 shows the membership func-tions for the four phrases. Two results areimportant. First, there is a small but pre-dictable decrease in the grade of member-ship of the largest square for the conceptlarge. This result is consistent with alinguistic interpretation of large, wherelarge is not strictly entailed by very large.Second, the phrase not very large is clearlynot the complement of very large. Ratheras predicted by the linguistic interpretation,it seems to be functioning as sort of small(compare Figures 30 and 31). What isimportant here is simply that not very largeis not the complement of very large. Thesetwo results together suggest that subjectswill tend to give linguistic interpretations tothe phrases if experimentally inducedcontext effects are eliminated. The questionarises whether these results are inconsistentwith the formalism of fuzzy set theory. Wethink not. At worst they will complicatethe particular characterization of various

x-—K'LARGE (VISUAL)

•——• 'LAR6E(AUDITORY)

0.0

FIGURE 28. Membership functions for large fromExperiment 1 (visual stimuli) and from Experiment,3 (auditory stimuli).

FUZZY SET APPROACH IN NATURAL LANGUAGE 273

operators but not invalidate the principleclaim that intensifiers and hedges can bedescribed as operators on fuzzy sets. In fact,recall that we did give a formal, thoughtentative, formulation of the operationsunderlying the use of adverbs for the sub-ject (Subject 4) who used a linguisticinterpretation in judging the application ofphrases to square size. The actual form thatthe operators and concepts take is anempirical issue to be determined by experi-mentation. Once we have determined theforms they take we can then proceed tosee whether we can give them an inter-pretation within the framework of fuzzyset theory. This is essentially the procedurefollowed in the "odd" case of Subject 4in Experiment 2.

GENERAL DISCUSSION

The most important conclusion to bedrawn from our results on the effects ofoperators not and very on the conceptssmall and large is that natural languageconcepts can be described more completelyand manipulated more precisely using theframework of fuzzy set theory. A further,not uninteresting, finding was that con-cepts and operators can be interpreted intwo different ways: what we have calledthe linguistic and logical interpretations.This latter finding is independent of anyconsiderations relating to the treatment ofconcepts as fuzzy sets. It, however, does

\x—-x'SORT OF SMALL•—• 'SMALL

x—x 'SORT OF LARGE /'LARGE /

o.o

FIGURE 29. Membership function for sort of small.

4 5 6 7 8

SQUARE SIZE

FIGURE 30. Membership function for sort of large.

have important consequences for generalpsycholinguistic theory. Thus, it raises acautionary note to be heeded by those whowould treat certain lexical items as ifthey were strict logical operators. Theresults we have reported show that incertain natural language settings entail-ment does not strictly hold (e.g., large isnot strictly entailed by very large) and thatcertain combinations of operators willassume an idiomatic sense very differentfrom that predicted by a simple linearcombination of the lexical items con-stituting the phrase (e.g., not very large).

As pointed out in the introduction, theimplicit claim underlying our work is thatnatural language concepts are intrinsicallyvague, a view shared by an increasinglylarge number of psychologists, linguists,and philosophers. The treatment of naturallanguage concepts as vague has importantimplications for semantic theory. Themajor implication is that the meaning of aterm could be specified as a fuzzy set ofmeaning components. That is, a specificmeaning component need not be neces-sarily either a member or a nonmember ofthe set of features that define the meaningof a term. All that is required in this viewis that a meaning component have a non-zero degree of membership in the set. Thisapproach, then, avoids a problem that hasbeen the attention of a number of recentlypublished reports where increasing em-

274 HARRY M. HERSH AND ALFONSO CARAMAZZA

.9

.8

0.

0 .6ZUJS .5

S . 4

1 -3

.2

.1

0.0

•—•'VERY LARGEX X'NOT VERY LARGEX—-X'SORT OF LARSE

5 6 7 8 9

SQUARE SIZE

FIGURE 31. Membership functions resulting fromthe presentation of a single phrase to each subject(Experiment 4).

phasis has been placed on the problem ofhow to exactly determine what aspects ofthe meaning of a term are necessary andsufficient to characterize that term (Lehrer,1970; Slote, 1966). The thrust of a numberof empirical reports is that the traditionalcomponent approach to meaning (e.g.,Katz & Fodor, 1963) incorrectly assumesthat a specific component can be said,without any uncertainty, to be necessaryfor the definition of a term (Caramazza,Grober, & Zurif, in press; Garvey, Cara-mazza, & Yates, 1975; Lehrer, 1970;Rosch, 1973; Smith, Shoben, & Rips, 1974).More generally, the traditional componen-tial approach is criticized for treating allcomponents as contributing equally to thedefinition of a term. The alternative pro-posal emerging from these critical reportsis that various components of meaning aredifferentially important to the definition ofa term and, in addition, that no subset ofthese components can conclusively be saidto be necessary and sufficient to define aterm: That is, the semantic structure of aterm itself is ill-defined, or vague. Thoughthis latter approach may seem to be un-necessarily complicated, it allows for morenatural explanations of important classesof semantic phenomena. Thus, for ex-ample, Labov (1973) has shown thatattempts to give well-defined character-izations in terms of traditional com-

ponential analysis of the semantic struc-ture of a common concept: such as cup areinadequate. Labov obtained data on theconsistency of naming various cuplikeobjects that support a theory in whichvarious features are differentially weightedas necessary to define the term. Theresults Labov reported have been replicatedin two independent studies with childrenas subjects (Anderson, 1975; DeVos &Caramazza, Note 4). In the study byDeVos and Caramazza (Note 4), it wasshown that perceptual features of objectsand functionally determined contexts inter-act in fairly specific ways to determine thelabel a child will assign to an object. Theimplication of this latter study is that fromvery early in life a child forms conceptsthat are vague (fuzzy), and contextual cuesoperate on perceptual features to determinethe relative weighting the features areassigned in choosing a label (name) forthe object. It is unclear how a traditionalfeature theory of meaning would handlesuch findings without recourse to complexad hoc principles.

The importance of vagueness (fuzziness)as a basic concept in semantic theory canbe seen by considering yet another semanticphenomenon: polysemy. Lexicographershave long recognized the fluidity of poly-semy: No clear boundary distinguishespolysemy from homonymy. However, tradi-tional feature theory requires that adefinite demarcation hold between poly-semous and homonymous senses—es-sentially marked by the presence of aspecific feature. In fact, the definingproperty of polysemous words is that the

' senses that comprise a word share a com-mon meaning core while at the same timediffer sufficiently to be recognized asdistinct from each other. In a recent in-vestigation of polysemous words, it wasshown, however, that the grade of member-ship of various senses in a general wordconcept vary considerably (Caramazza etal., in press). This finding is difficult tointerpret within traditional feature theories,since they do not normally allow for fea-tures to be only partially applicable. Thissame finding can be easily interpreted

FUZZY SET APPROACH IN NATURAL LANGUAGE 275

within a theory that allows "vague" coremeanings, where specific senses differ indegrees of membership to that core. Inci-dentally, it should be pointed out thatKatz and Bever (Note 5) have criticizedtheories that postulate graded membershipof instances in categories for confusingperformance variability with inherent vari-ability in concepts. They also make theblanket assertion that theories of this sortare based on empiricist principles. Whilenot necessarily disagreeing with theircharacterization of Rosch's (1973) work,in particular, as having an empiricist flair,we find their wholesale labeling as "em-piricist" theories that propose vague con-cepts to be rather naive. That is, there isno a priori reason to suggest that a rational-ist theory necessarily excludes vagueconcepts.

To conclude, we are proposing thatnatural language concepts be considered asinherently vague and, specifically, as fuzzysets of meaning components. We are alsoproposing that language operators—nega-tive markers, adverbs, and adjectives—be considered as operators on fuzzy sets.Though the research we have reported inthis article deals with a very narrow issue,modifying concepts, it does relate nonethe-less in a direct way to semantic theory.Specifically, we have tried to give theo-retical and empirical justification for theuse of a formal system that can handlevagueness. The formal treatment of vague-ness is an important and necessary (in ourview) step toward a more comprehensivehandling of natural language phenomenaand the communication of vague in-formation.

REFERENCE NOTES

1. Rowan, T. C. Some developments in multidimen-sional scaling applied to semantic relationships.Unpublished doctoral dissertation, University ofIllinois, 1954.

2. LeFaivre, R. A. Fuzzy problem-solving (Tech.Rep. 37). Madison: University of Wisconsin,Academic Computing Center, 1974.

3. Bezdek, J. C. Mathematical models for systematicsand taxonomy. Unpublished manuscript, StateUniversity of New York College at Oneonta, 1974.

4. DeVos, L., & Caramazza, A. The development ofnatural language concepts: The interaction of form

and function of objects. Unpublished manuscript,The Johns Hopkins University, 1975.

5. Katz, J. J., & Bever, T. The fall and rise ofempiricism, or the truth about generative gram-mars. Bloomington: Indiana University Linguis-tics Club Paper, 1974.

REFERENCES

Anderson, E. S. Cups and glasses: Learning thatboundaries are vague. Journal of Child Language,1975, 2, 79-103.

Bellman, R. E., Kalaba, R., & Zadeh, L. A. Abstrac-tion and pattern classification. Journal of Mathe-matical Analysis and Applications, 1966, 13, 1-7.

Black, M. Vagueness. Philosophy of Science, 1937,4, 427-455.

Black, M. Reasoning with loose concepts. Dialogue,1963, 2, 1-12.

Bolinger, D. Degree words. The Hague, Netherlands:Mouton, 1972.

Caramazza, A., Grober, E. H., & Zurif, E. B. Apsycholinguistic investigation of polysemy: Themeanings of LINE. In W. Dingwall & H. Whitaker(Eds.), Research on speech and language. NewYork: Plenum, in press.

Cliff, N. Adverbs as multipliers. PsychologicalReview, 1959, 66, 27-44.

Copilowish, I. M. Boarder-line cases, vagueness,and ambiguity. Philosophy of Science, 1939, 6,181-195.

Diederich, G. W., Messick, S. J., & Tucker, L. R.A general least squares solution for successiveintervals. Psychometrika, 1957, 2, 159-173.

Garvey, C., Caramazza, A., & Yates, J. Factorsaffecting the assignment of pronoun antecedents.Cognition, 1975, 3, 227-243.

Goguen, J. A. L-Fuzzy sets. Journal of Mathe-matical Analysis and Applications, 1967, 18,145-174.

Goguen, J. A. The logic of inexact concepts. Synthese,1969, 19, 325-373.

Hempel, C. G. Vagueness and logic. Philosophy ofScience, 1939, 6, 163-180.

Jones, L. V., & Thurstone, L. L. The psychophysicsof semantics: An experimental investigation.Journal of Applied Psychology, 1955, 39, 31-36.

Katz, J. J., & Fodor, J. A. The structure of semantictheory. Language, 1963, 39, 270-310.

Korner, S. Reference, vagueness, and necessity.Philosophical Review, 1957, 66, 363-376.

Labov, W. The boundaries of words and theirmeanings. In C.-J. N. Bailey & R. W. Shuy(Eds.), New ways of analyzing variation in English.Washington, D.C.: Georgetown University Press,1973.

Lakoff, G. Hedges: A study in meaning criteria andthe logic of fuzzy concepts. Journal of Philo-sophical Logic, 1973, 2, 458-508.

Lehrer, A. Indeterminancy in semantic description.Glossa, 1970, 4, 87-109.

Mosier, C. I. A psychometric study of meaning.Journal of Social Psychology, 1941, 13, 123-140.

Osgood, C. E. The nature and measurement ofmeaning. Psychological Bulletin, 1952, 49,197-237.

276 HARRY M. HERSH AND ALFONSO CARAMAZZA

Osgood, C. E., Suci, G. J., & Tannenbaum, P. H.The measurement of meaning. Urbana: Universityof Illinois Press, 1957.

Peirce, C. S. Vagueness. In J. M. Baldwin (Ed.),Dictionary of philosophy and psychology (Vol. 2).New York: Macmillan, 1902.

Rosch, E. R. On the internal structure of perceptualand semantic categories. In T. M. Moore (Ed.),Cognitive development and acquisition of language.New York: Academic Press, 1973.

Ruspini, E. H. Numerical methods for fuzzy cluster-ing. Information Sciences, 1970, 2, 319-350.

Russell, B. Vagueness. Australasian Journal ofPsychology and Philosophy, 1923, 1, 84-92.

Santos, E. Fuzzy algorithms. Information andControl, 1970, 17, 326-339.

Schmidt, C. The relevance to semantic theory of astudy of vagueness. In Papers from the TenthRegional Meeting. Chicago, 111.: Chicago LinguisticSociety, 1974.

Slote, M. A. The theory of important criteria. TheJournal of Philosophy, 1966, 63, 211-224.

Smith, E. E., Shoben, E. J., & Rips, L. J. Structureand process in semantic memory: A feature modelfor semantic decisions. Psychological Review,1974, 81, 214-241.

Thomason, M. G., & Marines, P. N. Fuzzy logicrelations and their utility in role theory. InProceedings from the 1972 IEEE InternationalConference on Cybernetics and Society. Washing-ton, D.C.: Institute of Electrical and ElectronicEngineers, 1972,

Zadeh, L. A. Fuzzy sets. Information and Control,1965, 8, 338-353.

Zadeh, L. A. Probability measures of fuzzy events.Journal of Mathematical Analysis and Applica-tions, 1968, 23, 421-427.

Zadeh, L. A. A fuzzy-set-theoretic interpretation ofhedges. Journal of Cybernetics, 1972, 2, 4-34.

Zadeh, L. A. Outline of a new approach to theanalysis of complex systems and decision processes.IEEE Transactions on Systems, Man, and Cyber-netics, 1973, SMC-3, 28-44.

(Received December 8, 1975)