2
Knowledge, Concepts, and Categories. Koen Lamberts and paradox motivating their analysis is that similarity is used to explain concepts, and concepts are used to explain similarity David Shanks, eds. Cambridge, MA: The MIT Press; 1997: 464 pp. Price: $30.00. ( ISBN 0-262-62118-5.) ( p. 44 ) . Into this vicious circle falls information science because ‘‘similarity’’ is probably the ideal operationalization for the fuzzy concept of relevance. Information science is shifting from a systems focus to a What does it mean that two bibliographic documents are users focus. There are calls in the research literature for a new similar? Perhaps the answer is that they have many properties dynamic, user-centered approach to relevance, and new confer- in common. But this does not get us too far since any two ences on the contexts of information seeking. Qualitative meth- bibliographic documents have an infinitude of common attri- ods suddenly have gained prominence. In this era of shifting butes (e.g.: Neither were published in Bulgaria in 1938, neither paradigms and methodological uncertainty, it is delightful to were found in the Australian Outback, neither have inlayed gold find a book that summarizes the latest empirical research on leaf, neither are on fire at this moment, etc.). We conclude that mental representation in contemporary cognitive psychology. I any two bibliographic documents are similar. But by the same found myself a student again with ideas for research projects token, any two documents have an infinitude of attributes that popping off each page. This collection contains state-of-the-art are different (e.g.: The third word in this one is ‘‘dog,’’ the reviews by leading cognitive researchers of mental representa- third word in the other one is ‘‘cat,’’ this one has 67 sentences, tion, concepts, and categories. Translated into the coinage of that one has 68 sentences, this one is online and that one is information science, these are the fundamental concerns of in- paper, etc.) We conclude that any two bibliographic documents dexing, online searching, search strategies, relevance research, are dissimilar. Obviously, there is some work to be done here etc., summarized and presented with the supporting empirical discovering how and why two pieces of text are perceived to research. In effect, this is the hard science of a user-centered be similar. information science. Hahn and Chater present avenues out of this circle by re- The volume falls into three parts, the first of which has the viewing the strategies of ‘‘respects’’ of similarity, surface and most immediate rapport with information science. Here one deep similarity. No doubt, we information scientists will trace finds Evan Heit’s ‘‘Knowledge and concept learning,’’ Ulrike these same peregrinations when we finally comes to grips with Hahn and Nick Chater’s ‘‘Concepts and similarity,’’ Gregory relevance. Among their comments are methods of modeling L. Murphy and Mary E. Lassaline’s ‘‘Hierarchical structure in similarity including the spatial model where an n -dimensional concepts and the basic level of categorization,’’ James Hamp- space is constructed out of the similarity relations among ob- ton’s ‘‘Conceptual combination,’’ and Linda B. Smith and La- jects. It is akin to building a city map from the distance among rissa K. Samuelson’s ‘‘Perceiving and remembering : Category buildings. Building such a document universe has been the goal stability, variability and development.’’ The second third of the of information scientists where one could navigate among bib- book covers more technical aspects of cognitive neuroscience, liographic records. Hahn and Chater present devastating argu- brain structures, and process models of categorization, while ments of the impossibility of using similarity as the basis for the last third of the book presents some specific models of this n -dimensional domain. Say goodbye to the ‘‘docuverse’’ categorization and function learning. that would permit searchers to navigate in a three-dimensional The articles in the first third of the book make it essential world selecting relevance articles. reading for information scientists who want an overview of the Murphy and Lassaline focus on hierarchical structures. Infor- fundamental psychological research on the human activities of mation science is replete with generic trees of descriptors and naming categories, placing objects in categories, creating hierar- subject classifications. An anchoring idea is the basic level of chies, creating multiword descriptors, and so on. These are all categorization. The information science analog would be topical fundamental information science activities. For example, Heit’s relevance, i.e., the term that the indexer chose. They review the contribution points out that what a person learns about a new qualities of the basic level of categorization illustrating that it category is greatly influenced by what a person knows about is more than just cue validity. They discuss a number of prob- other, related categories. This resonates with the theorizing of lems of hierarchical structures, among them the problem of William Cooper (1971), largely forgotten today, that to search transitivity. For example, ‘‘car seat’’ is widely viewed as a online is to deduce. Online searching is a process that combines chair, and ‘‘chair’’ is widely viewed as a type of furniture, but the searcher’s background and deductive mechanisms. Heit few regard car seat as a piece of furniture. How prevalence is comments: ‘‘Perhaps the most fundamental generalization is this problem in the generic trees and subject classifications that that in learning about new categories, people act as if these we use in information science? categories will be consistent with previous knowledge. People The theme of Smith and Samuelson’s contribution is that no stem to act with economy, so that previous knowledge structures two ideas are ever exactly the same, that what is known in any are reused when possible’’ (p. 10). From this insight flows the real moment depends on the context. Here is the empirical integration of prior knowledge with new observations. This is foundation for the analysis of online searching moves. They manifested in information science research in both understand- examine category variability — concepts like ‘‘water’’ and ing the moves of online searching, and the priming effects of ‘‘mother’’ that resist a single cohesive and coherent concept. reading through a set of retrieved records. This is the empirical foundation for the analysis of the act of Heit considers many issues including the exemplar model indexing, a practical art that has lacked a theoretical develop- of categorization, which suggests that categorization is based ment (Farrow, 1991). on similarity to category exemplars. Are ERIC major descriptors There are many more connections to be made. This book would exemplars of a category? Are the records retrieved by ERIC make an excellent textbook in a course on the cognitive aspects major descriptors exemplars of their category? These are unin- of information use. I have added it to my personal collection. vestigated research questions. Heit also considers categorization based on feature interpretation effects. An information science analog would be the work of Park (1994), who interviewed Terrence A. Brooks searchers about the most compelling part of bibliographic re- cords. This book acts as a research source book for these basic Graduate School of Library and Information Science University of Washington features of information science. Hahn and Chater tackle the subject of concepts and similar- Seattle, WA 98195 E-mail: [email protected] ity. Their contribution is worth studying carefully. The central JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE—May 15, 1998 671 / 8N48$$BKRV 03-10-98 11:18:43 jasa W: JASIS

Knowledge, concepts, and categories

Embed Size (px)

Citation preview

Page 1: Knowledge, concepts, and categories

Knowledge, Concepts, and Categories. Koen Lamberts and paradox motivating their analysis is that similarity is used toexplain concepts, and concepts are used to explain similarityDavid Shanks, eds. Cambridge, MA: The MIT Press; 1997: 464

pp. Price: $30.00. (ISBN 0-262-62118-5.) (p. 44). Into this vicious circle falls information science because‘‘similarity’’ is probably the ideal operationalization for thefuzzy concept of relevance.Information science is shifting from a systems focus to a

What does it mean that two bibliographic documents areusers focus. There are calls in the research literature for a newsimilar? Perhaps the answer is that they have many propertiesdynamic, user-centered approach to relevance, and new confer-in common. But this does not get us too far since any twoences on the contexts of information seeking. Qualitative meth-bibliographic documents have an infinitude of common attri-ods suddenly have gained prominence. In this era of shiftingbutes (e.g.: Neither were published in Bulgaria in 1938, neitherparadigms and methodological uncertainty, it is delightful towere found in the Australian Outback, neither have inlayed goldfind a book that summarizes the latest empirical research onleaf, neither are on fire at this moment, etc.) . We conclude thatmental representation in contemporary cognitive psychology. Iany two bibliographic documents are similar. But by the samefound myself a student again with ideas for research projectstoken, any two documents have an infinitude of attributes thatpopping off each page. This collection contains state-of-the-artare different (e.g.: The third word in this one is ‘‘dog,’’ thereviews by leading cognitive researchers of mental representa-third word in the other one is ‘‘cat,’’ this one has 67 sentences,tion, concepts, and categories. Translated into the coinage ofthat one has 68 sentences, this one is online and that one isinformation science, these are the fundamental concerns of in-paper, etc.) We conclude that any two bibliographic documentsdexing, online searching, search strategies, relevance research,are dissimilar. Obviously, there is some work to be done hereetc., summarized and presented with the supporting empiricaldiscovering how and why two pieces of text are perceived toresearch. In effect, this is the hard science of a user-centeredbe similar.information science.

Hahn and Chater present avenues out of this circle by re-The volume falls into three parts, the first of which has theviewing the strategies of ‘‘respects’’ of similarity, surface andmost immediate rapport with information science. Here onedeep similarity. No doubt, we information scientists will tracefinds Evan Heit’s ‘‘Knowledge and concept learning,’’ Ulrikethese same peregrinations when we finally comes to grips withHahn and Nick Chater’s ‘‘Concepts and similarity,’’ Gregoryrelevance. Among their comments are methods of modelingL. Murphy and Mary E. Lassaline’s ‘‘Hierarchical structure insimilarity including the spatial model where an n-dimensionalconcepts and the basic level of categorization,’’ James Hamp-space is constructed out of the similarity relations among ob-ton’s ‘‘Conceptual combination,’’ and Linda B. Smith and La-jects. It is akin to building a city map from the distance amongrissa K. Samuelson’s ‘‘Perceiving and remembering: Categorybuildings. Building such a document universe has been the goalstability, variability and development.’’ The second third of theof information scientists where one could navigate among bib-book covers more technical aspects of cognitive neuroscience,liographic records. Hahn and Chater present devastating argu-brain structures, and process models of categorization, whilements of the impossibility of using similarity as the basis forthe last third of the book presents some specific models ofthis n-dimensional domain. Say goodbye to the ‘‘docuverse’’categorization and function learning.that would permit searchers to navigate in a three-dimensionalThe articles in the first third of the book make it essentialworld selecting relevance articles.reading for information scientists who want an overview of the

Murphy and Lassaline focus on hierarchical structures. Infor-fundamental psychological research on the human activities ofmation science is replete with generic trees of descriptors andnaming categories, placing objects in categories, creating hierar-subject classifications. An anchoring idea is the basic level ofchies, creating multiword descriptors, and so on. These are allcategorization. The information science analog would be topicalfundamental information science activities. For example, Heit’srelevance, i.e., the term that the indexer chose. They review thecontribution points out that what a person learns about a newqualities of the basic level of categorization illustrating that itcategory is greatly influenced by what a person knows aboutis more than just cue validity. They discuss a number of prob-other, related categories. This resonates with the theorizing oflems of hierarchical structures, among them the problem ofWilliam Cooper (1971), largely forgotten today, that to searchtransitivity. For example, ‘‘car seat’’ is widely viewed as aonline is to deduce. Online searching is a process that combineschair, and ‘‘chair’’ is widely viewed as a type of furniture, butthe searcher’s background and deductive mechanisms. Heitfew regard car seat as a piece of furniture. How prevalence iscomments: ‘‘Perhaps the most fundamental generalization isthis problem in the generic trees and subject classifications thatthat in learning about new categories, people act as if thesewe use in information science?categories will be consistent with previous knowledge. People

The theme of Smith and Samuelson’s contribution is that nostem to act with economy, so that previous knowledge structurestwo ideas are ever exactly the same, that what is known in anyare reused when possible’’ (p. 10). From this insight flows thereal moment depends on the context. Here is the empiricalintegration of prior knowledge with new observations. This isfoundation for the analysis of online searching moves. Theymanifested in information science research in both understand-examine category variability—concepts like ‘‘water’’ anding the moves of online searching, and the priming effects of‘‘mother’’ that resist a single cohesive and coherent concept.reading through a set of retrieved records.This is the empirical foundation for the analysis of the act ofHeit considers many issues including the exemplar modelindexing, a practical art that has lacked a theoretical develop-of categorization, which suggests that categorization is basedment (Farrow, 1991).on similarity to category exemplars. Are ERIC major descriptors

There are many more connections to be made. This book wouldexemplars of a category? Are the records retrieved by ERICmake an excellent textbook in a course on the cognitive aspectsmajor descriptors exemplars of their category? These are unin-of information use. I have added it to my personal collection.vestigated research questions. Heit also considers categorization

based on feature interpretation effects. An information scienceanalog would be the work of Park (1994), who interviewed

Terrence A. Brookssearchers about the most compelling part of bibliographic re-cords. This book acts as a research source book for these basic Graduate School of Library and Information Science

University of Washingtonfeatures of information science.Hahn and Chater tackle the subject of concepts and similar- Seattle, WA 98195

E-mail: [email protected]. Their contribution is worth studying carefully. The central

JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE—May 15, 1998 671

/ 8N48$$BKRV 03-10-98 11:18:43 jasa W: JASIS

Page 2: Knowledge, concepts, and categories

a summary which highlights important points. Maze, Moxley,Referencesand Smith, who come from different Web-searching back-grounds, each tested every engine extensively, with a varietyCooper, W. S. (1971). A definition of relevance for information re-of query types and hypothetical user perspectives. Clearly atrieval. Information Storage & Retrieval, 7, 19–37.great deal of effort and experimentation went into discoveringFarrow, J. (1991). A cognitive process model of document indexing.what lies underneath the surface. Each case includes details as toJournal of Documentation, 47, 149–166.document representation, default query processing, stopwords,Park, T. K. (1994). Toward a theory of user-based relevance: A callstemming, query operators, case sensitivity, the effect (or lackfor a new paradigm of inquiry. Journal of the American Society forof it) of term position in the document and the query, and otherInformation Science, 45, 135–141.factors which affect retrieval. Revelations about unexpected,inconsistent, or illogical results are particularly fascinating.

The fourth and final section contains a chapter on the ‘‘eco-nomics’’ of search tools, exploring issues to do with how searchengines generate revenue now, and how this might change in

Authoritative Guide to Web Search Engines. Susan Maze, the future. There is a very interesting, almost philosophical,David Moxley, and Donna J. Smith. New York, NY: Neal- discussion of how a search engine maintains its ‘‘identity,’’Schuman Publishers; 1997:178 pp. Price: $49.95. (ISBN 1- even though its outward appearance might change radically5570-305-4.) from one day to the next. The penultimate chapter reminds us

of the usefulness of Web discovery tools which were not thesubject of the Guide, specifically Yahoo (in some depth) , Ma-The Authoritative Guide is indeed just that, and, as far as Igellan, the Argus Clearinghouse, and the WWW Virtual Li-know, is the first of its kind to appear in print. Maze, Moxley,brary. The last chapter is one simple centered paragraph ex-and Smith have done an excellent job of explaining how searchhorting the reader to be a critical consumer, and to ‘‘realizeengines work, and how and why they do not. This book isthat there is not, nor ever will be, a perfect search-and-retrievalclearly intended for the practicing information professional, or,tool—on the Web or in any other environment’’ (p. 149).as the authors put it, ‘‘is primary written as a working searcher’sThree appendices follow: A table of search engine features forguide on how best to evaluate and exploit any search engine’’use in comparison and evaluation, a brief reiteration of the best(pp. x–xi) . This is a refreshing change from the myriad ofand worst characteristics of the seven explored in Section II,Web sites which purport to do the same, but which often merelyand a 7-page glossary. The index is reasonable in size and depth.compare a number of features across a number of search en-

The Authoritative Guide is not a scholarly work, nor doesgines, with no systematic attempt to explain what this meansit present search engine research (of which there is some). Infor the serious user.fact, it is largely bereft of any citations other than URLs for theThe Guide starts (Section I) with a little bit of obligatoryvarious engines, and a few references to support the introductorybackground on the development of the Web and the alarmingmaterial. In the context of its intention of being a practicalproportions of its content, but even this is properly directed toguide, this is fine, although the reader seeking to learn morean audience which is familiar with traditional indexing andwould have been well served by a supplementary list of re-online searching tools. The next section begins with two chap-sources, including such admirable (and persistent) sites asters devoted to explaining the inner workings of resource dis-Search Engine Watch (http: / /searchenginewatch.com/) , orcovery agents and search engine database construction. This isTraugott Koch’s Literature About Search Services (http: / /introduced with a brief history of robot development, and thewww.ub2.lu.se//desire/radar/lit-about-search-services.html).reader is then taken through the hypothetical creation of a dis-This aside, the Guide is an admirable work, well written (andcovery agent (called ‘‘Steve’’) and an indexing process. Byprinted in very large print, perhaps in recognition of the eye-seeing how Steve and his database function, we are led to ex-strain induced by Web surfing), and amply illustrated. In writ-plore discovery depth and breadth, the effect of Web site direc-ing about search engines, the phrase ‘‘moving target’’ comestory structure on agents, robot netiquette, term extraction deci-so easily to mind. Maze, Moxley, and Smith are to be givensions, stopwords, query processing, and ranking algorithms.credit for producing a work which will remain timely and useful,Section II closes with a chapter on the search engine interface,despite changes in individuals of the species which they havequery construction, results display, and help text.scrutinized. This book is recommended for any informationSection III, which occupies the bulk of the Guide, takes theprofessional who uses search engines.concepts examined in Section II and applies them to consider-

ation of seven popular search engines (WebCrawler, Lycos,Infoseek, Open Text, AltaVista, Excite, and HotBot) . These in-

Candy Schwartzdepth studies are prefaced by a list of nine basic and verygeneral searching tips (for example, ‘‘have a good vocabulary,’’ Graduate School of Library & Information Science

Simmons Collegeand ‘‘use Boolean searches’’) , each one discussed in somedetail. The chapters on specific engines follow a standard pat- 300 The Fenway

Boston, MA 02115-5898tern: A history of the engine, how documents are added to thedatabase, the interface, searching options, results, ranking, and E-mail: [email protected]

672 JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE—May 15, 1998

/ 8N48$$BKRV 03-10-98 11:18:43 jasa W: JASIS