of 25 /25
Knowl. Org. 30(2003)No.2 B. Hjørland: Fundamentals of Knowledge Organization 87 Fundamentals of Knowledge Organization 1 Birger Hjørland Royal School of Library and Information Science, Copenhagen Email: [email protected] Birger Hjørland is Professor in information science (IS) at the Royal School of Library and Informa- tion Science in Copenhagen, Denmark. He was formerly professor in IS at the College University of Borås, Sweden. He has a Ph.D. in information science from the University of Goethenburg, Sweden and a Master of Arts in psychology from the University of Copenhagen, Denmark. For twelve years, Professor Hjørland worked as a research librarian at the Royal Library in Copenhagen, where he was a subject specialist in psychology and the coordinator of the library's computer based reference serv- ices. He has taught the history and philosophy of psychology at the University of Copenhagen and also taught information retrieval in the Institute of Applied and Mathematical Linguistics in Copen- hagen. In spite of Professor Hjørland's experience in the field of psychology, his work is much influ- enced and oriented by philosophical and sociological perspectives. He is a member of ASIS&T and ISKO and is the author of Information Seeking and Subject Representation: An Activity-Theoretical Approach to Information Science. Hjørland, Birger. (2003). Fundamentals of Knowledge Organization. Knowledge Organization, 30(2). 87-111. 74 refs. ABSTRACT: This article is organized in 10 sections: (1) Knowledge Organization (KO) is a wide interdisciplinary field, much broader than Library and Information Science (LIS). (2) Inside LIS there have been many different approaches and traditions of KO with little mutual influence. These traditions have to a large extent been defined by new technology, for which reason the theoretical integration and underpinning has not been well considered. The most important technology-driven traditions are: a) Manual indexing and classification in libraries and reference works, b) Documentation and scientific communication, c) Informa- tion storage and retrieval by computers, d) Citation based KO and e) Full text, hypertext and Internet based approaches. These traditions taken together define very much the special LIS focus on KO. For KO as a field of research it is important to establish a fruitful theoretical frame of reference for this overall field. This paper provides some suggestions. (3) One important theoretical distinction to consider is the one between social and intellectual forms of KO. Social forms of KO are related to professional training, disciplines and social groups while intellectual organization is related to concepts and theories in the fields to be organ- ized. (4) The social perspective includes in addition the systems of genres and documents as well as the social system of knowl- edge producers, knowledge intermediaries and knowledge users. (5) This social system of documents, genres and agents makes available a very complicated structure of potential subject access points (SAPs), which may be used in information retrieval (IR). The basic aim of research in KO is to develop knowledge on how to optimise this system of SAPs and its utilization in IR. (6) SAPs may be seen as signs, and their production and use may be understood from a social semiotic point of view. (7) The con- cept of paradigms is also helpful because different groups and interests tend to be organized according to a paradigm and to de- velop different criteria of relevance, and thus different criteria of likeliness in KO. (8) The basic unit in KO is the semantic rela- tion between two concepts, and such relations are embedded in theories. (9) In classification like things are grouped together, but what is considered similar is not a trivial question. (10) The paper concludes with the considering of methods for KO. Basically the methods of any field are connected with epistemological theories. This is also the case with KO. The existing methods as de- scribed in the literature of KO fit into a classification of basic epistemological views. The debate about the methods of KO at the deepest level therefore implies an epistemological discussion. 1. What is Knowledge Organization (KO)? In the Library and Information Science community (LIS) Knowledge Organization (KO) means especially the organization of information in bibliographical rec- ords, including citation indexes, full text records and the Internet. Information Science (IS) is basically about the best way to construct such bibliographical

Fundamentals of Knowledge Organizationppggoc.eci.ufmg.br/downloads/bibliografia/Hjorland2003.pdf · Fundamentals of Knowledge Organization. Knowledge Organization, 30(2). 87-111

  • Author

  • View

  • Download

Embed Size (px)

Text of Fundamentals of Knowledge Organizationppggoc.eci.ufmg.br/downloads/bibliografia/Hjorland2003.pdf ·...

  • Knowl. Org. 30(2003)No.2B. Hjørland: Fundamentals of Knowledge Organization


    Fundamentals of Knowledge Organization1

    Birger Hjørland

    Royal School of Library and Information Science, Copenhagen Email: [email protected]

    Birger Hjørland is Professor in information science (IS) at the Royal School of Library and Informa-tion Science in Copenhagen, Denmark. He was formerly professor in IS at the College University ofBorås, Sweden. He has a Ph.D. in information science from the University of Goethenburg, Swedenand a Master of Arts in psychology from the University of Copenhagen, Denmark. For twelve years,Professor Hjørland worked as a research librarian at the Royal Library in Copenhagen, where he wasa subject specialist in psychology and the coordinator of the library's computer based reference serv-ices. He has taught the history and philosophy of psychology at the University of Copenhagen andalso taught information retrieval in the Institute of Applied and Mathematical Linguistics in Copen-hagen. In spite of Professor Hjørland's experience in the field of psychology, his work is much influ-enced and oriented by philosophical and sociological perspectives. He is a member of ASIS&T andISKO and is the author of Information Seeking and Subject Representation: An Activity-TheoreticalApproach to Information Science.

    Hjørland, Birger. (2003). Fundamentals of Knowledge Organization. Knowledge Organization, 30(2). 87-111. 74 refs.

    ABSTRACT: This article is organized in 10 sections: (1) Knowledge Organization (KO) is a wide interdisciplinary field, muchbroader than Library and Information Science (LIS). (2) Inside LIS there have been many different approaches and traditions ofKO with little mutual influence. These traditions have to a large extent been defined by new technology, for which reason thetheoretical integration and underpinning has not been well considered. The most important technology-driven traditions are: a)Manual indexing and classification in libraries and reference works, b) Documentation and scientific communication, c) Informa-tion storage and retrieval by computers, d) Citation based KO and e) Full text, hypertext and Internet based approaches. Thesetraditions taken together define very much the special LIS focus on KO. For KO as a field of research it is important to establisha fruitful theoretical frame of reference for this overall field. This paper provides some suggestions. (3) One important theoreticaldistinction to consider is the one between social and intellectual forms of KO. Social forms of KO are related to professionaltraining, disciplines and social groups while intellectual organization is related to concepts and theories in the fields to be organ-ized. (4) The social perspective includes in addition the systems of genres and documents as well as the social system of knowl-edge producers, knowledge intermediaries and knowledge users. (5) This social system of documents, genres and agents makesavailable a very complicated structure of potential subject access points (SAPs), which may be used in information retrieval (IR).The basic aim of research in KO is to develop knowledge on how to optimise this system of SAPs and its utilization in IR. (6)SAPs may be seen as signs, and their production and use may be understood from a social semiotic point of view. (7) The con-cept of paradigms is also helpful because different groups and interests tend to be organized according to a paradigm and to de-velop different criteria of relevance, and thus different criteria of likeliness in KO. (8) The basic unit in KO is the semantic rela-tion between two concepts, and such relations are embedded in theories. (9) In classification like things are grouped together, butwhat is considered similar is not a trivial question. (10) The paper concludes with the considering of methods for KO. Basicallythe methods of any field are connected with epistemological theories. This is also the case with KO. The existing methods as de-scribed in the literature of KO fit into a classification of basic epistemological views. The debate about the methods of KO at thedeepest level therefore implies an epistemological discussion.

    1. What is Knowledge Organization (KO)?

    In the Library and Information Science community (LIS)Knowledge Organization (KO) means especially the

    organization of information in bibliographical rec-ords, including citation indexes, full text records andthe Internet. Information Science (IS) is basicallyabout the best way to construct such bibliographical

  • Knowl. Org. 30(2003)No.2B. Hjørland: Fundamentals of Knowledge Organization


    records (which is done in KO) as well as the optimalway to utilize given records (which is done in infor-mation retrieval IR).

    KO is, however, a much broader concept. Knowl-edge is organized in, among other things:

    – The social division of labour (e.g., in disciplines)– Social institutions (e.g., in universities)– Languages and symbolic systems– Conceptual systems and theories– Literatures and genres

    Library and Information Science or just InformationScience, (L)IS, has often ignored this broader meaningof KO and has thus failed to be based on suchbroader theories; or any theories at all. My centralview is that such broader views cannot be ignored.Any attempt to develop fruitful principles for KO inLIS must be based on broader theories of KO.

    The principal actors in IS are the knowledge pro-ducers (e.g., authors), the users and the intermediar-ies. It is their interaction with bibliographical recordsthat is the focus of (L)IS. Each of those actors bringswith him certain pre-understandings, views, conceptsand languages mainly acquired during socialization insociety. The success of the interaction depends onthese pre-understandings, concepts and languages.This applies not only to the “match” of concepts butespecially to their ability to support users’ tasks. Atheory of such broader forms of KO is therefore es-sential in order to construct efficient systems for KOin LIS.

    LIS has claimed status as a field of study fromabout 1876 when Melvin Dewey first published hisclassification system and when schools of librarian-ship began to emerge. What are the accumulated re-sults of more than one century of research in KO?Among the results are a number of standards andguidelines, some theoretical developments, for exam-ple, Cutter’s rules (1904), advances in facet analysis andof course important changes in information technol-ogy (IT). It is, however, difficult to sketch the moretheoretical and scientific progress in this field becauseit seems largely atheoretical and fragmented and be-cause many different lines of thought seem to co-exist. In my opinion KO lacks theories about its mostfundamental concepts such as:

    – Concepts– Criteria for class inclusion– Meaning– Indexing

    – Semantic relations– Subjects– Subject access points (“SAPs”), and so forth.

    I also find that KO lacks serious explorations into themethods of KO and the methodological basis of KO.Developments in practice seem much more influ-enced by progress in IT than by progress in KOproper. Sometimes people seem even to expect thatIT in itself can replace conceptual and theoreticalconcerns. In my opinion we may have the best over-view of progress in KO if we consider five technol-ogy-driven stages in the development of KO. If this istrue, it is an expression of a crisis in the field: that thefield is not driven by developments in its own re-search. It is lacking accumulated knowledge, whichcan be transformed from one IT-platform to another.

    2. Technology Driven Phases in the Developmentof Knowledge Organization (KO)

    In library and information science (LIS) the conceptof KO is connected to the development of classifica-tion and indexing systems in libraries, bibliographies,and electronic databases. Hjørland, (2000; Hjørland &Kyllesbech Nielsen, 2001) described five technologydriven stages in the development of KO. Each stage isnot replaced by subsequent stages, but continues toinfluence the field. Because of this fact the technologybeyond those stages may also be seen as one amongother causes for the development of more or less sup-plementary or competing approaches in KO.

    The stages are:

    a. Manual indexing and classification in libraries and referenceworks

    Cataloguing in libraries and organizational principlesin reference works go far back in time. One can tracethe beginnings of the use of alphabetical order to an-tiquity, although it was not generally adopted untilwell into the Middle Ages, and it was not the onlymethod of classification possible (Daly, 1967). How-ever, in the Middle Ages and at the beginning ofmodern time catalogues were probably seen as inven-tories designed for controlling the collection and forinformation needed for new acquisitions, rather thanas tools for subject searches.

    A professionalisation of the classification and in-dexing of books in libraries appeared around 1876with the publication of the Dewey Decimal Classification,

  • Knowl. Org. 30(2003)No.2B. Hjørland: Fundamentals of Knowledge Organization


    and with the foundations of library schools (and thus“library science”), and so forth. Charles A. Cutter(1837-1903), Melvin Dewey (1851-1931), Henry E.Bliss (1870-1955) and S. R. Ranganathan (1892-1972),among others, have been recognized as founding fa-thers.

    It is generally recognized that this stage producedprinciples for KO that are still valid and important.Among them are “Cutter’s rules” and Ranganathan’sapproach to the classification of subject fields. Otherprinciples have of course become obsolete, and someare controversial. Frohmann (1994), for example,considered Dewey’s approach to be very harmful be-cause it ignored the semiotic nature of classificationand just proposed an empty formalism: “Dewey'ssubjects were elements of a semiological system ofstandardized, technobureaucratic administrativesoftware for the library in its corporate, rather thanhigh culture, incarnation” (pp. 112-113). It must beadded, however, that there is very little overall dis-cussion of the importance of these (and other) princi-ples. They have been considered standards or canonsrather than research-based principles subject to inve-stigation, modification and revision.

    b. Documentation and Scientific Communication

    The documentation movement developed from about 1895.The founders were Paul Otlet (1868-1944) and HenriLafontaine (1854-1943), who established The Interna-tional Institute of Bibliography2 and in 1905-1907 publishedthe first edition of the Universal Decimal Classification,UDC, which was an extension of the Dewey DecimalClassification designed for deeper subject analysis.The American Documentation Institute was founded in 1937and by 1968, had changed its name to the American So-ciety for Information Science (ASIS). It is difficult to de-scribe the exact differences between librarianship anddocumentation. Documentalists were less interestedin libraries and their collections; they were more in-terested in bibliographical control, in scientific com-munication and scientific documentation, in informa-tion services to industry and in the utilization of theknowledge in the documents. Implicitly they shared aview of KO as based on subject knowledge and sub-ject literature. Documentalists often regarded them-selves as more service-minded, more technology-oriented and more advanced than librarians. Wheretraditional librarians often had an orientation to-wards the humanities, documentalists were mostly af-filiated with science, technology and business. Theyindexed single articles in journals and books and

    played a central role in the establishing of interna-tional abstracting journals3. They were less interestedin keeping books for their own sake or for broad cul-tural purposes, and were highly interested in estab-lishing services which could stimulate the applicationof knowledge to specific purposes. The foundation ofuser studies (Bernal, 1948) and bibliometrics (e.g.,Bradford, 1948) is also part of this stage/tradition,which may first and foremost be characterized by amore specific subject approach, a deeper level of in-dexing, more emphasis on modern technology and amore scientific attitude towards goals and problems.This stage laid the foundation for the later develop-ment of online bibliographic databases.

    c. Information Storage and Retrieval by Computers (mainly 1950-)

    Computer science4 has its own development, whichshould not be confused with the development of li-brary, documentation and information science. As atheoretical discipline computer science was foundedin the 1930s, well before the appearance of the mod-ern computer. The founding fathers were the logi-cians Alonso Church (1903-1995), Kurt Gödel (1906-1978), Stephen C. Kleene (1909-1994), Emil Post(1897-1954), and Alan Turing (1912-1954). This earlywork has had a profound influence on both the theo-retical and the applied development of computer sci-ence, but these persons are not directly connected tothe history of LIS or KO, in my view. Computer sci-ence has different educational programs, scientificjournals and conferences compared to LIS and it alsohas somewhat different goals compared to LIS. Arbib,Kfoury, and Moll (1981) write that “Computer science seeksto provide a scientific basis for the study of information processing, thesolution of problems by algorithms, and the design and programmingof computers.” In my opinion the goal of LIS is to opti-mize the utilization of documented knowledge. Theprimary purpose of libraries is to provide physical andintellectual access to information. The intellectual ac-cess is provided by the organization of either thephysical documents themselves or by organizingdocument representations in catalogues, bibliographiesand databases. LIS is not primarily focused on con-structing algorithms, but on informing people aboutdocuments. LIS is supposed to be more open to dif-ferent views, more reflective and meta-oriented anddemonstrate gaps and uncertainties in knowledge tousers. This is very different from making, for exam-ple, an expert system that performs optimally by re-flecting some generalized cognitive models.

  • Knowl. Org. 30(2003)No.2B. Hjørland: Fundamentals of Knowledge Organization


    With the advent of computers in the 1950s, LIS andKO became influenced by this new technology, andmany people felt that the future of LIS must be as apart of computer science. There is an indication that“information scientists” did not regard themselves aspart of library science or as dealing with documentsand their representations (cf., Hjørland, 2000a). Theyfelt much more closely related to computer science.This may have confused the theoretical developmentof KO, because theories related to “information” ver-sus theories related to “documents” are related to quitedifferent kinds of outlooks. As a theoretical concept,“information” tends to move LIS and KO towardstheories about control, feedback, coding and noise intransmitting messages, while “document” tends tomove LIS towards theories about meaning, language,knowledge, epistemology and sociology. Therefore,in LIS there may be a whole paradigmatic conflicthidden in those words.

    This third stage in the development of KO as afield has been influenced by experimental approachesin which recall and precision are well-known meas-ures, by extensive use of statistical models of theproperties of the document representations, by ap-proaches that try to automate KO using Artificial In-telligence (AI) and expert systems, by applying natu-ral language processing (NLP) techniques and thelike. An offshoot of this stage is also the cognitiveview that tries to model the users’ cognitive functionsin the computers.

    The establishing of computer based abstract serv-ices such as Chemical Abstracts and MEDLINE inthe 1960s is an important achievement at this stage.The development of descriptor based and free text re-trieval (mainly based on titles and abstracts), Booleanlogic, field specific subject access, measuring recalland precision and other innovations were extremelyimportant developments in document retrieval. In-formation retrieval as a research tradition startedwith the ASTIA and Cranfield experiments in the1950s, and today's TREC full text experiments maybe seen as a continuation of this tradition. This thirdstage improved information services and research ef-forts in IS in a most important way. In my opinionthese technological innovations and the goals they tryto solve define the core area of what is termed infor-mation science.

    Computer technology made it possible to usemany kinds of Subject Access Points (SAPs), both ofthe traditional kinds produced by information spe-cialists and of words from the documents themselves(e.g., titles and abstracts). This removed the librari-

    ans’/information specialists’ monopoly on subject ac-cess, and established a direct competition betweenSAPs produced by different agencies.

    An underlying philosophy at this stage has oftenbeen that the length of the searchable record in itselfwas the most important parameter in retrieval (Lan-caster, 1998, pp.6-8). SAPs were often seen as merely“semantic condensations” of the texts represented(implying that the ultimate goal was full text repre-sentation and nothing more). Research was domi-nated by quantitative methodologies while little re-search concerning qualitative differences (semanticsor meanings) between different kinds of SAPs was es-tablished at this stage. The underlying philosophy hasbeen empiricist in more than one sense of this word.It was first and foremost empiricist in its attempt tomeasure the efficiency of subject retrieval points em-pirically, for example, by measuring “recall” and“precision.” It was also empiricist in its avoidance of“metaphysically” based classifications, and in its fa-vour of “atomist” SAPs such as the Uniterm systemdevised by Mortimer Taube in 1951 and similar sys-tems depending on specific words from the documentthemselves.

    One associated tendency at this stage or approachwas the attempt to formalize and automate retrievaland to eliminate human interpretation and subjectanalysis. We must distinguish between the financialpressure to automate practical systems on the oneside and the scientific evaluation of the performanceof various aspects of human based and mechanized re-trieval systems on the other side. It is a legitimate andhighly desirable goal to reduce costs and improve ef-ficiency in information systems. Basic researchshould, however, illuminate basic strengths and draw-backs in different approaches, and not be blinded bythe pressure to use automated or cheap solutions. Be-cause of such tendencies important approaches relatedto interpretation and qualitative aspects have beenmuch neglected to this stage, and the research has notaccumulated as satisfactory a body of knowledge as isneeded.

    d. Citation based retrieval and KO (1963-)

    Eugene Garfield's introduction of the Science CitationIndex in 1963 marks the fourth important stage in thedevelopment of SAPs. The possibility of retrievingdocuments according to the citations they receiverepresents a real innovation in IR, and this techniqueis able to supplement all forms of term-based retrievalin very important and qualitative new ways. This in-

  • Knowl. Org. 30(2003)No.2B. Hjørland: Fundamentals of Knowledge Organization


    novation has also contributed to research in motivesfor citing other documents, in sociological patterns inciting, in the relative role of terms and references asSAPs and in the semantic relations between citingand cited papers. Overall, this stage has brought theconcept of networks of documents into the focus of KO.

    In this way, citation based retrieval has changedour understanding not only of subject relatedness, butalso of the concept of subject matter and of the fun-damental aim of IR itself. Since it may be relevant tocite papers, which have no words in common withthe citing papers (or no simple semantic relation suchas narrower terms, broader terms, synonyms, etc.),naïve conceptions of subject relatedness or subjectmatter can no longer persist. Semantic relations maybe implicit or latent. Semantic relations in science aredetermined by theoretical advances, which maychange the verbal description of the research phe-nomena completely. This is the reason why statisticalpatterns in vocabulary may sometimes turn out to bea less efficient measure of subject relatedness than pat-terns in citations.

    Citation behaviour is extremely important as atheoretical object because the goal of IR is to providethe references which are useful in solving a specificproblem. A scientific article is a documentation of thesolving of a specific research problem. The problem isformulated in the article, and this problem has de-termined what kind of information was needed bythe author in order to contribute to this problem.Based on the information need, information has beensought and selected, and the documents actually usedare finally cited in the article. Each of the thousandsof articles produced weekly is in a way a case study inIR. Not only does every article pose a definite IRproblem, but the list of references provided by theauthor is the key to how that particular person hassolved the problem. Thus it is possible to check theo-ries of IR against how they match the actual docu-ments cited! In the traditional (positivist) philosophyof science, studies should be able to “predict” futurebehaviour or events. In our case this principle wouldimply that theories and models of IR should be ableto predict what citations will turn up in the singlepapers. Most research on relevance and on IR seemsto have overlooked this fact. From what we doknow, it seems extremely unlikely that an algorithmshould be able to select references from electronic da-tabases and end up with just the set of references rep-resented in a given article. From this point of view,theories of IR seem naïve and unrealistic and the goalof prediction seems to be wrong. A more detailed

    study of citation behaviour can illuminate the realproblems of IR: That cited documents are not simplya set of documents sharing a fixed set of attributeswhich are not represented in the non-selected items.Documents which are similar from the point of viewof retrieval algorithms do not need to be co-cited,whereas documents which are not similar are oftenco-cited. Ordinary retrieval algorithms and citationpractices seem simply to reflect different theoriesabout subject-relatedness.

    Since authors may cite other papers in order toflatter or to impress, the prediction of which refer-ences a given author will select in the end for a givenpaper cannot be used as a valid criterion in IR. Crite-ria for IR should not be based on social-psychologicalmotives, but on epistemological principles for the ad-vancement of public knowledge! In this way, our in-sight from citation indexes has, in a very profoundway, changed not only the methods of IR but also theconcept of subject relatedness itself and the basic aimof retrieving information. We can no longer regardthe prediction of individual use as the ideal criterionfor IR, nor can we regard IR as a value free technique.Instead, we have to face the insight that the goals ofIR are deeply rooted in epistemological norms forwhat should be regarded as good science and good ci-tation behaviour.

    Pioneers in the integration of bibliometrical meth-ods with the more traditional methods of KO areMike M. Kessler (1965), Miranda Lee Pao and DennisB.Worthen (1989), Pao (1993) and Lorna K. Rees-Potter (1987, 1989,1991). I also see my own researchas pioneering in this respect both theoretically andempirically (e.g., Hjørland, 1992, 1993, 1998a+b,2002b). It has subsequently also been used by, amongothers, research students of Peter Ingwersen (e.g.,Schneider & Borlund, 2002).

    e. Full text, Hypertext and Internet (mainly 1990-)

    Full text retrieval marks the fifth and final step in thedevelopment of SAPs. Until this point space limitswere a major constraint in the development of subjectaccess systems because length of record in itself is animportant parameter in retrieval. At this stage, everysingle word or every possible combination of wordsin full text documents is a potential SAP, as is everythinkable kind of value-added information providedby authors, readers, or intermediaries. Given full-textrepresentations the first important theoretical prob-lem to arise is whether any kind of value-added infor-mation is necessary? Can such extra information as

  • Knowl. Org. 30(2003)No.2B. Hjørland: Fundamentals of Knowledge Organization


    provided by abstracting and indexing – at least inprinciple – increase recall and/or precision? If this isnot the case, then it seems as if we have reached theend of the line, that no further contributions from re-search or practice in IS are needed any longer.

    The answer to this question is closely linked totheoretical views on the concept of subject. In theview of Claus Poulsen (1994) a subject is somethingthat is expressed in the literature in a transparent andself-evident way. By defining subjects this way it isimpossible to even ask whether a given text alwaysrepresents the optimal representation of itself. By de-fining subjects as informative or epistemological po-tentialities Hjørland (1992, 1997) establishes the pos-sibility that documents may be implicit or evenwrong about their own subject matter and hence thatthere still exists a need for information professionals.To take an extreme example: A document about Jewswritten by a Nazi author should not only be indexedas being about Jews, as it claims to be. It is importantto make the Nazi view visible in the subject analysis(and, for example, index it as Nazi propaganda aboutJews). Subjects are not objectively “given” but are in-fluenced by broader views which are important forthe information seeker to know and should thereforeideally be part of the subject analysis. Whether or notthis is also a practical, economical and realistic solu-tion is another question that must be illuminated byevaluating specific subject access systems.

    The introduction of full text databases has empha-sized the theoretical problems concerning subjectmatter and meaning and has also made compositionstudies highly relevant to IS. In this phase, qualitativeaspects become more important than ever before. It isnot just the problem of making efficient algorithmsfor IR, but also the problem of identifying the under-lying values and goals that such algorithms are goingto serve (see Hjørland, 2003b).


    The above description of the five technology-drivendevelopmental stages in KO provides, in my opinion,a clear picture of what KO means with LIS. In view-ing those five stages together, we have a rather well-defined picture of the overall topic with which we aredealing. Behind those different traditions is a certaingeneral goal that can be expressed as the optimisationof KO in libraries, databases, reference works and onthe Internet. This goal is important to keep in mindbecause it is this goal that gives KO in LIS its impor-tance for practice.

    It is important to realize, however, that the devel-opment has been very much technology-driven. Thisis not a satisfactory circumstance for a discipline withthe ambition of being a science and producing generaltheories and principles. Although those five stages de-fine a topic and a common goal, they do not, how-ever, define a cumulated fund of findings, theoriesand principles. On the contrary, there are often latentconflicting views between those stages or traditions.Those traditions have only to a very limited extentbeen in the form of a dialogue. This is why overallconcepts for thinking about KO have been missing orat least strongly underdeveloped. In the next sectionswe shall examine some of those concepts, theoriesand principles that seem most important for this task.

    The goal of LIS is to write a history of its theoreti-cal development abstracted from the concrete technolo-gies in which its principles have been studied. This isdifficult because the general knowledge base is notwell established. Also, the tendency has been that li-brary and information science has passively used thetechnology without contributing much to its devel-opment. If LIS is to be able to contribute valuableknowledge, its focus must be abstracted from con-crete technologies.

    3. Information Processing in the Social Division ofLabour: The Intellectual and the Social Organi-zation of Knowledge.

    Hjørland (2002d) analysed the official definition ofinformation science given by the American Society for In-formation Science and Technology (ASIST), which states:

    Information science is concerned with the gen-eration, collection, organization, interpretation,storage, retrieval, dissemination, transformationand use of information, with particular empha-sis on the applications of modern technologiesin these areas (Borko, 1968; Griffith, 1980).

    This definition does not discuss how professionsother than information scientists are concerned with“the generation, collection, organization, interpreta-tion, storage, retrieval, dissemination, transformationand use of information”. In fact, we know that manyprofessions could claim to be equally concerned. Byneglecting this issue, the definition fails to define in-formation science since a definition of IS should spec-ify the special role of information professionals instudying or handling information.

  • Knowl. Org. 30(2003)No.2B. Hjørland: Fundamentals of Knowledge Organization


    Astronomers, for example, can be seen as expertswho identify, process and interpret information fromthe universe. Astronomers “read” both nature andbooks, but nature is mostly considered the key in-formation in the sciences. As products of their activi-ties they may publish their empirical and theoreticalfindings. The library, documentation and informa-tion profession, although interested in all kinds ofdocuments, has a core interest and expertise con-nected to communication of published documents.Information scientists are not experts in interpretinginformation from the stars, but are, for the most part,experts in information documented by, for example,astronomers (e.g., indexing and retrieving astronomi-cal documents). In this example information has beendefined in a broader sense than usually implied in in-formation science.

    Just as astronomers can be said to handle informa-tion professionally, all other professions can be saidto do so as well. Publishers, researchers, historians,lawyers, teachers and anybody else can be said to beprofessionals in handling information in some way oranother. The role of information specialists may berelatively clear when the target group is, for example,astronomers: Information specialists are experts onforms of publications, databases, reference tools, andso forth. In the case of, for example, historians andlawyers, borderlines are much less clear because thekey information that these professions seek, interpretand use is itself contained in publications and otherdocuments. The historian, not the librarian or infor-mation specialist, is the expert in seeking, organizing,interpreting and utilizing the documents (mostly un-published) needed in his or her professional work,but an information professional is more professionalregarding some specific problems such as databases,cataloguing, and so forth. A basic difference inknowledge about information sources is typicallythat the subject specialist starts from a narrow pointand works bottom-up to more general informationsources, while the information specialist starts from abroad overview and works top-down to more specificinformation sources. In this way their competenciesare supplementary.

    The special focus of LIS is on documented knowl-edge produced by human beings in some kind ofdocuments of potential use to other human beings.Light from the stars is not information for the LIS-community, but astronomical information as pro-duced and used by astronomers is. This distinctionmay seem subtle, but is important in order to con-struct a firm theoretical basis for KO.

    KO involves two kinds of organization: The intel-lectual organization of knowledge may also betermed the cognitive organization of knowledge. Thisis basically the organization of knowledge in con-cepts, conceptual systems and theories. If we use theperiodical system of chemistry or the zoological tax-onomy of biology as the basis of indexing systems,we are using intellectual or “cognitive” systems ofKO. The social organization of knowledge, on theother hand, is basically the organizations in profes-sions, trades and disciplines. If we refer to disciplinesin our knowledge representations, we are using “so-cial” systems of KO. It should be mentioned that thissocial organization principle has been the basic one intraditional library classification such as the DeweyDecimal Classification (DDC):

    A work on water may be classed with many disciplines,such as metaphysics, religion, economics, commerce, physics,chemistry, geology, oceanography, meteorology, and history.

    No other feature of the DDC is more basic than this:that it scatters subjects by discipline (Dewey, 1979, p.xxxi).

    There is very little serious debate about this principlein the literature, supporting the view that many li-brarians and information specialists seem to be un-comfortable with it. Whether or not this principle isapproved it should be clear, however, that the socialorganization of knowledge in disciplines is very im-portant for KO and for optimising information seek-ing. When it has been established which discipline agiven query belongs to, the hardest part of the re-trieval task may well have been completed. The socialorganization of knowledge is indeed important tostudy, and apart from some research in bibliometrics(see Hargens, 2000), this study has hardly begun inLIS in spite of the significance expressed in the quota-tion given above. Today, the “empirical” basis forclassifications such as DDC is mainly the literatureclassified by the system. One way to expand the so-cial dimension could be to study the development ofoccupations in society, for example, as displayed inthe Standard Occupational Classification System (U.S. De-partment of Labor, Bureau of Labor Statistics, 2002).

    There are of course different theories or concep-tual frameworks on both the intellectual and the so-cial organization of knowledge (as well as about theirinterrelatedness). One view sees scientific concepts,theories and fields as reflecting a neutral and objectivereality. This might be termed science as a mirror meta-phor, and it is related to rationalism. Scientists may

  • Knowl. Org. 30(2003)No.2B. Hjørland: Fundamentals of Knowledge Organization


    organize themselves according to such pre-establishedfields. Basically, however, the intellectual organiza-tion is not connected to – let alone formed by – thesocial division of labour. Another view might see sci-entific concepts, theories and fields as useful toolsconstructed in order for human beings to accommo-date the demands of life. This is the science as a mapmetaphor, which is related to pragmatism.5 Even if amap is a reflection or representation of a reality, amap is still not a mirror. A map is first and foremosta tool for certain human activities. The kind of activi-ties and interests the map is going to serve has a majorinfluence on how the map is made. From a pragmaticperspective the intellectual organisation of knowledgeis deeply rooted in and connected to the social or-ganization of knowledge.

    One influential view today is social constructivism.This view is related to the pragmatic view presentedabove. Often, however, social constructivism andpragmatism are opposed to kinds of realism such asscientific realism. One of the modern pragmatic phi-losophers is Richard M. Rorty. According to Rorty,scientific realism and pragmatism are two views thatcannot be combined; pragmatism is seen as an anti-realist position. According to other philosophers(e.g., Dewey, 1929; Ellis, 1990), realism must indeed bebased on pragmatism. This corresponds to my ownview, which may thus be termed pragmatic realism. Scien-tific concepts and fields tend to represent parts of real-ity in a way which is functional for human activity. Tothe degree that social constructivism is opposed to real-ism, I disagree with that view. One should avoid thepitfall of sociological reductionism. However, I havefound much research done under the banner of socialconstructivism deeply relevant for the understanding ofthe structure of many knowledge fields. For example,in order to illuminate the classification of mental ill-ness it is relevant to do both experimental researchand to do “discourse analysis.” Experimental researchmay show that schizophrenia is not a fruitful aeti-ological concept because all major psychoses mayhave a common genetic basis (see Kringlen, 1994).“Discourse analysis” on the other hand may uncoverthe fact that most of our theories and concepts in thisfield are “social constructions” influenced by specificinterests. In particular it may draw our attention todifferent cultural experiences and avoid tendencies touniversalistic explanations. In this way it may openup for classifications that are better suited for a dia-logue between different perspectives. In the end, ourscientific concepts and classifications (e.g., “schizo-phrenia” versus “Einheitspsychose”) are tools that

    should be evaluated according to the job they aresupposed to do for us. As such they are parts of theo-ries that have been developed in order to make us actin relation to specific problems.

    The organization of knowledge in traditions, ide-ologies and paradigms may be seen as the combiningconcepts between the intellectual and the social or-ganization. They are cognitive organizations based onsocial influences.

    4. Documents and the Sociological Perspective ofKnowledge Production, Knowledge Interme-diating and Knowledge Consumption

    It is important for information science to providemodels of actors, institutions and information serv-ices in different discourse communities. The actorsare knowledge producers, intermediaries, and users ofknowledge. The institutions are research institutions,publishers, libraries, and so forth. Information serv-ices may be classified in primary services (e.g., pub-lishing houses and journals), secondary services (e.g.,bibliographical databases) and tertiary services (e.g.,professional encyclopaedias and literature reviews). Ihave found the UNISIST (1971) useful. Figure 1 be-low is a modified version of the UNISIST-modelfrom Trine Fjordback Søndergaard, Jack Andersen,& Birger Hjørland (2003, p. 303), in which the stip-pled ellipse symbolises a knowledge domain. Figure 2shows the same model (not previously published) inwhich the Internet resources are highlighted as inte-grated with the printed sources.

    At this point, I want to emphasize that such mod-els provide one specific perspective on knowledge or-ganization. By introducing the actors and the divi-sions of labour between different organizations andservices, such models are sociological by nature. Theyhelp us to understand the different functions of kindsof literature, and they enable us to establish a mean-ingful typology of scholarly literature, including thewell known differentiating differentiation betweenprimary, secondary and tertiary literature.

  • Knowl. Org. 30(2003)No.2B. Hjørland: Fundamentals of Knowledge Organization


    Figure 1: The revised UNISIST-model modified for the domain analytic approach

  • Knowl. Org. 30(2003)No.2B. Hjørland: Fundamentals of Knowledge Organization


    Figure 2: The revised UNISIST-model integrating printed and Internet resources and modified according to the domain analytical approach

  • Knowl. Org. 30(2003)No.2B. Hjørland: Fundamentals of Knowledge Organization


    5. Subject Access Points in Electronic Retrieval.

    In electronic retrieval, words, phrases and texts aretaken out of contexts and merged. Words and sym-bols from many different journals are merged; wordsfrom different sections and parts of texts are merged,and so forth. Words are taken out of their context, acontext that contributes to their meaning. By usingadvanced technology it is possible to re-establish partsof this lost context (e.g., by means of proximity ope-rators). Any string of characters and any symbolfrom the texts themselves or from so-called value ad-ded information become a possible subject accesspoint. The art and science of information retrievaland KO is to utilize all of these possibilities in an op-timal way and to add further value to the representa-tion and retrieval of documents.

    What we must emphasize in IS, is to study theways in which words and symbols are given meaningby their specific contexts. Of importance are the wayin which different disciplines construct their mea-nings, the way in which document composition pro-vides meaning to words and symbols, and the way inwhich different controlled vocabularies constructmeanings. If a word is employed, for example, in thetitle, in the abstracts, in the methodology section orin the conclusion, we might (sometimes) be able toattribute different priorities and meanings to theword, and this may help us to give different probabi-litistic evaluations of whether documents should beretrieved or not. People, who know certain databasesvery well, including the subject literature and subjectlanguage they cover, probably employ this kind ofknowledge in tacit ways. The job for informationstudies is to help to explicate the underlying princi-ples. Just as ordinary people can speak a language anduse grammar, linguistic experts have to explicate whatcompetent users do.

    In Birger Hjørland and Lykke Kyllesbech Nielsen(2001) we summarized research on the relative infor-mational value of different kinds of subject accesspoints (SAPs). It is important to realize that the rela-tive value of, for example, terms from a text versus ci-tation searching (based on the bibliographical refe-rences) is not constant. It varies according to discipli-nary norms among other things. This implies that itis important to uncover such different disciplinarynorms: In our above mentioned review we tried tointegrate such knowledge from many different sour-ces. Our approach implies that the rich flora of do-cuments and domains should be investigated by in-formation science. This is in contrast to the prevai-

    ling “universalistic,” “reductionist” or “individualist”approaches. The purpose of studies is to explicate andrepresent the implicit meanings that are lost in mer-ging.

    Three major approaches to IR and KO are:

    a) Traditional IR based on term frequencies in thetexts themselves and in the total database.

    b) Citation bases’ organization and retrieval based onnetworks of citations between documents.

    c) Traditional library classifications based on disci-plinary divisions and disciplinary criteria. Thosedisciplinary criteria are by principle external to thedocuments themselves and are NOT derivablefrom either full text databases or hyper linkednetworks. You cannot, for example, construct amap of Spain from a collection of documentsabout Spain, whether you use co-citation analysisor any other techniques. Maps are produced bygeographers, and their conceptual structures aresubsequently applied in library classification.

    All these approaches are often used concurrently andbibliographical records often contain a mixture of allof them. We should develop more (explicit) know-ledge about their relative strengths and weaknessesfor different kinds of queries.

    KO involves two kinds of decisions: To make aclassification system (or a thesaurus, etc.) and to deci-de where to classify a given document in that system.Both decisions are related and involve epistemologicalimplications. Both decisions are made at the end, ul-timately based on what is considered relevant: Whatclasses are relevant? What kinds of connections bet-ween classes are relevant? What kinds of documentsare relevant for given users or purposes to put to-gether in a given class?

    Different kinds of SAPs have different informatio-nal values in information retrieval. SAPs are more orless functional in revealing “the subject” of the docu-ments that they represent. It is important to realize,however, that which documents are relevant to a gi-ven query depends, among other things, on the theo-retical perspective in which the query is embedded. Adescription of “the subject” is not a neutral, objectiveactivity, but is influenced by different theoreticalviews and interests. The consequence of this view isthat “the subject itself” (the something that the SAPsare supposed to identify) cannot be regarded as beingobjective in the positivist sense. Based on such consi-derations, Hjørland (1992) defined the subject of a

  • Knowl. Org. 30(2003)No.2B. Hjørland: Fundamentals of Knowledge Organization


    document as the document’s epistemological (or in-formative) potentialities. A document has given poten-tialities, and the job for indexers (and other SAPs) isto identify those potentialities. This is easily under-stood if we compare this with citation indexes. Theremay be many reasons to cite a given document, butsome of these reasons may be rather widely varied.For the group of authors citing a given document forthe same reason this document has the same functionand the same “subject.” Indexing is a process in theservice of helping future authors to identify docu-ments to be cited. This is done by identifying exactlythose attributes that make it citable. The connectionsbetween cited and citing papers are relevance connec-tions, and so also are (or should be) the connectionsbetween indexing terms and the documents indexed.

    6. Social Semiotics and Activity Theory: TheTeleological Nature of KO

    The structuralist view of how our concepts areformed by our languages is shown in Figure 3 below.It shows that there is no one-to-one relation betweenmeanings in different languages. When individualslearn a language, they learn the concepts of that lan-guage, and consequently they classify the world in theway that is given by that particular language. For ex-ample, Germans and Danes classify “tree” in differentways. By implication languages may affect the waywe conceptualise the world. According to LouisHjelmslev (1943) each language puts arbitrary borderson reality, while activity theory (AT) finds that oursymbolic systems tend to capture functional aspectsor affordances in the things we perceive (see Al-brechtsen, Andersen, Bødker, & Pejtersen, 2001).

    English *German *Danish *French Italian Spanish

    Tree Baum Arbre albero ÁrbolTræ

    LeñaWood Holz legno


    Boisbosco Bosque


    SkovForêt foresta Selva

    Figure 3: Cultural relativity in word meanings

    * Originally presented by the Danish structural linguist LouisHjelmslev (1943).Extendet by information from Buckley (2001)

    Hjelmslev’s figure may be criticized; for example, ithas been mentioned that he omits the Danish term“lund” (small forest) and that this omission may wea-ken this particular example. In this place we shall,

    however, take this basic structuralist claim for gran-ted: that given languages affect our semantic structu-res and thus our classification of the world.

    For human informative activities, the proper per-spective of the meaning of “meaning” is very impor-tant. This is a difficult concept for which activitytheory provides a fruitful understanding. The produc-tion of books, texts and other documents is a specialdevelopment in literate cultures. Documents are toolshaving specific kinds of functional values in those cul-tures. The view of AT and social semiotic6 theories isthat meanings, signs and documents are developed tofunction in relation to standardized practices incommunities. We use, for example, the Bible and theHymn Book in our standardized religious practices,textbooks in our standardized teaching practices, lawbooks in standardized legal practices, and so forth.

    When we name something, we facilitate the use ofthat object. By naming churches (or books onchurches) “tourist attractions,” we facilitate a certainuse of churches (or documents). We perform an actby means of language. Given names (or SAPs) willalways facilitate some uses of a document and makeother uses relatively more difficult. No terminologyor technology or kind of SAP can ever be neutral.

    Concepts and documents have more or less stablefunctional values in relation to such standardizedpractices. There are of course always different viewsof whether such standardized practices should bechanged or remain unchanged, and there are alwaysdifferent kinds of possible changes of those practices.There are always dynamic developments in meaning.The most important cause of this is the purpose forwhich we use the concepts, which is connected to thepractices we want to change.

    Often scientific and technological developmentchanges standardized practices in a rather uncontro-versial way. In other cases, however, changes in prac-tices are related to different political interests, to dif-ferent theories or “paradigms.” Different paradigmstend to influence given practices in different ways andby doing so they also tend to change our symbolic sy-stems as well as our production of documents and theform, content, meaning and use of those documents.The proper study of symbols and documents is thusbased on the study of the functions and interests tho-se documents are serving.

    Any given tool (including a sign, concept, theory,finding or document) has given potentialities in relati-on to new social practices and paradigms. Users of li-braries may discover that previously neglected docu-ments may be very useful to develop a new social

  • Knowl. Org. 30(2003)No.2B. Hjørland: Fundamentals of Knowledge Organization


    practice or to defend an existing practice. Or theymay find that well-known documents may be re-interpreted and used in new ways. Especially the kindsof documents known as “classics” have such abilitiesto be reused and re-interpreted by each generation.But even the most humble documents may turn outto be of value. This does not, of course, imply relati-vism regarding the relevance of documents: tools alsoquickly become obsolete and “only of historical inter-est.” There may of course also be a discrepancy bet-ween the potential value of some documents and thecost of storing, organizing and using them.

    The meaning of any sign is its potential quality offor referring to some objects or states of affairs (Kar-patschof, 2000, p.197). A tool is something that has afunctional value for some human (sub)culture. Lan-guages (and sublanguages) are also tools with functio-nal values. In languages, there are terms for tools. Themeaning of a word for a certain kind of tool is thefunctional value of referring to a certain functionalvalue, defining the quality of the tool (Karpatschof,2000, p.197). A hammer may, for example, be termed“hammer” (denotation) or “murder-weapon” (conno-tation). The word we use about a tool facilitates oneor another use of it. Words may be more or less ap-propriately used for a specific object in relation to agiven task, activity or discourse. In other words, touse words is a kind of act (verbal acts) often used toaccomplish something extra-verbal. The meanings ofthe words we use may be more or less suitable for ourpurposes, and in those cases we try to develop newwords or change the meaning of some old words.Such changes in meanings are visible in the study ofdifferent fields, traditions and paradigms.

    It is very naive and reductionistic to disregardthese kinds of cultural inter-mediating factors in peo-ple's relationship with information. The dominanttraditions in both information science and in behav-ioural and cognitive sciences have, however, ne-glected these cultural aspects and just tried to studygeneralized human relations to something termed “in-formation.” This dominant approach may be broadlytermed “behavioural” in spite of different attitudes toversions of behaviourism. In this tradition people areexpected to react to something in a specific, mechani-cal way without considering the culturally-determined meanings and without considering thedifferent goals and values in the meanings and in thedocuments. This has, in my opinion, brought about asituation in which we have inherited very little usefulknowledge from these areas on which to advance ourfield and practice. That view is also expressed by

    Robert de Beaugrande (1997) in linguistics. “I wouldcertainly be happier if my findings had turned out tobe less sobering or disturbing; but I can only reportwhat I have in fact found” (§89).

    In the view of pragmatism (and Activity Theory)languages are tools that are adapted culturally to suitthe needs of their users. General languages may beseen as adaptations that suit the needs of major partsof the populations, while languages for special purpo-ses are adaptations that suit the needs of specificgroups such as chemists, lawyers, musicians, and soforth. Both languages, (or more broadly symbolic sy-stems) as also media, documents and information sy-stems, are teleological or goal-directed. They are op-timised to do certain functions, while relatively igno-ring other purposes and goals. They should be seen asspecializations in the social division of labour. Thisimplies that no system of KO can be optimised to doall kind of tasks equally well. Although moderncomputer based retrieval systems are very flexible andseem efficient for almost all tasks, it is important toconsider the limitations of each kind of system fordifferent kinds of tasks and different domains. Theway knowledge is organized in information systemsmust be relevant to the specific purpose of that parti-cular system. Relevance must, in my opinion, alwaysbe regarded in relation to a goal. Birger Hjørland andFrank Sejer Christensen (2002) defined relevance thisway: “Something (A) is relevant to a task (T) if it increases thelikelihood of accomplishing the goal (G), which is implied by T” [p.964]. Our conception of relevance is thus connectedto meaning and to different views and traditions insociety. It is a basic view in the pragmatic traditionthat the nature of knowledge is to fulfil goals for theorganism or system that possess the knowledge. Allkinds of knowledge or information processing pro-cesses or institutions are basically telelogical by na-ture. They have to fulfil certain goals. Their relevancecriteria are constructed backwards from those goals,and people learn those relevance criteria by being so-cialized and educated within a particular context andtradition.

    7. Paradigms, Epistemologies and KO

    The concept of “paradigm” became influentialthrough Thomas Kuhn’s (1962, 1970, 1996) book.Kuhn did not, however, recognize the social sciencesas sciences or as paradigmatic. He considered the so-cial sciences to be preparadigmatic. In his view a pa-radigm should be a set of common assumptions. As sta-ted by Dogan (2001):

  • Knowl. Org. 30(2003)No.2B. Hjørland: Fundamentals of Knowledge Organization


    Within a formal discipline, several major theo-ries may cohabit, but there is a paradigm onlywhen one testable theory alone dominates allother theories and is accepted by the entire sci-entific community. When Pasteur discoveredthe microbe, the theory of spontaneous genera-tion collapsed: contagion became the new para-digm. In the social sciences, however, we see atbest a confrontation between several non- test-able theories. Most of the time there is not evena confrontation but careful mutual avoidance,superb disregard on all sides. This is relativelyeasy, owing to the size of scientific disciplines,and their division into schools; it is true for allcountries, big or small. (p. 11024)

    Nonetheless, his theory has been very influential inthe social sciences, and the concept of paradigm is of-ten used in regard to the different schools, approa-ches, systems and so on. Thus, the concept of para-digm is usually applied in a broader way. “Paradigm”may, in the philosophy of science, mean more or lessimplicit background assumptions concerning the ob-ject of research, concerning research methods, con-cerning the usefulness of research, and so forth. In aneven broader meaning the concept may be used withmore or less coherent views concerning a collectivepractice, such as teaching, divine service, administra-tion of justice, politics, and so forth. Kuhn’s 1970postscript in the second edition defines among otherthings a paradigm as a constellation of beliefs, values,techniques and so on shared by members of a givencommunity.

    Håkan Tjörnebohm, in1974, (as cited in Andersen,1999, p. 89) defined paradigms as systems of (explicitor implicit) basic assumptions and epistemic ideals inscientific disciplines. A paradigm is a super individualstructure of meaning, which is formed and reproducedin disciplinary socialization, teaching and scientificcommunication. Components of paradigms include:

    1) Ideals and beliefs about science (epistemicgoals, methods and criteria in the productionand evaluation of scientific results inside the dis-cipline);2) World-view hypotheses, basic ontological as-sumptions; and3) Ideals concerning significance for society andculture, for practical use and for enlightenment.

    “Paradigm” in the broadest meaning of the word is avery central concept for KO. Not only do scientificparadigms determine how we study KO in informa-

    tion science (see section 9 and 10 below); it is also thecase that the knowledge that we are going to organizeis already conceptualised and organized according tomore or less invisible structures, determined by diffe-rent paradigmatic influences. This is done in many“layers” in ordinary languages, in subject languages,in documentary forms, in networks of citing papers,in informal networks and so on. Different paradigmsusually imply different values and goals. KO in in-formation science must know those different valuesand be able to fulfil the different goals.

    8. Concepts and Conceptual Relations as Units of KO

    Concepts are normally seen as the units of thoughtand knowledge. This implies that the fundamentalunits of KO are the (semantic) relations between con-cepts. If knowledge is defined as justified true belief(as in the Platonic tradition) then real knowledge ishard or impossible to identify and to classify. It ismore fruitful to speak of knowledge claims, ratherthan knowledge itself. To speak of knowledge claims asthe things represented in the literature and the thingto be classified is a more careful way of speaking, andthere is no real loss by this way of speaking.7

    The basic function of concepts is to provide a basisfor dealing with the world. Concepts provide bordersand classes in a continuous world. “Blue” delimitssome wavelengths, “castle” delimits some kind ofbuildings, and “music” delimits some sounds. Ourtheories, conceptualisations and paradigms tend toclassify things and knowledge about things accordingto the same basic principles. Anders Ørom (2003, inpress) has demonstrated how, for example, pictures inmuseums, the literature about art and the classificati-ons used in libraries and bibliographies all tend to beinfluenced by the same paradigm (e.g. the stylistic pa-radigm, the cultural historical paradigm, the semioticparadigm, the social historical paradigm, the feministparadigm etc.).

    A descriptor or a class represents a concept8, andinformation retrieval is essentially concerned withsemantic relations between queries, document repre-sentations and texts. The smallest unit is the relationsbetween two concepts. The theory of concepts (mea-ning, semantics) is, however, probably one of themost difficult and muddled research fields today. It isimportant to understand the different views betweentraditional theories based on logical positivism and al-ternative views based on pragmatic theories. Tem-merman (1997) describes some of these differences.

  • Knowl. Org. 30(2003)No.2B. Hjørland: Fundamentals of Knowledge Organization


    Note the distinction between on the one handlogical and ontological classification, which in tra-ditional [theory of] Terminology is supposed tobe possible in the mind without considering orusing language and before the naming of theconcepts takes place, and on the other hand cate-gorisation which is a result of the interaction bet-ween language and the mind (p. 55). 9

    The importance of concepts has been recognized inthe KO community. Dahlberg (1991) is one promi-nent example. The journal Knowledge Organization has as

    subtitle “International Journal. Devoted to ConceptTheory, Classification, Indexing, and Knowledge Re-presentation”. In spite of this, the contributionsabout concepts have hitherto been very limited innumber, and the concept of concepts is much betterexplored in other fields like philosophy, cognitiveand developmental psychology, artificial intelligenceand linguistics.

    Different theories of concepts are deeply linkedwith epistemological theories. Empiricism, rationa-lism, historicism and pragmatism have their differentconceptions of concepts.

    “Paradigms” In philosophy In psychology

    Empiricism Simple concepts correspond to simple sensations.

    There are no necessary relations between simpleconcepts. Simple concepts may be combined tocomplex concepts. Nominalism: General conceptsare names, which we put on classes of things(empirical generalizations).

    Classical associationism. Behaviourism. Connec-tionism (neo-associationism or parallel distributedprocessing, which works with inductive, self-organizing, “neural” networks for the processingof sensory input)

    Rationalism Simple are concepts, which cannot in fruitfulways be defined by other concepts. They are notexperienced, but inborn or maturated. Complexconcepts are defined from simple concepts. Fun-damental concepts are concepts, which are neces-sary in order to describe a field. Simple, complexand fundamental concepts enter into certain ne-cessary mutual relations. The differentiation bet-ween simple and complex concepts is absolute(independent of domain, interests, points of viewetc).

    Classical Artificial Intelligence. Cognitivism.Works with deductive, rule-governed algorithmsfor the processing of sensory input. “Classical” or“Aristotelian” concepts exhaustively defined bysets of necessary and sufficient attributes.

    Modern theory (Rosch, Lakoff) Prototypetheory.Concepts are more or less prototypical instancesof things.

    Criticism (Kant)(Empirico-rationalism)

    Concepts represent knowledge of the world me-diated by our forms of reason or categories likespace, time, thing and cause.

    Jean Piaget’s “genetic epistemology”.

    “Concepts mature in the individual. They growout like the teeth”

    Historicism andhermeneutics

    Concepts are formed in a historical process onthe basis of pre-understanding and holistic per-ception. There is circularity between the formingof simple and complex concepts. The relationsbetween simple and complex concepts are relativein relation to interests. Traditions and socialcommunities play important roles for the for-ming of concepts.

    Psychoanalytical interpretation. A satisfactoryanalysis of the functions of cognition cannot justexplore sensation, memory and thinking in isola-tion, but must involve the whole person, and hisor her developmental history both individuallyand collectively. Concepts are thus formed by in-fluence of personal characteristics such as sex andsocial class.

    Pragmaticism andcritical realism

    Knowledge and concepts are formed by people’spractical activity in relation to the objects of theactivity.

    Activity theory: Our concepts are not just de-termined by attributes in physical objects. Theyhave also “historical depth”. They are formed inthe historical contexts of the objects, which havemeaning.

    Fig. 4: Basic Conceptions of “Concept”

  • Knowl. Org. 30(2003)No.2B. Hjørland: Fundamentals of Knowledge Organization


    Frank C. Keil outlines some very important devel-opments in theories about concepts:

    ...one thing has increasingly emerged. Coher-ent belief systems, or “theories,” are critical tounderstanding the nature of concepts and howthey develop. Although philosophers and somepsychologists of a more classical bent have oftenargued about the interdependence of theoriesand concepts, for the most part cognitive psy-chologists in the last 20 years have not. As aconsequence of the work done in the 1960s onconcept learning and the later “anticlassical” re-sponse to that paradigm, it became the norm totalk about concepts as consisting of combina-tions of features or as being the result of sum-marizing operations on exemplars and/or di-mensions. Although the classical and newerprobabilistic views clashed on the issue ofwhether features for natural concepts were nec-essary and sufficient, their debate masked ashared set of assumptions that the meanings ofconcepts could be fully described by lists of fea-tures and simple probabilistic or correlationaloperations on those features.

    As researchers began to look at the pheno-mena associated with concepts more closely,however, it became increasingly clear from se-veral different lines of evidence that somethingwas missing... . . Part of the meaning of a con-cept may be the apprehension of the theoreticalrelations that explain its internal structure... .such relations have always been fundamentallyimportant in the philosophy of science. (Keil,1989, p. 267)

    The history of all natural sciences documentsthe discovery that certain entities that shareimmediate properties nonetheless belong to dif-ferent kinds. Biology offers a great many exam-ples, such as the discoveries that dolphins andwhales are not fish but mammals, that the bat isnot a kind of bird, that the glass “snake” is infact a kind of lizard with only vestigial limbsbeneath its skin. In the plant kingdom it hasbeen found, for example, that some “vegetables”are really fruits and that some “leaves” are notreally leaves. From the realm of minerals andelements have come the discoveries, among oth-ers, that mercury is a metal and that water is acompound ...

    In almost all these cases the discoveries fol-low a similar course. Certain entities are ini-tially classified as members of a kind becausethey share many salient properties with otherbona fida members of that kind and becausetheir membership is in accordance with currenttheories. This classification may be accepted forcenturies until some new insight leads to a reali-zation that the entities share other, more fun-damentally important properties with a differ-ent kind not with their apparent kind.

    Sometimes it is discovered that although thefundamental properties of the entities are notthose of their apparent kind, they do not seemto be those of any other familiar kind either. Insuch cases a new theoretical structure must de-velop that provides a meaningful system of clas-sification.

    There are many profound questions aboutwhen a discovery will have a major impact on ascheme of classification, but certainly a majorfactor is whether that discovery is made in thecontext of a coherent causal theory in which thediscovered properties are not only meaningfulbut central (Keil, 1989, p. 159).

    This long citation is important, I believe, because itdemonstrates the deeper reality of kinds and conceptsthat science discovers. It has important implicationsfor the methodology of KO. First and foremost itchallenges many user-oriented and empiristic ap-proaches.

    To the degree that this view is correct, the relati-ons between two concepts are thus relative to thetheoretical systems (or paradigms) in which they areembedded. We cannot say objectively and once andfor all, for example, that “schizophrenia” and “manio-depressive disorders” are narrower terms (NT) in re-lation to “psychosis”. It has been suggested by Krin-glen (1994) that aetiologically schizophrenia mightnot be a useful concept, while one should probablyalternatively operate with the old German concept“Einheitspsychose”. It follows that schizophrenia isnot NT in relation to psychosis. To the degree thatthis is correct the semantic relations between “schi-zophrenia,” “manio-depressive disorders” and “psy-chosis” are synonym relations, not generic relations.

    We may conclude that the basic units in KO, thesemantic relations between two concepts, may be re-lative to the perspective and the theory from whichthey are considered. Because of this fact, KO cannotbe done just from successive combinations of ele-

  • Knowl. Org. 30(2003)No.2B. Hjørland: Fundamentals of Knowledge Organization


    ments, but must reflect broader perspectives andtheories.

    9. Criteria for Likeliness in KO.

    A fundamental principle in KO is that like thingsshould be brought together, while different thingsshould be separated. “Likeliness” is a concept thatmay also be expressed by other terms such as “simi-larity”, “sameness” (used by James, 1890), “resem-blance” or “equivalence”. Many writers in LIS havedefined classification and KO by using the conceptsof likeliness. For example:

    – Ernest Cushing Richardson (1964, p. 1) definedclassification as the “putting together of likethings, or more fully described, it is the arrangingof things according to likeness and unlikeness. Itmay also be described as the sorting and groupingof things”.

    – Henry E. Bliss (1935, p. 3) wrote: “In dealing withthe multiplicity of particular things, actualities,and specific kinds, we find that some are alike, ingeneral characters and in specific characteristics;and we may consequently relate them in a class, orclasses, that is classify them”.

    – Frederick Wilfrid Lancaster (born 1933) definedclassification as “sorting items into ‘conceptualclasses’” and “forming classes of objects on the ba-sis of their subject matter”. (Lancaster, 1998, p. 17).

    – Arlene G. Taylor: “The placing of subjects into ca-tegories; in organizing of information, classificati-on is the process of determining where an informa-tion package fits into a given hierarchy and thenassigning the notation associated with the appro-priate level of the hierarchy to the informationpackage and its surrogate.” (Taylor, 1999, p. 237).

    How do we decide what things are alike? How do wedevelop our criteria for likeness? The literature onKO in LIS seems to ignore this core problem. Is thisbecause the problem is seen as obvious? Is this basedon a kind of naïve realism: that things are what theylook like, and that people’s immediate sense of like-ness is adequate as the basis for KO?

    But when are two things alike? Naïve realism con-fuses seeming similarity with similarity in an objec-tive sense. Some metals might look like gold, butmight not be precious metal. Chemical analysis (notcommon sense) has to make a distinction between“what looks like gold” and “what is gold”. Classifica-tion made for children may be based on more super-

    ficial attributes – e.g., “the big book of trains” whichconsiders most aspects of the “railway system”. Scien-tific classifications, on the other hand, reflect somedeeper properties, such as the classification of chemi-cal substances in organic and inorganic compounds,precious metals, etc., based on atomic theory. Whenscientific principles of classification are applied, see-mingly related objects may be separated and seemin-gly different objects may be grouped together. Forexample, whales were once classified as fish, but aretoday – influenced by evolutionary theory - classifiedas mammals.10 These examples demonstrate that na-ïve realism is not adequate as a method to classify do-cuments.

    If we consider different examples, we will see thatconceptual relations have different kinds of motiva-tions.

    – “Institutions for Information Science” are generi-cally subordinate to “institutions”. This relation-ship seems to be motivated by purely logical rela-tions (or by relations inherent in a given language).

    – “Copenhagen” is part of “Denmark” (WhereasMalmö in Sweden is not. This last example is mo-tivated by the fact that Denmark lost this part ofterritory to Sweden in 1658). The parts of a coun-try are thus defined by social arrangements. Thistype of semantic relations is in other words basedon human conventions.

    – Whales are today classified as mammals. The ex-planation of this semantic (generic) relation is dueto evolutionary theory.11

    – Psychology (and also psychopathology) may beclassified as part of neuroscience (natural science),as part of the social sciences or as part of the hu-manities (or otherwise). Such differences are, forexample, visible in how such fields are placed inthe organizational structures in universities. Suchclassifications (and semantic relations) often in-volve professional interests. It is, for example,partly a question of professional power whether afield (e.g. psychopharmacology) is monopolized bya profession. At the deepest level, the question ofwhether psychology is a human science or a natu-ral science is a scientific question related to theo-retical questions within psychology. It is wellknown that psychology is divided over this ques-tion. Different paradigms in psychology have dif-ferent answers. For behaviorism psychology isclearly a part of the natural sciences (cf. Watson,1913). For humanistic psychology it is clearly apart of the humanities. In classifying psychology as

  • Knowl. Org. 30(2003)No.2B. Hjørland: Fundamentals of Knowledge Organization


    belonging to science, or social science or humani-ties, one is actually involved in a theoretical battlebetween paradigms. (Which most people findrather uncomfortable).

    – Classifications and semantic relations may be es-tablished by empirical generalizations. One exam-ple is Berlin & Kay (1969), who established theempirical generalization that human languages re-flect the classification of colour sensations in essen-tially the same ways universally, regardless of his-torical and cultural differences. Such empiricalgeneralizations avoid the uncomfortable theoreti-cal issues addressed above. There are, however,other kinds of problems with this approach, whichare closely connected with the basic assumptionsin empiricism.

    – Classifications and semantic relations may also bepurely accidental or ad hoc. Such classificationsmay serve some purposes very well. In general,however, classifications that reflect essential char-acteristics in the objects are the most valuable.(This “essentialism” should not be confused withan objectivism that ignores human activity).

    We may conclude that there exist many differentkinds of criteria for likeliness. They may be conven-tional, logical, psychological and so on. Regardingnatural kinds, however, they should especially beseen as domain-specific criteria which are discoveredby science. They are not just something that can beextracted from users or from statistical investigations.

    10. Methods of Knowledge Organization

    By methods of KO we mean the methods of con-structing systems of KO such as classifications andthesauri. Also the The processes of indexing and clas-sification are also considered methods of KO.

    What are the methods for classification and KO?We may first make a distinction between classifica-tion in the sciences (such as biology or archaeology)and in LIS. This paper is about KO in LIS. However,we shall argue that basically the methods of KO inLIS are connected to the same fundamental paradigmsin epistemology as the methods of classification insciences and other fields. First, however, let usshortly consider some methods:

    – Standardization– Computer based KO– “Manual” or “intellectual” methods– Quantitative methods

    – Qualitative methods– Text based methods– People based methods– Institution based methods (e.g. studies of univer-

    sity organisations)– Bibliometrical methods– Word frequency based methods– Sociological methods– Historical methods– Pragmatic, epistemological and critical methods

    It is important to realize that an important quality ofa system of KO may be its function as a standard. Of-ten classifications and other systems are made ascompromises between experts in a field and declareda standard. Scientific arguments between researchersin the field are not published and discussed in the lite-rature as a part of this process. This method is epi-stemologically a kind of “authoritanism”: the qualityof the system is claimed by reference to its authorityas a standard. However, as researchers in KO we haveto consider other qualities of KO too and should notaccept “authoritanism” as a criterion of scientificknowledge.

    One distinction, one that may be encountered inthe literature, is between computer based methodsand “manual” methods of KO. The term “manual” isa little peculiar, because what is meant is rather intel-lectual methods. This distinction is not in itself a fun-damental distinction. If it is possible to develop someexplicit rules for how to classify documents, thensuch rules can be used by humans or by computers.In other words, sometimes humans are not perform-ing better than computers, which follow rather sim-ple instructions. At other times, however, humansapply deep knowledge, which is not available tocomputers. And sometimes computers process hugeamounts of data that are not possible for humans toprocess. We should thus not consider computer basedversus manual methods as a basic methodological dis-tinction. We should rather classify the methods ofKO in other ways asking what kind of rules, whatkind of knowledge and what kind of cognitive proc-esses are involved in the process? In general ourknowledge of how humans classify is limited.12

    Another distinction is between quantitative versusqualitative methods. Quantitative methods may, forexample, be based on word frequencies. Qualitativemethods may, on the other hand, be based on inter-pretations of meaning. Again, I do not consider this afundamental distinction. It is well known that anyquantitative investigation presupposes an adequate

  • Knowl. Org. 30(2003)No.2B. Hjørland: Fundamentals of Knowledge Organization


    qualitative analysis. Qualitative methods, on theother hand, may result in a number of categories andthus approximate the quantitative approach. The so-ciologists Pierre Bourdieu uses many statistics (quan-titative data) in his research; however, he explicitlydenies doing this in a positivist way. There are thusimportant ways in which to use data that go deeperthan the distinction between quantitative and qualita-tive approaches.

    Systems of KO may use data from the literature,from people, from organizations, etc., in many differ-ent ways and in many combinations. Bibliometricalapproaches are mainly based on literatures, and so aremany conventional and “manual” methods (includingthe well known principle of identifying literary warrantfor a given class or descriptor). The use of historicalknowledge about the development of knowledgefields (as exemplified in Wallerstein, 1996 and Hjør-land, 2000c) is a different way to use literature as datafor KO. This last approach also uses the developmentof the organization of universities as a basis for con-structing KO.

    People may also be used as sources on which tobase the development of KO. In cognitive science andartificial intelligence the methods of knowledge elicitationare well known (e.g. Cooke, 1994). This is, in myopinion, an indirect method because we have to know themethods by which the experts, from whom we ob-tain the knowledge, got this in the first place. Inother words we must be able to argue which knowl-edge claims are best substantiated. If we do not ad-dress this issue, we are only indirectly qualified toconstruct and evaluate systems of KO. People may,however, also be used as sources in other ways. Theymay, for example, be seen as members of discourse com-munities and be studied as such. We may study the so-cial division of labour and the dependency betweendifferent people and groups of people (compareWhitley, 1984).

    When we are going to decide what kind of sourcesto build on (literature, people, organizations), weneed to know something about how the relevantknowledge is distributed. Methods of knowledgeelicitation seem to be built on the assumption thatthe needed knowledge is ready at hand within agroup of experts or other people. This may be moreor less the case. However, more scholarly approachesbuild on other assumptions. The meaning of words istraced in great detail and registered in historical dic-tionaries with quotations and citations from authori-tative authors. In this case, the relevant information isnot supposed to be ready at hand in the mind of ex-

    perts, but something that might be constructed byworking with the literature (not just as a kind ofopinion poll). The question of cognitive authority isof course extremely important and difficult. Scholarlywork is not just “subjective”, choosing authorities ac-cording to personal preferences. The more qualified,the less “subjective” in this sense of the word. The“objectivity” of the decision of scholarly work ispartly based on the knowledge of different traditionsand metaperspectives and on the cumulated historicalevidence. On could perhaps say that a fundamentaldifference between empiricism/rationalism and his-toricism/pragmatism is the emphasis on reading inthe research process.

    Basically, the methods of KO are related to fun-damental theories of epistemology. All researchers inany field are always more or less influenced by cer-tain ideals on how to obtain knowledge. This is alsothe case with regard to constructing systems of KO. Ifstudents of LIS have taken a course in the philosophyof science, they should be able to identify the domi-nant epistemic approach in any paper on KO (andalso in any system of KO). Some papers and systemsare mostly based on empirical generalizations. Sys-tems based on word-frequency measures may be thebest example. Other papers and systems are mostlybased on rational rules and deductions (while oftenignoring empirical issues). Facet-analytic systems inthe tradition of Ranganathan may provide the bestexamples. A third kind of system is based on thestudy of the evolution of knowledge fields. To a cer-tain degree the disciplinary based systems such asDewey (DDC) can be said to conform to the ideal.On the other hand the updating of this system ismore is and more influenced by the principles of thefacet-analytic tradition (as discussed by Miksa, 1998).And in some sense the DDC is also empirically basedon the basis of the literature, which is classified by thesystem. DDC is thus not as clear in its methodologicalprinciples as the former examples. The fourth basicideal is the pragmatic epistemology. In this case the sys-tems are developed on the basis of the analysis of goals,values and consequences. For example, female research-ers may regard the goal of women’s emancipation asthe main goal of feminist scholarship. They may thusregard any relations that support this goal as important.“Similarity” is thus identified by classes of documentssharing similar functions in relation to this goal. Prag-matic classification may thus also be termed critical orpolitical classification. It may at first seem strange andopposed to common scientific norms.

  • Knowl. Org. 30(2003)No.2B. Hjørland: Fundamentals of Knowledge Organization


    Pragmatic epistemology and pragmatic KO do notmean that a person (or a whole field) can just do thingsthe way that suits his personal interests (or the interestsof the researchers in the field). If this is done, if researchjust produces “social constructions” then reality willmake those constructions incoherent. They will be op-posed by empirical and theoretical arguments. Theproduction of incoherent “knowledge” is not valuableand cannot be a serious goal. Therefore pragmatic phi-losophy is bound up with a form of realism. The prag-matic method is not opposed to aspects of empiricism,rationalism and historicism. It claims, however, that noisolated evidence is enough. The final criteria of truthare connected to human goals and activities. You can-not avoid considering such issues although they mayseem uncomfortable.

    The founder of pragmatism, Charles Sanders Peirce(1839-1914), wrote:

    The rational meaning of every proposition liesin the future. How so? The meaning of a propo-sition [its logical interpretant] is itself a proposi-tion. Indeed, it is no other than the very propo-sition of which it is the meaning: It is a transla-tion of it. But of the myriads of forms intowhich a proposition may be translated, what isthat one which is to be called its very meaning?It is, according to the pragmaticist, that form inwhich the proposition becomes applicable tohuman conduct, ... that form which is most di-rectly applicable to self-control under everysituation and to every purpose. This is why helocates the meaning in future time; for futureconduct is the only conduct that is subject toself-control. (Peirce, 1905)

    In this way the pragmatic epistemology sees themeaning of words as connected to speech acts and tothe goals that humans try to satisfy through theiracts. In the same way it regards semantic relations andclassifications as determined by their functions astools for human goals. As the pragmatic philosopherJohn Dewey stated:

    So far nominalism and conceptualism - the the-ory that kinds exist only in words or in ideas -was on the right track. It emphasized the teleo-logical character of systems and classifications,that they exist for the sake of economy and effi-ciency in reaching ends. But this truth was per-verted into a false notion, because the active anddoing side of experience was denied or ignored.

    Concrete things have ways [emphasis in original]of acting, as many ways of acting as they havepoints of interaction with other things. Onething is callous, unresponsive, inert in the pres-ence of some other things; it is alert, eager, andon the aggressive with respect to other things; ina third case, it is receptive, docile. Now differentways of behaving, in spite of their endless diver-sity, may be classed together in view of commonrelationship to an end. No sensible person triesto do everything. He has certain main interestsand leading aims by which he makes his behaviorcoherent and effective. To have an aim is to limit,select, concentrate, group. Thus a basis is fur-nished for selecting and organizing things accord-ing as their ways of acting are related to carryingforward pursuit. Cherry trees will be differentlygrouped by woodworkers, orchardists, artists,scientists and merry-makers. To the execution ofdifferent purposes different ways of acting and re-acting on the part of trees are important. Eachclassification may be equally sound when the dif-ference of ends is borne in mind.

    Nevertheless there is a genuine objective stan-dard for the goodness of special classifications.One will further the cabinetmaker in reachinghis end while another will hamper him. Oneclassification will assist the botanist in carryingon fruitfully his work of inquiry, and anotherwill retard and confuse him. The teleologicaltheory of classification does not therefore com-mit us to the notion that classes are purely verbalor purely mental. Organization is no moremerely nominal or mental in any art, includingthe art of inquiry, than it is in a department storeor railway system. The necessity of executionsupplies objective criteria. Things have to besorted out and arranged so that their groupingwill promote successful action for ends. Conven-ience, economy and efficiency are the bases ofclassification, but these things are not restrictedto verbal communication with others