63
Knowl. Org. 42(2015)No.3 KO KNOWLEDGE ORGANIZATION Official Journal of the International Society for Knowledge Organization ISSN 0943 – 7444 International Journal devoted to Concept Theory, Classification, Indexing and Knowledge Representation Contents Articles Hui Fang. Classifying Research Articles in Multidisciplinary Sciences Journals into Subject Categories ................................ 139 Philip Hider. A Survey of the Coverage and Methodologies of Schemas and Vocabularies Used to Describe Information Resources ............................................................... 154 Gilberto Anguiano Peña and Catalina Naumis Peña. Method for Selecting Specialized Terms from a General Language Corpus .......................................................... 164 Kyong Eun Oh, Soohyung Joo and Eun-Ja Jeong. Online Consumer Health Information Organization: Users’ Perspectives on Faceted Navigation ............................. 176 Brief Communication Satija, M.P. The 21 st Edition (2014) of the Sears List of Subject Headings: A Brief Introduction. ..................................... 187 Book Reviews Wissensorganisation: Entwicklung, Aufgabe, Anwendung, Zukunft by Ingetraut Dahlberg. Würzburg: Ergon Verlag, 2014, 175p. ISBN 978-3-95650-065-7, €28 ................. 190 The Elements of Knowledge Organization by Richard Smiraglia. Cham: Springer, 2014, 101p. ISBN 978-3-319-09356-7 US$109 .................................. 190 Intner, Sheila S. and Weihs, Jean. Standard Cataloging for School and Public Libraries, 5 th ed. Santa Barbara, CA: ABC-CLIO/Libraries Unlimited, 2015. ix, 239p. ISBN: 978-1-61069-114-7(pbk). US$ 55 ...................... 195 Books Recently Published ...................................................... 197

KO KNOWLEDGE ORGANIZATION · Knowl. Org. 42(2015)No.3 KO KNOWLEDGE ORGANIZATION Official Journal of the International Society for Knowledge Organization ISSN 0943 – 7444

  • Upload
    others

  • View
    8

  • Download
    0

Embed Size (px)

Citation preview

Page 1: KO KNOWLEDGE ORGANIZATION · Knowl. Org. 42(2015)No.3 KO KNOWLEDGE ORGANIZATION Official Journal of the International Society for Knowledge Organization ISSN 0943 – 7444

Knowl. Org. 42(2015)No.3

KO KNOWLEDGE ORGANIZATION Official Journal of the International Society for Knowledge Organization ISSN 0943 – 7444

International Journal devoted to Concept Theory, Classification, Indexing and Knowledge Representation

Contents Articles Hui Fang. Classifying Research Articles in Multidisciplinary Sciences Journals into Subject Categories................................ 139 Philip Hider. A Survey of the Coverage and Methodologies of Schemas and Vocabularies Used to Describe Information Resources ............................................................... 154 Gilberto Anguiano Peña and Catalina Naumis Peña. Method for Selecting Specialized Terms from a General Language Corpus.......................................................... 164 Kyong Eun Oh, Soohyung Joo and Eun-Ja Jeong. Online Consumer Health Information Organization: Users’ Perspectives on Faceted Navigation ............................. 176 Brief Communication Satija, M.P. The 21st Edition (2014) of the Sears List of Subject Headings: A Brief Introduction. ..................................... 187

Book Reviews Wissensorganisation: Entwicklung, Aufgabe, Anwendung, Zukunft by Ingetraut Dahlberg. Würzburg: Ergon Verlag, 2014, 175p. ISBN 978-3-95650-065-7, €28 .................190 The Elements of Knowledge Organization by Richard Smiraglia. Cham: Springer, 2014, 101p. ISBN 978-3-319-09356-7 US$109 ..................................190 Intner, Sheila S. and Weihs, Jean. Standard Cataloging for School and Public Libraries, 5th ed. Santa Barbara, CA: ABC-CLIO/Libraries Unlimited, 2015. ix, 239p. ISBN: 978-1-61069-114-7(pbk). US$ 55 ......................195

Books Recently Published ......................................................197

Page 2: KO KNOWLEDGE ORGANIZATION · Knowl. Org. 42(2015)No.3 KO KNOWLEDGE ORGANIZATION Official Journal of the International Society for Knowledge Organization ISSN 0943 – 7444

Knowl. Org. 42(2015)No.3

KNOWLEDGE ORGANIZATION KO Official Journal of the International Society for Knowledge Organization ISSN 0943 – 7444

International Journal devoted to Concept Theory, Classification, Indexing and Knowledge Representation KNOWLEDGE ORGANIZATION This journal is the organ of the INTERNATIONAL SOCIETY FOR KNOWLEDGE ORGANIZATION (General Secretariat: Amos DAVID, Université de Lorraine, 3 place Godefroy de Bouillon, BP 3397, 54015 Nancy Cedex, France. E-mail: [email protected].

Editors Richard P. SMIRAGLIA (Editor-in-Chief), School of Information Stud-ies, University of Wisconsin, Milwaukee, Northwest Quad Building B, 2025 E Newport St., Milwaukee, WI 53211 USA. E-mail: [email protected]

Nancy WILLIAMSON (Classification Research News Editor), Faculty of Information Studies, University of Toronto, 140 St. George Street, Toronto, Ontario M5S 3G6 Canada. E-mail: [email protected]

Melodie J. FOX (Review Editor), School of Information Studies, Uni-versity of Wisconsin, Milwaukee, Northwest Quad Building B, 2025 E Newport St., Milwaukee, WI 53211 USA.

Jihee BEAK (Assistant to the Editor-in-Chief), School of Information Studies, University of Wisconsin, Milwaukee, Northwest Quad Building B, 2025 E Newport St., Milwaukee, WI 53211 USA.

Hailey E. STRICKON (Editorial Assistant), School of Information Studies, University of Wisconsin, Milwaukee, Northwest Quad Building B, 2025 E Newport St., Milwaukee, WI 53211 USA.

Editors Emerita Hope A. OLSON, School of Information Studies, University of Wis-consin-Milwaukee, Milwaukee, Northwest Quad Building B, 2025 E Newport St., Milwaukee, WI 53211 USA. E-mail: [email protected]

Clare BEGHTOL, Faculty of Information Studies, University of To-ronto, 140 St. George Street, Toronto, Ontario M5S 3G6, Canada. E-mail: [email protected]

Ingetraut DAHLBERG, Am Hirtenberg 13, 64732 Bad Ko ̈nig, Germa-ny. E-mail: [email protected]

Editorial Board Jonathan FURNER, Graduate School of Education & Information Studies, University of California, Los Angeles, 300 Young Dr. N, Mail-box 951520, Los Angeles, CA 90095-1520, USA. E-mail: [email protected]

Jesús GASCÓN GARCÍA, Facultat de Biblioteconomia i Docu-mentació, Universitat de Barcelona, C. Melcior de Palau, 140, 08014 Barcelona, Spain. E-mail: [email protected]

Claudio GNOLI, University of Pavia, Mathematics Department Li-brary, via Ferrata 1, I-27100 Pavia, Italy. E-mail: [email protected]

Rebecca GREEN, Assistant Editor, Dewey Decimal Classification, Dewey Editorial Office, Library of Congress, Decimal Classification Division , 101 Independence Ave., S.E., Washington, DC 20540-4330, USA. E-mail: [email protected]

José Augusto Chaves GUIMARÃES, Departamento de Ciência da In-fromação, Universidade Estadual Paulista–UNESP, Av. Hygino Muzzi Filho 737, 17525-900 Marília SP Brazil. E-mail: [email protected]

Birger HJØRLAND, Royal School of Library and Information Science, Copenhagen Denmark. E-mail: [email protected]

Barbara H. KWASNIK, School of Information Studies, Syracuse Uni-versity, Syracuse, NY 13244 USA. E-mail: [email protected]

María J. LÓPEZ-HUERTAS. Universidad de Granada, Facultad de Bib-lioteconomía y Documentación, Campus Universitario de Cartuja, Bib-lioteca del Colegio Máximo de Cartuja, 18071 Granada, Spain. E-mail: [email protected]

Kathryn LA BARRE, The Graduate School of Library and Information Science, University of Illinois at Urbana-Champaign, 501 E. Daniel Street, MC-493, Champaign, IL 61820-6211 USA. E-mail: [email protected]

Marianne LYKKE, e-Learning Lab, Center for User-driven Innovation, Learning and Design, Department of Communication, Aalborg Univer-sity, Kroghstraede 1, room 2.023 Denmark 9220 Aalborg OE. E-mail: [email protected]

Ia MCILWAINE (Literature Editor), Research Fellow. School of Li-brary, Archive & Information Studies, University College London, Gower Street, London WC1E 6BT U.K. E-mail: [email protected]

Jens-Erik MAI, Royal School of Library and Information Science, Co-penhagen Denmark. E-mail: [email protected]

Widad MUSTAFA el HADI, Université Charles de Gaulle Lille 3, URF IDIST, Domaine du Pont de Bois, Villeneuve d’Ascq 59653, France. E-mail: [email protected]

H. Peter OHLY, Prinzenstr. 179, D-53175 Bonn, Germany. E-mail: [email protected]

K. S. RAGHAVAN, KAnOE (Centre for Knowledge Analytics & Onto-logical Engineering), PES Institute of Technology, 100 Feet Ring Road, BSK 3rd Stage, Bangalore 560085, India. E-mail: [email protected].

M. P. SATIJA, Guru Nanak Dev University, School of Library and In-formation Science, Amritsar-143 005, India. E-mail: [email protected]

Aida SLAVIC, UDC Consortium, PO Box 90407, 2509 LK The Hague, The Netherlands. E-mail: [email protected]

Dagobert SOERGEL, Department of Library and Information Studies, Graduate School of Education, University at Buffalo, 534 Baldy Hall, Buffalo, NY 14260-1020. E-mail: [email protected]

Renato R. SOUZA, Applied Mathematics School, Getulio Vargas Foundation, Praia de Botafogo, 190, 3o andar, Rio de Janeiro, RJ, 22250-900, Brazil. E-mail: [email protected]

Joseph T. TENNIS, The Information School of the University of Washington, Box 352840, Mary Gates Hall Ste 370, Seattle WA 98195-2840 USA. E-mail: [email protected]

Maja ŽUMER, Faculty of Arts, University of Ljubljana, Askerceva 2, Ljubljana 1000 Slovenia. E-mail: [email protected]

Page 3: KO KNOWLEDGE ORGANIZATION · Knowl. Org. 42(2015)No.3 KO KNOWLEDGE ORGANIZATION Official Journal of the International Society for Knowledge Organization ISSN 0943 – 7444

Knowl. Org. 42(2015)No.3

H. Fang. Classifying Research Articles in Multidisciplinary Sciences Journals into Subject Categories

139

Classifying Research Articles in Multidisciplinary Sciences Journals

into Subject Categories

Hui Fang

State Key Laboratory of Analytical Chemistry for Life Science, School of Electronic Science and Engineering, Nanjing University, Nanjing 210023, China, <[email protected]>

Hui Fang received BSc in radio engineering (in 1990) and MSc degrees in signal processing (in 1993) from Southeast University in Nanjing, China, and the PhD degree in Electroanalytical Chemistry from Nanjing Uni-versity in Nanjing, China, in 1998. He is now an associate professor at the School of Electronic Science and Engineering, Nanjing University. His research interests include information processing, data mining, artificial intelligence, instruments and instrumentation, and bibliometrics.

Fang, Hui. Classifying Research Articles in Multidisciplinary Sciences Journals into Subject Catego-ries. Knowledge Organization. 42(3), 139-153. 35 references. Abstract: In the Thomson Reuters Web of Science database, the subject categories of a journal are applied to all articles in the journal. However, many articles in multidisciplinary Sciences journals may only be represented by a small number of subject categories. To provide more accurate information on the research areas of articles in such journals, we can classify articles in these journals into subject categories as defined by Web of Science based on their references. For an article in a multi- disciplinary sciences journal, the method counts the subject categories in all of the article’s references indexed by Web of Science, and uses the most numerous subject categories of the references to determine the most appropriate classification of the article. We used articles in an issue of Proceedings of the National Academy of Sciences (PNAS) to validate the correctness of the method by comparing the obtained re-sults with the categories of the articles as defined by PNAS and their content. This study shows that the method provides more precise search results for the subject category of interest in bibliometric investigations through recognition of articles in multidisciplinary sciences journals whose work relates to a particular subject category.

Received: 24 February 2015; Revised 23 April 2015; Accepted 1 May 2015

Keywords: subject categories, multidisciplinary sciences journals, classifying, PNAS, WoS

1.0 Introduction Web of Science (WoS) from Thomson Reuters is a con-venient platform for researchers to search for and retrieve articles related to their fields of interest. In addition to the article title, abstract and keywords, it provides other in-formation about the articles it has indexed, such as the au-thors and their affiliations, publication name, DOI num-ber, year published, financial support, research area, and WoS category. Based on these parameters, bibliometric re-searchers have investigated research activities (for exam-ple, Ardanuy et al. 2009; de la Moneda Corrochano et al. 2013; Diem and Wolter 2013; Moppett and Hardman 2011; Pinto et al. 2013; Wang et al. 2013), analyzed the re-

search trends in a specific field (for example, Chiu and Ho 2007; Grossi et al. 2003; Krampen et al. 2011), examined knowledge diffusion (Chen et al. 2009), and studied para-digm shifts (Marx and Bornmann 2010).

In WoS, a research article is represented by the subject categories of the journal in which it is published. An arti-cle in a peer-reviewed journal is classified into the subject categories of the journal by several experts, including its author(s), who select the journal for submission, and re-viewers and the editorial board of the journal, who accept the article for publication. WoS labels each journal with a maximum of six subject categories. Those journals that publish articles in more than six subject categories are la-beled and categorized as multidisciplinary sciences jour-

Page 4: KO KNOWLEDGE ORGANIZATION · Knowl. Org. 42(2015)No.3 KO KNOWLEDGE ORGANIZATION Official Journal of the International Society for Knowledge Organization ISSN 0943 – 7444

Knowl. Org. 42(2015)No.3

H. Fang. Classifying Research Articles in Multidisciplinary Sciences Journals into Subject Categories

140

nals; these include Science, Nature, and Proceedings of the Na-tional Academy of Sciences (PNAS), for example. The work behind each article in these journals usually relates to a limited number of subject categories, and does not en-compass many disciplines. Thus, labelling articles in these journals as multidisciplinary sciences is not accurate.

Bibliometric studies often investigate the distribution of a certain topic in each related subject category to de-termine that category’s contribution to the topic (for ex-ample, Aleixandre et al. 2013; Chen et al. 2014; Chiu and Ho 2007; Liu et al. 2012; Naqvi 2014; Porter and Youtie 2009; Yang et al. 2013; Wang et al. 2013; Zhuang et al. 2013). Those subject categories which contain many arti-cles published in multidisciplinary sciences journals will be underrepresented in such searches. For research areas that attract considerable attention among scientists all over the world, such as “Immunology,” many important articles are published in multidisciplinary sciences jour-nals because of the high influence of such journals.

This problem related to multidisciplinary sciences journals can be overcome by classifying individual articles published in such journals into subject categories. Glän-zel et al. (1999) used reference analysis to classify the sub-jects of papers published in multidisciplinary and general journals, and this method was also utilized to improve SCImago Journal & Country Rank subject classification (Gómez-Núñez et al. 2011). These previous studies have demonstrated the feasibility of the method. It remains a task to test the correctness of the method. That was the aim of present investigation, using a case study of articles in an issue of PNAS, which provides categories of the articles on its website.

The remainder of this article is organized as follows. Section 2 provides the background of this study. Section 3 presents the data used in this study and a detailed de-scription of the methodology for classifying articles in multidisciplinary sciences journals into subject categories according to their references. Section 4 presents and dis-cusses the results of this case study of the method under investigation. Conclusions and limitations are presented in the final sections. 2.0 Background Classifying articles in multidisciplinary sciences journals can also be achieved by clustering articles (this procedure is not limited to articles in such journals) based on citation relations (Griffith et al. 1974; Small and Griffith 1974; Small and Sweeney 1985; Small et al. 1985; Small 1998; Lewison 1999; Gouvea Meireles et al. 2014). Using this method, Klavans and Boyack (2010) assigned more than 5.5 million publications to over 84,000 research areas. Waltman and Van Eck (2012) recently clustered about 10

million articles from the period 2001–2010. The clustering method can create a classification system other than the subject categories of WoS. Clustering articles into subject categories may also adopt combinations of co-citation and co-word analysis (Braam et al. 1991). Incorporating natural language processing techniques allows co-word analysis to utilize further linguistic relations as the basis for clustering (Ibekwe-SanJuan et al. 2002).

To apply the clustering method for classifying articles into research areas, it is necessary to download information related to a large number of articles from WoS to establish their co-occurrence relations. However, most WoS users can download only up to 500 records at a time, which makes it very difficult to download information on a large number of articles. A more practical method for classifying articles in multidisciplinary sciences journals into subject categories is needed before the results of clustering a large number of articles can be readily available for ordinary us-ers who cannot conveniently download information related to a large number of articles.

Ordinary WoS users can categorize individual articles in multidisciplinary sciences journals indexed by WoS ac-cording to the articles’ references (Glänzel et al. 1999). This approach is effective in three main ways. First, the content of an article is related to the content of its refer-ences; the references of the article introduce the back-ground or area of applicability of the article or the tool or principle adopted by the article. Thus, an article has the same, similar, or related subject categories as its refer-ences (Glänzel et al. 1999; Gabel 2006). Second, the in-tersection of the subject categories of the references can reflect the subject categories of the article in which they are cited. One reference may cover several aspects, which correspond to different subject categories. However, the article in which that reference is cited relates to a subset of those aspects: they belong to a subset of the subject categories to which the research behind the citing article corresponds. For example, the journal Neurobiology of Ag-ing, which WoS labels with the subject categories “Geriat-rics & Gerontology” and “Neurosciences,” publishes the results of studies in which the primary emphasis involves the mechanisms of changes in the nervous system asso-ciated with age or age-related diseases. An article in the category “neurosciences” may cite articles (as its refer-ences) in Neurobiology of Aging because the former refers to the latter’s research on the mechanisms of the nervous system; an article in the category “geriatrics & gerontol-ogy” may cite articles (as its references) in Neurobiology of Aging because the former refers to the latter’s research into diseases associated with age. Suppose an article be-longing to only one subject category has two references: one reference is labelled with subject categories A, B, and C by the database; the other is labelled with A and D. It

Page 5: KO KNOWLEDGE ORGANIZATION · Knowl. Org. 42(2015)No.3 KO KNOWLEDGE ORGANIZATION Official Journal of the International Society for Knowledge Organization ISSN 0943 – 7444

Knowl. Org. 42(2015)No.3

H. Fang. Classifying Research Articles in Multidisciplinary Sciences Journals into Subject Categories

141

can then be inferred that the article belongs to subject category A. The subject categories B, C, and D of the two references may not be related to the subject matter of the citing article. Third, this method utilizes reliable classification of the references into the subject categories of publishing journals by experts (authors of the refer-ences, reviewers, and journal editorial boards), which is mentioned in the Introduction. 3.0 Data and methods 3.1 Data Data were obtained from WoS. Articles in the first issue of PNAS in 2014 were used as examples for classifica-tion into the subject categories defined by WoS. PNAS classifies its articles into three research fields: “physical sciences,” “social sciences,” and “biological sciences.” Each field has several subsidiary groups. For example, the subjects in “physical sciences” include: “applied mathe-matics,” “applied physical sciences,” “chemistry,” “earth, atmospheric, and planetary sciences,” “engineering,” “en-vironmental sciences,” and “physics.” Articles are listed below each subject heading on the PNAS website. The described method could be validated by comparing the obtained results with the categories of the articles as de-fined by PNAS (PNAS subject categories). As with other multidisciplinary sciences journals that provide subject in-formation about articles, such as Science and PLoS One, some PNAS subject categories have no counterparts in WoS. This means that the subject categories declared by authors may not be in accordance with those of WoS. For an article in such PNAS subject categories, we judged the WoS subject categories to which its content relates by reading the article. Except for “PNAS subject category,” each “subject category” in the sections below represents that defined by WoS. For convenience, we defined the ar-ticle subject category as the WoS subject category to which the article should be assigned according to its con-tent, and the recognized subject category as that to which the article was assigned by the method. 3.2 Classifying articles in multidisciplinary sciences journals into

subject categories WoS provides information on the references of every arti-cle it indexes. The references are also classified into subject categories by WoS. If one reference is labelled by

(from 1 to 6)n subject categories, then the reference is equally assigned to the n subject categories by 1/ n (Waltman 2012) because it is unclear which subject cate-gory it belongs to without further information (Bornmann 2014). Suppose an article has L references which are repre-

sented by N ( and L N are positive integers) subject cate-gories by WoS. Here we use a matrix L NS to represent the assignment of references to each subject category:

11 12 1

21 22 2

1 2

N

N

L L LN

s s ss s s

s s s

S , (1)

where

0 the -th reference is not labelled with the -th subject category

1the -th reference is labelled with the -th subject category ij

i

i js

i jn

,

and ni is the number of subject categories with which the i-th reference is labelled. sij can be regarded as the score of the j-th subject category given to the article from its i-th reference. The subject categories of each reference are taken as the subject categories of the journal in which it was published and can be obtained from Journal Citation Reports provided by WoS. The information on the refer-ences of the inspected article, such as the journals pub-lishing the references, can be extracted from the full re-cord (including cited references) of the WoS file for users to download.

Vector M was defined as representing the scores of each subject category obtained from all the references of an article:

T

1 2, , Nm m m M , (2)

where

1

, ( 1,2, , )L

j iji

m s j N

.

The order of mj is that of the subject categories in Equa-tion 1. The larger the mj, the higher is the possibility of the article belonging to the j-th subject category. Suppose,

max arg max ( 1,2, )jjj m j N (3)

And

max max ( 1,2, )jjm m j N . (3´)

Then, the article can be classified into the jmax-th subject category.

Page 6: KO KNOWLEDGE ORGANIZATION · Knowl. Org. 42(2015)No.3 KO KNOWLEDGE ORGANIZATION Official Journal of the International Society for Knowledge Organization ISSN 0943 – 7444

Knowl. Org. 42(2015)No.3

H. Fang. Classifying Research Articles in Multidisciplinary Sciences Journals into Subject Categories

142

Some articles may be related to more than one subject category. If

1jm of any j1 reaches or exceeds a preset

threshold (mth), then the article can also be classified to the j1-th subject category. The threshold is defined as:

th max /m m d , (4)

where d is a parameter to adjust threshold and thus the number of recognized subject categories, defined as the threshold factor.

An alternative method to determine the threshold is to set the threshold at some proportion of the total number of references (López-Illescas et al. 2009). Both methods assign an article to the subject categories in which it has more references than any other subject categories. Evalu-ating the relative merits of these two methods is beyond the scope of this case study: it demands further investiga-tion. 4.0 Results and discussion There were 114 articles in the first issue of PNAS in 2014. Seven were categorized with two subject headings in the journal. Those articles involved the subjects of physics, chemistry, life sciences and social sciences. Table 1 lists the titles of the 114 articles and their number which was used in Table 2. Table 2 presents the PNAS subject categories (SubjectPNAS1 and SubjectPNAS2) and the recognized subject categories (SCrecog1 to SCrecog6, ranked in descending order of score in Equation 2) for the 114 articles analysed. In the tabulated results, non-italics rep-resent the recognized subject categories when the thresh-old factor was 3. When the threshold factor was 4, addi-tional results were obtained and are represented by italics. The maximal score mmax in Equation 3, which is used to calculate the threshold mth in Equation 4, is selected from M (Equation 2), except for the category of “Multidisci-plinary Sciences,” which does not provide useful informa-tion about the subject of the references. If the highest score of an article indicated “Multidisciplinary Sciences,” then mmax was taken as the second highest category.

The PNAS subject categories and the WoS subject categories were not exactly the same. In Table 2, the PNAS subject categories “applied biological sciences,” “biochemistry,” “cell biology,” “ecology,” “immunology,” “microbiology,” “neuroscience,” “physiology,” “plant bi-ology,” and “psychological and cognitive sciences” had corresponding or similar WoS subject categories, and the articles related to these subjects were assigned to the arti-cle subject categories based on the subject categories of the references. The PNAS subject categories “applied physical sciences,” “chemistry,” “physics,” “earth, atmos-

pheric, and planetary sciences,” “medical sciences,” “so-cial sciences” each encompasses several WoS subject categories. The articles of these PNAS subject categories were classified appropriately to the article subject catego-ries by the method.

The recognized subject categories of some articles were not the same as the PNAS subject categories. The 15th article analyzed the interaction between plants and parasites. The recognized subject categories were “plant sciences,” “zoology,” “biochemistry & molecular biol-ogy” and “multidisciplinary sciences.” The object of the study was bananas, so it was appropriately labelled as “ag-ricultural sciences” by PNAS, which is its area of appli-cability. Most PNAS “anthropology” articles were cor-rectly categorized by our method. The exception was the 18th article which discussed the evolution of the human hand. It was reasonable that it was labelled as “anthro-pology” by PNAS or classified as “evolutionary biology” by our method. The PNAS subject category “genetics” was similar to the WoS subject category “genetics & he-redity.” Three of five PNAS “genetics” articles were cor-rectly assigned to the WoS counterpart. The other two were the 57th and 110th articles. The former discussed DNA molecules, so it is not surprising that it was as-signed to the WoS subject category “biochemistry & mo-lecular biology.” The 110th article investigated neuronal translational control and microRNA function. Our method classified it as “neurosciences” which was consis-tent with the object of the study.

No. Title of Article 1 High-resolution photoacoustic tomography of rest-

ing-state functional connectivity in the mouse brain 2 Theory of epithelial sheet morphology in three di-

mensions 3 Rationale and mechanism for the low photoinactiva-

tion rate of bacteria in plasma 4 Initial stages of calcium uptake and mineral deposi-

tion in sea urchin embryos 5 Platinum supported on titanium –ruthenium oxide is

a remarkably stable electrocatayst for hydrogen fuel cell vehicles

6 Re-Os geochronology and coupled Os-Sr isotope constraints on the Sturtian snowball Earth

7 Amphitheater-headed canyons formed by megaflooding at Malad Gorge, Idaho

8 Fe-vacancy order and superconductivity in tetragonal β-Fe1-xSe

9 Kondo conductance across the smallest spin 1/2 radical molecule

10 Avalanches mediate crystallization in a hard-sphere glass

11 Evidence supporting an intentional Neandertal burial at La Chapelle-aux-Saints

12 Critical slowing down as early warning for the onset and termination of depression

13 Media's role in broadcasting acute stress following the Boston Marathon bombings

Page 7: KO KNOWLEDGE ORGANIZATION · Knowl. Org. 42(2015)No.3 KO KNOWLEDGE ORGANIZATION Official Journal of the International Society for Knowledge Organization ISSN 0943 – 7444

Knowl. Org. 42(2015)No.3

H. Fang. Classifying Research Articles in Multidisciplinary Sciences Journals into Subject Categories

143

14 Dynamic pricing of network goods with boundedly rational consumers

15 Phenalenone-type phytoalexins mediate resistance of banana plants (Musa spp.) to the burrowing nema-tode Radopholus similis

16 Cultural assemblages show nested structure in hu-mans and chimpanzees but not orangutans

17 Earliest evidence for commensal processes of cat domestication

18 Early Pleistocene third metacarpal from Kenya and the evolution of modern human-like hand morphol-ogy

19 Production and stabilization of the trimeric influenza hemagglutinin stem domain for potentially broadly protective influenza vaccines

20 Rewiring yeast sugar transporter preference through modifying a conserved protein motif

21 Structure of a eukaryotic thiaminase I 22 Reaction-based fluorescent sensor for investigating

mobile Zn2+ in mitochondria of healthy versus can-cerous prostate cells

23 Quantum mechanical calculations suggest that lytic polysaccharide monooxygenases use a copper-oxyl, oxygen-rebound mechanism

24 Regulation of PTEN inhibition by the pleckstrin homology domain of P-REX2 during insulin signal-ing and glucose homeostasis

25 Biological role of prolyl 3-hydroxylation in type IV collagen

26 Circadian clock-dependent and -independent rhyth-mic proteomes implement distinct diurnal functions in mouse liver

27 Covalent EGFR inhibitor analysis reveals importance of reversible interactions to potency and mechanisms of drug resistance

28 Large effect of membrane tension on the fluid–solid phase transitions of two–component phosphatidyl-choline vesicles

29 Transmembrane allosteric coupling of the gates in a potassium channel

30 Designed amyloid fibers as materials for selective carbon dioxide capture

31 Aggregation-triggering segments of SOD1 fibril formation support a common pathway for familial and sporadic ALS

32 Automatic Classification of Cellular Expression by Nonlinear Stochastic Embedding (ACCENSE)

33 Morphological optimization for access to dual oxi-dants in biofilms

34 Structure of Est3 reveals a bimodal surface with dif-ferential roles in telomere replication

35 Measuring membrane protein stability under native conditions

36 Direct observation of a transient ternary complex during IκBα-mediated dissociation of NF-κB from DNA

37 Heteromerization of PIP aquaporins affects their in-trinsic permeability

38 Protein structural ensembles are revealed by redefin-ing X-ray electron density noise

39 Nucleolin is important for Epstein-Barr virus nuclear antigen 1-mediated episome binding, maintenance, and transcription

40 Celastrol increases glucocerebrosidase activity in Gaucher disease by modulating molecular chaper-ones

41 Identification of cancer initiating cells in K-Ras driven lung adenocarcinoma

42 Enhanced stability of Mcl1, a prosurvival Bcl2 rela-tive, blunts stress-induced apoptosis, causes male ste-rility, and promotes tumorigenesis

43 A mechanism for retromer endosomal coat complex assembly with cargo

44 Evaluation of intramitochondrial ATP levels identi-fies G0/G1 switch gene 2 as a positive regulator of oxidative phosphorylation

45 JMJD5 regulates PKM2 nuclear translocation and reprograms HIF-1α-mediated glucose metabolism

46 Tumor suppressor and deubiquitinase BAP1 pro-motes DNA double-strand break repair

47 miR-218 opposes a critical RTK-HIF pathway in mesenchymal glioblastoma

48 Emerging predictable features of replicated biologi-cal invasion fronts

49 Effects of genotypic and phenotypic variation on es-tablishment are important for conservation, invasion, and infection biology

50 Interannual variation in land-use intensity enhances grassland multidiversity

51 Drastic neofunctionalization associated with evolu-tion of the timezyme AANAT 500 Mya

52 Aphid amino acid transporter regulates glutamine supply to intracellular bacterial symbionts

53 Policing of reproduction by hidden threats in a co-operative mammal

54 A unique covalent bond in basement membrane is a primordial innovation for tissue evolution

55 PIWI proteins and PIWI-interacting RNAs function in Hydra somatic stem cells

56 Scan statistic-based analysis of exome sequencing data identifies FAN1 at 15q13.3 as a susceptibility gene for schizophrenia and autism

57 Quantitation of the DNA tethering effect in long-range DNA looping in vivo and in vitro using the Lac and λ repressors

58 Contribution of phenotypic heterogeneity to adap-tive antibiotic resistance

59 Ohnologs are overrepresented in pathogenic copy number mutations

60 IL-25 and type 2 innate lymphoid cells induce pul-monary fibrosis

61 Altered inactivation of commensal LPS due to acy-loxyacyl hydrolase deficiency in colonic dendritic cells impairs mucosal Th17 immunity

62 Dynamic control of β1 integrin adhesion by the plexinD1-sema3E axis

63 Salmonella exploits NLRP12-dependent innate im-mune signaling to suppress host defenses during in-fection

64 β-Catenin induces T-cell transformation by promot-ing genomic instability

65 An amphioxus RAG1-like DNA fragment encodes a functional central domain of vertebrate core RAG1

66 Alloreactive cytotoxic T cells provide means to deci-pher the immunopeptidome and reveal a plethora of tumor-associated self-epitopes

67 mTOR target NDRG1 confers MGMT-dependent resistance to alkylating chemotherapy

Page 8: KO KNOWLEDGE ORGANIZATION · Knowl. Org. 42(2015)No.3 KO KNOWLEDGE ORGANIZATION Official Journal of the International Society for Knowledge Organization ISSN 0943 – 7444

Knowl. Org. 42(2015)No.3

H. Fang. Classifying Research Articles in Multidisciplinary Sciences Journals into Subject Categories

144

68 Dual-modality gene reporter for in vivo imaging 69 Epstein-Barr Virus Nuclear Antigen 3C binds to

BATF/IRF4 or SPI1/IRF4 composite sites and re-cruits Sin3A to repress CDKN2A

70 Neisseria meningitidis NalP cleaves human comple-ment C3, facilitating degradation of C3b and survival in human serum

71 Identification of secreted bacterial proteins by non-canonical amino acid tagging

72 Mathematical modeling of primary succession of murine intestinal microbiota

73 A common solution to group 2 influenza virus neu-tralization

74 Human herpesvirus 6 (HHV-6) alters E2F1/Rb pathways and utilizes the E2F1 transcription factor to express viral genes

75 The antigen 43 structure reveals a molecular Velcro-like mechanism of autotransporter-mediated bacte-rial clumping

76 Local domains of motor cortical activity revealed by fiber-optic calcium recordings in behaving nonhu-man primates

77 Protein kinase LKB1 regulates polarized dendrite formation of adult hippocampal newborn neurons

78 Comparison of explicit and incidental learning strategies in memory-impaired patients

79 Representation of interval timing by temporally scal-able firing patterns in rat prefrontal cortex

80 Presynaptic mitochondrial morphology in monkey prefrontal cortex correlates with working memory and is improved with estrogen treatment

81 Bidirectional homeostatic plasticity induced by in-terneuron cell death and transplantation in vivo

82 Mechanisms underlying subunit independence in py-ramidal neuron dendrites

83 Tonic GABAA conductance bidirectionally controls interneuron firing pattern and synchronization in the CA3 hippocampal network

84 Neurofibrillary tangle-bearing neurons are function-ally integrated in cortical circuits in vivo

85 Cumulative latency advance underlies fast visual processing in desynchronized brain state

86 Optical control of trimeric P2X receptors and acid-sensing ion channels

87 Arabidopsis EDM2 promotes IBM1 distal polyade-nylation and regulates genome DNA methylation patterns

88 Overexpression of plasma membrane H+-ATPase in guard cells promotes light-induced stomatal opening and enhances plant growth

89 Data-poor management of African lion hunting us-ing a relative index of abundance

90 Growth feedback as a basis for persister bistability 91 Identification of key regulators for the migration and

invasion of rheumatoid synoviocytes through a sys-tems approach

92 Direct observation of single stationary-phase bacte-ria reveals a surprisingly long period of constant pro-tein production activity

93 Distinct kinetics of synaptic structural plasticity, memory formation, and memory decay in massed and spaced learning

94 Effective functional maturation of invariant natural killer T cells is constrained by negative selection and T-cell antigen receptor affinity

95 Microbial biogeography of wine grapes is condi-tioned by cultivar, vintage, and climate

96 SuperBiHelix method for predicting the pleiotropic ensemble of G-protein-coupled receptor conforma-tions

97 De novo selection of oncogenes 98 Pch2 is a hexameric ring ATPase that remodels the

chromosome axis protein Hop1 99 BK channel opening involves side-chain reorienta-

tion of multiple deep-pore residues 100 Cortical neural populations can guide behavior by in-

tegrating inputs linearly, independent of synchrony 101 Translational dynamics revealed by genome-wide

profiling of ribosome footprints in Arabidopsis 102 Structural and biochemical basis for the inhibition of

cell death by APIP, a methionine salvage enzyme 103 AFF1 is a ubiquitous P-TEFb partner to enable Tat

extraction of P-TEFb from 7SK snRNP and forma-tion of SECs for HIV transactivation

104 Dual role for Islet-1 in promoting striatonigral and repressing striatopallidal genetic programs to specify striatonigral cell identity

105 Cooperative assembly of IFI16 filaments on dsDNA provides insights into host defense strategy

106 Parkinson-related LRRK2 mutation R1441C/G/H impairs PKA phosphorylation of LRRK2 and dis-rupts its interaction with 14-3-3

107 Ghrelin triggers the synaptic incorporation of AMPA receptors in the hippocampus

108 Origins of R2* orientation dependence in gray and white matter

109 Cytoglobin modulates myogenic progenitor cell vi-ability and muscle regeneration

110 FMRP and Ataxin-2 function together in long-term olfactory habituation and neuronal translational con-trol

111 A Cdc42- and Rac-interactive binding (CRIB) do-main mediates functions of coronin

112 Distinct cerebellar engrams in short-term and long-term motor learning

113 Interplay of mevalonate and Hippo pathways regu-lates RHAMM transcription via YAP to modulate breast cancer cell motility

114 Trapping of naive lymphocytes triggers rapid growth and remodeling of the fibroblast network in reactive murine lymph nodes

Table 1. Articles in the first issue of PNAS in 2014. There was no corresponding WoS subject category of the PNAS subject category “sustainability science.” The 89th article was labelled as such by PNAS, and it discussed the sustainable management of terrestrial lion hunting. Our method assigned it to the WoS subject category “ecol-ogy” which is also suitable. There was also no WoS coun-terpart of the PNAS subject category “systems biology,” to which the 91st and 92nd articles belonged. “Systems biology” adopts a holistic approach to the study of com-plex biological systems. The 91st article studied fibro-

Page 9: KO KNOWLEDGE ORGANIZATION · Knowl. Org. 42(2015)No.3 KO KNOWLEDGE ORGANIZATION Official Journal of the International Society for Knowledge Organization ISSN 0943 – 7444

Knowl. Org. 42(2015)No.3

H. Fang. Classifying Research Articles in Multidisciplinary Sciences Journals into Subject Categories

145

blast-like synoviocytes of rheumatoid arthritis; and the 92nd article studied growth of bacteria. Our method categorized these as “rheumatology” and “microbiology,” respectively, which was an accurate reflection of their content.

In Table 2, there are five articles (51st to 55th) as-signed to the PNAS subject category “evolution.” Only the 53rd article was recognized as the WoS subject cate-gory “evolutionary biology.” The other four articles in-troduced their research background as “evolution,” and used methods of “biochemistry & molecular biology,” “cell biology,” “developmental biology,” “genetics & he-redity,” and “zoology” to investigate specific aspects of evolution. Our method correctly identified the related subject categories of these four articles from their refer-ences.

There were 12 articles solely labelled as “biophysics and computational biology” by PNAS (30th to 38th, 96th, 99th and 105th). Nine were identified as the WoS subject category “biochemistry & molecular biology,” not exactly similar to the PNAS subject category. The reason may be that there are differences in the definitions of the subject categories between PNAS and WoS. Some as-pects of physical chemistry or chemical physics related to biology are assigned to biophysics in PNAS, but to bio-chemistry in WoS. The 30th article was recognized as “chemistry, multidisciplinary” and multidisciplinary sci-ences as its references were published in journals cover-ing many disciplines. The research area of the 32nd arti-cle was “immunology” as reflected by its references, while it used technology of “biophysics and computa-tional biology.” There was a similar case with the 33rd ar-ticle’s research area, which according to the references was “microbiology.”

The method identified more subject categories than were labelled by PNAS for many articles. The additional recognized subject categories were relevant to the articles. For example, the PNAS subject category of article 66 was “immunology,” which was recognized by the method. This article studied the immunotherapy of cancer, and involved detection of HLA-A2-bound peptides from two leukaemia-associated differentiation antigens. Thus, it was also appropriately recognized as being in the subject cate-gories of “hematology” and “oncology.” One other rec-ognized subject category “biochemical research methods” correlated with the subject categories above. The other recognized subject category of multidisciplinary sciences indicated that it contained some references from multi- disciplinary sciences journals.

Among the 114 articles examined in this case study, we regarded 78 articles as having been almost perfectly clas-sified. Excluding “multidisciplinary sciences,” their rec-ognized subject category with the highest score was the

same as, belonged to, or covered their PNAS subject category (or its counterpart). If one of those articles had two PNAS subject categories, both categories were iden-tified. For d = 3, we identified PNAS subject categories for 22 articles but not as the first recognized subject cate-gory. We regard those articles as acceptably classified. For d = 4, 25 articles were acceptably classified. The accepta-bly classified articles included three articles (89th, 91st, and 92nd) whose PNAS subject category had no coun-terpart in WoS. Eleven articles had identified subject categories other than PNAS subject categories. Though those articles related to the identified subject categories, the adopted method failed to identify all aspects of the articles. All 11 problematically classified articles were in-terdisciplinary ones. However, other interdisciplinary arti-cles in the almost perfectly and acceptably classified groups were successfully classified.

4.1 Results with the recognized subject category of references in

multidisciplinary sciences journals As shown in Table 2, multidisciplinary sciences was the recognized subject category of 86 articles because they cited many references in these types of journals. Some even had multidisciplinary sciences in the first two recog-nized subject categories in terms of score. However, the subject category multidisciplinary sciences provided no useful information on article content in most cases. To further use the information from the references, we can modify the classification method above using the follow-ing steps: 1. Recognize the subject categories of the references of

articles which are published in multidisciplinary sci-ences journals with the method described in Equa-tions 1 to 4 (this is feasible because references in mul-tidisciplinary sciences journals are also articles);

2. Use the recognized subject categories of each refer-ence in multidisciplinary sciences journals obtained in Step 1 to replace the subject category multidisciplinary sciences of the reference;

3. Classify the article inspected using the method de-scribed in Equations 1 to 4 with the subject categories of references modified in Step 2.

For example, we selected 10 articles with multidisciplinary sciences in the top two recognized subject categories, and further refined the results as shown in Table 3 (threshold factor of 3) and Table 4 (threshold factor of 4).

Comparing Tables 3 and 4 with Table 2, the recog-nized subject categories of more than half the articles by the modified method in this section were more focused than those determined by the method without modifica-

Page 10: KO KNOWLEDGE ORGANIZATION · Knowl. Org. 42(2015)No.3 KO KNOWLEDGE ORGANIZATION Official Journal of the International Society for Knowledge Organization ISSN 0943 – 7444

Knowl. Org. 42(2015)No.3

H. Fang. Classifying Research Articles in Multidisciplinary Sciences Journals into Subject Categories

146

No.

Su

bjec

t pna

s1 a,

b

Subj

ect p

nas2

a, b

Sc

reco

g1 a,

c

Scre

cog2

a, c

Scre

cog3

a, c

Sc

reco

g4 a,

c

Scre

cog5

a, c

Sc

reco

g6 a,

c

1 A

ppPh

ySci

ne

uros

cien

ce

neur

osci

ence

s (1

8.7)

rn

mm

i (8.

87)

mul

tisci

(8)

optic

s (5)

2

App

PhyS

ci

m

ultis

ci (1

0)

cell

biol

ogy

(8.5

) bi

ophy

sics

(3.5

) bi

omol

bio

(3)

deve

lbio

(2.5

) ph

ymul

(2.2

5)

3 C

hem

istr

y bi

oche

mis

try

hem

atol

ogy

(7.5

) bi

omol

bio

(4.1

7)

biop

hysi

cs (4

.2)

chem

mul

(3)

mic

robi

olog

y (3

) ch

emph

y (2

.67)

4

Che

mis

try

bioc

ombi

o m

ultis

ci (7

) de

velb

io (4

.5)

zool

ogy

(3)

chem

mul

(2.5

) ce

ll bi

olog

y (1

.83)

m

atsc

imul

(1.5

) 5

Che

mis

try

el

ectr

oche

mis

try

(11.

8)

chem

phy

(5.5

) m

ultis

ci (3

.5)

mat

scicf

(3)

6 E

APS

ci

ge

omul

(27.

3)

geog

eo (1

5.5)

ge

olog

y (9

.5)

mul

tisci

(9)

7 E

APS

ci

ge

omul

(14.

8)

geol

ogy

(7)

geog

eo (4

)

8 Ph

ysic

s

phym

ul (1

4)

phyc

onm

at (9

.12)

m

ultis

ci (9

) m

atsc

imul

(4.3

)

9

Phys

ics

ph

ymul

(12)

ph

ycon

mat

(11.

1)

mul

tisci

(10)

ch

emph

y (4

.5)

10

Phys

ics

ph

ymul

(19.

3)

phyc

onm

at (7

.17)

pa

mc (

6)

11

A

nthr

opol

-og

y

anth

ropo

logy

(7)

geom

ul (5

.5)

mul

tisci

(4)

evob

io (2

.3)

12

PsyC

ogSc

i

psyc

hiat

ry (1

1.5)

m

ultis

ci (8

) ps

yclin

(5.5

) ps

ycho

logy

(4.5

) ps

yexp

(3)

ecol

ogy

(3)

13

PsyC

ogSc

i

psyc

hiat

ry (7

.5)

med

geni

nt (5

) ps

ymul

(3)

psys

oc (2

.5)

psyc

holog

y (2

.33)

pe

oh (2

) 14

So

cial

Sci

-en

ces

ec

onom

ics

(9.5

)

15

Agr

iSci

plan

t sci

ence

s (9

.17)

zo

olog

y (5

) bi

omol

bio

(4)

mul

tisci

(3)

16

Ant

hrop

ol-

ogy

ev

obio

(6.6

7)

ecol

ogy

(6)

mul

tisci

(5)

anth

ropo

logy

(5)

biol

ogy

(4.3

3)

cell

biol

ogy

(3)

17

Ant

hrop

ol-

ogy

m

ultis

ci (1

1)

geom

ul (5

) ge

ogeo

(2)

anth

ropo

logy

(2)

biol

ogy

(1.3

3)

18

A

nthr

opol

-og

y

evob

io (1

9)

mul

tisci

(10)

19

A

ppB

ioSc

i

mul

tisci

(7)

bioa

ppm

ic (6

.83)

bi

omol

bio

(3.3

) m

edre

sexp

(1.8

)

20

A

ppB

ioSc

i

bioa

ppm

ic (1

9.3)

m

icro

biol

ogy

(8.4

2)

biom

olbi

o (8

.1)

21

B

ioch

emis

try

bi

omol

bio

(22.

8)

22

B

ioch

emis

try

ch

emm

ul (2

2)

biom

olbi

o (8

) on

colog

y (5

.5)

23

B

ioch

emis

try

ch

emm

ul (1

6)

biom

olbi

o (9

.67)

m

ultis

ci (5

)

24

Bio

chem

istr

y

biom

olbi

o (8

.5)

cell

biol

ogy

(7.5

) m

ultis

ci (4

)

25

Bio

chem

istr

y

biom

olbi

o (7

) he

mat

olog

y (4

.83)

pe

rvas

dis

(3.3

) on

colo

gy (3

) de

velb

io (2

) ge

nher

(2)

26

Bio

chem

istr

y

biom

olbi

o (1

2)

cell

biol

ogy

(10.

8)

genh

er (6

.8)

mul

tisci

(5)

biol

ogy

(3.5

)

27

Bio

chem

istr

y

biom

olbi

o (1

0.4)

on

colo

gy (9

.25)

ph

amph

am (3

.2)

chem

med

(3.2

)

28

A

ppPh

ySci

bi

ocom

bio

biop

hysi

cs (1

2.8)

bi

omol

bio

(6.3

3)

chem

phy

(6.1

)

29

Che

mis

try

bioc

ombi

o m

ultis

ci (1

3)

biom

olbi

o (1

1.8)

bi

ophy

sics

(4)

spec

tros

copy

(3.7

) ch

emm

ul (3

) ce

ll bi

olog

y (3

) 30

B

ioC

omB

io

ch

emm

ul (1

2.8)

m

ultis

ci (8

)

31

B

ioC

omB

io

bi

omol

bio

(8)

mul

tisci

(7)

neur

osci

ence

s (4

) ge

nher

(3)

Page 11: KO KNOWLEDGE ORGANIZATION · Knowl. Org. 42(2015)No.3 KO KNOWLEDGE ORGANIZATION Official Journal of the International Society for Knowledge Organization ISSN 0943 – 7444

Knowl. Org. 42(2015)No.3

H. Fang. Classifying Research Articles in Multidisciplinary Sciences Journals into Subject Categories

147

No.

Su

bjec

t pna

s1 a,

b

Subj

ect p

nas2

a, b

Sc

reco

g1 a,

c

Scre

cog2

a, c

Scre

cog3

a, c

Sc

reco

g4 a,

c

Scre

cog5

a, c

Sc

reco

g6 a,

c

32

Bio

Com

Bio

imm

unol

ogy

(8.5

)

33

Bio

Com

Bio

mul

tisci

(13)

m

icro

biol

ogy

(12)

bi

oapp

mic

(4.5

)

34

Bio

Com

Bio

biom

olbi

o (1

4)

mul

tisci

(6)

genh

er (3

.5)

35

B

ioC

omB

io

bi

omol

bio

(9.3

3)

mul

tisci

(3)

36

Bio

Com

Bio

biom

olbi

o (1

4.5)

m

ultis

ci (1

0)

cell

biol

ogy

(8.8

)

37

Bio

Com

Bio

plan

t sci

ence

s (2

0.3)

bi

omol

bio

(11.

3)

mul

tisci

(8)

cell

biol

ogy

(7.8

)

38

B

ioC

omB

io

bi

omol

bio

(9.6

7)

mul

tisci

(5)

crys

tallo

grap

hy (4

.5)

bior

esm

eth

(4.5

) bi

ophy

sics

(3.6

7)

cell

biol

ogy

(2.6

7)

39

Cel

l Bio

logy

viro

logy

(11.

7)

biom

olbi

o (1

1)

cell

biol

ogy

(7.5

) m

ultis

ci (3

)

40

C

ell B

iolo

gy

bi

omol

bio

(9.7

5)

mul

tisci

(9)

onco

logy

(7.8

) ce

ll bi

olog

y (6

.3)

genh

er (4

) he

mat

olog

y (3

.5)

41

Cel

l Bio

logy

onco

logy

(5.2

) ce

ll bi

olog

y (4

.03)

m

ultis

ci (3

) bi

omol

bio

(1.5

) ge

rher

(1.3

3)

42

C

ell B

iolo

gy

m

ultis

ci (1

1)

cell

biol

ogy

(9.5

) bi

omol

bio

(5.3

) ge

nher

(2.7

)

43

C

ell B

iolo

gy

ce

ll bi

olog

y (2

2.2)

44

Cel

l Bio

logy

biom

olbi

o (9

) ce

ll bi

olog

y (7

.5)

mul

tisci

(6)

45

C

ell B

iolo

gy

m

ultis

ci (6

) bi

omol

bio

(6)

cell

biol

ogy

(5)

onco

logy

(1.5

)

46

C

ell B

iolo

gy

m

ultis

ci (9

) ce

ll bi

olog

y (8

.75)

bi

omol

bio

(7.8

) on

colo

gy (4

.3)

47

Cel

l Bio

logy

onco

logy

(15.

8)

cell

biol

ogy

(8.9

2)

biom

olbi

o (4

.6)

48

Ph

ysic

s ec

olog

y m

ultis

ci (1

3)

ecol

ogy

(8.6

7)

phym

ul (3

) ge

nher

(2.5

) ev

obio

(2.5

) bi

olog

y (2

.33)

49

E

colo

gy

ec

olog

y (2

3.4)

m

ultis

ci (1

1)

evob

io (8

.6)

50

E

colo

gy

ec

olog

y (1

2.8)

m

ultis

ci (1

0)

biol

ogy

(3.3

)

51

Evo

lutio

n

biom

olbi

o (1

4.5)

ge

nher

(5.8

3)

biol

ogy

(4)

cell

biol

ogy

(3.7

)

52

E

volu

tion

m

ultis

ci (1

0)

mic

robi

olog

y (5

) bi

omol

bio

(3.8

) en

tom

olgy

(2)

bioa

ppm

ic (2

) bi

olog

y (1

.83)

53

E

volu

tion

m

ultis

ci (9

) ec

olog

y (9

) zo

olog

y (8

.7)

behs

ci (7

.2)

evob

io (5

.33)

bi

olog

y (3

.67)

54

E

volu

tion

m

ultis

ci (8

) bi

omol

bio

(7.8

3)

cell

biol

ogy

(7.2

) de

velb

io (3

.3)

med

geni

nt (3

) bi

ores

met

h (3

) 55

E

volu

tion

de

velb

io (1

4.9)

ce

ll bi

olog

y (8

.25)

bi

omol

bio

(7.4

) m

ultis

ci (7

)

56

G

enet

ics

ge

nher

(12.

3)

neur

osci

ence

s (6

) m

ultis

ci (5

) bi

omol

bio

(4.8

)

57

G

enet

ics

bi

omol

bio

(18)

m

ultis

ci (7

)

58

G

enet

ics

m

icro

biol

ogy

(26.

2)

genh

er (9

.83)

59

G

enet

ics

ge

nher

(33.

8)

mul

tisci

(11)

60

Im

mun

olog

y

imm

unol

ogy

(21.

5)

61

Im

mun

olog

y

imm

unol

ogy

(18.

2)

mul

tisci

(9)

62

Imm

unol

ogy

im

mun

olog

y (2

1.5)

m

ultis

ci (1

1)

cell

biol

ogy

(7.5

)

63

Imm

unol

ogy

im

mun

olog

y (1

4.5)

m

ultis

ci (8

) bi

omol

bio

(4)

64

Im

mun

olog

y

mul

tisci

(11)

im

mun

olog

y (1

0)

biom

olbi

o (8

.5)

cell

biol

ogy

(6.7

) he

mat

olog

y (5

.5)

onco

logy

(4.5

) 65

Im

mun

olog

y

biom

olbi

o (9

) im

mun

olog

y (6

.5)

mul

tisci

(5)

cell

biol

ogy

(3.3

)

66

Im

mun

olog

y

imm

unol

ogy

(12)

he

mat

olog

y (9

) on

colo

gy (6

) m

ultis

ci (3

) bi

ores

meth

(3)

67

M

edSc

i

onco

logy

(11.

8)

biom

olbi

o (4

.08)

68

M

edSc

i

rnm

mi (

7)

bior

esm

eth

(3)

bioa

ppm

ic (2

.3)

biom

olbi

o (2

.3)

Page 12: KO KNOWLEDGE ORGANIZATION · Knowl. Org. 42(2015)No.3 KO KNOWLEDGE ORGANIZATION Official Journal of the International Society for Knowledge Organization ISSN 0943 – 7444

Knowl. Org. 42(2015)No.3

H. Fang. Classifying Research Articles in Multidisciplinary Sciences Journals into Subject Categories

148

No.

Su

bjec

t pna

s1 a,

b

Subj

ect p

nas2

a, b

Sc

reco

g1 a,

c

Scre

cog2

a, c

Scre

cog3

a, c

Sc

reco

g4 a,

c

Scre

cog5

a, c

Sc

reco

g6 a,

c

69

Mic

robi

olog

y

viro

logy

(17.

3)

mul

tisci

(10)

70

M

icro

biol

ogy

m

icro

biol

ogy

(11)

im

mun

olog

y (9

.83)

bi

omol

bio

(5.8

) in

fecd

is (3

.8)

mul

tisci

(3)

71

M

icro

biol

ogy

m

icro

biol

ogy

(15.

8)

biom

olbi

o (8

.75)

bi

ores

met

h (6

) m

ultis

ci (4

)

72

M

icro

biol

ogy

m

ultis

ci (8

) m

icro

biol

ogy

(4)

biom

olbi

o (4

) m

icro

biol

ogy

(3.7

) bi

oapp

mic

(3.6

7)

ecol

ogy

(3)

73

Mic

robi

olog

y

mul

tisci

(15)

vi

rolo

gy (5

.67)

im

mun

olog

y (4

.8)

biom

olbi

o (3

.3)

med

rese

xp (2

.83)

m

icrob

iolog

y (1

.5)

74

Mic

robi

olog

y

viro

logy

(9.8

3)

mul

tisci

(4)

cell

biol

ogy

(3.8

) bi

omol

bio

(3.8

)

75

M

icro

biol

ogy

m

icro

biol

ogy

(15)

bi

omol

bio

(13.

6)

mul

tisci

(6)

76

N

euro

scie

nce

ne

uros

cien

ces

(15)

m

ultis

ci (8

) ph

ysio

logy

(4)

77

N

euro

scie

nce

ce

ll bi

olog

y (1

3)

mul

tisci

(13)

bi

omol

bio

(9)

neur

osci

ence

s (8

)

78

N

euro

scie

nce

ne

uros

cien

ces

(4.8

3)

mul

tisci

(4)

psyc

holo

gy (3

) be

hsci

(1.5

)

79

N

euro

scie

nce

ne

uros

cien

ces

(28.

3)

80

N

euro

scie

nce

ne

uros

cien

ces

(25.

2)

mul

tisci

(7)

81

Neu

rosc

ienc

e

neur

osci

ence

s (3

5.5)

m

ultis

ci (1

5)

82

Neu

rosc

ienc

e

neur

osci

ence

s (4

2.5)

m

ultis

ci (1

7)

83

Neu

rosc

ienc

e

neur

osci

ence

s (1

7.5)

m

ultis

ci (5

)

84

N

euro

scie

nce

ne

uros

cien

ces

(16.

3)

mul

tisci

(6)

85

Neu

rosc

ienc

e

neur

osci

ence

s (3

0)

mul

tisci

(10)

86

Ph

ysio

logy

mul

tisci

(14)

ne

uros

cien

ces

(10.

5)

biom

olbi

o (7

.5)

chem

mul

(5)

phys

iolo

gy (4

.5)

87

Pl

ant B

iolo

gy

m

ultis

ci (1

1)

cell

biol

ogy

(8.3

3)

genh

er (6

.5)

biom

olbi

o (6

.3)

plan

t sci

ence

s (6

)

88

Pla

nt B

iolo

gy

pl

ant s

cien

ces

(16.

3)

mul

tisci

(13)

ce

ll bi

olog

y (4

.3)

89

Su

sSci

ecol

ogy

(4.6

7)

mul

tisci

(4)

fishe

ries

(3.7

) bi

ocon

(2.2

) en

vsci

(1.6

7)

evob

io (1

.5)

90

Phy

sics

sy

sbio

m

icro

biol

ogy

(10)

m

ultis

ci (5

)

91

Sy

sBio

rheu

mat

olog

y (1

2)

genh

er (3

)

92

Sy

sBio

mic

robi

olog

y (1

2.7)

m

ultis

ci (7

)

93

N

euro

scie

nce

ne

uros

cien

ces

(33)

94

Imm

unol

ogy

im

mun

olog

y (3

2)

95

M

icro

biol

ogy

m

icro

biol

ogy

(11.

3)

mul

tisci

(11)

bi

oapp

mic

(9.3

) fs

cite

c (8

.2)

plan

t sci

ence

s (4

) bi

ores

meth

(3.6

7)

96

Bio

Com

Bio

biom

olbi

o (8

.67)

m

ultis

ci (6

) bi

ophy

sics (

2.7)

97

Bio

chem

istr

y

biom

olbi

o (1

2.5)

vi

rolo

gy (9

) m

ultis

ci (9

)

98

Bio

chem

istr

y

biom

olbi

o (2

1.8)

ce

ll bi

olog

y (1

5.3)

ge

nher

(14)

m

ultis

ci (7

)

99

B

ioC

omB

io

ph

ysio

logy

(22)

m

ultis

ci (1

5)

neur

osci

ence

s (1

3)

biom

olbi

o (6

)

10

0 N

euro

scie

nce

ne

uros

cien

ces

(27.

5)

mul

tisci

(18)

10

1 P

lant

Bio

logy

plan

t sci

ence

s (1

8.8)

bi

omol

bio

(16.

3)

mul

tisci

(13)

102

Bio

chem

istr

y

biom

olbi

o (1

0.5)

on

colo

gy (3

.75)

bi

ores

met

h (3

.7)

mul

tisci

(3)

103

Bio

chem

istr

y

biom

olbi

o (1

3.5)

ce

ll bi

olog

y (6

.83)

m

ultis

ci (6

)

104

Neu

rosc

ienc

e

neur

osci

ence

s (2

1.5)

105

Bio

Com

Bio

imm

unol

ogy

(12)

m

ultis

ci (1

1)

biom

olbi

o (7

.7)

viro

logy

(5.3

) ce

ll bi

olog

y (3

.67)

Page 13: KO KNOWLEDGE ORGANIZATION · Knowl. Org. 42(2015)No.3 KO KNOWLEDGE ORGANIZATION Official Journal of the International Society for Knowledge Organization ISSN 0943 – 7444

Knowl. Org. 42(2015)No.3

H. Fang. Classifying Research Articles in Multidisciplinary Sciences Journals into Subject Categories

149

No.

Su

bjec

t pna

s1 a,

b

Subj

ect p

nas2

a, b

Sc

reco

g1 a,

c

Scre

cog2

a, c

Scre

cog3

a, c

Sc

reco

g4 a,

c

Scre

cog5

a, c

Sc

reco

g6 a,

c

106

Bio

chem

istr

y

biom

olbi

o (2

1.7)

m

ultis

ci (9

) ce

ll bi

olog

y (6

.2)

neur

oscie

nces

(5.5

)

10

7 N

euro

scie

nce

ne

uros

cien

ces

(31.

3)

mul

tisci

(11)

10

8 N

euro

scie

nce

rn

mm

i (14

.1)

neur

oscie

nces

(4.1

7)

109

Med

Sci

bi

omol

bio

(19.

8)

cell

biol

ogy

(15.

3)

mul

tisci

(9)

phys

iolog

y (6

.2)

110

Gen

etic

s

neur

osci

ence

s (3

3.1)

m

ultis

ci (1

7)

cell

biol

ogy

(16)

bi

omol

bio

(12)

11

1 B

ioch

emis

try

ce

ll bi

olog

y (2

0.5)

bi

omol

bio

(9.5

)

11

2 N

euro

scie

nce

ne

uros

cien

ces

(24.

8)

11

3 C

ell B

iolo

gy

on

colo

gy (1

3.5)

ce

ll bi

olog

y (8

.17)

bi

omol

bio

(7.5

) m

ultis

ci (6

)

11

4 Im

mun

olog

y

imm

unol

ogy

(38.

5)

Tabl

e 2.

Com

paris

on o

f th

e PN

AS

subj

ect

cate

gorie

s an

d th

e re

cogn

ized

sub

ject

cat

egor

ies

of a

rtic

les

in T

able

1. I

n th

e re

cogn

ized

res

ults

, non

-ital

ics

repr

esen

t th

e re

sults

whe

n d

= 3

in E

qua-

tion

4. W

hen

d =

4, a

dditi

onal

res

ults

wer

e ob

tain

ed a

nd a

re r

epre

sent

ed b

y ita

lics.

Not

e:

a. S

ubje

ctpn

as1

and

Subj

ect p

nas2

are

the

subj

ects

of

the

artic

le la

belle

d by

“PN

AS”

. SC

reco

g1, S

Cre

cog2

, SC

reco

g3, S

Cre

cog4

, SC

reco

g5, a

nd S

Cre

cog6

are

the

reco

gniz

ed s

ubje

ct c

ateg

orie

s of

the

artic

le.

b. A

bbre

viat

ion

of s

ome

PNA

S su

bjec

t cat

egor

ies:

A

griS

ci: a

gric

ultu

ral s

cien

ces;

A

ppB

ioSc

i: ap

plie

d bi

olog

ical

sci

ence

s;

App

PhyS

ci: a

pplie

d ph

ysic

al s

cien

ces;

B

ioC

omB

io: b

ioph

ysic

s an

d co

mpu

tatio

nal b

iolo

gy;

EA

PSci

: ear

th, a

tmos

pher

ic, a

nd p

lane

tary

sci

ence

s;

Med

Sci:

med

ical

sci

ence

s;

PsyC

ogSc

i: ps

ycho

logi

cal a

nd c

ogni

tive

scie

nces

; Sy

sBio

: sys

tem

s bi

olog

y;

SusS

ci: s

usta

inab

ility

sci

ence

. c.

abb

revi

atio

n of

som

e su

bjec

t cat

egor

ies

labe

lled

by w

os:

Beh

Sci:

beha

vior

al s

cien

ces;

B

ioA

ppM

ic: b

iote

chno

logy

& a

pplie

d m

icro

biol

ogy;

B

ioC

on: b

iodi

vers

ity c

onse

rvat

ion;

B

ioM

olB

io: b

ioch

emis

try

& m

olec

ular

bio

logy

; B

ioR

esM

eth:

bio

chem

ical

res

earc

h m

etho

ds;

Che

mM

ed: c

hem

istr

y, m

edic

inal

; C

hem

Mul

: che

mis

try,

mul

tidis

cipl

inar

y;

Che

mPh

y: c

hem

istr

y, ph

ysic

al;

Dev

elB

io: d

evel

opm

enta

l bio

logy

; E

nvSc

i: en

viro

nmen

tal s

cien

ces;

E

voB

io: e

volu

tiona

ry b

iolo

gy;

FSci

Tec:

foo

d sc

ienc

e &

tech

nolo

gy;

Gen

Her

: gen

etic

s &

her

edity

; G

eoG

eo: g

eoch

emis

try

& g

eoph

ysic

s;

Geo

Mul

: geo

scie

nces

, mul

tidis

cipl

inar

y;

Infe

cDis

: inf

ectio

us d

isea

ses;

M

atSc

iMul

: mat

eria

ls s

cien

ce, m

ultid

isci

plin

ary;

M

edG

enIn

t: m

edic

ine,

gen

eral

& in

tern

al;

Med

Res

Exp

: med

icin

e, r

esea

rch

& e

xper

imen

tal;

M

atSc

iCF

: mat

eria

ls s

cien

ce, c

oatin

gs &

film

s;

Mul

tiSci

: mul

tidis

cipl

inar

y sc

ienc

es;

PAM

C: p

hysi

cs, a

tom

ic, m

olec

ular

& c

hem

ical

; Ph

yCon

Mat

: phy

sics

, con

dens

ed m

atte

r;

PsyC

lin: p

sych

olog

y, cl

inic

al;

PEO

H: p

ublic

, env

ironm

enta

l & o

ccup

atio

nal h

ealth

; Ph

yMul

: phy

sics

, mul

tidis

cipl

inar

y;

Pha

mP

ham

: pha

rmac

olog

y &

pha

rmac

y;

PsyE

xp: p

sych

olog

y, ex

peri

men

tal;

Ps

yMul

: psy

chol

ogy,

mul

tidis

cipl

inar

y;

PsyS

oc: p

sych

olog

y, so

cial

; Pe

rVas

Dis

: per

iphe

ral v

ascu

lar

dise

ase;

R

NM

MI:

rad

iolo

gy, n

ucle

ar m

edic

ine

& m

edic

al im

agin

g.

Page 14: KO KNOWLEDGE ORGANIZATION · Knowl. Org. 42(2015)No.3 KO KNOWLEDGE ORGANIZATION Official Journal of the International Society for Knowledge Organization ISSN 0943 – 7444

Knowl. Org. 42(2015)No.3

H. Fang. Classifying Research Articles in Multidisciplinary Sciences Journals into Subject Categories

150

No.

Su

bjec

t PN

AS1

Su

bjec

t PN

AS2

SC

reco

g1

SCre

cog2

SC

reco

g3

SCre

cog4

SC

reco

g5

S Cre

cog6

2

appp

hysc

i

cell

biol

ogy

(12.

32)

deve

lbio

(4.4

4)

4 ch

emis

try

bioc

ombi

o de

velb

io (4

.57)

zo

olog

y (3

.30)

ch

emm

ul (3

.27)

m

ultis

ci (2

.23)

ce

ll bi

olog

y (2

.20)

m

atsc

imul

(1.8

1)

17

anth

ropo

logy

geom

ul (7

.51)

m

ultis

ci (4

.66)

an

thro

polo

gy

(2.8

5)

32

bioc

ombi

o

imm

unol

ogy

(9.2

2)

52

ev

olut

ion

bi

omol

bio

(7.0

1)

mic

robi

olog

y (6

.98)

ento

mol

gy (2

.86)

53

evol

utio

n

ecol

ogy

(10.

57)

zool

ogy

(9.5

8)

behs

ci (7

.88)

ev

obio

(6.2

0)

biol

ogy

(4.2

5)

54

ev

olut

ion

bi

omol

bio

(10.

76)

cell

biol

ogy

(9.1

9)

deve

lbio

(4.1

9)

77

ne

uros

cien

ce

ne

uros

cien

ces

(14.

24)

cell

biol

ogy

(14.

24)

biom

olbi

o (9

.74)

87

plan

t bio

logy

cell

biol

ogy

(10.

60)

biom

olbi

o (9

.25)

ge

nher

(8.1

2)

plan

t sci

ence

s (7

.01)

105

bioc

ombi

o

imm

unol

ogy

(15.

17)

biom

olbi

o (1

0.44

) vi

rolo

gy (6

.00)

Tabl

e 3. C

ompa

rison

of

the

PNA

S su

bjec

t cat

egor

ies

and

the

reco

gniz

ed s

ubje

ct c

ateg

orie

s of

10

artic

les

in T

able

1. T

he s

ubje

ct c

ateg

orie

s of

ref

eren

ces

in m

ultid

isci

plin

ary

scie

nces

jour

nals

w

ere

repl

aced

by

thei

r re

cogn

ized

sub

ject

cat

egor

ies

(d =

3).

Oth

ers

are

the

sam

e as

in T

able

2.

No.

Su

bjec

t PN

AS1

Su

bjec

t PN

AS2

SC

reco

g1

SCre

cog2

SC

reco

g3

SCre

cog4

SC

reco

g5

S Cre

cog6

2

appp

hysc

i

cell

biol

ogy

(11.

98)

deve

lbio

(4.4

0)

biop

hysic

s (3.

93)

biom

olbi

o (3

.65)

4

chem

istr

y bi

ocom

bio

deve

lbio

(4.5

6)

zool

ogy

(3.3

3)

chem

mul

(3.2

7)

cell

biol

ogy

(2.2

0)

mul

tisci

(2. 0

8)

mat

scim

ul (1

.75)

17

an

thro

polo

gy

ge

omul

(7.4

8)

mul

tisci

(4.6

3)

anth

ropo

logy

(2

.85)

ge

ogeo

(2.4

0)

32

bioc

ombi

o

imm

unol

ogy

(9.1

6)

52

ev

olut

ion

m

icro

biol

ogy

(7.0

4)

biom

olbi

o (6

.86)

en

tom

olgy

(2.8

5)

bioa

ppm

ic (2

.15)

bi

olog

y (1

.94)

m

ultis

ci (1

.93)

53

ev

olut

ion

ec

olog

y (1

0.67

) zo

olog

y (9

.52)

be

hsci

(7.9

6)

evob

io (6

.21)

bi

olog

y (4

.34)

54

evol

utio

n

biom

olbi

o (1

0.24

) ce

ll bi

olog

y (9

.18)

de

velb

io (4

.24)

m

edge

nint

(3.4

) bi

ores

met

h (3

)

77

neur

osci

ence

cell

biol

ogy

(14.

21)

neur

osci

ence

s (1

3.68

) bi

omol

bio

(9.8

6)

87

plan

t bio

logy

cell

biol

ogy

(10.

57)

biom

olbi

o (9

.21)

ge

nher

(8.1

8)

plan

t sci

ence

s (7

.08)

m

ultis

ci (3

.02)

105

bioc

ombi

o

imm

unol

ogy

(15.

11)

biom

olbi

o (1

0.44

) vi

rolo

gy (5

.88)

ce

ll bi

olog

y (4

.86)

Ta

ble 4

. Com

pari

son

of th

e PN

AS

subj

ect c

ateg

orie

s an

d th

e re

cogn

ized

sub

ject

cat

egor

ies

of 1

0 ar

ticle

s in

Tab

le 1

. The

sub

ject

cat

egor

ies

of r

efer

ence

s in

mul

tidis

cipl

inar

y sc

ienc

es jo

urna

ls

wer

e re

plac

ed b

y th

eir

reco

gniz

ed s

ubje

ct c

ateg

orie

s (d

= 4

). O

ther

s ar

e th

e sa

me

as in

Tab

le 2

.

Page 15: KO KNOWLEDGE ORGANIZATION · Knowl. Org. 42(2015)No.3 KO KNOWLEDGE ORGANIZATION Official Journal of the International Society for Knowledge Organization ISSN 0943 – 7444

Knowl. Org. 42(2015)No.3

H. Fang. Classifying Research Articles in Multidisciplinary Sciences Journals into Subject Categories

151

tion. The number of articles in Tables 3 and 4 with recog-nized subject categories of multidisciplinary sciences was also reduced, especially with a threshold factor of 3. In addition, for articles recognized as multidisciplinary sci-ences in Tables 3 and 4, the ranking of multidisciplinary sciences in the recognized subject categories was lowered. For article 2, the ranking of “developmental biology,” which is its area of application, climbed to 2 in Tables 3 and 4 from 5 in Table 2. The authors of the article used a mechanical model to study shapes of epithelial cells and the bending and buckling of epithelial sheets during em-bryo development, thus the recognized subject category was appropriate. Table 4 included “biophysics” for article 2 which reflected the methodology used in the study and showed that lowering the threshold in Equation 4 can re-duce loss of an article subject category in the determina-tion of the classification. With the recognized subject category of the references in multidisciplinary sciences journals of article 17, the rank of the subject category “anthropology,” which is also the label used by PNAS, rose to be behind only “geosciences, multidisciplinary” and multidisciplinary sciences which cover many subject categories. Article 32 only had one subject category rec-ognized in Table 2. The subject category with the second highest score was multidisciplinary sciences which is not listed in Table 2 for article 32 because its score was less than a quarter of the highest one. After classifying the ref-erences in the multidisciplinary sciences journals, the score of multidisciplinary sciences decreased to one third of its previous value in Table 2 (=2); about one third of the previous score of multidisciplinary sciences was distrib-uted to the subject category with the highest score (Im-munology), and the remaining third was distributed to the other 10 subject categories. PNAS assigned article 77 to “neurosciences,” and in Table 2, the score of “neurosci-ences” was distinctly lower than the highest scoring cate-gory. However, in Tables 3 and 4, the score of “neurosci-ences” was highest or near the highest. The scores of the recognized subject categories of article 4 were distributed relatively evenly, even after its multidisciplinary sciences references were classified. The recognized subject catego-ries of some articles, such as article 4 when the threshold factor was 4, were more than 6. These articles can be as-signed to multidisciplinary sciences if the number of sub-ject categories is limited to 6. 5.0 Conclusions This study shows the possibility of more precise deter-mination of subject categories of articles in multidiscipli-nary sciences journals indexed by a documentation data-base (such as WoS) according to information from the references.

The subject category of articles in multidisciplinary sciences journals, which often publish high-quality arti-cles, is classified by simply counting the subject categories of the journals in which the references are published. For articles in the PNAS subject categories “applied biologi-cal sciences,” “biochemistry,” “cell biology,” “ecology,” “immunology,” “microbiology,” “neuroscience,” “physi-ology,” “plant biology,” “psychological and cognitive sci-ences,” “applied physical sciences,” “chemistry,” “phys-ics,” “earth, atmospheric, and planetary sciences,” “medi-cal sciences,” and “social sciences” in the analyzed issue of PNAS, the method correctly recognized their subject categories. The recognized subject categories of some ar-ticles in PNAS differed from the PNAS subject catego-ries because of recognition of areas of application and research methods used. The PNAS subject categories and the recognized subject categories for these articles re-flected different aspects of the articles as a consequence of knowledge diffusion (Chen et al. 2009). The recog-nized subject categories determined by our method dif-fered from the subjects of the articles labelled by PNAS as they recognized other aspects of the articles.

In this study, we adopted six subject categories to clas-sify articles, as is done by WoS for journals. Among the 25 acceptably classified articles, 4 had PNAS subject categories that were identified as SCrecog4, 8 as SCrecog3, and 13 as SCrecog2. It would appear that using more than four subject categories to classify an article is meaning-less. In future studies, we will apply this method to arti-cles published in general journals (such as “chemistry, multidisciplinary” and “physics, multidisciplinary”) and in journals of multiple subject categories.

The threshold factor adjusted the number of recog-nized subject categories. If it was small, the condition for a categorization as a subject other than the one with the highest score recognized was strict. This meant that the possibility of the article being assigned to recognized sub-ject categories other than the article subject categories was small. However, some of the article subject categories may be lost in the result. If the threshold factor is large or the threshold is low, the possibility of the article subject categories being lost will be reduced, but some subject categories less relevant to the article through correlation with the article subject categories will be recognized.

The method can exclude irrelevant subject categories for most articles. For example, if one is interested in “neu-rosciences” articles in multidisciplinary sciences journals, he/she can ignore articles whose recognized subject cate-gories do not include “neurosciences,” such as articles on “physics, condensed matter.” This case study also showed that if subject categories of the references in the multidis-ciplinary sciences journals are identified and used to re-place the subject category multidisciplinary sciences of

Page 16: KO KNOWLEDGE ORGANIZATION · Knowl. Org. 42(2015)No.3 KO KNOWLEDGE ORGANIZATION Official Journal of the International Society for Knowledge Organization ISSN 0943 – 7444

Knowl. Org. 42(2015)No.3

H. Fang. Classifying Research Articles in Multidisciplinary Sciences Journals into Subject Categories

152

such references, the recognized subject categories of the articles citing them would be more focused. 6.0 Limitation One key step of the method is to count the number of references indexed by the database for the inspected arti-cle. If the number of references indexed by the database is small, the outcome of the method may be unreliable because of a small sample effect. In view of this, the method can be used only when the number of references indexed by WoS is large enough, such as more than 20. References Aleixandre, Jose L., José L. Aleixandre-Tudó, Máxima Bo-

laños-Pizzaro and Rafael Aleixandre-Benavent. 2013. “Mapping the Scientific Research on Wine and Health (2001-2011).” Journal of Agricultural and Food Chemistry 61: 11871-80.

Ardanuy, Jordi, Cristóbal Urbano and Lluís Quintana. 2009. “A Citation Analysis of Catalan Literary Studies (1974-2003): Towards a Bibliometrics of Humanities Studies in Minority Languages.” Scientometrics 81: 347-66.

Bornmann, Lutz. 2014. “Assigning Publications to Multi-ple Subject Categories for Bibliometric Analysis, an Empirical Case Study Based on Percentiles.” Journal of Documentation 70: 52-61.

Braam, Robert R., Henk F. Moed and Anthony F. J. Van Raan. 1991. “Mapping of Science by Combined Co-Citation and Co-Word Analysis I. Structural Aspects.” Journal of the American Society for Information Science 42: 233-51.

Chen, Chaomei, Yue Chen, Mark Horowitz, Haiyan Hou, Zeyuan Liu and Donald Pellegrino. 2009. “Towards an Explanatory and Computational Theory of Scientific Discovery.” Journal of Informetrics 3: 191-209.

Chen, Huaqi, Yuehua Wan, Shuian Jiang and Yanxia Cheng. 2014. “Alzheimer’s Disease Research in the Fu-ture: Bibliometric Analysis of Cholinesterase Inhibi-tors from 1993 to 2012.” Scientometrics 98: 1865-77.

Chiu, Wen-Ta and Yuh-Shan Ho. 2007. “Bibliometric Analysis of Tsunami Research.” Scientometrics 73: 3-17.

De La Moneda Corrochano, Mercedes, Maria J. Lopez-Huertas and Evaristo Jimenez-Contreras. 2013. “Span-ish Research in Knowledge Organization (2002-2010).” Knowledge Organization 40: 28-41.

Diem, Andrea and Stefan C. Wolter. 2013. “The Use of Bibliometrics to Measure Research Performance in Education Sciences.” Research in Higher Education 54: 86-114.

Gabel, Jeff. 2006. “Improving Information Retrieval of Subjects Through Citation-Analysis.” Knowledge Organi-zation 33: 86-95.

Glänzel, W., A. Schubert and H. J. Czerwon. 1999. “An Item-By-Item Subject Classification of Papers Pub-lished in Multidisciplinary and General Journals Using Reference Analysis.” Scientometrics 44: 427-39.

Gómez-Núñez, Antonio J., Benjamín Vargas-Quesada, Félix De Moya-Anegón and Wolfgang Glänzel. 2011. “Improving Scimago Journal & Country Rank (SJR) Subject Classification through Reference Analysis.” Scientometrics 89: 741-58.

Gouvea Meireles, Magali Rezende, Valadares Cendón Cen-don and Paulo Eduardo Maciel De Almeida. 2014. “Bibliometric Knowledge Organization: A Domain Analytic Method Using Artificial Neural Networks.” Knowledge Organization 41: 145-59.

Griffith, Belver C., Henry. G. Small, Judith A. Stonehill and Sandra Dey. 1974. “The Structure of Scientific Literatures. II: Toward a Macro- and Microstructure for Science.” Science Studies 4: 339-65.

Grossi, F., O. Belvedere and R. Rosso. 2003. “Geography of Clinical Cancer Research Publications from 1995 to 1999.” European Journal of Cancer 39: 106-11.

Ibekwe-Sanjuan, Fidelia and Eric Sanjuan. 2002. “From Term Variants to Research Topics.” Knowledge Organiza-tion 29: 181-97.

Klavans, Richard and Kevin Boyack. 2010. “Toward an Objective, Reliable and Accurate Method for Measuring Research Leadership.” Scientometrics 82, no. 3: 539-53.

Krampen, Günter, Alexander Von Eye and Gabriel Schui. 2011. “Forecasting Trends of Development of Psy-chology from a Bibliometric Perspective.” Scientometrics 87: 687-94.

Lewison, G. 1999. “The Definition and Calibration of Biomedical Subfields.” Scientometrics 46: 529-37.

Liu Xingjian, F. Benjamin Zhan, Song Hong, Beibei Niu and Yaolin Liu. 2012. “A Bibliometric Study of Earth-quake Research: 1900–2010.” Scientometrics 92: 747-65.

López-Illescas, Carmen, Ed C. M. Noyons, Martijn S. Visser, Félix De Moya-Anegón and Henk F. Moed. 2009. “Expansion of Scientific Journal Categories Us-ing Reference Analysis: How Can It Be Done and Does It Make a Difference?” Scientometrics 79: 473-90.

Marx, Werner and Lutz Bornmann. 2010. “How Accu-rately Does Thomas Kuhn’s Model of Paradigm Change Describe the Transition from the Static View of the Universe to the Big Bang Theory in Cosmol-ogy? A Historical Reconstruction and Citation Analy-sis.” Scientometrics 84: 441-64.

Moppett, I. K. and J. G. Hardman. 2011. “Bibliometrics of Anaesthesia Researchers in the UK.” British Journal of Anaesthesia 107: 351-56.

Page 17: KO KNOWLEDGE ORGANIZATION · Knowl. Org. 42(2015)No.3 KO KNOWLEDGE ORGANIZATION Official Journal of the International Society for Knowledge Organization ISSN 0943 – 7444

Knowl. Org. 42(2015)No.3

H. Fang. Classifying Research Articles in Multidisciplinary Sciences Journals into Subject Categories

153

Naqvi, Shehbaz Husain. 2014. “Polymer Science Re-search in India During 1999–2012: A Scientometric Study Based on Science Citation Index-Expanded.” Science, Technology and Society 19: 95-108.

Pinto, María, María Isabel Escalona-Fernández and An-tonio Pulgarín. 2013. “Information Literacy in Social Sciences and Health Sciences: A Bibliometric Study (1974–2011).” Scientometrics 95: 1071-94.

Porter, Alan L. and Jan Youtie. 2009. “How Interdiscipli-nary Is Nanotechnology?” Journal of Nanoparticle Re-search 11: 1023-41.

Small, Henry and Belver C. Griffith. 1974. “The Struc-ture of Scientific Literatures. I: Identifying and Graph-ing Specialties.” Science Studies 4: 17-40.

Small, Henry and E. Sweeney. 1985. “Clustering the Sci-ence Citation Index Using Co-Citations. I. A Compari-son of Methods.” Scientometrics 7: 391-409.

Small, Henry, E. Sweeney and E. Greenlee. 1985. “Clus-tering the Science Citation Index Using Co-Citations. II. Mapping Science.” Scientometrics 8: 321-40.

Small, Henry. 1998. “A General Framework for Creating Large-Scale Maps of Science in Two Or Three Di-mensions: The Sciviz System.” Scientometrics 41: 125-33.

Waltman, Ludo and Nees Jan Van Eck. 2012. “A New Methodology for Constructing a Publication-Level Classification System of Science.” Journal of the Ameri-can Society for Information Science and Technology 63: 2378-92.

Waltman, Ludo. 2012. “An Empirical Analysis of the Use of Alphabetical Authorship in Scientific Publishing.” Journal of Informetrics 6: 700-11.

Wang, Haijun, Minyan Liu, Song Hong and Yanhua Zhuang. 2013. “A Historical Review and Bibliometric Analysis of GPS Research from 1991–2010.” Scien-tometrics 95: 35-44.

Yang, Lie, Zhulie Chen, Ting Liu, Zhe Gong, Yingjian Yu and Jia Wang. 2013. “Global Trends of Solid Waste Research from 1997 to 2011 by Using Bibliometric Analysis.” Scientometrics 96:133-46.

Zhuang, Yanhua, Xingjia Liu, Thuminh Nguyen, Qingqing He and Song Hong. 2013. “Global Remote Sensing Research Trends During 1991–2010: A Bibli-ometric Analysis.” Scientometrics 96: 203-19.

Page 18: KO KNOWLEDGE ORGANIZATION · Knowl. Org. 42(2015)No.3 KO KNOWLEDGE ORGANIZATION Official Journal of the International Society for Knowledge Organization ISSN 0943 – 7444

Knowl. Org. 42(2015)No.3

Ph. Hider. A Survey of the Coverage and Methodologies of Schemas and Vocabularies Used to Describe Information Resources

154

A Survey of the Coverage and Methodologies of Schemas and Vocabularies

Used to Describe Information Resources

Philip Hider

School of Information Studies, Charles Sturt University, Boorooma Street, Wagga Wagga, NSW 2678, Australia, <[email protected]>

Philip Hider has been Head of the School of Information Studies at Charles Sturt University in Australia since 2008. He holds a Master of Librarianship degree from the University of Wales, Aberystwyth, and a PhD from City University, London. He worked at the British Library from 1995-1997 and in Singapore from 1997-2003. He specialises in the area of knowledge organization, and served on the Australian Committee on Cataloguing from 2004-2009. He is a fellow of the Chartered Institute of Library and Information Professionals and an as-sociate member of the Australian Library and Information Association.

Hider, Philip. A Survey of the Coverage and Methodologies of Schemas and Vocabularies Used to De-scribe Information Resources. Knowledge Organization. 42(3), 154-163. 24 references. Abstract: Riley’s survey (2010) of metadata standards for cultural heritage collections represents a rare attempt

to classify such standards, in this case according to their domain, community, function and purpose. This paper reports on a survey of metadata standards with particular functions, i.e. those of schemas and vocabularies, but that have been published online for any domain or community (and not just those of the cultural heritage sector). In total, 53 schemas and 328 vocabularies were identified as within scope, and were classified according to their subject coverage and the type of warrant used in their reported development, i.e. resource, expert or user warrant, or a combination of these types. There was found to be a general correlation between the coverage of the sche-mas and vocabularies. Areas of underrepresentation would appear to be the humanities and the fine arts, and, in the case of schemas, also law, engineering, manufacturing and sport. Schemas would appear to be constructed more by consulting experts and considering end-users’ search behaviour; vocabularies, on the other hand, are developed more by considering the information resources themselves, or by combining a range of methods.

Received: 22 February 2015; Revised: 4 June 2015; Accepted: 29 June 2015

Keywords: warrant, vocabularies, schemas, standards, metadata, information resources

1.0 Introduction This paper was inspired by Riley’s colourful chart, shown in Figure 1, of the “metadata universe” as it existed in 2010. The chart provides a visualisation of relationships between the many metadata standards applied in the cul-tural heritage sector (so perhaps more of a metadata gal-axy than the entire universe). As the metadata universe continues to evolve, it is worth taking additional snap-shots. This paper reports on another snapshot, taken through a slightly different lens. Whereas Riley’s survey (2010) focused on standards relating to cultural heritage resources, this snapshot covers information resources more generally. Conversely, while Riley covered standards

for a wide range of purposes (preservation metadata, rights metadata, structural metadata, etc.), we are primarily interested here in those that pertain to what Riley calls “descriptive metadata,” or what sometimes is called “dis-covery metadata.” That is, the survey reported in this pa-per focuses on metadata standards intended to support in-formation access, or, to be more precise, information re-source access, as, like Riley, we are not concerned here with metadata in the more general, structured data sense.

Riley’s snapshot (2010) provides an overview of the domains, communities and functions, as well as purposes, which the various cultural heritage standards support. Again, we focus here specifically on two of the functions identified by Riley, i.e. those of structure standards (used

Page 19: KO KNOWLEDGE ORGANIZATION · Knowl. Org. 42(2015)No.3 KO KNOWLEDGE ORGANIZATION Official Journal of the International Society for Knowledge Organization ISSN 0943 – 7444

Knowl. Org. 42(2015)No.3

Ph. Hider. A Survey of the Coverage and Methodologies of Schemas and Vocabularies Used to Describe Information Resources

155

to describe information resources as a whole) and of con-trolled vocabularies (used to describe particular aspects of information resources). An example of the former would be the cataloguing code, Resource Description and Access, and of the latter, Library of Congress Subject Headings. In this pa-per, these standards shall be referred to as schemas and vocabularies respectively.

Although Riley’s chart offers a useful classification, it is perhaps aimed more at the practitioner, who needs to choose a suitable standard or set of standards—for use by a particular community, to serve a particular function, etc. The classification is relative, grouping standards into par-ticular categories, but not showing gaps, i.e. areas that lack standards. To obtain a picture of how metadata standards are distributed across all information domains and com-munities, they need to be viewed against the backdrop of the information universe, or at least against a proxy for it—the Dewey Decimal Classification has been used for this study. The respective coverage of the schemas and vocabu-laries identified for this paper can then be compared. Given the articulated nature of the two functions, with vo-cabularies feeding into schemas, one might expect a certain degree of correlation, although other factors may work against this.

As well as looking at the coverage of schemas and vo-cabularies, across domains and communities, this paper also considers their reported development methodology. Although the principles of the construction of controlled vocabularies have been expounded, and debated, for a very long time, details of how these principles are to be opera-tionalised are rarely provided in standard guides (at least not in the LIS field), and it is unclear which principles, if any, are emphasised in the development of actual vocabu-laries. In the case of schemas, even the principles are un-clear: there are some accounts of how particular schemas have been developed, but little discussion about how they should be developed. Again, the development methodol-

ogy of schemas and vocabularies will be compared using a broad taxonomy based on the concept of warrant. Whether the methodologies used were the best ones, how-ever, is a question beyond the scope of this paper. 2.0 Literature Review This section reports on the reviews of two related parts of the literature, namely that which focuses on metadata standards in general and that which focuses on their con-struction and development. An examination of the latter is necessary in order to construct a framework for the second element of the analysis being reported, for which no established framework presently exists. 2.1 Metadata standard surveys and aggregations Riley’s map of metadata standards appears to be, rather surprisingly, unique. Indeed, few compilations, or bibliog-raphies, of metadata standards have attempted any form of analysis of them; instead, professionals are supported by databases and lists, typically operating as registries. Ex-amples of registries, or registers, include: the Basel Register of Thesauri, Ontologies & Classifications (BARTOC; bar-toc.org); the Dublin Core Metadata Registry (http://dcmi. kc.tsukuba.ac.jp/dcregistry); and the Open Metadata Regis-try (http://metadataregistry.org). Established lists and da-tabases of vocabularies and related standards include: the American Society for Indexing List of Online Thesauri and Authority Files (www.asindexing.org/about-indexing/ thesauri/online-thesauri-and-authority-files); the Finnish Ontology Library Service (ONKI; https://onki.fi); Schema.org (https://schema.org); the Society of American Archivists Metadata Directory (http://www2.archivists. org/groups/metadata-and-digital-object-roundtable/meta data-directory); Taxonomy Warehouse (taxonomyware house.com); and Vocabulary Bank (http://culturegrid.

Figure 1. A visualization of the metadata universe (Riley 2010; http://www.dlib.indiana.edu/~jenlrile/metadatamap)

Page 20: KO KNOWLEDGE ORGANIZATION · Knowl. Org. 42(2015)No.3 KO KNOWLEDGE ORGANIZATION Official Journal of the International Society for Knowledge Organization ISSN 0943 – 7444

Knowl. Org. 42(2015)No.3

Ph. Hider. A Survey of the Coverage and Methodologies of Schemas and Vocabularies Used to Describe Information Resources

156

lexaurus.net). These compilations serve diverse communi-ties and include a wide range of standards, not all of which are knowledge organisation standards in the narrower sense of information resource description.

The literature about these compilations tends to focus on their operation (e.g. Baker et al., 2002; Johnston 2004), although this does include some discussion around their intellectual organisation: for instance, Souza, Tudhope and Almeida (2012) propose a taxonomy of knowledge organisation systems (KOS). Perhaps the most extensive survey of metadata standards, which includes discussion of their respective merits, is to be found in Zeng and Qin (2008). Although such texts break down standards in various ways, typically by function and field, they do not constitute what might be termed an environmental scan. That is, their aim is to describe and explain, rather than to analyse how the universe of metadata standards fits, or does not fit, together. There is no survey specifically con-cerned with the way standards have been developed. 2.2 Development methodology The concept of warrant was chosen as the lens through which to compare how the standards collected for this study had been developed. Kwaśnik (2010, 106) explains how a typology of warrant “provides us with a set of conceptual tools that can be used to understand, analyse, evaluate and design any knowledge-representation sys-tem.” Comparing the design of different systems can thus be carried out with reference to this core concept. Although other lenses could have been used for framing the comparison, warrant afforded for a relatively simple and straightforward typology that could be readily and consistently applied, without being associated with one particular kind of knowledge-representation system.

Beghtol’s identifies (1986) four types of semantic war-rant for bibliographic classification systems, that is, bases on which library classification schemes could be con-structed: literary, philosophical/scientific, educational and cultural. Although Howarth and Jansen (2014) have sug-gested that other types of warrant may also be applied, the four approaches identified by Beghtol appear promi-nently in guides to the development of controlled vo-cabularies, including, but not limited to, classification schemes. For example, Aitchison, Gilchrist and Bawden (2000) recommend collecting concepts, as well as terms, from: reference works and experts” experience and knowledge (i.e. educational and scientific warrant); the lit-erature (i.e. literary warrant); and, search logs and users’ experience and knowledge (i.e. a form of cultural war-rant). Theoretically, one can see how the typology sup-ports basic functions of controlled vocabularies: they need to a) represent users’ queries (cultural or user war-

rant); b) represent the resources available (literary or re-source warrant); and, c) help users, particularly inexpert ones, identify and articulate their information needs (edu-cational or scientific warrant).

The term warrant was first used by Hulme a century ago, when librarians debated whether bibliographic classi-fication schemes should be developed on an a priori, phi-losophical basis, or on an a posteriori basis, that is, accord-ing to what Hulme called “literary warrant” (Hulme 1911; 1912; Rodriguez 1984). Advocates of schemes based on a predetermined ontology assumed that libraries reflected, at least collectively, an approximation of knowledge at large. Hulme argued that the knowledge to be found” in books would be more accurately represented by examin-ing the books themselves, and constructing classes when they were warranted by the literature. The practical nature of this approach became increasingly apparent as collec-tions grew and the knowledge they conveyed both ex-panded and changed, and as subject description became somewhat more exhaustive and specific with the introduc-tion of subject headings. Concepts and terms were de-rived through a systematic examination of the item in hand. Although focused on textual matter originally, liter-ary warrant could be, and eventually was, applied to a wide range of information resources; I use the term resource warrant to represent the broader application.

The a priori approach was not altogether abandoned, however. Indeed, it enjoyed something of a revival through the work of Bliss, Ranganathan, the Classification Research Group, and others, in the middle decades of the twentieth century. Sayers (1967) characterises their ap-proach as “neo-classical;” it was underpinned by a view of knowledge that was less monolithic than that of the nine-teenth classificationists, but that was still essentially objec-tive. Knowledge was a cumulative, and mostly unilinear, product of science, or, in the case of the arts and humani-ties, an educational consensus (Bliss 1939). Thus it could be classified by consulting scientific and academic authori-ties. This appeal to authority is characteristic of the a pri-ori approach in general—the first principles must come from somewhere. Various authoritative sources, such as reference works and domain experts, are identified and utilised. The domain-specific classification schemes and thesauri compiled from the mid-twentieth century on-wards emphasised this form of warrant (many of the early ones were compiled by those affiliated with the Classifica-tion Research Group). For the purposes of this survey, the concepts of philosophical, scientific and educational warrant are collapsed under the superordinate “expert warrant.”

Beghtol’s fourth type of warrant, i.e. cultural, is harder to attribute to particular theorists; however, part of this warrant would appear to pertain to the end-users of vo-

Page 21: KO KNOWLEDGE ORGANIZATION · Knowl. Org. 42(2015)No.3 KO KNOWLEDGE ORGANIZATION Official Journal of the International Society for Knowledge Organization ISSN 0943 – 7444

Knowl. Org. 42(2015)No.3

Ph. Hider. A Survey of the Coverage and Methodologies of Schemas and Vocabularies Used to Describe Information Resources

157

cabularies, that is, to their own ways of thinking about, and describing, knowledge (Howarth and Jansen, 2014). The importance of the user perspective was increasingly recognised by vocabulary builders through the second half of the twentieth century, due in part to the increasing accommodation users were given to drive automated re-trieval systems. It also became easier to capture this per-spective, e.g. through transaction logs. Other methods based on this approach include surveying end-users di-rectly and analysing reference queries. Even the Classifica-tion Research Group recognised the value of what they termed enquiry warrant (Howarth and Jansen, 2014). Lan-caster (1986) contrasts user warrant and literary warrant, and regards the two as complementary. Similarly, the ISO standard, 25964-1, Thesauri for Information Retrieval (2011) advises that consideration is given both to the materials to be indexed and to what “the users want to search for.” Previously, Kim and Kim (1977) had questioned whether the two approaches do, in fact, yield particularly different results, and did not find significant differences in their study, but theirs was a rare dissenting voice.

Employing a combination of resource, expert and user warrant is typically advised in the guides to the construc-tion of thesauri and other controlled vocabularies, even if the way in which they should be combined is rarely clari-fied (Mai 2008). Often a list of sources is simply sug-gested. For instance, Broughton (2006) advises that key terms and concepts in a field be identified by consulting reference sources and expert opinion, and by examining samples of published literature and relevant collections. As noted above, Aitchison, Gilchrist and Bawden (2000) recommend a similar list. For the slightly different pur-pose of web information architecture, Morville and Rosenfeld (2007) advise that “labels” (for website menus, etc.) should be based on content analysis, and by consult-ing authors, users (e.g. through card-sorting and free-listing exercises, search log and/or tag analysis) and sub-ject experts.

The general implication of these recommendations is that concepts established by multiple approaches should be favoured, but in cases of conflict, little advice is forth-coming. Lancaster (1986, #) suggests that having users select terms from the literature “is probably the best ap-proach of all ... In this way, the requirements for literary and user warrant are satisfied in a single step.” This proposition remains untested, however; it is by no means clear that terms and concepts derived from a single source are not worthy of inclusion, nor that expert war-rant can be ignored.

Advice is even less forthcoming when it comes to the development of schemas. Warrant does not appear to have been discussed at all in this context. Typically, case studies report the use of a combination of methods,

without explicit justification. For example, Riley and Dalmau (2007) describe expert input and user studies in the construction of a schema for a digital sheet music collection. The methods reported are often similar to those employed by vocabulary builders, although com-parisons are not made. 3.0 Survey Design Those schemas and vocabularies that are freely accessible online were targeted for the survey. Following Hider (2012) and Miller (2011), a schema is defined here as a data struc-ture used to describe an information resource, whereas a vocabulary is defined as a set of controlled values used to describe a particular element of a schema, e.g. subject. Vo-cabularies may well indicate semantic relationships between values, but do not have to; examples of vocabularies in-clude thesauri, subject headings, taxonomies, classification schemes, and some ontologies. Standards about or for schemas and vocabularies (e.g. concerning their construc-tion or display) were not included in the sample. The schemas and vocabularies were searched for using relevant registries and directories, including those cited above, as well as standard search engines (e.g. Google). The search ceased when the two research associates, working semi-independently, had used up all their available hours for the project, which totalled 346. At this point, new finds were occurring quite infrequently—the sample is thus consid-ered to be representative (at the time of the search), though not exhaustive. Those schemas and vocabularies that were deemed to meet the following criteria constituted the sample for analysis: – Freely accessible on the Web; – Developed to describe information resources; – Published for use by external institutions; and, – Published in English (may also be published in other

languages). Different editions of a standard, that were presented as such, were regarded as a single standard with the latest edition being used in the analysis. Integrated sets of indi-vidual standards that were presented as such were treated as a single standard.

Assessing candidate standards against the above criteria, particularly (b) and (c), involved a fair amount of judge-ment. It should be noted that a large number of vocabular-ies were rejected because although they could be used to describe a particular aspect of information resources, this was not considered their primary purpose. For example, there are many classifications (e.g. of occupations, educa-tional fields, regions, etc.) developed by various statistical bureaux, and these can, and sometimes are, used in biblio-

Page 22: KO KNOWLEDGE ORGANIZATION · Knowl. Org. 42(2015)No.3 KO KNOWLEDGE ORGANIZATION Official Journal of the International Society for Knowledge Organization ISSN 0943 – 7444

Knowl. Org. 42(2015)No.3

Ph. Hider. A Survey of the Coverage and Methodologies of Schemas and Vocabularies Used to Describe Information Resources

158

graphic contexts, but they have been developed, and are maintained, primarily to describe entities other than infor-mation resources, such as people. (Information resources were nevertheless defined broadly, to include datasets as well as resources containing “information” or “knowl-edge.”) Thus the compilation differs from those such as BARTOC and ONKI, which do not make this distinction, as well as by those, such as Schema.org and Taxonomy Warehouse, that focus either on schemas or vocabularies. Ultimately, the compilation consisted of 53 schemas and 328 vocabularies; their details are published at http:// www.csu.edu.au/faculty/educat/sis/student-resources/ lists/information-organisation-element-sets and http:// www.csu.edu.au/faculty/educat/sis/student-resources/ lists/information-organisation-vocabularies. The size of the compilation is mid-range, in comparison with the metadata registries and directories found elsewhere on the Web; Riley’s map featured 105 standards in total.

Each schema and vocabulary, and any accompanying documentation, was examined carefully to ascertain its “coverage” and the (reported) methodology used for its development (if any). The coverage of schemas is defined here with reference to three common (perhaps fundamen-tal) ways of viewing information resources, i.e. their con-tent, carrier and application aspects. Concepts associated with these three aspects were, respectively, subject, form and audience. The last of these may be considered to loosely equate to Riley’s “community” variable (2010); ref-erence to “domain” was avoided, due to its ambiguity (Mai 2005). The following preference order was used to classify each schema, in relation to the information resources it is intended to describe (from the evidence examined): their content, then their carriers, then their application. For ex-ample, a schema that is for sound recordings in music li-braries could be classed in a) music (subject); b) sound re-cordings (form); or, c) libraries (or more specifically music libraries, i.e. where the resources are used, or at least ob-tained); according to the order of preference, they would be classed in music. A schema for all kinds and content of sound recordings, on the other hand, would be classed in sound recordings. In the case of the vocabularies, the clas-sifier was left to identify what each of them was intended to describe, which could, potentially, be anything, though typically it could be summarised as a particular “field.” General subject vocabularies, such as the Library of Congress Subject Headings, were classed according to the ultimate tar-get of their application, i.e. the information resource type (e.g. library materials).

With the emphasis on subject and discipline, the Dewey Decimal Classification (DDC) was considered a suitable scheme to use to classify the schemas and vocabularies. The scheme is, of course, by no means a perfect or com-pletely universal knowledge organisation system, but it is

one of the more general and most widely used in the LIS field, and would be familiar to many of this journal’s read-ers. The classification rules mentioned above took prece-dence, but otherwise the standard DDC rules were applied to number choices, with each standard assigned to only one class. However, no explicit number building was un-dertaken: the aim was to paint no more than a broad pic-ture of the standards’ coverage.

To investigate the (reported) methodologies used to de-velop both the schemas and the vocabularies, the re-source/expert/user warrant typology was used, covering three basic approaches often represented in the LIS litera-ture. The published documentation for each standard was examined to ascertain which of the three warrants, if any, were emphasised in the historical and ongoing develop-ment of its semantics. One of the following codes was re-corded for each of the standards: R = resource warrant, i.e. concepts are based on the re-

sources being described E = expert warrant, i.e. concepts are based on expert

guidance U = user warrant, i.e. concepts are based on users’ search

needs C = combined warrant, i.e. methods representing two or

more of the above warrants are combined, with no single warrant emphasised

X = methodology is not reported, or unclear, or none of the above applies.

4.0 Results The latest versions of the 53 schemas were published be-tween 1994 and 2014, with a median publication year of 2008. Their nomenclature varied widely; only one was called a “schema,” others were “element sets,” “specifica-tions,” “ontologies,” data dictionaries” and so on (the most common designation was simply “metadata standard”). Well-known examples include: Categories for the Description of Works of Art, Dublin Core Metadata Element Set, IEEE Stan-dard for Learning Object Metadata and SPECTRUM. The lat-est versions of the 328 vocabularies were published be-tween 1976 and 2014, with a median publication year of 2010. Several of the standards were called “vocabularies,” but many more were called “thesauri” (97 of them, in fact) or “classifications” (27). Well-known examples include: Art & Architecture Thesaurus, ERIC Thesaurus, Iconclass, Library of Congress Classification and NASA Thesaurus. 4.1 Coverage A high-level overview of the coverage of the schemas is provided by means of the histogram in Figure 2, showing

Page 23: KO KNOWLEDGE ORGANIZATION · Knowl. Org. 42(2015)No.3 KO KNOWLEDGE ORGANIZATION Official Journal of the International Society for Knowledge Organization ISSN 0943 – 7444

Knowl. Org. 42(2015)No.3

Ph. Hider. A Survey of the Coverage and Methodologies of Schemas and Vocabularies Used to Describe Information Resources

159

their distribution across the ten main classes of DDC (see Appendix 1 for a list of the ten classes). It should be noted that a lack of coverage cannot be measured quantitatively, as it cannot be assumed that the population of information resources are distributed evenly across DDC (indeed, a criticism of the scheme has been that even library re-sources are distributed unevenly). However, the histogram suggests that more schemas are needed in the areas of phi-losophy, religion, language and literature, and that technol-ogy is also, perhaps more surprisingly, underrepresented. A more detailed account of the schemas” coverage, at the next level down in DDC, is shown in Figure 3 (see Appen-dix 2 for a list of the hundreds divisions). Additional gaps noticeable here are those in law, engineering, manufactur-ing, fine arts, sport and (non-European) history. Relatively well covered fields include education and publishing.

A high-level overview of the coverage of the vocabular-ies is shown in Figure 4. The distribution is broadly similar to that of the schemas, with a similar dearth of standards in the areas of philosophy, religion, language and literature. At the next level down, as shown in Figure 5, most sub-fields in other disciplines are covered to some extent; nota-ble exceptions are construction and some of the fine arts. Relatively well covered fields include many of the social sciences (including law), some of the natural sciences, cer-tain applied sciences, including medicine, engineering and agriculture, and LIS. As one might expect, given their far greater number, the vocabularies cover a larger number of

areas than the schemas do, but the areas in which the vo-cabularies and schemas are most concentrated are broadly similar. 4.2 Development methodology The extent to which certain types of warrant are empha-sised in the development of the schemas and vocabular-ies in the sample is indicated in Table 1. Documentation covering development methodology was lacking for about half of the standards; documentation was provided for proportinately more of the schemas than the vocabu-laries. Although a lack of reported methodology does not necessarily mean, of course, that no particular methodol-ogy was applied, nor that no internal guidelines were used, an account of how a given standard had been de-veloped would be of use to those considering applying it, as well as for the purposes of this study. The notably high proportion of standards lacking such documenta-tion is not encouraging at a number of levels.

Most of the methodologies reported emphasise a par-ticular warrant. As Table 1 shows, in the case of the schemas, user and expert warrant was reported to be ap-plied much more than was resource warrant; conversely, in the case of the vocabularies, resource warrant was re-ported to be applied much more than expert or user war rant, with a combination of warrants also applied more than either expert or user warrant alone.

Figure 2. Distribution of schemas (main classes)

Page 24: KO KNOWLEDGE ORGANIZATION · Knowl. Org. 42(2015)No.3 KO KNOWLEDGE ORGANIZATION Official Journal of the International Society for Knowledge Organization ISSN 0943 – 7444

Knowl. Org. 42(2015)No.3

Ph. Hider. A Survey of the Coverage and Methodologies of Schemas and Vocabularies Used to Describe Information Resources

160

It was speculated that there might also be a relationship be-tween warrant and coverage, so the warrant distributions for the schemas and vocabularies classed in the 300s (social science) and 5-600s (natural and applied science) of DDC were compared; they are shown in Tables 2 and 3. For the vocabularies, there are very similar distributions; for the schemas, there is a little more variation, but this could well be due to the small sample size. Thus no relationship be-tween warrant and coverage was discerned.

Warrant Schemas Vocabularies

n % n %Resource 1 1.9 61 18.6

Expert 12 22.6 18 5.5

User 13 24.5 15 4.6

Combined 5 9.4 42 12.8

Unidentified 22 41.5 192 58.5

53 100.0 328 100.0

Table 1. Warrant of schemas and vocabularies

Figure 3. Distribution of schemas (hundreds divisions)

Figure 4. Distribution of vocabularies (main classes)

Page 25: KO KNOWLEDGE ORGANIZATION · Knowl. Org. 42(2015)No.3 KO KNOWLEDGE ORGANIZATION Official Journal of the International Society for Knowledge Organization ISSN 0943 – 7444

Knowl. Org. 42(2015)No.3

Ph. Hider. A Survey of the Coverage and Methodologies of Schemas and Vocabularies Used to Describe Information Resources

161

Warrant 300s 5-600s n % n %

Resource 0 0.0 1 7.7

Expert 5 25.0 2 15.4

User 3 15.0 3 23.1

Combined 1 5.0 2 15.4

Unidentified 11 55.0 5 38.5

20 100.0 13 100.0

Table 2. Warrant of schemas in DDC 300s and 5-600s

Warrant 300s 5-600s

n % n %Resource 25 18.1 16 17.8

Expert 6 4.3 6 6.7

User 8 5.8 4 4.4

Combined 18 13.0 10 11.1

Unidentified 81 58.7 54 60.0

138 100.0 90 100.0

Table 3. Warrant of vocabularies in DDC 300s and 5-600s 5.0 Conclusions The survey results demonstrate that a wide range of sche-mas and vocabularies are freely accessible on the Web for the description of information resources. No doubt other standards, published but not freely accessible online, ex-tend this range even further, but probably not to an extent

that satisfies the needs of all areas. Areas such as the hu-manities and the fine arts may not need so many schemas and vocabularies, but they probably need more than they currently have; at any rate, they should be targeted for deeper audits. Other fields worth investigating, at least with respect to schemas, include law, engineering, manufacturing and sport. Given that each schema may require the applica-tion of multiple vocabularies, more vocabularies than schemas should be needed overall, but how many more, and thus of which type of standard there is a greater shortage, is difficult to judge. The general correlation be-tween the areas covered by the schemas and the areas cov-ered by the vocabularies in the sample suggests that the development of schemas and vocabularies often coincides in practice, as well as in theory.

Assuming the reported methodologies used to develop schemas and vocabularies are broadly representative of those actually applied to develop schemas and vocabularies, there is a notable difference in emphasis: schemas are typi-cally constructed by consulting experts considering end-users’ search behaviour; vocabularies, on the other hand, typically focus on the information resources, or combine a range of methods. Why this is the case, and whether it should be the case, remains unclear. It does not appear to be due to differences in coverage, given that similar pat-terns were reported in both the “hard” and “soft” sciences. It is likely, of course, that the advice offered in vocabulary construction guides such as those cited in section 2.2 above would have had an influence, although they do not fully explain the emphasis on resource warrant; similarly, the case studies that have reported the development of schemas may have influenced other constructions, with an

Figure 5. Distribution of vocabularies (hundreds divisions)

Page 26: KO KNOWLEDGE ORGANIZATION · Knowl. Org. 42(2015)No.3 KO KNOWLEDGE ORGANIZATION Official Journal of the International Society for Knowledge Organization ISSN 0943 – 7444

Knowl. Org. 42(2015)No.3

Ph. Hider. A Survey of the Coverage and Methodologies of Schemas and Vocabularies Used to Describe Information Resources

162

emphasis on user and expert warrant, but they do not properly explain a relative lack of mixed method ap-proaches to schema development identified in this study.

Also worthy of investigation is the way in which methodologies are combined, when they are. A limitation of his study was that the types of warrant involved in combined approaches were not coded, because relatively low frequencies of such approaches were anticipated and because such granularity would have been harder for the coder to have determined. Optimal development meth-odology for both schemas and vocabularies is a topic on which much more research is needed. More detailed re-porting on the construction of these standards is also to be encouraged. References Aitchison, Jean, Alan Gilchrist and David Bawden. 2000.

Thesaurus Construction and Use: A Practical Manual, 4th ed. London: Aslib IMI.

Baker, Thomas, Christophe Blanchi, Dan Brickley, Erik Duval, Rachel Heery, Pete Johnston, Leonid Ka-linichenko, Heike Neuroth and Shigeo Sugimoto. 2002. Principles of Metadata Registries: A White Paper of the DELOS Working Group on Registries. http://web. archive.org/web/20110818100958/http://delosnoe.iei. pi.cnr.it/activities/standardizationforum/Registries.pdf.

Beghtol, Clare. 1986. “Semantic Validity: Concepts of Warrant in Bibliographic Classification Systems.” Li-brary Resources & Technical Services 30: 109-25.

Bliss, Henry Evelyn. 1939. The Organization of Knowledge in Libraries, and the Subject-Approach to Books, 2nd ed. New York: Wilson.

Broughton, Vanda. 2006. Essential Thesaurus Construction. London: Facet Publishing.

Hider, Philip. 2012. Information Resource Description: Creating and Managing Metadata. London: Facet Publishing.

Howarth, Lynne, and Eva Jansen. 2014. “Towards a Ty-pology of Warrant for 21st Century Knowledge Or-ganization Systems.” In Proceedings of the Thirteenth In-ternational ISKO Conference 19-22 May 2014 Kraków, Po-land, edited by Wieslaw Babik. Advances in Knowledge Organization 12. Würzburg: Ergon Verlag, 216-21.

Hulme, Edward W. 1911. “Principles of Book Classifica-tion.” Library Association Record 13: 354-8, 389-94, 444-9.

Hulme, Edward W. 1912. “Principles of Book Classifica-tion.” Library Association Record 14: 39-46, 174-81, 216-22.

ISO 259641-1. 2011. Thesauri for Information Retrieval. Ge-neva: ISO.

Johnston, Pete. 2004. JISC IE Metadata Schema Registry: Functions of the IE Metadata Schema Registry. http:// www.ukoln.ac.uk/projects/iemsr/wp2/function.

Kim, Chai and Soon D. Kim. 2007. “Consensus vs Fre-quency: An Empirical Investigation of the Theories for Identifying Descriptors in Designing Retrieval Thesau- ri.” Information Processing and Management 13: 253-8.

Kwaśnik, Barbara H. 2010. “Semantic Warrant: A Pivotal Concept for Our Field.” Knowledge Organization 37: 106-10.

Lancaster, F. 1986. Vocabulary Control for Information Retrieval, 2nd ed. Arlington, VA: Information Resources Press.

Mai, Jens-Erik. 2005. “Analysis in Indexing: Document and Domain Centered Approaches.” Information Process-ing and Management 4: 599 -611.

Mai, Jens-Erik. 2008. “Actors, Domain, and Constraints in the Design and Construction of Controlled Vo-cabularies.” Knowledge Organization 35: 16-29.

Miller, Steven J. 2011. Metadata for Digital Collections: A How-To-Do-It Manual. London: Facet Publishing.

Morville, Peter and Louis Rosenfeld. 2007. Information Ar-chitecture for the World Wide Web, 3rd ed. Sebastopol, CA: O”Reilly Media.

Riley, Jenn. 2010. Seeing Standards: A Visualization of the Metadata Universe. http://www.dlib.indiana.edu/~jenl rile/metadatamap.

Riley, Jenn and Michelle Dalmau. 2007. “The IN Har-mony Project: Developing a Flexible Metadata Model for the Description and Discovery of Sheet Music.” The Electronic Library 25: 132-47.

Rodriguez, Robert. 1984. “Hulme”s Concept of Literary Warrant.” Cataloging & Classification Quarterly 5: 17–26.

Sayers, W.C. Berwick. 1967. A Manual of Classification for Librarians, 4th ed. London: Deutsch.

Souza, Renato, R., Douglas Tudhope and Mauricio B. Almeida. 2012. “Towards a Taxonomy of KOS: Di-mensions for Classifying Knowledge Organization Systems.” Knowledge Organization 39: 179-92.

Zeng, Marcia L. and Jian Qin. 2008. Metadata. New York: Neal-Schuman Publishers.

Appendix 1 000 Computer science, information & general works 100 Philosophy & psychology 200 Religion 300 Social sciences 400 Language 500 Science 600 Technology 700 Arts & recreation 800 Literature 900 History & geography

Page 27: KO KNOWLEDGE ORGANIZATION · Knowl. Org. 42(2015)No.3 KO KNOWLEDGE ORGANIZATION Official Journal of the International Society for Knowledge Organization ISSN 0943 – 7444

Knowl. Org. 42(2015)No.3

Ph. Hider. A Survey of the Coverage and Methodologies of Schemas and Vocabularies Used to Describe Information Resources

163

Appendix 2 000 Computer science, knowledge & systems 010 Bibliographies 020 Library & information sciences 030 Encyclopedias & books of facts 040 [Unassigned] 050 Magazines, journals & serials 060 Associations, organizations & museums 070 News media, journalism & publishing 080 Quotations 090 Manuscripts & rare books 100 Philosophy 110 Metaphysics 120 Epistemology 130 Parapsychology & occultism 140 Philosophical schools of thought 150 Psychology 160 Logic 170 Ethics 180 Ancient, medieval & eastern philosophy 190 Modern western philosophy 200 Religion 210 Philosophy & theory of religion 220 The Bible 230 Christianity & Christian theology 240 Christian practice & observance 250 Christian pastoral practice & religious orders 260 Christian organization, social work & worship 270 History of Christianity 280 Christian denominations 290 Other religions 300 Social sciences, sociology & anthropology 310 Statistics 320 Political science 330 Economics 340 Law 350 Public administration & military science 360 Social problems & social services 370 Education 380 Commerce, communications & transportation 390 Customs, etiquette & folklore 400 Language 410 Linguistics 420 English & Old English languages 430 German & related languages 440 French & related languages 450 Italian, Romanian & related languages 460 Spanish & Portuguese languages 470 Latin & Italic languages 480 Classical & modern Greek languages 490 Other languages

500 Science 510 Mathematics 520 Astronomy 530 Physics 540 Chemistry 550 Earth sciences & geology 560 Fossils & prehistoric life 570 Life sciences; biology 580 Plants (Botany) 590 Animals (Zoology) 600 Technology 610 Medicine & health 620 Engineering 630 Agriculture 640 Home & family management 650 Management & public relations 660 Chemical engineering 670 Manufacturing 680 Manufacture for specific uses 690 Building & construction 700 Arts 710 Landscaping & area planning 720 Architecture 730 Sculpture, ceramics & metalwork 740 Drawing & decorative arts 750 Painting 760 Graphic arts 770 Photography & computer art 780 Music 790 Sports, games & entertainment 800 Literature, rhetoric & criticism 810 American literature in English 820 English & Old English literatures 830 German & related literatures 840 French & related literatures 850 Italian, Romanian & related literatures 860 Spanish & Portuguese literatures 870 Latin & Italic literatures 880 Classical & modern Greek literatures 890 Other literatures 900 History 910 Geography & travel 920 Biography & genealogy 930 History of ancient world (to ca. 499) 940 History of Europe 950 History of Asia 960 History of Africa 970 History of North America 980 History of South America 990 History of other areas

Page 28: KO KNOWLEDGE ORGANIZATION · Knowl. Org. 42(2015)No.3 KO KNOWLEDGE ORGANIZATION Official Journal of the International Society for Knowledge Organization ISSN 0943 – 7444

Knowl. Org. 42(2015)No.3

G. Anguiano Peña and C. Naumis Peña. Method for Selecting Specialized Terms from a General Language Corpus

164

Method for Selecting Specialized Terms from a General Language Corpus

Gilberto Anguiano Peña* and Catalina Naumis Peña**

*Diccionario del Español de México, El Colegio de México, Camino al Ajusco #20 Pedregal de Santa Teresa, Tlalpan, C. P. 10740, México D. F., <[email protected]>

**Instituto de Investigaciones Bibliotecológicas y de la Información, Universidad Nacional Autónoma de México, Piso 12, Torre II de Humanidades, Avda Universidad 3000,

Ciudad Universitaria, Delegación Coyoacán, C. P. 04510, México D. F., [email protected]

Gilberto Anguiano Peña, is a doctoral candidate at the National Autonomous University of Mexico (UNAM). His undergraduate thesis (UNAM 1991) was La relevancia de la información bibliográfica en la documentación de un dic-cionario (The Relevance of Bibiographic Information in Dictionary Documentation), and his master’s (UNAM 2007) Indización semiautomática para almacenar y recuperar información del léxico del español usado en México (Semi-automatic Indexing for Storing and Retrieving Information from the Spanish Language Used in Mexico). Since 1992, he has been project researcher for the Diccionario del Español de México (Dictionary of Spanish in Mexico) at El Colegio de México.

Catalina Naumis Peña has undergraduate and master’s degrees from the National Autonomous University of Mexico (UNAM), and a PhD in Information Sciences (Documentation) from the Complutense University of Madrid. Specializes in analysis and content representation within the area of information and knowledge or-ganization, at the UNAM Library and Information Science Research Institute. She also teaches in the UNAM Faculty of Philosophy and Literature. Her most extensive projects have been the Tesauro Latinoamericano en Cien-cia Bibliotecológica (Latin American Thesaurus in Library Science and Information) and the Macrotesauro mexicano para contenidos educativos (Mexican Macrothesaurus for educational content).

Anguiano Peña, Gilberto and Naumis Peña, Catalina. Method for Selecting Specialized Terms from a General Language Corpus. Knowledge Organization. 42(3), 164-175. 27 references. Abstract: Among the many aspects studied by library and information science are linguistic phenomena asso-ciated with document content analysis, for purposes of both information organization and retrieval. To this end, terms used in scientific and technical language must be recovered and their area of domain and behavior

studied. Through language, society controls the knowledge available to people. Document content analysis, in this case of scientific texts, facilitates gathering knowledge of lexical units and their major applications and separating such specialized terms from the general lan-guage, to create indexing languages. The model presented here or other lexicographic resources with similar characteristics may be useful in the near future, in computer-assisted indexing or as corpora monitors, with respect to new text analyses or specialized corpora. Thus, using techniques for document content analysis of a lexicographically labeled general language corpus proposed herein, components which enable the extraction of lexical units from specialized language may be obtained and characterized.

Received: 19 March 2015; Revised: 2 July 2015; Accepted: 3 July 2015

Keywords: language units, language corpus, lexical analysis, specialized terms

1.0 Introduction The overall goal of this paper is to determine method-ologies and strategies for processing a linguistic corpus from the general language in order to extract specialized

terms from one discipline and those shared by several so they may be automatically separated from the mass of lexical units in the general language. This type of work makes it possible to gather terms and later clarify their meaning, learn about their uses in the texts in which they

Page 29: KO KNOWLEDGE ORGANIZATION · Knowl. Org. 42(2015)No.3 KO KNOWLEDGE ORGANIZATION Official Journal of the International Society for Knowledge Organization ISSN 0943 – 7444

Knowl. Org. 42(2015)No.3

G. Anguiano Peña and C. Naumis Peña. Method for Selecting Specialized Terms from a General Language Corpus

165

appear and utilize them when constructing indexing lan-guages. Alexiev (2006, 96) wrote:

Terminologies come under various forms (indexed, thesauri, termbanks, specialized dictionaries, glossa-ries, etc.) and are designed to meet the needs of translation, Language for Specific Purposes (LSP) teaching, information retrieval, controlled indexing, document consulting and navigation, technical au-thoring, or merely to help the understanding of technical documents.

Is the language of science a secret language? The author of a published scientific article titled “The Secret Lan-guage of Science” (Ryan 1985, 91) says that:

Too many of the students enrolled in introductory courses at colleges and universities do not under-stand the concepts being discussed even though they make a conscientious effort to study and to master the technical vocabulary … the secret language of science consists of the many common words that have been appropriated into biology, chemistry, phys-ics, psychology, and even mathematics, with highly specific technical meanings.

Essentially, the problem becomes evident when people try unsuccessfully to understand words, phrases, sentences, etc. found in scientific texts in order to interpret their meaning. Consequently, they are unable to benefit from scientific and technological knowledge, the outcome being that the code of the sciences is opaque, dark and foreign to most of the population, reinforcing the idea that the language of science is practically a secret language. This is actually quite understandable, because individuals engaged in scientific communication need to have the basic ele-ments to codify, transmit, decode and interpret the scien-tific or technical meanings, and anyone lacking those ele-ments is automatically excluded from the specialized communication held between senders and receivers of science and technology, as discussed in Gemma (2013).

Knowledge organization should clarify terms so as to help users in their information searches. The key words, descriptors or subject headings used to index contents are validated terms from texts written about each subject. Thus, the corpus of a general language dictionary of the Spanish used in Mexico can also be a source for extracting indexing term candidates. The problem is that the lack of clarity would seem to be repeated in texts that are in-volved in forming the linguistic corpora from the general language. The corpus called Corpus del Español Mexicano Contemporáneo (CEMC, Corpus of Contemporary Mexican Spanish) has been utilized as the basis for developing this

study. It has incorporated scientific, as well as formal, un-dergraduate-level education texts in which terms from the specialized language appear that are retrieved from the corpus mass in order to be analyzed. This paper does not touch on semantic aspects of the terms, only the method-ology for isolating them from the general language corpus. According to Alexiev (2006, 14), “The concept <corpus> is used in modern linguistics to refer to both running text and lists of lexical items excerpted from running text for various, including terminographic, purposes.” 2.0 Communication Focus and Other Aspects of the

Text Information studies currently considers, among other things, that to help users access information, the starting point is to make clear that the main idea of the communi-cation of a message is for its meaning to be understood, so it is necessary to observe what happens with the linguistic sign and its components: signifier, referent and meaning, for effective communication to exist. When working with texts, as is usually the case in information studies, it must be established that there are several aspects inherent to the need to communicate something, as explained by such au-thors as Lara (2001), Temmerman (2000), with her socio-cognitive theory of terminology, and Cabré (1999; 2003), with her two propositions: the Theory of Doors and the Communicative Theory of Terminology (TCT, Spanish initials). These specialists argue that the context in a lexical unit is used and its correlation with the rest of the lan-guage must be taken into account in order to understand its true meaning, which, in turn, is designated by common consensus among speakers.

If the theoretical approaches of these specialists are considered pertinent, it would also be important to exam-ine those of sociolinguistics concerning the context of situation, field, tenor and mode (Halliday 1977) and of quantitative urban sociology or variationism (DTCE 2014), which looks at the speaker’s socioeconomic position, as well as cultural background. This will make it clear that sci-entific language communication encompasses the spatial and temporal circumstances in which it develops. So, studying the object called text must also take into account linguistic context, meaning the factors linked to sentence production. The same context affects the interpretation, adaptation and meaning of the message (through gram-mar, syntax, vocabulary and context). Furthermore, the context or extra linguistic situation must also be consid-ered, as it is the set of potential participants in the com-munication, such as the place, type of registry and moment when a linguistic act occurs.

The study and maintenance of linguistic registries is ex-tremely important for clarifying terms, since it includes the

Page 30: KO KNOWLEDGE ORGANIZATION · Knowl. Org. 42(2015)No.3 KO KNOWLEDGE ORGANIZATION Official Journal of the International Society for Knowledge Organization ISSN 0943 – 7444

Knowl. Org. 42(2015)No.3

G. Anguiano Peña and C. Naumis Peña. Method for Selecting Specialized Terms from a General Language Corpus

166

set of contextual, sociolinguistic variables that condition the way in which a language is used in a concrete socio-economic context. In other words, analysis of a linguistic registry must define whether the communiqué is in stan-dard, non-standard, proper or semi-proper language, if it is a formal or informal communication, etc., as established in the CEMC stratification (Lara and Ham 1979, 7-49) from which the results for this article came.

Likewise, when scientific texts are analyzed, it is impor-tant to indicate that science is a type of communication based on registries of use and formal situations where the sender chooses suitable linguistic resources, in specialized registries, aimed at a receiver whose common link is an in-terest in a specific or professional specialized activity. These characteristics help to differentiate and identify it from registries pertaining to other sociocultural contexts such as the one studied in this case. Professional situations typically use a technical vocabulary specific to the area of interest and expressions with a special meaning. The mes-sages transmitted tend to be in writing. Nevertheless, sci-entific authors in real life are often unable to communicate their message, as set down by Wüster in the General The-ory of Terminology (GTT, see Wüster 1979), that is with terminological units exclusive of their discipline, since they also need to use lexical units from the general lan-guage and even specialized lexical units from other disci-plines.

Important aspects must be considered when deciding on analyzing the lexical units in one of a particular au-thor’s works or texts. Since the writer is like an authority to be followed and respected, a productive author (among the most cited in a field) should be chosen. Other aspects should also be considered, such as birthplace, socioeco-nomic status, individual experiences, culture, ideology, re-ligion, political stance, verbal tradition, language, profes-sional training, previous individual and group research, experience, freedom of expression, individual interests, updatedness, scientific specialization and the type of documents or texts produced, since they could be as var-ied as letters, communiqués, reports, theses, research pa-pers, articles, books, speeches, decisions, norms, laws, regulations or general documents. In fact, to situate the production of the terms to be analyzed, the type of document or scientific text must be identified, as well, of course, whether it is by an authority on the subject, is an oral or written communication, was produced quickly or put together slowly, was a freely chosen or assigned topic, etc.

Another aspect to consider is the use of specialized expressions, because scientific authors are generally me-ticulous when selecting lexical units for their texts, so as to minimize the ambiguities in the scientific and technical communication. Nevertheless, an author may or may not

be good at choosing the most precise words to fulfill the communication goal, since countless ideas may guide the choice of lexical and terminological units in the discourse, among them the very situation in which the discourse is produced, the language in which it is written and the cor-rect use of nomenclature, proper names, abbreviations, acronyms, initials, pat expressions, codes, passwords, con-cepts, numbers written out and in figures, symbols, formu-las, conventions, etc., which may or may not contribute to a terminological unit taking shape in specialized texts.

Clearly, numerous factors can influence an author’s se-lection of lexical and terminological units. Moreover, sim-ple forms, syntagmas, pat expressions and multi-verbal terminological phrases exist, and in addition, another type of information, often found in academic, technical and specialized texts and in greater proportions is the use of quotes and transcriptions that are studied to find out about what others have said either as thoughts or scientific proofs (Cabré et al. 2014). They often appear in the lan-guage in which they were originally produced, such as Latin, Greek, English, French, etc., and along with their respective critical baggage. 3.0 Content Analysis It is a good time to focus on the idea that to solve scientific problems, both the sciences and technical fields carry out their research by means of the analysis method, with a par-ticular type of analysis used in each discipline or field of human knowledge. Certain analysis methods and tech-niques are related or complementary to information stud-ies. Of course countless disciplines offer useful knowledge on the subject, but in fact, traditionally information is sought from the closest areas, such as: linguistics, the gamut of applied linguistics and computer science, among many others. Analyses are developed in these disciplines that could be applied to multidisciplinary studies, for ex-ample: content analysis, analysis of discourse, grammatical analysis, qualitative analysis, quantitative analysis, analysis of analytical definitions, analysis of contrasting phraseol-ogy, lexicological analysis, document analysis, analysis of conceptual relationships, analysis of texts, analysis of syn-tagmatic units, analysis and design of linguistic corpora, analysis of terms and lastly the document content analysis method used to transfer information.

In the introduction to the book Text and Context, Van Dijk (1977) explains how discourse analysis is studied by different scientific disciplines and the extent to which there is an interdisciplinary “transverse connection.” Van Dijk starts with the assumption that language use, com-munication and interaction are produced through texts or discourses. Linguistics studies a part of language use, as do other sciences: sociolinguistics, communications, cog-

Page 31: KO KNOWLEDGE ORGANIZATION · Knowl. Org. 42(2015)No.3 KO KNOWLEDGE ORGANIZATION Official Journal of the International Society for Knowledge Organization ISSN 0943 – 7444

Knowl. Org. 42(2015)No.3

G. Anguiano Peña and C. Naumis Peña. Method for Selecting Specialized Terms from a General Language Corpus

167

nitive psychology, pedagogy, jurisprudence, political sci-ence, sociology and, of course, information studies. Tex-tual or discursive relationships are formed among differ-ent types of texts, the underlying textual structures, their distinct conditions and functions, the contents and the ef-fects they have on speakers (Van Dijk 1977, 12-13). The various types of text, the relationships among them and with society have various kinds of connections that are analyzed from different viewpoints, depending on the field where they are carried out. The sciences of text at-tempt to delve into the common properties and charac-teristics of language use within the spectrum of disci-plines that encompass the social and human sciences.

The area of information analysis and systematization that comprises library-and-information-science is con-fined to describing the types of texts, data and informa-tive contents that lead to their localization in systems. However, the use of processes common to other disci-plines is undeniable, among them the terminological and lexicographic analysis used in this paper. 4.0 Documentation in Lexicography The compilation of dictionaries presented by Varantola (2003) is the method and is essential for representing the content of the documents that make up the corpus where the lexicographic units defined in the dictionary are in-cluded. The representation of contents carried out allows for consultation and retrieval through different access points, plus, with the information produced by this type of analysis, new products may almost always be generated to satisfy lexical information needs, such as concordances, statistical data, indices and dictionaries. Document content analysis helps in message decoding and retrieval of infor-mation pertinent to users of the Diccionario del Español de México (DEM) indexing system project (This project was launched in 1973 and from the outset, as Varantola (2003) mentions for other corpora, the DEM also structured its lexicograph information retrieval system taking corpus lin-guistics as its base). This is based on the fact that the au-thor previously created his/her message, which is con-tained in a support document, usually a specialty text. Therefore, it is up to the information centers to ensure that the contents of such documents, as term candidates may be, are easy for users to consult and retrieve.

Corpus linguistics provides the underpinnings of the DEM, to maintain general and specialized corpora that of-fer great capacity and versatility in managing the informa-tion it contains, as with any other information system. A corpus of this type, however, presents entries and points of access defined by corpus linguistics. Although multi-modal corpora (voice, image, text, etc.) now exist, the cor-pora used generally by the sciences and technical fields un-

til fairly recently are meant to analyze the words or lexical units contained in general texts or specialized languages in their different modalities and characteristics and especially in the general communications case being studied.

The indexing process in lexicography basically requires fulfilling certain stages for their application, such as:

– Planning activities, including defining goals, organiza-

tion and methodologies to be implemented, – Document selection and acquisition mark the begin-

ning of the process. In the case of recordings with in-formants, transcriptions are made.

– External document treatment involves physically pre-paring material and thus creating the respective file, for subsequent analysis.

– Next comes the bibliographic description of the docu-ment, highlighting the access points that will enable its identification with relation to other documents. The de-scription of printed texts includes authors, title, printer information and physical description of the material. Additionally, in lexicography, data external to the docu-ment and of interest to sociolinguistics, pragmatics and semiotics are included. Generally, such data correspond to the communication unit analyzed, in which the sender stands out, the situation in which the communi-cation was generated and the channel utilized. Also, an extra linguistic context or spoken registry is added, with which the formality or informality of the written documents may be ascertained, as well as whether the text targeted a general or specialized audience. Later situational and thematic identification concerning the use of lexical units will depend on these registries as a whole, which will help system users assign meaning to lexical units from the retrieved information.

– Regarding the text or strictly linguistic context, texts written in a scientific discipline should have the compo-nents of the linguistic sign (signifier, meaning and refer-ent), with the smallest text equaling a paragraph, or an item. They are analyzed through previously determined programs and algorithms in order to obtain the infor-mation contained in the document. Usually, analysis of these texts produces graphic forms of the words or lexical units, as found in natural language texts, either from common or specialized language texts.

In information studies, when pre-coordinated systems of indexing are used, terms are isolated from their contexts. The work method is textual analysis of the scientific document, with subsequent document content analysis, keeping pre-coordinated terms as the main objective. In-dexing terms are extracted from the same text. (A corpus may be created ad hoc, or commercial text analysis pro-grams such as WordSmith, AntConc or Notepad, Atlas.ti,

Page 32: KO KNOWLEDGE ORGANIZATION · Knowl. Org. 42(2015)No.3 KO KNOWLEDGE ORGANIZATION Official Journal of the International Society for Knowledge Organization ISSN 0943 – 7444

Knowl. Org. 42(2015)No.3

G. Anguiano Peña and C. Naumis Peña. Method for Selecting Specialized Terms from a General Language Corpus

168

Sketch Engine, etc. may be used.) Lists of signifiers or lexical units, separated from their meanings and referents, were thus derived. The linguistic sign is thus fragmented, which causes information retrieval complications for us-ers, meaning they need backup in their searches.

Unlike this method, when terms are extracted to form linguistic corpora in lexicography, various lists are pro-duced, which may be of simple or compound words, with their grammatical category, through morphology, accord-ing to their internal structure, according to the number of syllables they have, or also as: placements, phraseological units, compound syntagmas, phraseological sentences, significant words, key words, stop words, technical terms, neologisms or term candidates. Generally, the lexical units obtained from corpus linguistics are joined by quantitative data (range and frequency), and the realm of their origin may be recognized through the registry of their use, as long as they mainly pertain to specialized languages. 5.0 Terminological Exclusion Through Subsets

of the General Language When intending to extract term candidates from general or specialized texts, it is extremely helpful to keep in mind prior existing information on the language in general, taken from measurement information studies such as informetry, bibliometry, scienciometry and lexicometry, also Luhn’s cut-offs and obtaining TF-IDF weights (Schultz 1968) so as to create filters that exclude the common language and mainly retrieve term candidates.

Furthermore, besides the aforementioned indicators, we suggest that other very similar indicators based on the natural language may be used to exclude subsets of the general language, such as: the basic vocabulary (similar to the highest frequency index and the Zipf model (Zipf 1949), the common language (based on the dispersion in-dex) and the list of grammatical words (the equivalent of empty words), with the goal of isolating the specialized units searched for in the text as much as possible. In other words, the lexicographic knowledge that, in this case, pro-duced by the DEM project (2012) and its CEMC (1975), may be reutilized, as a way of simplifying the information to be analyzed.

Some results of the CEMC content analysis, which was structured with close to two million grammatically labeled words, are used in this paper. That corpus yielded a lexico-graphic product, the Diccionario Estadístico del Español de México (DEEM 2005). It is a statistical index of natural language with lexical grammatical and sociolinguistic in-formation, language use registries and quantitative data.

The results from the DEEM regarding empty words, greatest dispersion and highest frequency were:

1) Grammatical lexical units or empty words. They are mainly articles, prepositions, interjections, pronouns, etc. and come to 292 headwords, or 51.60% of the to-tal information in the corpus. This is the third group of excluded terms when seeking scientific and techni-cal terms to extract.

2) The lexical units with the greatest dispersion, or com-mon language, (see Anguiano 2013) are 994 different headwords that corresponded to 67.57% of the total information in the corpus. When a specialized term search is carried out, this type of lexical unit tends to separate from the document content analysis.

3) The most frequent lexical units, or basic vocabulary, as Lara introduced (2007). Here it is helpful to realize that studies of lexicometry, informetry, the Zipf model, etc. pointed out that there is an economic phenomenon in language use, known as “least effort.” It basically de-scribes how people use a huge amount of graphic words that correspond to a very small amount of headwords, the result being a very small number of lexical units with very high frequency. Following this reasoning, it is understood that the basic vocabulary or that with the highest frequency index is the most used in texts and speeches, as in the CEMC, where merely 861 headwords comprise 75% of the total information in the corpus. It is recommended that this type of lexi-cal units also be eliminated.

Chart 1 shows the results for these three divisions, ex-plaining their exclusion from analysis as a drastic saving.

Clearly, the three subgroups together do not total themselves, because some lexical units are repeated in two, or even all three, subgroups. Considering that to-gether they might total 78.83% of all the information analyzed, it is then interesting for information retrieval to create filters with the information from the general lan-guage prior to content analysis, which would save about 80% of term candidate retrieval, which coincides with calculations made in other information retrieval studies. To make the work of retrieving scientific and technical terms more efficient, a minimum of valid lexical unit ap-pearances is set, to avoid filtering very low frequency lexical units, because terms whose meanings have no lit-erary guarantee may appear. 6.0 Document Process for Clarifying Meaning and

Defining Term Candidate Use The way we propose to retrieve text of user interest is that once the index of signifiers is obtained, equivalent to the list of terminological unit candidates, they must be simplified and made into headwords. Next, each unit is retrieved through the thematic use registry to which the

Page 33: KO KNOWLEDGE ORGANIZATION · Knowl. Org. 42(2015)No.3 KO KNOWLEDGE ORGANIZATION Official Journal of the International Society for Knowledge Organization ISSN 0943 – 7444

Knowl. Org. 42(2015)No.3

G. Anguiano Peña and C. Naumis Peña. Method for Selecting Specialized Terms from a General Language Corpus

169

documents analyzed belong, making the practice into something like indicating the available text language; as this would help users “clarify the meaning and find the appropriate use of certain words” (Estopà 1998, 360). López (2013, 1) wrote: “The available vocabulary is the set of words that speakers have in their mental language and whose use is conditioned by the concrete topic of communication. The idea is to discover what words a speaker would be able to use for specific topics of com-munication.” Furthermore, users may subsequently ask the information retrieval system for the referent closest to what is being sought, simplifying searches as much as possible. Nonetheless, and despite all efforts, the true meaning will always be the reader’s interpretation. Like the indexing process, term candidates or key words may be adjusted to a controlled language to improve content

retrieval. This can be done by employing subject headings and thesauri. Words from the natural language obtained through indexing are converted into expressions and concepts in a controlled language.

At the end of the indexing process, the information is dispersed so that it gets to users who can utilize it. In the case of lexicographic projects, there are different infor-mation products derived from the document analysis that target internal and external users. They may be separate or joined as a system. The components may be the data-base of concordances, similar to KWIC (Key Word in Context), quantitative information, index cards, the very dictionary being put together, or the various interfaces generated to consult the lexicographic information. Among the results of the long lexicographic process of document content analysis, what should result upon con-

Stop words Greatest dis-persion

Highest fre-quency The three proposals together

in relation to the total 51.60 67.57 75 78.83 lexical units 975921 1277637 1418293 1490699

Chart 1. Proposal of cut-offs; empty words, highest frequency, greatest dispersion and of the three together, with respect to 1,891,058 lexical units (%)

Concept Graphic words % respecto al total Empty or grammatical words 975921 51.60561 Greatest dispersion or common language 1277637 67.57303 Highest frequency or basic vocabulary 1418293 75 The three together 1490699 78.83653

Table 1. Summary of empty words, greatest dispersion, highest frequency and the three together

Page 34: KO KNOWLEDGE ORGANIZATION · Knowl. Org. 42(2015)No.3 KO KNOWLEDGE ORGANIZATION Official Journal of the International Society for Knowledge Organization ISSN 0943 – 7444

Knowl. Org. 42(2015)No.3

G. Anguiano Peña and C. Naumis Peña. Method for Selecting Specialized Terms from a General Language Corpus

170

cluding the indexing or natural language classification is a list of lexical unit signifiers from the general language, but also from the sciences and technical fields based on the presence in texts related to this field of work. 7.0 Application of Use Labels from Lexicographic

Documentation Based on the results from the DEEM, another database could be put together, the sociolinguistic model of the Spanish language used in Mexico (Anguiano 2006). After assigning the lexical units from the DEEM semi-automated indexing, it was possible to get the sum of the partial results that the prior base showed. Then with the complete data, the total results of the lexical units used in the general language in Mexico could be identified through their sociolinguistic registries (see table 2).

Pre-coordinated terms may appear in texts from dif-ferent specialties, because they are not exclusive to one knowledge area but are, rather, shared terms. 8.0 Proposal for Limiting Term Candidates For the specialized information search and retrieval, we suggest eliminating the following information from the quantitative data and general language use labels prior to the general and specialized text document content analy-sis: – The most frequent lexical units – The most widely dispersed lexical units – Lexical units pertaining to the empty word group – Lexical units from non-standard language – Lexical units from semi-proper language Eliminating the quantitative and sociolinguistic units listed above from the analysis would economize substan-tially on term candidate information retrieval.

But most importantly, after coming up with the list of term candidates, comparisons may be made with the pre-existing language use registries in this same sociolinguistic model of the Mexican Spanish language. Such an exami-nation would help both information users and library and information science professionals in the reconstruction of the meaning of the linguistic sign and the elaboration of a controlled language.

This new correlation exercise could reveal term candi-dates that are used exclusively in a particular discipline, which would confirm, first off, that they are key words and that later, after expert validation, they might turn into terms in the strict sense. The comparison may also lead to identifying candidates used in two or more disciplines, which would indicate that they are terms in a loose sense

and may even have polysemy, in other words that for lexi-cography they are technical terms. Candidates may also be found that pertain to both the sciences and technical fields, so could be considered technical terms while bear-ing the (sci.) label in dictionaries, denoting they pertain to the scientific language.

Furthermore, we propose reutilizing lexicographic processes to differentiate lexical units and extract scien-tific and technical terms through content analysis of spe-cialized texts, by employing use labels or spoken registries as suggested by Rey-Debove (1971, 91-92) who describes three fundamental aspects for achieving this goal: 1) the set of words (lexical units) that belong to a lan-

guage; 2) the sociolinguistic information from lexical units; and, 3) the use labels agreed upon by the community. By incorporating these three guidelines into the informa-tion analysis, the expectation is that when the same lexical units from the general language, identified by consensus, are contrasted with the specialized language, they may first be used to classify the technical terms, and since the latter behave very similarly to terminological units, may also be designated term candidates. (From a linguistic standpoint, terms may also be called technical terms, as stated in the following definition translated from the Spanish (DMLE 2007 s.v.):

TECHNICAL TERM. m. 1 A term that has a con-crete and specific meaning within the language of a trade, science, art or industry: the word “algorithm” is a technical term in mathematics.

9.0 The Search for Terminological Units in Texts To access specialized terms with the help of a corpus from the general language such as the cited model, first, term candidates with a spoken registry related to a spe-cialized text are separated. At this stage of the term search process, it is normal to find, in the lists produced by the automated analysis, lexical units that pertain to use in a discipline at its various communication levels, even if all these units belong to the standard language, the proper language and a science or technical field. This, in itself, means that the following lexical units from a general or scientific text may be obtained from the content analysis: 1) units from the general language; 2) units pertaining to the style of the analyzed discipline; 3) term candidates in a loose sense and 4) term candidates in the strict sense, as shown in image 1.

Page 35: KO KNOWLEDGE ORGANIZATION · Knowl. Org. 42(2015)No.3 KO KNOWLEDGE ORGANIZATION Official Journal of the International Society for Knowledge Organization ISSN 0943 – 7444

Knowl. Org. 42(2015)No.3

G. Anguiano Peña and C. Naumis Peña. Method for Selecting Specialized Terms from a General Language Corpus

171

Hea

dwor

ds

Gra

m.

cat.

Freq

uenc

y to

tal

% to

tal

Use

of

Span

ishL

angu

age l

evel

Spok

en

regis

tries

Gre

ates

t fre

quen

cy

Bes

t di

strib

utio

ncla

ve d

e tex

t R

egist

ry 1

use

R

egist

ry 1

use

R

egist

ry 1

use

actio

n no

un4

0.00

021

stan

dard

pr

oper

la

ngua

ge

attit

ude

noun

259

0.01

370

stan

dard

ba

sic

vo

cabu

lary

activ

atio

n no

un14

0.

0007

4 st

anda

rd

prop

er

lang

uage

sc

ienc

es

42

0, 4

27, 4

28, 4

54,

469,

473

, 477

, 478

C

hem

istr

y M

edic

ine

and

ve

terin

ary

scie

nce

Hum

an m

edic

ine

activ

ated

ad

j 2

0.00

011

stan

dard

pr

oper

la

ngua

ge

scie

nces

389,

478

E

lect

roni

cs a

nd

elec

tric

ity

Hum

an m

edic

ine

activ

ely

adv

12

0.00

063

stan

dard

pr

oper

la

ngua

ge

activ

ity

noun

511

0.02

701

stan

dard

ba

sic

vo

cabu

lary

activ

ist

adj;

s6

0.00

031

stan

dard

l p

rope

r

lang

uage

act

noun

308

0.01

629

stan

dard

ba

sic

vo

cabu

lary

acto

r ad

j; s

133

0.00

704

stan

dard

Tabl

e 2. E

xam

ple

of U

se R

egis

trie

s in

the

Iden

tific

atio

n of

Ter

m C

andi

date

s in

the

Soci

olin

guis

tic M

odel

Page 36: KO KNOWLEDGE ORGANIZATION · Knowl. Org. 42(2015)No.3 KO KNOWLEDGE ORGANIZATION Official Journal of the International Society for Knowledge Organization ISSN 0943 – 7444

Knowl. Org. 42(2015)No.3

G. Anguiano Peña and C. Naumis Peña. Method for Selecting Specialized Terms from a General Language Corpus

172

To delve further into the discussion from the previous paragraph, a more detailed explanation regarding the units to be found in texts follows: 1) Lexical units that belong to the general language and

that appear in texts from the sciences and technical fields but that are also identified sociolinguistically as pertaining to: the general standard language, non-standard language, semi-proper language and proper language. In other words, they are not exclusive to the sciences or technical fields, and it is advisable to ex-clude them from the term candidate list.

2) Lexical units that correspond to the writing style typi-cal of the discipline under analysis. These units tend to be lexical units from the discipline’s verbal tradition, pat phrases and locutions. Their appearance corre-sponds to a very low frequency index with respect to an analyzed text, yet since they are characteristic of certain scientific disciplines it is not advisable to elimi-nate them prior to content analysis. Here we may find locutions, phraseological units, Latinisms, etc.

3) Specialized lexical units or technical terms. Their use and meaning are typical of the discipline to which the

analyzed text corresponds, though they may also have the same signifier in the general language and even in other disciplines; in other words, they may have syno-nyms. This type of lexical units are recorded in general language dictionaries, (as in DRAE 2001 or DEM 2012) and in fact such units are terms in a loose sense. Forms of graphic words, just as they appear in the original text, tend to be few: feminine, masculine, singular and plural; they have a very low document content analysis fre-quency index in the common language, but as lexical units made into headwords (canonically grouped words) their percentage with respect to the total of the ana-lyzed sample increases. In other words, a small number of lexical units is grouped within a large number of headwords. In terms of their dispersion index in the DEEM, what was seen is that while they may be con-centrated by their use in a discipline, they may also show up in other disciplines within the sciences or technical fields or belong to scientific language, which encom-passes both knowledge areas. Among other things, they may be recognized, because even if they have a known signifier, the meaning differs from that in the natural language, which is why the average reader does not

Image 1. Lexical Units in the Document Content Analysis of a Text

Page 37: KO KNOWLEDGE ORGANIZATION · Knowl. Org. 42(2015)No.3 KO KNOWLEDGE ORGANIZATION Official Journal of the International Society for Knowledge Organization ISSN 0943 – 7444

Knowl. Org. 42(2015)No.3

G. Anguiano Peña and C. Naumis Peña. Method for Selecting Specialized Terms from a General Language Corpus

173

grasp their meaning and it seems like a secret. These units may appear simply or multi-verbally as, for exam-ple, in syntagmas, pat phrases or as phraseological units.

4) Candidates to be terminological units of the discipline analyzed. Their document behavior is very similar to technical terms, but they do not have synonyms and presuppose a univocal meaning. These units belong to standard and proper language, are used exclusively in the sciences or technical fields and have a spoken reg-istry that situates them in a formal communication mode so they are used exclusively in a specialized lan-guage, having no meaning or equivalent in common language. Such candidates may be simple lexical units or units composed of several words. They pertain ex-clusively to one discipline. In principle, candidates may be considered key words; after being validated by an information specialist they may become part of the indexing language, and in the best-case scenario they may be terms of a particular discipline, in the strict sense (Rey-Debove 1971). With a low appearance fre-quency level in text analysis, when lexical units are grouped together, they have a high percentage of headwords in relation to the total analysis. Since their data are concentrated in a single discipline, they do not undergo dispersion.

Considering all of this, it is also likely that in any docu-ment content analysis of a text, be it general, on science or on technology, and keeping in mind Rey-Debove’s per-spective (1971), the lexical units have the following char-acteristics regarding signifier, meaning and type of com-munication they belong to (see table 3):

Signifier* Meaning** Language Type A common signifier

and a com-mon meaning

form part of the general language.

An uncom-mon signi-fier

and a com-mon meaning

would be a technical term of signifier, for ex., close up, stock shot, fade out.

A common signifier

with an un-common meaning

is a technical term in a loose sense, for ex., winder, optic, camera.

An uncom-mon signi-fier

and an un-common meaning

would be a technical term in the strict sense, for ex., magnetic eraser, projection lamp, translucent screen system, animation tech-nique.

Table 3.

* Signifier is that which indicates something, in this study a word or lexical unit given to a person, animal, thing or con-cept, tangible or intangible, concrete or abstract, to distin-guish it from others.

** Meaning is that which is indicated, and for our purposes, the representation or mental concept of something.

Despite this coexistence of lexical and terminological units in a scientific text, they may be differentiated if their spoken registry is verified, confirming whether it is found in a form of communication or in a text that be-longs exclusively to a specialized language, that is whether it is proven to be the product of a formal communication between specialists in a particular discipline to ensure ef-fective communication among themselves.

As seen in the previous paragraph description of the process carried out, the lexical units analyzed originate from an empirical study developed by lexicography, which shows that something similar to what happens with any general language text occurs with texts from a specialized language. Both types of texts are composed, to a greater or lesser degree, of lexical units from the general language and not only specialized units from the sciences or technical fields. And while it may not seem so, these differences are actually useful for information re-trieval, as the terms to be extracted from the texts are not typical of the common language. 9.1 Example of the type of analysis made with the model Following the steps proposed in this article and isolating the headwords that correspond to the sciences and tech-nical fields contained in the model, the following results were produced (see chart 2).

This graph illustrates how 346,284 graphic words were automatically extracted from scientific texts that were manually indexed as 16,296 headwords in science texts. Of them, however, only 4,876 headwords were identified as exclusive to that area. As shown, the sciences obtained 16% of the total of 30,899 headwords taken from the corpus. With technical texts, 202,667 graphic words were automatically extracted and manually indexed as 10,821 headwords in technical texts, 1,574 of which were de-fined as exclusive to technical fields. This was 5% of the 30,899 total headwords taken from the corpus. With just 6,450 headwords, sciences and technical fields together covered 21% of the total of 30,899 headwords. 10.0 Final Considerations The model presented here or other lexicographic resources with similar characteristics may be useful in the near future, in computer-assisted indexing or as corpora monitors, with respect to new text analyses or specialized corpora. Their utilization would facilitate rapidly generating lists of term candidate signifiers, which, besides being useful for repre-senting and retrieving original text content, will also be of great value in the stage of development of the controlled language when working with the terms, uniterms, subject headings or descriptors that comprise the terminology of a

Page 38: KO KNOWLEDGE ORGANIZATION · Knowl. Org. 42(2015)No.3 KO KNOWLEDGE ORGANIZATION Official Journal of the International Society for Knowledge Organization ISSN 0943 – 7444

Knowl. Org. 42(2015)No.3

G. Anguiano Peña and C. Naumis Peña. Method for Selecting Specialized Terms from a General Language Corpus

174

discipline analyzed this way. Finally, we must keep in mind that natural and specialized languages are constantly evolv-ing, so there are, of course, difficulties in controlling and retrieving specialized languages and their terminologies. Consequently, the presence of library and information sci-ence and its development are that much more necessary, to help users and readers decode the language of science. References Alexiev, Boyan. 2006. “Terminology Structuring for

Learner’s Glossaries.” Knowledge Organization 33, no. 2: 96-118.

Anguiano Peña, Gilberto. 2006. Modelo sociolingüístico del léxico del español usado en México. Mexico: El Colegio de México, Centro de Estudios Lingüísticos y Literarios, Diccionario del Español de México.

Anguiano Peña, Gilberto. 2013. El léxico común del español de México. Mexico: El Colegio de México, Centro de Estudios Lingüísticos y Literarios, Diccionario del Español de México.

Cabré, M. Teresa, Iria da Cunha, Eric San Juan, Juan-Manuel Torres-Moreno and Jorge Vivaldi. 2014. “Automatic specialized vs. non-specialized text differ-entiation: the usability of grammatical features in a Latin multilingual context.” In Languages for Specific Pur-poses in the Digital Era, edited by E. Bárcena, T. Read and J. Arús, 223-41. Berlin: Springer.

Cabré, Maria Teresa. 1999. Terminology, Theory, Methods, and Applications. Barcelona: Benjamins.

Cabré, Maria Teresa. 2003. “Theories of terminology, their description, prescription and explanation.” Termi-nology 9, no. 2: 163–99.

CEMC. 1975. Corpus del Español Mexicano Contemporáneo, 1921-1974. María Isabel García Hidalgo, Luis Fer-nando Lara, Roberto Ham Chande et al., Mexico: Dic-cionario del Español de México.

CEMC. 2005. Corpus del Español Mexicano Contemporáneo, 1921-1974. Lematizado. Version by Gilberto Anguiano Peña, Francisco Segovia and Erika Flores García.

Mexico: El Colegio de México; UNAM, Instituto de Ingeniería. www.corpus.UNAM.mx/cemc/.

DEEM. 2005. Diccionario estadístico del español de México. Lematizado. Gilberto Anguiano Peña, Francisco Sego-via and Erika Flores (eds.). Mexico: El Colegio de México, Centro de Estudios Lingüísticos y Literarios, Diccionario del Español de México.

DEM. 2012. Diccionario del español de México. Mexico: El Colegio de México, Centro de Estudios Lingüísticos y literarios. http://dem.colmex.mx/moduls/Default.

aspx?id=8. DMLE. 2007. Diccionario Manual de la Lengua Española Vox.

Larousse Editorial. http://es.thefreedictionary.com/ tecnicismo. DTCE. 2014. Diccionario de términos clave de ELE. Spain,

Biblioteca Virtual Cervantes. http://cvc.cervantes.es/ ensenanza/biblioteca_ele/diccio_ele/diccionario/socio linguistica.htm.

Estopà, Rosa. 1998. “El Léxico Especializado en los Dic-cionarios de Lengua General: Las Marcas Temáticas.” Revista Española de Lingüística 28, no. 2: 359-87.

Gemma, Will. 2013. The Elements of Communications a Theo-retical Approach. https://blog.udemy.com/elements-of-communication/#wrap.

Halliday, Michael A. K. 1977. “Text as Semantic Choice in Social Contexts”. In Linguistic Studies of Text and Dis-course. Volume 2 in the Collected Works of M.A.K. Halliday, edited by J, J. Webster, 23-81. London: Con-tinuum.

Lara, Luis Fernando. 2001. Ensayos de Teoría Semántica: Lengua Natural y Lenguajes Científicos. Mexico: El Cole-gio de México.

Lara, Luis Fernando. 2007. Resultados Numéricos del Vocabu-lario Fundamental del Español de México. Mexico: El Cole-gio de México. http://dem.colmex.mx/moduls/

Default.aspx?id=14. Lara, Luis Fernando and Ham Chande, Roberto. 1979.

“Base Estadística del Diccionario del Español de México.” In Investigaciones Lingüísticas en Lexicografía, ed-ited by Luis Fernando Lara, Roberto Ham Chande and

Chart 2. Term Candidates Retrieved from a Total of 30,899 Headwords: 4,871 Lexical Units from the Sciences and 1,574 from Technical Fields

Page 39: KO KNOWLEDGE ORGANIZATION · Knowl. Org. 42(2015)No.3 KO KNOWLEDGE ORGANIZATION Official Journal of the International Society for Knowledge Organization ISSN 0943 – 7444

Knowl. Org. 42(2015)No.3

G. Anguiano Peña and C. Naumis Peña. Method for Selecting Specialized Terms from a General Language Corpus

175

María Isabel García Hidalgo, 7-39. Mexico: El Colegio de México. http://www.scielo.cl/scielo.php?pid=S07 18-09342008000200002&script=sci_arttext.

López Morales, Humberto. 2013. ¿Qué es la disponibili-dad léxica?. En DispoLex: investigación léxica. Recu-perado de: http://www.dispolex.com/info/la-disponi bilidad-lexica.

Rey-Debove, Josette. 1971. Etude Linguistique et Semiotique des Dictionnaires Français Contemporains. Paris: Mouton the Hague.

Ryan, Janet N. 1985 “The Secret Language of Science or, Radicals in the Classroom.” The American Biology Teacher 47, no. 2: 91.

Schultz, Claire K. 1968. H.P. Luhn: Pionner of Information Science; Selected Works. New York: Spartan Books.

Temmerman, Rita. 2000. Towards New Ways of Terminology Description: The Sociocognitive Approach. Amsterdam/ Philadelphia: John Benjamins.

Van Dijk, Teun A. 1977. Text and Context: Exploration in the Semantics and Pragmatism of Discourse. Linguistics Li-brary, no.21. London: Longman.

Varantola, Krista. 2003. “Linguistic Corpora (Database) and the Compilation of Dictionaries.” In A practical guide of Lexicography, edited by Piet van Sterkenburg, 228-39. Amsterdam: John Benjamin.

Wüster, Eugen. 1979. Introduction to the General Theory of Terminology and Terminological Lexicography. Wien: Springer.

Zipf, George Kingsley. 1949. Human Behavior and the Prin-ciple of Least Effort. Oxford, England: Addison-Wesley Press.

Page 40: KO KNOWLEDGE ORGANIZATION · Knowl. Org. 42(2015)No.3 KO KNOWLEDGE ORGANIZATION Official Journal of the International Society for Knowledge Organization ISSN 0943 – 7444

Knowl. Org. 42(2015)No.3

K. Oh, S. Joo, E.-J. Jeong. Online Consumer Health Information Organization: Users’ Perspectives on Faceted Navigation

176

Online Consumer Health Information Organization: Users’ Perspectives on Faceted Navigation

Kyong Eun Oh*, Soohyung Joo**, Eun-Ja Jeong***

*School of Library and Information Science, Simmons College, 300 The Fenway, Boston, MA 02446, <[email protected]>

**School of Library and Information Science, University of Kentucky, 352 Lucille Little Library, Lexington, KY 40506, <[email protected]>

***Department of Food and Nutrition, Eulji University, 553 Sanseong-daero, Sujeong-gu, Seongnam-si, Gyeonggi-do 461-713 Korea <[email protected]>

Kyong Eun Oh, PhD, is an assistant professor at the School of Library and Information Science (SLIS) at Simmons College in Boston. Oh’s research interests include categorization, information organizing behavior, and personal information management (PIM).

Soohyung Joo, PhD, is an assistant professor in the School of Library and Information Science at the Univer-sity of Kentucky. His main research areas include digital libraries, information retrieval, information system de-sign, and data analytics. He can be contacted at: [email protected].

Eun-Ja Jeong, PhD, is a professor in the Department of Food and Nutrition Science at Eulji University in South Korea. Her research interests include nutrition and health, food chemistry, and dietary culture. She is the author of seven books and numerous research articles.

Oh, Kyong Eun, Joo, Soohyung, and Jeong, Eun-Ja. Online Consumer Health Information Organization: Users’ Perspectives on Faceted Navigation. Knowledge Organization. 42(3), 176-186. 32 references. Abstract: We investigate facets of online health information that are preferred, easy-to-use and useful in ac-cessing online consumer health information from a user’s perspective. In this study, the existing classification structure of 20 top ranked consumer health information websites in South Korea were analyzed, and nine fac-ets that are used in organizing health information in those websites were identified. Based on the identified facets, an online survey, which asked participants’ preferences for as well as perceived ease-of-use and useful-ness of each facet in accessing online health information, was conducted. The analysis of the survey results showed that among the nine facets, the “diseases & conditions” and “body part” facets were most preferred, and perceived as easy-to-use and useful in accessing online health information. In contrast, “age,” “gender,” and “alternative medicine” facets were perceived as relatively less preferred, easy-to-use and useful. This re-search study has direct implications for organization and design of health information websites in that it sug-gests facets to include and avoid in organizing and providing access points to online health information.

Received: 11 February 2015; Revised 25 April 2015; Accepted 1 May 2015

Keywords: information, health, facets, websites, online, consumer, conditions, navigation

1.0 Introduction Online consumer health information websites, which spe-cifically provide health information to the general public,

have become prevalent in our lives with the growing number of online health information (Robins et al. 2010). A Pew Charitable Trust survey found that approximately 80% of American Internet users, which are about 113 mil-

Page 41: KO KNOWLEDGE ORGANIZATION · Knowl. Org. 42(2015)No.3 KO KNOWLEDGE ORGANIZATION Official Journal of the International Society for Knowledge Organization ISSN 0943 – 7444

Knowl. Org. 42(2015)No.3

K. Oh, S. Joo, E.-J. Jeong. Online Consumer Health Information Organization: Users’ Perspectives on Faceted Navigation

177

lion adults, have searched online health information (Fox 2006), indicating that there are many people who use online health information. When people feel curious about certain health problems about themselves or others, they can simply type in their search key words on the Web to get search results from various sources including con-sumer health websites, or directly visit such websites.

Like any other websites, online consumer health web-sites are organized in certain ways to facilitate users’ find-ing health information that resides in them. However, the organization of consumer health information websites is not standardized, and they are often designed without un-derstanding the theoretical principles of information or-ganization. Although online health information has a lot of advantages, including fast access and anonymity, the use of information can be hindered because of disorgani-zation of information on the Web (Cline and Haynes 2001). Moreover, consumer health information websites are often organized without an in-depth understanding of a user’s preferences, perceived easiness and usefulness in accessing information. In the case of websites that utilize a faceted approach to information organization, facets are often provided without investigating the user’s preferences for facets in accessing health information, although there is a possibility that the users might find those facets diffi-cult to use in accessing needed information. Russell-Rose and Tate (2013) argued that when designing a website to support finding information, it is crucial to identify and understand the users so that a website can be designed in a way that meets users’ needs. Cline and Haynes (2001, 684) also stated that “the basic premise behind the ease of use is designing a website that builds on the user’s per-spective,” and “navigability is facilitated by organizing and grouping ideas and information by categories that make sense from the consumer’s perspective.” Thus, it is impor-tant to understand which facets users prefer to use, per-ceive as easy-to-use and useful in accessing health infor-mation on websites.

We aim to examine facets that are preferred, easy-to-use and useful in accessing online consumer health infor-mation from a user’s perspective so that online health in-formation can be organized in a way that facilitates users finding needed information. In this study, we analyzed the organization of current online consumer health informa-tion websites in Korea to identify facets that are used in organizing online health information. Then, we conducted an online survey to examine users’ preferences, perceived easiness and usefulness of those facets that we identified from the website analysis. The research questions of this study are as follows:

RQ1. What facets are currently used in online con-sumer health information websites?

RQ2. Which facets do users prefer in accessing online consumer health information?

RQ3. Which facets do users perceive as easy-to-use in accessing online consumer health information?

RQ4. Which access facets do users perceive as useful in accessing online consumer health information?

This paper is organized as follows: In the next section, we provide background information and review the literature and related works on online information organization, online health information, and online health information organization. This is followed by Section 3, which explains our research methodology. In Section 4, we present our re-sults along with supportive data for our findings and in-clude a discussion of these results. Section 5 summarizes our work and presents our conclusions. 2.0 Background and Related Work 2.1 Online information organization Websites are organized in certain ways in order to facili-tate users’ access to information on the websites, either by providing general directories or several access points (Zhang et al. 2009). On the one hand, some websites provide subject directories, which allow people to browse information by categories and sub-categories. This inter-face is developed based on the principle of hierarchical classification structure, which begins with broad, top-level categories that branch into more specific, subordi-nate-level categories. On the other hand, other websites use interfaces utilizing the principle of faceted classifica-tion by providing multiple access points in which each access point works as a facet, “a generic term used to de-note any component of a compound subject, also its ranked forms, terms and numbers” (Ranganathan 1967, 88). On websites, facets are sometimes called “catego-ries,” and typically they are used as navigation or search tools (La Barre 2006b, 181, 183). When a faceted ap-proach is used, users can navigate information on the website by continuously refining their choices in each facet (Russell-Rose and Tate 2013). Compared to hierar-chical classification structure, faceted classification struc-ture is more flexible, and provides a more refined repre-sentation of information (Hudon 2010). Particularly, fac-ets allow the expression of various aspects of informa-tion resources and provide multiple access points, which make them especially effective in organizing information on the Web. A faceted display of the classification also encourages users to articulate different aspects of their information need (Tang 2007). Because of such benefits, there has been growing interest in using a faceted ap-proach to organize digital information resources (Priss

Page 42: KO KNOWLEDGE ORGANIZATION · Knowl. Org. 42(2015)No.3 KO KNOWLEDGE ORGANIZATION Official Journal of the International Society for Knowledge Organization ISSN 0943 – 7444

Knowl. Org. 42(2015)No.3

K. Oh, S. Joo, E.-J. Jeong. Online Consumer Health Information Organization: Users’ Perspectives on Faceted Navigation

178

and Jacob 1999; Ellis and Vasconcelos 2000; Broughton 2002; Yoo 2004; La Barre 2006a; La Barre 2006b; Tang 2007; Choi 2009). In addition, several researchers devel-oped faceted classification schemes that can be used in classifying information resources on the Web (Van der Walt 2004; Zins and Guttmann 2000; Kwak 2001; Hudon 2010). However, little research has suggested faceted clas-sification for online health information based on the analysis of users’ preferences. 2.2 Online health information Currently, the interests and uses of online health informa-tion are increasing. This is not only because the number of people who find health information on the Web is growing (Fox 2006), but also because online health information has various advantages for the users. Online health informa-tion is accessible at any time, can be accessed by lay people, and can be acquired anonymously (Fox and Rainie 2000; Thobaben 2002; Eriksson-Backa 2003; Cotton and Gupta 2004). Thus, in recent years, online health information has become one of the serious areas of research studies.

Among studies on online health information, a number of studies examined the quality and credibility of online health information (Berland et al. 2001; Eastin 2001; Ey-senbach et al. 2002; Thobaben 2002; Dutta-Bergman 2004; Stvilia et al. 2009; Robins et al. 2010). In these studies, the researchers found that the quality of health information websites vary, and many sites lack quality and credibility so that they often include incomplete, inaccurate, confusing, and conflicting information on a topic (Berland et al. 2001; Eysenbach et al. 2002; Thobaben 2002; Hesse et al. 2005). Some researchers examined the factors that impact the credibility of the consumer health information websites. For instance, Robins et al. (2010) investigated the relation-ship between visual design and the perception of credibil-ity of consumer health information websites. Robins et al. (2010, 25) asked 34 participants to rate preferences for vis-ual designs of 31 consumer health information websites as well as credibility, and found that the visual designs of the consumer health information websites are related to the perceived credibility of the sites. Eastin (2001) investigated the relationship between source expertise, topic knowl-edge, and perceived credibility of online health informa-tion by asking 125 participants to assess health-related websites with a known topic (HIV) and an unknown topic (syphilis), and health information provided from an expert (doctor) and a non-expert (high school freshman). In this study, the author found that both topic knowledge and source expertise affect the perception of the credibility of health information websites. In a similar vein, Dutta-Bergman (2004) found that the completeness of consumer health information strongly impacts the online consumer

health information’s credibility judgments. Eysenbach et al. (2002) reported that accuracy, completeness, readability, design, disclosures, and references are the most frequently cited criteria in assessing the quality of consumer health in-formation on the Web.

There have been also studies that examined online health information searching behaviors of various user groups (Erikson-Backa 2003; Cotton and Gupta 2004; Warner and Procaccino 2004; Crystal and Greenberg 2006; Given et al. 2007; Ybarra and Suman 2008). For instance, Given et al. (2007, 615) examined senior users’ online health information seeking behaviors with a special focus on the usability of the health information websites and senior users’ searching strategies by interviewing 12 par-ticipants. Given et al. reported that senior users regard people (e.g., physicians, pharmacists) and the Internet as good resources for health information. Also, the research-ers suggested that online health information websites need to have big and clear images for the senior users and in-clude information about drug interactions as well as side effects. Warner and Procaccino (2004) paid special atten-tion to female users and explored their health information seeking behaviors by conducting a survey with 119 female participants. Warner and Procaccino (2004, 720) found that women are active health information seekers, and they had conflicting opinions about easiness of finding health in-formation, usefulness of the health information they found, and whether their information questions had been answered. Consumer health information in Warner and Procaccino’s study (2004) included both physical and online health information, and in terms of online health informa-tion, the researchers stated that while female users often use online health information, they were not particularly good at finding qualified resources. Ybarra and Suman (2008, 93) investigated gender and age differences in using online health information. By analyzing national data which sur-veyed Americans by phone, the researchers found differ-ences between gender as well as age in reasons for finding online health information, evaluating online health informa-tion and taking actions after finding online health informa-tion. Similarly, to explore whether there are any differences in health information seeking behaviors based on users’ gender, age, education, occupation, and health status, Erik-son-Backa (2003, 95) interviewed 50 participants who are (1) pregnant women, (2) people who have diabetes, and (3) people who are not pregnant and do not have diabetes. Erikson-Backa (2003, 93) found that both pregnant women and people with diabetes were more active in seeking online health information than people who are not pregnant and do not have diabetes. In terms of gender, men were more active in reading while women were more active in discus-sions. In the case of age, younger participants were more active than older participants.

Page 43: KO KNOWLEDGE ORGANIZATION · Knowl. Org. 42(2015)No.3 KO KNOWLEDGE ORGANIZATION Official Journal of the International Society for Knowledge Organization ISSN 0943 – 7444

Knowl. Org. 42(2015)No.3

K. Oh, S. Joo, E.-J. Jeong. Online Consumer Health Information Organization: Users’ Perspectives on Faceted Navigation

179

As shown above, there have been a number of studies on online health information that investigated the quality and credibility of online health information and explored information seeking behaviors of various user groups. However, fewer studies focused on the organization of online health information so that little is known about the classification structure of health information on the websites as well as users’ preferences for the organization of online health information. Tang’s (2007) study is one of the few studies that is related to organization of health information. In this study, the researcher examined whether an interface utilizing the principle of faceted classification is useful in browsing and searching informa-tion in PubMed. In this study, Tang (2007) found that participants preferred to use query submission methods together with the faceted classification display than only using query submission methods, indicating that faceted classification supports users’ information access. Zhang et al. (2009) is another study that focused on the organi-zation of online health information organization. In this study, Zhang’s team analyzed health subject clusters based on the users’ transaction log data on the health subject di-rectory of health information websites. However, which facets facilitate accessing online health information for the users are not fully investigated yet. When designing a website using a faceted approach, it is critical to focus on user need, and user research is key in achieving this (La Barre 2006b, 155). Investigating which facets are pre-ferred, easy-to-use and useful in accessing online health information is important to make online health informa-tion more accessible so that users can take full advantage of health information on the Web. 3.0 Methodology To answer to the identified research questions, this study employed two-fold methodological approaches: analysis of classification structures of existing health information websites, and an online user survey. 3.1 Classification structure analysis To analyze the classification structures that are currently used in organizing consumer health information on web-sites, the authors selected 20 websites that are the most frequently used consumer health information sites in South Korea. Top 20 sites were identified based on the usage statistics from rankey.com, which offers site usage statistics for websites in various fields in South Korea. The sample health information websites were visited in October 2011. The ranking numbers, the name of the websites, and their URLs are provided in Table 1.

Ranking Website Name URL 1 Health Chosun health.chosun.com

2 Kormedi Dot Com

www.kormedi.com

3 Hidoc www.hidoc.co.kr 4 Gungang IN hi.nhic.or.kr 5 MK Health www.mkhealth.co.kr 6 Joins MSN healthcare.joinsmsn.com 7 Vitamin MD www.vitaminmd.co.kr 8 eHospital www.clinic.co.kr 9 Medcity www.medcity.com 10 Doctor Korea duser.doctorkorea.com 11 Think Medi www.thinkmedi.com/

12 365 Homecare www.365homecare.com/ main.html

13 Health Korea health.korea.com/ 14 Cy Medi www.cymedi.com/ 15 Wise Women www.wisewoman.co.kr/ 16 Health MBC www.healthmbc.com/

17 Korean Medi www.koreanmedi.com/ html/

18 My Doctor www.mydr.or.kr/ 19 Doctor www.doctor.co.kr/ 20 Health Hankyung health.hankyung.com/

Table 1. Sample websites As consumer health information websites often include information and materials other than health information, such as advertisements, in each site, web pages that spe-cifically contain health information were analyzed for this study. Often, these were under the name of “health/ medical information,” “health/medical information cen-ter” or “health/medical encyclopedia.” To identify facets that are used in organizing online health information, the researchers developed an initial coding scheme that in-cludes definitions of possible facets and examples of each facet. Then, this coding scheme was further developed by carefully going through the sample websites. In particular, since each website used different terminologies to repre-sent each facet, it was necessary for the authors to develop categories which could group similar facets, and select a terminology for each category in a way that best repre-sented facets in each category. To develop these facet categories, the authors started with few examples of facets that are used in selected websites, and then continued to review more facets that are used in sample websites. In this process, some categories were eliminated, new catego-ries emerged and some categories were combined or split. Then, the authors gave definitions to each facet category so that facets can be easily categorized for analysis. In ad-dition, examples for each category were included. This be-came a final coding scheme for facet analysis in current health information websites for this study. The final ver-sion of the coding scheme is presented in Table 2.

Page 44: KO KNOWLEDGE ORGANIZATION · Knowl. Org. 42(2015)No.3 KO KNOWLEDGE ORGANIZATION Official Journal of the International Society for Knowledge Organization ISSN 0943 – 7444

Knowl. Org. 42(2015)No.3

K. Oh, S. Joo, E.-J. Jeong. Online Consumer Health Information Organization: Users’ Perspectives on Faceted Navigation

180

Facet Category Definition Examples Diseases & Conditions

Name of a dis-ease, or a medical condition associ-ated with a spe-cific symptom.

Breast cancer, Depression, HIV/AIDS, High blood pressure

Medical Specialty

The branch of medicine.

Otolaryngology, Dentistry, Pediatrics

Body Part Part or organ of a body.

Head, Feet, Eyes

Diagnostics Name of test or process which at-tempts to identify a possible disease or disorder.

Allergy tests, MRI, Pregnancy test

Treatments & Procedures

Name of remedia-tion of a disease or disorder.

Aortic valve surgery, Joint replacement, Radiation therapy

Nutrition Name of nutri-ents.

Vitamins, Magnesium

Age Specific age group.

Children’s health, Teens’ health, Seniors’ health

Gender Specific gender. Men’s health, Women’s health

Complementary & Alternative Medicine

Name of healing practice that does not fall within the realm of conven-tional medicine.

Oriental medicine, Osteoarthritis Alternatives

Table 2. Coding scheme for website analysis Based on this coding scheme, facets in sample websites were analyzed. In particular, for each site, the researchers analyzed:

1. Facet categories used in each site 2. The number of facet categories 3. The number of sub-facets in each facet. 4. The minimum, maximum, and average number of

facets 5. The minimum, maximum, and average number of

sub-facets 6. The depth of classification structures

Here, “sub-facet” refers to sub-categories under each facet category. For example, in the case of “Gender” facet category, most websites included “women” and “men” as sub-facet categories. In addition, the depth of classification structure refers to the level of hierarchy within each facet category. For instance, if a facet in-cluded a sub-facet, and sub-sub facet, the depth of the classification structure was marked as “3” as there were three levels of hierarchy in this facet.

3.2 Online survey An online user survey was conducted using a questionnaire sent to 96 participants representing the general public in South Korea. Convenience sampling was employed, and we sent requests through email. We considered balancing gender and age as much as possible to better represent the general public. Twelve responses that were incomplete were excluded from the analysis. In total, 84 valid re-sponses were collected. Among those 84 participants, 37 (44%) were male and 47 (56%) were female. As for age groups, participants who were in their 20s were 21 (25%), 30s were 22 (26%), 40s were 14 (17%), 50s were 14 (17%), and 60s were 13 (15%).

The questionnaire was designed to investigate users’ preferences for and perceptions of ease-of-use and useful-ness of the identified facets that are currently used in con-sumer health information sites in Korea. In the question-naire, participants were asked to rate their preferences, per-ceptions of ease-of-use and usefulness of the selected fac-ets using a five-point scale ranging from 1 to 5. The data were analyzed with descriptive statistics, including means and standard deviations, to identify which facets are pre-ferred, easy-to-use and useful. In addition, a Kendall’s tau-b correlation test was conducted to examine correlations among the three perceptions of facets – perceived prefer-ence, easiness, and usefulness. 4.0 Results 4.1 Facets used in current consumer health information sites By analyzing the organization of 20 consumer health in-formation sites, the following nine facet categories were identified: 1) diseases & conditions (e.g., breast cancer, de-pression); 2) medical specialty (e.g., otolaryngology, den-tistry); 3) body part (e.g., head, feet); 4) diagnostics (e.g., al-lergy tests); 5) treatments & procedures (e.g., adrenal sur-gery, aortic valve surgery); 6) nutrition (e.g. vitamins, mag-nesium); 7) age (e.g., children’s health, seniors’ health); 8) gender (e.g., men’s health, women’s health); and, 9) com-plementary & alternative medicine (e.g., oriental medicine). Table 3 shows identified facet categories and an indication of which website included which facet category. In this ta-ble, websites are displayed in their ranking numbers. Facets that are categorized as “Others” are unique facets that were used only in one website. Those facets included emergency, drug, topic, and special users.

Among nine facet categories, “diseases & conditions” was the most frequently used to provide an access point to online health information. Among 20 health informa-tion websites, 10 of the websites (50%) used facets that can be categorized into “diseases & conditions.” In addi-

Page 45: KO KNOWLEDGE ORGANIZATION · Knowl. Org. 42(2015)No.3 KO KNOWLEDGE ORGANIZATION Official Journal of the International Society for Knowledge Organization ISSN 0943 – 7444

Knowl. Org. 42(2015)No.3

K. Oh, S. Joo, E.-J. Jeong. Online Consumer Health Information Organization: Users’ Perspectives on Faceted Navigation

181

tion, “medical specialty” (30%) and “body part” (20%) also frequently were used.

Number of facet categories, sub-facets, and the depth of the classification structure for each website along with their average, median and range are presented in Table 4. In this table, websites are displayed in their ranking num-bers.

As shown in Table 4, the number of facets used in each site ranged from 1 to 5, and the mean number of facets was 2.2 while the median number was 2. Thus, it seemed that quite a few were being used to provide ac-cess to health information on the websites. In the case of sub-facets, they ranged from 0 to 51, which showed that the number of sub-facets for each facet varies. However, the average number, which was 9, and the median num-ber, which was 5, indicated that few sites with many sub-facets impacted the range. In fact, among 45 facets, only 16 facets (36%) had more than 10 sub-facets, which mean that 29 facets (64%) included less than 10 sub-facets. As shown in the table, there was one website which did not use any sub-facets. In this case, if a user clicks one of the facets, such as “Diseases & Conditions,” the user encoun-ters a list of thousands of health information topics that are not organized by any sub-facets. In the case of the facet that had 51 sub-facets, the website organized the “Nutrition” facet by using 51 sub-facets, where each of the sub-facets was a nutrient such as “vitamin B1,” “vi-tamin C,” and “vitamin K.” The depth of the classifica-tion structure ranged from 1 to 5, the mean number of the depth was 2.4 and the median was 2, which indicates that most sites have multiple levels of hierarchy for each facet. However, there was only one facet category in

which the depth was 5, and most facet categories did not have a deep classification structure. In addition, it seemed that there were no differences in the depth of classifica-tion structure among the facet categories. 4.2 Users’ perception of preference, ease-of-use and usefulness First, user preference was investigated using a five-point scale. Table 5 presents users’ ratings of preference about which facets they like to use to access health information in a consumer health information site. The results indi-cate that users preferred “diseases & conditions” (4.11), “body part” (3.76), and “nutrition” (3.33) the most, ranked 1st, 2nd, and 3rd respectively. On the contrary, “al-ternative medicine” (2.62) and “gender” (2.67) turned out to be least preferred facets to users. The results reveal that the most widely used facets were also preferably se-lected, such as “diseases & conditions” and “body part.” However, users’ demographic characteristics, such as “age” and “gender,” were relatively less preferred. Inter-estingly, “alternative medicine” was selected as the least preferable facet even though oriental medicine is widely available in Korea.

Secondly, the survey participants were asked to rate per-ceived ease-of-use. Similar to the previous result, “diseases & conditions” (4.11), “body part” (3.81), and “nutrition” (3.21) were ranked 1st, 2nd, and 3rd respectively. “Medical specialty” (3.18) and “diagnostics” (3.18) were tied at the 4th rank in terms of ease-of-use. Then, “gender” (3.08) and “age” (3.05) followed, ranking 6th and 7th respectively. However, users thought that “treatments & procedures” (2.83) and “alternative medicine” (2.64) would be less easy-

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

Diseases & Conditions

o o o o o o o o o o

Medical Specialty

o o o o o o

Body Part o o o o

Diagnostics o o o

Treatments & Procedures

o o o

Nutrition o o o

Age o o o

Gender o o o

Complementary & Alternative Medicine

o o

Others o o o o o o o o

Table 3. Facets used in 20 top-ranked consumer health information websites.

Page 46: KO KNOWLEDGE ORGANIZATION · Knowl. Org. 42(2015)No.3 KO KNOWLEDGE ORGANIZATION Official Journal of the International Society for Knowledge Organization ISSN 0943 – 7444

Knowl. Org. 42(2015)No.3

K. Oh, S. Joo, E.-J. Jeong. Online Consumer Health Information Organization: Users’ Perspectives on Faceted Navigation

182

Sample Website Facet Categories Number of Facet Categories

Number of Sub- facets

Depth of classification structure

Diseases & Conditions 20 2 Medical Specialty 23 2 1

Diseases & Conditions

3

2 2 Diseases & Conditions 20 2

Medical Specialty 22 2 2

Body Part

3

11 3 Diseases & Conditions 0 1

Diagnostics 0 1 Treatments & Procedures 0 1

3

Nutrition

4

0 1 4 Others (Korean alphabets) 1 14 2 5 Medical Specialty 1 10 2 6 Body Part 1 7 3

Diseases & Conditions 14 1 Diagnostics 14 1

Treatments & Procedures 2 2 Nutrition 51 1

7

Others (drug)

5

2 2 8 Treatments & Procedures 1 5 1

Diseases & Conditions 5 2 Age 2 2 9

Gender

3

2 2 Body Part 12 3 Nutrition 3 2

Age 2 3 Others (topic) 19 2

10

Others(health functional food)

5

1 2 11 Medical Specialty 1 25 2

Age 1 2 12

Gender 2

2 2 Diseases & Conditions 3 4

Diagnostics 3 4

Others (subject) 3 4 13

Others (theme)

4

8 3 Diseases & Conditions 1 4

Complementary & Alternative Medicine

1 4 14

Others (medicine)

3

3 5 15 Diseases & Conditions 1 9 4

Medical Specialty 17 3 16 Complementary & Alternative

Medicine 2

7 3

17 Medical Specialty 1 14 2 18 Medical Specialty 1 23 3 19 Others (professional clinic) 1 8 3

Diseases & Conditions 8 3 20

Others (medical information) 2

5 2 Mean 2.2 9.0 2.4

Median 2 5 2

Range

4 51 4

Table 4. Facet categories, sub-facets, and the depth of classification structure

Page 47: KO KNOWLEDGE ORGANIZATION · Knowl. Org. 42(2015)No.3 KO KNOWLEDGE ORGANIZATION Official Journal of the International Society for Knowledge Organization ISSN 0943 – 7444

Knowl. Org. 42(2015)No.3

K. Oh, S. Joo, E.-J. Jeong. Online Consumer Health Information Organization: Users’ Perspectives on Faceted Navigation

183

Facet Mean Rank Standard deviationDiseases & Conditions

4.11 1 0.92

Medical Specialty 3.02 6 1.09 Body Part 3.76 2 0.90 Diagnostics 3.19 4 1.12 Treatments & Procedures

3.08 5 1.12

Nutrition 3.33 3 1.13 Age 3.00 7 1.11 Gender 2.67 8 1.13 Alternative Medicine 2.62 9 1.12

Table 5. Users’ perception of preference of facets

Facet Mean Rank Standard deviation

Diseases & Conditions

4.11 1 0.81

Medical Specialty 3.18 4 0.89

Body Part 3.81 2 0.90

Diagnostics 3.18 4 1.05

Treatments & Procedures

2.83 8 1.14

Nutrition 3.21 3 1.12

Age 3.05 7 0.99

Gender 3.08 6 1.01

Alternative Medicine 2.64 9 0.93

Table 6. Users’ perceived ease-of-use of facets to-use. One thing notable is that “alternative medicine” was also ranked at the lowest in the case of easy-to-use facet. Users regarded the facet of “alternative medicine” as the least preferable and least easy-to-use facet when they access health information on the Web. Table 6 dis-plays users’ ratings of perceived ease-of-use of the se-lected facets.

Facet Mean Rank Standard deviationDiseases & Conditions

4.27 1 0.72

Medical Specialty 3.40 4 0.84

Body Part 3.60 2 0.84

Diagnostics 3.48 3 0.96

Treatments & Procedures 3.35 5 1.05

Nutrition 3.33 6 1.09

Age 2.98 7 0.92

Gender 2.83 8 1.00

Alternative Medicine 2.76 9 1.03

Table 7. Users’ perceived usefulness of facets Third, we investigated which facets users would consider useful when accessing health information online. Table 7 presents users’ ratings of perceived usefulness of the se-lected facets. The top two most useful facets turned out to be “diseases & conditions” (4.27) and “body part” (3.60). These two facets were also chosen as the most preferred and ease-to-use by the participants. Also, the values of standard deviations for these two facets were relatively low (“diseases & conditions” s.d. = 0.72; “body part” s.d. = 0.84). This result reveals that the participants’ responses were more consistent in rating these facets as useful access points. “diagnostics” (3.48; 3rd) and “medi-cal specialty” (3.40; 4th) were also perceived as highly useful facets in using consumer-level health information. “Nutrition” (3.33), which was placed as the 3rd in prefer-ence and ease-of-use, was ranked 5th in terms of useful-ness. Users preferred to use “nutrition” information and thought it as an easy-to-use facet, but it was regarded as less useful comparatively. Both “age” (2.98) and “gender” (2.83) were relatively rated low in their usefulness as a

Preference Ease-of-use Usefulness Facet

Mean Rank Mean Rank Mean Rank

Diseases & Conditions 4.11 1 4.11 1 4.27 1

Medical Specialty 3.02 6 3.18 4 3.40 4

Body Part 3.76 2 3.81 2 3.60 2

Diagnostics 3.19 4 3.18 4 3.48 3

Treatments & Procedures 3.08 5 2.83 8 3.35 5

Nutrition 3.33 3 3.21 3 3.33 6

Age 3.00 7 3.05 7 2.98 7

Gender 2.67 8 3.08 6 2.83 8

Alternative Medicine 2.62 9 2.64 9 2.76 9

Table 8. Users’ perceived preference, ease-of-use, and usefulness of facets

Page 48: KO KNOWLEDGE ORGANIZATION · Knowl. Org. 42(2015)No.3 KO KNOWLEDGE ORGANIZATION Official Journal of the International Society for Knowledge Organization ISSN 0943 – 7444

Knowl. Org. 42(2015)No.3

K. Oh, S. Joo, E.-J. Jeong. Online Consumer Health Information Organization: Users’ Perspectives on Faceted Navigation

184

facet. Once again, “alternative medicine” (2.76) was per-ceived least useful when using consumer health informa-tion.

This study also compared users’ ratings of the three difference perceptions—preference, ease-of-use, and use-fulness. As shown in Table 8, in all the three cases, “dis-eases & conditions” was selected as the top, with the rat-ing scores higher than 4 point out of 5. That is, “diseases & conditions” is considered to be the most preferred, easy-to-use, and useful facet in organizing information in consumer health information sites. Also, “body part” was selected as the second ranked in all three different types of perceptions investigated in this study. In particular, the score of ease-of-use (3.81) was relatively high compared to preference (3.76) and usefulness (3.60). The facet of “diagnostics” was also ranked relatively high in all three perceptions. In the case of “nutrition,” users thought it preferable and easy-to-use when using consumer health information. However, it was perceived less useful by be-ing ranked 6th. On the other hand, the “treatments & procedures” facet was ranked 5th in both preference and usefulness respectively, but participants thought it less easy-to-use when accessing health information. The fac-ets of “age” and “gender” were relatively less preferred and considered less useful. The “alternative medicine” facet was the least preferred, easy-to-use, and useful amongst the facets suggested in this study.

A Kendall’s tau-b correlation analysis result implies that all three perceptions, ease-of-use, preference, and usefulness, would be closely related to each other (p<0.05). Correlation coefficients turned out to be high in all three pairs observed by showing over 0.6. In par-ticular, the correlative relationship between preference and usefulness was the highest (τb= .761, p<0.01), which implies useful facets would be preferred by users. On the contrary, the correlation coefficient between ease-of-use and usefulness was relatively low (τb= .648, p<0.05). The correlation table is displayed in Table 9.

Preference Ease-of-use Ease-of-use .761** Usefulness .778** .648*

Table 9. Kendall”s tau-b correlation among three perceptions of facets (* p<0.05; ** p<0.01) 5.0 Discussion and Conclusions We investigated the organization of online health infor-mation by analyzing facets that are currently used in or-ganizing 20 top-ranked consumer health information websites in South Korea. Each site offered different types and number of facets to provide access points to online health information, and often used varying terminologies

for similar facets. In this study, by grouping similar facets into categories, nine facets were identified. Among nine facets, the “diseases & conditions” and “medical spe-cialty” facets were the most widely used facets among the websites. On average 2.2 facets were provided, which means that only a small number of facets are imple-mented as access points to online health information in the sites. A further analysis of the facets showed that the numbers of sub-facets varied greatly by websites. In the case of the depth of organizational structure, the average number was 2.4, which mean that the level of hierarchy for each facet is not very deep. Overall, the analysis re-vealed that there is little consistency in organizational structure across the selected online health information sites, which can make finding health information difficult for the users.

Based on the identification of facets, this study sur-veyed users’ perceptions of those facets with regards to preference, ease-of-use, and usefulness. The survey re-sults explicitly uncovered that “diseases & conditions” was the most preferred, easy-to-use, and useful when ac-cessing health information in the perspective of users. The “body part” facet, such as names of organs or body parts, was also preferred and perceived as easy-to-use and useful. Users showed their preference for the “nutrition” facet, and they also thought it would be easier to use as well. However, they did perceive it less useful. Consider-ing the fact that the most widely used facets in health in-formation websites were “diseases & conditions” and “medical specialty,” current health information websites seemed to reflect users’ preferences to some extent; how-ever, more thorough and deeper understanding is neces-sary. On the other hand, two key demographic facets, “age” and “gender,” were relatively less preferred, easy-to-use, and useful by the subjects. Even though these two facets serve as the basic access points, users were less likely to prefer these facets in using health information. Finally, “alternative medicine” turned out to be least pre-ferred, easy-to-use, and useful according to users’ percep-tions. Even though Korean health information sites in-clude a fair amount of oriental and alternative medicine information, users were less likely to prefer this facet when accessing online health information.

These findings could yield direct implications in orga-nizing health information. For example, when designing a consumer health information site, “diseases & conditions” and “body part” should be given priority among other facets. In designing a health information site, these two facets should be the basic access points to help users reach complex health information. Also, “diagnostics” and “nutrition” should be emphasized in guiding users to ac-cess the appropriate information they want. The “treat-ments & procedures” facet was also widely used, and us-

Page 49: KO KNOWLEDGE ORGANIZATION · Knowl. Org. 42(2015)No.3 KO KNOWLEDGE ORGANIZATION Official Journal of the International Society for Knowledge Organization ISSN 0943 – 7444

Knowl. Org. 42(2015)No.3

K. Oh, S. Joo, E.-J. Jeong. Online Consumer Health Information Organization: Users’ Perspectives on Faceted Navigation

185

ers would benefit from facet categories related to the “treatments & procedures.” However, users felt that the “treatments & procedures” facet would be relatively diffi-cult to use. Therefore, it is required to help users to use this type of facet more easily by providing more user-oriented terms in describing treatment and procedures in the medical area. La Barre (2006b, 187) also emphasized the importance of selecting terminologies that are familiar to the user. Although significant correlations were ob-served, there were gaps amongst preference, ease-of-use, and usefulness. This implies that we should consider dif-ferent aspects of user perceptions in designing a faceted structure of a health information site at the consumer level. When consumer health information websites are or-ganized and designed based on a thorough understanding of a user’s perspective, it will not only make health infor-mation on websites more accessible, but also will likely provide more consistency across health information web-sites, which will further support accessing and utilizing health information on the Web. References Berland, Gretchen, Marc Elliott, Leo Morales and et al.

2001. “Health Information on the Internet: Accessibil-ity, Quality and Readability in English and Spanish.” Journal of the American Medical Association 285: 2612-21.

Broughton, Vanda. 2002. “Facet Analytical Theory as a Basis for Knowledge Organization Tool in a Subject Portal.” In Proceedings of the 7th International ISKO Con-ference 10-13 July 2002 Granada, Spain, edited by Maria Lopez-Huertas. Advances in Knowledge Organization 8. Würzburg: Ergon Verlag, 135-42.

Crystal, Abe and Jane Greenberg. 2006. “Relevance Crite-ria Identified by Health Information Users during Web Searches.” Journal of the American Society for Information Science and Technology 57: 1368-82.

Choi, Yunseon. 2009. “Making Faceted Classification More Acceptable on the Web: A Comparison of Fac-eted Classification and Ontologies.” In Proceedings of the American Society for Information Science and Technology 45: 1-6.

Cline, Rebecca and Katherine Haynes. 2001. “Consumer Health Information Seeking on the Internet: The State of the Art.” Health Education Research 16: 671-92.

Cotten, Shelia and Sipi Gupta. 2004. “Characteristics of Online and Offline Health Information Seekers and Factors that Discriminate Between Them.” Social Science & Medicine 59: 1795-806.

Dutta-Bergman, Mohan. 2004. “The Impact of Com-pleteness and Web Use Motivation on the Credibility of E-Heath Information.” Journal of Communication 54: 253-69.

Eastin, Matthew. 2001. “Credibility Assessments of Online Health Information: The Effects of Source Expertise and Knowledge of Content.” Journal of Computer-Mediated Communication 6, no. 4. http://jcmc. indiana.edu/vol6/issue4/Eastin.html.

Ellis, David and Ana Vasconcelos. 2000. “The Relevance of Facet Analysis for World Wide Web Subject Or-ganization and Searching.” Journal of Internet Cataloging 2: 97-113.

Eriksson-Backa, Kristina. 2003. “Who Uses the Web as a Health Information Sources?” Health Informatics Journal 9: 93-101.

Eysenbach, Gunther, John Powell, Oliver Kuss and Eun-Ryoung Sa. 2002. “Empirical Studies Assessing the Quality of Health Information for Consumers on the World Wide Web.” Journal of the American Medical Asso-ciation 20: 2691-700.

Fox, Susannah. 2006. Online Health Search 2006 (Pew Inter-net & American Life Project, October 29 2006). http:// www.pewinternet.org/Reports/2006/Online-Health-

Search-2006.aspx. Fox, Susannah and Rainie Lee. 2000. The Online Health

Care Revolution: How the Web Helps Americans Take Better Care of Themselves. Washington, DC: Pew Charitable Trusts.

Given, Lisa, Stan Ruecker, Heather Simpson, Elizabeth Sadler and Andrea Ruskin. 2007. “Inclusive Interface Design for Seniors: Image-browsing for a Health In-formation Context.” Journal of the American Society for Information Science and Technology 58: 1610-7.

Hesse, Bradford, David Nelson, Gary Kreps, Robert Croyle, Neeraj Arora, Barbara Rimer and Viswanath Kasisomayajula. 2005. “Trust and Sources of Health In-formation.” Archives of Internal Medicine, 165: 2618-24.

Hudon, Michele. 2010. “Using Facets to Classify and Ac-cess Digital Resources: Proposal and Example.” In Digital libraries, edited by F. Papy, 163-85. London: ISTE Ltd.

Kwak, Chul-Wan. 2001. “A Study of Classification Sys-tems in the Internet Shopping Malls.” Journal of Korean Society for Information Management 18: 202- 15.

La Barre, Kathryn. 2006a. “A Multi-faceted View: Use of Facet Analysis in the Practice of Website Organization and Access.” In Proceedings of the 9th International ISKO Conference 5-7 July 2006 Vienna, Austria, edited by G. Budin, C. Swertz and K. Mitgutsch. Advances in Knowledge Organization 10. Wurzburg: Ergon Verlag, 359-66.

La Barre, Kathryn. 2006b. “The Use of Faceted Analytico-Syntheric Theory as Revealed in the Practice of Website Construction and Design.” Doctoral dissertation, Indi-ana University.

Page 50: KO KNOWLEDGE ORGANIZATION · Knowl. Org. 42(2015)No.3 KO KNOWLEDGE ORGANIZATION Official Journal of the International Society for Knowledge Organization ISSN 0943 – 7444

Knowl. Org. 42(2015)No.3

K. Oh, S. Joo, E.-J. Jeong. Online Consumer Health Information Organization: Users’ Perspectives on Faceted Navigation

186

Priss, Uta and Elin Jacob. 1999. “Utilizing Faceted Struc-tures for Information Systems Design.” In Knowledge: Creation, Organization and Use: Proceedings of the American Society for Information Science Oct. 31 – Nov. 4 1999, Washington, D.C. 1-12.

Russell-Rose, Tony, and Tate Tyler. 2013. Designing the Search Experience: The Information Architecture of Discovery. Waltham, MA.: Morgan Kaufmann Publishers.

Stvilia, Besiki, Lorri Mon and Yong Jeong Yi. 2009. “A Model for Online Consumer Health Information Quality.” Journal of the American Society for Information Science and Technology 60: 1781-91.

Ranganathan, Shiyali R. 1967. Prolegomena to Library Science. New York, NY.: Asia Publishing.

Robins, David, Jason Holmes and Mary Stansbury. 2010. “Consumer Health Information on the Web: The Rela-tionship of Visual Design and Perceptions of Credibil-ity.” Journal of the American Society for Information Science and Technology 61: 13-29.

Tang, Muh-Chyun. 2007. “Browsing and Searching in a Faceted Information Space: A Naturalistic Study of PubMed Users’ Interaction with a Display Tool.” Jour-nal of the American Society for Information Science and Tech-nology 58: 1998-2006.

Thobaben, Marshelle. 2002. “Accessibility, Quality, and Readability of Health Information on the Internet: Im-

plication for Home Health Care Professionals.” Home Health Care Management & Practice 14: 295-6.

Van der Walt, Marthinus. 2004. “A Classification Scheme for the Organization of Electronic Documents in Small, Medium, and Micro Enterprises (SMMEs).” Knowledge Organization 31: 26-38.

Warner, Dorothy and Drew Procaccino. 2004. “Toward Wellness: Women Seeking Health Information.” Journal of the American Society for Information Science and Technology 55: 709-30.

Yoo, Yeong-Jun. 2004. “A Study on Organizing the Web Using Facet Analysis.” Korean Biblia Society for Library and Information Science 15: 23-41.

Ybarra, Michele and Michael Suman. 2008. “Reasons, As-sessments and Actions Taken: Sex and Age Differ-ences in Uses of Internet Health Information.” Health Education Research 23: 512-21.

Zhang, Jin, Lu An, Tao Tang and Hong Yi. 2009. “Visual Health Subject Directory Analysis Based on Users’ Traversal Activities.” Journal of the American Society for Information Science and Technology 60: 1977-94.

Zins, Chaim and David Guttmann. 2000. “Structuring Web Bibliographic Resources: An Exemplary Subject Classification Scheme.” Knowledge Organization 29: 20-8.

Page 51: KO KNOWLEDGE ORGANIZATION · Knowl. Org. 42(2015)No.3 KO KNOWLEDGE ORGANIZATION Official Journal of the International Society for Knowledge Organization ISSN 0943 – 7444

Knowl. Org. 42(2015)No.3

Brief Communication

187

Brief Communication: The 21st Edition (2014) of the

Sears List of Subject Headings: A Brief Introduction

M.P. Satija

Department of Library & Information Science, Guru Nanak Dev University, Amritsar, 143005, India <[email protected]>

M.P. Satija recently retired as Professor, Head and then Emeritus Fellow from the Department of Library and Information Science, Guru Nanak Dev University, India. He is a member of the Advisory Board of the UDC Consortium and founder patron of the Satija Research Foundation in Library Science (SRFLIS).

Satija, M.P. Brief Communication: The 21st Edition (2014) of the Sears List of Subject Headings: A Brief Introduction. Knowledge Organization. 42(3), 187-189. 2 references. Abstract: States in brief the new features of the recently released 21st edition of the Sears List of Subject Head-ings. Introduces its new editor Barbara A. Bristow, and the new publisher EBSCO Information Services which recently acquired Sears’ founder publisher since 1923, the H.W. Wilson Company. Names a few new subject headings in areas like science, technology, engineering and medicine (STEM). In this edition there are a total of 250 new headings making it a total of 10,000 preferred headings meant for small and medium sized libraries. Critically examines inconsis-tencies in a few headings. States the additional features of the online edition. Concludes to say the new edition maintains its stellar reputa-tion of a handy list of general subject headings.

Received: 13 June 2015; Accepted: 15 June 2015

Keywords: subject headings, Sears List of Subject Headings, EBSCO 1.0 A New Edition First since the retirement in December 2011 of its long term and formidable editor, Dr. Joseph Miller (ed. 15, 1994--ed.20, 2010), this new edition is slightly late by its earlier cycle of three year revisions started since the Sears-14 (1991). In the transitional period after his retirement Ms. Eve-Marie Miller had taken over its editorship. In fact, the period immediately after the retirement of Dr. Miller has been bit uncertain, nevertheless now the ship of Sears is on an even keel. The new edition has the fol-lowing bibliographical details:

Sears List of Subject Headings, 21st ed. Barbara A. Bris-tow, Chief editor; Christi Showman Farrar, Associate editor. Ipswich MA: H.W. Wilson Co., a Division of

EBSCO Information Services (Print edition by Grey House Publishing, Amenia, NY), 2014. liii, 946p. ISBN 978-1-61925-190-8 (Hb).

This edition by the new editor Ms. Barbara A Bristow is also the first to be published by EBSCO, which in 2011 acquired Sears’ founding publisher and proprietor H.W. Wilson Co., and since January 2012 the Sears is edited from its EBSCO office at Ipswich, MA. EBSCO is a fa-mous research content provider and has more resources in the form of databases, e-books and e-journals. This acquisition provides Sears with a larger and wider literary warrant base to bank upon. One wonders whether this acquisition will prove as beneficial as was that of OCLC to the Dewey Decimal Classification (DDC) research, growth and popularity. Time seems auspicious for Sears.

Page 52: KO KNOWLEDGE ORGANIZATION · Knowl. Org. 42(2015)No.3 KO KNOWLEDGE ORGANIZATION Official Journal of the International Society for Knowledge Organization ISSN 0943 – 7444

Knowl. Org. 42(2015)No.3

Brief Communication

188

The new editor Ms. Bristow has been with the H.W. Wilson as indexer and associate editor of the Humanities Index and was also editor of the OmniFiles at Wilson. Cur-rently she is Senior Taxonomist of the Taxonomy Team at EBSCO Information Services. A long-term colleague of the outgoing editor Dr. Miller, she was the associate editor of the Sears 19th edition (2007). During her tenure she also worked part-time in the Wilson Subject Author-ity Department and so is thus experienced in both index-ing and subject headings work. No wonder she toes the line of the previous editor.

Grey House Publishing, Amenia, NY1, under license agreement with EBSCO Publishing handles all print edi-tions for H.W. Wilson. It is responsible for printing, dis-tribution, marketing and selling Sears in print edition, and is responsible for providing CIP data which has been outsourced from Donohue Group, Inc.2 It may be men-tioned that in the CIP data the catalogue entry has been made under the title main access point instead of the original author, Sears, Minnie Earl, 1873-1933. The Sears List is neither a composite book with contributions from many credited authors, nor a serial. Hence main access point under the title is not justified. The DDC is always entered under Melvil Dewey.

As another change for the first time a Sears Advisory Board of nine members has been constituted which had its first meeting in June 2013 held by the side lanes of the American Library Association (ALA) annual conference. The board has members from public and school libraries and those who have served on the ALA cataloguing committees. It is planned to expand the board which meets digitally and in-person several times a year to ad-vise on revisions and to keep it, inclusive, accurate, useful, popular and, of course, to maintain its stellar reputation.

The major feature of this edition is the inclusion of more than 250 new Subject headings, which reflect the changing needs of library users and emergence of new subjects in many areas of knowledge. Special attention has been paid to science, technology, engineering and mathematics (STEM) subjects: Functional equations, Near-Earth objects, Streaming technology, Virology. In computer science the newly coined headings are: Brain-computer interfaces, Cloud computing, In-formation theory, iPad (Computer), Linked data (Semantic web), Computer bulletin board, and Inter-net forums. The recent global economic recession is re-flected in the headings: Business writing, Deflation (Finance), Derivative securities, Economic indica-tors, Financial risk, International economic integra-tion, and Subprime mortgages, Systemic risk, Target marketing. Next in line is education with subject head-ings like, Flipped classrooms, Massive open online courses (MOOCs), Research-Methodology. The new

headings in sports are Aikido, Paralympic games and World Cup (Soccer). There are updates reflecting world events and the changing face of society: Occupy pro-tests movements (started in 2011) and Arab Spring, 2010- . And of course more headings in these areas can be coined on instructions. Happily the changes and re-placements are only in 34 headings listed on page xliv.

As ever the new headings are suggested by practicing catalogers from various kinds and different sizes of li-braries and by commercial vendors of bibliographic re-cords, indexers and of course from the indexing staff and subject experts of EBSCO. As a major change the latest sixth edition (Lighthall 2001) of the Canadian Com-panion of the Sears List has been merged indistinguishably (mostly on pages 126-144) with the main Sears, perhaps forever, to improve the efficiency to access all subject headings in one volume. This also accounts for this edi-tion having about a hundred pages more than the DDC-20. Thus for Canada it provides highest number of sub-ject headings for any country next to the U.S.A.

The structure of the headings conforms to the Resource Description and Access (RDA) standards which have already been in vogue since early 2013. Its impact is in limited ar-eas, e.g., earlier subject headings such as Bible. N.T. and Bible. O.T. have now been replaced by Bible. New Tes-tament and Bible. Old Testament respectively, along with their derivatives. Also the capitalization, corporate and geographic names are RDA compliant as the earlier editions were to the Anglo-American Cataloguing Rules 2nd edition (AACR2). And also there are minor changes in fil-ing rules, e.g., ‘United States. Army’ is now interfiled with subdivisions of the main corporate heading, United States. e.g.,

United States - Armed forces United States. Army – Uniform United States – Associations United States – Atlases United States – Bibliography

But the libraries have the option to continue with their earlier word-by-word alphabetical arrangement. However, this filing rule seems to have been violated in the case of Bible. New Testament and Bible. Old Testament, which have been given separately instead of interfiling, e.g.,

Bible. Old Testament Bible – Animals Bible – Antiquities

The class numbers are by the latest abridged WebDewey, as earlier.

Page 53: KO KNOWLEDGE ORGANIZATION · Knowl. Org. 42(2015)No.3 KO KNOWLEDGE ORGANIZATION Official Journal of the International Society for Knowledge Organization ISSN 0943 – 7444

Knowl. Org. 42(2015)No.3

Brief Communication

189

Some changes have been occasioned by the changing information seeking patterns of library users, changing usage of the English language, and to be in tune with the current vocabulary of the subject—though it always pre-fers popular terms over the technical ones, provided both are true synonyms. For example, handicapped has been replaced by People with disabilities and Hispanic Americans by the Latinos (U.S.) along with their deriva-tives, e.g., Handicapped children is now Children with disabilities, and Hispanic American authors is Latino authors.

Though there is nothing to say special about this edi-tion but here and there one may disagree with the head-ings and the terms under that, e.g., under Appetizers (p. 51) its BT is Cooking which should have been Food, instead; Cooking is a process whereas appetizers and foods are entity nouns. Similarly, under Food (p. 361) the NT Spice should have been given as RT as spices are not a kind of food but only related to it, as is the icing on the cake. Under Indian philosophy, which was coined in the Sears-20 on this author’s suggestion (Satija 2008) in a re-view of the Sears-19, the NT should have been given as Hindu philosophy, latter is a subset of the basket of Indian philosophies which include Jain and Buddhist phi-losophies, regretfully both of these have not been listed in the Sears. There is confusion regarding the SH Re-search-Methodology (p. 747); it has been inadvertently printed in light type-face. And the SH Research is hotch-potch in its relationships with headings given below it. Neither is it exhaustive, nor systematic.

Filing of entities is alphabetical by main headings with subheadings along with organs of a corporate entry ar-ranged in one sequence of word-by-word order. Its web-based version3 available since 2009, includes all the head-ings and the cross references in the 21st print edition and all the new headings added annually. The search results are ranked according to relevance and have the option of brief or full display. It is also possible to browse the en-

tire list and see established subject terms and cross refer-ences in one alphabetical order, or by their DDC num-bers in one numerical (systematic) order. More than 1400 scope notes accompany all new, revised and existing 10,000 headings, where needed. MARC authority record delivery is available separately. In fact the online data base is the main version and print is simply its derivative.

To offer a suggestion, its 27 page (pp. xv-xli) “Princi-ples of the Sears List of Subject Headings” is a concise, clear and illustrated manual not only on the Sears List, but on the general theory and practice of subject headings work. Its concept index would be highly useful to the learners, teachers and practitioners of subject indexing.

In all, the new edition with high publication standards, some minor typos notwithstanding, maintains its reputa-tion earned over more than nine decades of its service to the library users and the information retrieval community. Notes 1. Grey House Publishing http://www.greyhouse.com/. 2. For the last three decades the Donohue Group http://

www.dgiinc.com/ provides contract services to the community of libraries and archives through consult-ing, cataloging, retro conversion, authority control and tech support.

3. The web version of the Sears is available at https:// www.ebscohost.com/public/sears-list-of-subject-headings.

References Lighthall, Lynne, ed. 2001. Sears List of Subject Headings:

Canadian Companion, 6th ed. New York: The H.W. Wil-son Co.

Satija, M.P. 2008. “Book Reviews: Sears List of Subject Headings, 19th ed.” Knowledge Organization 35: 55-56.

Page 54: KO KNOWLEDGE ORGANIZATION · Knowl. Org. 42(2015)No.3 KO KNOWLEDGE ORGANIZATION Official Journal of the International Society for Knowledge Organization ISSN 0943 – 7444

Knowl. Org. 42(2015)No.3

Book Reviews 190

Book Reviews Edited by Melodie J. Fox

Book Review Editor

Wissensorganisation: Entwicklung, Aufgabe, Anwendung, Zukunft by Ingetraut Dahlberg. Würzburg: Ergon Verlag, 2014, 175p. ISBN 978-3-95650-065-7, €28. The Elements of Knowledge Organization by Richard Smiraglia. Cham: Springer, 2014, 101p. ISBN 978-3-319-09356-7, US$109. 1.0 Introduction Whenever eminent scholars of a scientific community simultaneously present their book-length introductions to the field, the time has come for self-reflection and for asking a seminal question: What is it all about?

In the fall of 2014, two monographs on the founda-tions of knowledge organization as a field of study were published, namely, Wissensorganisation–Entwicklung, Aufgabe, Anwendung, Zukunft (Knowledge Organization–Development, Task, Application, Future) by Ingetraut Dahlberg, the foun-der of the International Society of Knowledge Organiza-tion (ISKO), as well as The Elements of Knowledge Organiza-tion by Richard Smiraglia, the editor-in-chief of the journal Knowledge Organization.

In his introduction, Smiraglia emphasizes the impor-tance of perspective by presenting some photos of an an-cient ruin-like palace taken from different angles and with different levels of detail. Each perspective is necessarily limited to a particular stance within a plethora of possible viewpoints. That is as true for tourists as it is for scientists or philosophers. According to Smiraglia, this challenge of multiple perspectives illustrates the work in knowledge or-ganization, a discipline characterized not only as a bridge across different fields of research but as itself the prov-ince of different philosophical points of view.

The critical role of viewpoint also becomes evident if one compares the two approaches presented in the new books by Smiraglia (2014) and Dahlberg (2014b). At first glance, both of them treat some very similar topics, in-cluding the history of knowledge organization (KO), the typology of knowledge organization systems (KOS), or the crucial role of concept theory and semiotics as a theo-retical foundation for any conceptual ordering system. Nevertheless, both approaches differ significantly in their philosophical point of view. This paired book review will

focus on the authors’ particular understandings of knowl-edge organization and the implications for the direction of future research questions. 2.0 What is Knowledge Organization? The terminus technicus “knowledge organization” originates from a small German research group around Dahlberg in the 1970s, which in turn refers to Henry E. Bliss’s (1929; 1933) use of the phrase “organization of knowledge” in two titles of his books and the fact that avant la lettre, the activity of organizing human knowledge in a systematic way, dates back to ancient times.

Dahlberg’s understanding of knowledge organization and its basic elements is already laid out in her disserta-tion Grundlagen universaler Wissensordnung (Foundations of Universal Organization of Knowledge) in which the German term “Wissensordnung” (knowledge order) is used to de-scribe the conceptual ordering of human knowledge, even though for the English translation the less ambigu-ous term “knowledge organization” is preferred as it is established internationally today (Dahlberg 1974; 2006; 2014a). For Dahlberg, “knowledge” simply means that which is known and “organization” means the activity of constructing something according to a plan. The best way to understand the organization of knowledge, according to Dahlberg, is to analyze its core units and elements, that is, concepts and their characteristics. Her concept-theoretical methodology is intended to provide a basis for any kind of KOS, but the main field of interest of her life-long research and the explicit focus of her recent book is the universal classification. As an initial orienta-tion, Dahlberg (2014b, 57) offers a useful disambiguation of at least six meanings of the term “classification”:

1. The final system 2. The process of class-building 3. The result of class-building 4. The process of allocating entities to classes 5. The result of allocating entities to classes 6. The study of classification Occasionally, Dahlberg equates the study of classification with knowledge organization, although in other works

Page 55: KO KNOWLEDGE ORGANIZATION · Knowl. Org. 42(2015)No.3 KO KNOWLEDGE ORGANIZATION Official Journal of the International Society for Knowledge Organization ISSN 0943 – 7444

Knowl. Org. 42(2015)No.3

Book Reviews 191

she emphasizes the broader meaning of the latter, includ-ing not only the methods of classifying and classing, but also virtually all ways in which knowledge can be under-stood, described, represented, or organized (Dahlberg 2006; 2014a).

In opposition to the tendency of library and informa-tion science (LIS) to subsume knowledge organization, often labeled as information organization, as a subdisci-pline beside information retrieval, information manage-ment, or information behaviour (Broughton et al., 2005; Rowley and Hartley, 2008; Taylor and Joudrey, 2008; Pat-tuelli 2010; Bawden and Robinson, 2012; Stock and Stock, 2013), Dahlberg considers KO as a multi-, inter-, and transdisciplinary field originating from the science of science, but not yet fully established as a discipline on its own right (Dahlberg 2011; for a similarly broad but far from identical approach see Glushko 2013).

As Smiraglia points out, Dahlberg must be considered as the leading figure with respect to the establishment of KO as the domain known today, since she has not only founded an international society and a periodical, the jour-nal Knowledge Organization, formerly known as International Classification, but she has also established a publishing house, Indeks Verlag, now taken over by Ergon, as well as a bibliographic classification and a theoretical foundation for the field. Indeed, the strength of Dahlberg’s new publi-cation is to describe the history and institutionalization of knowledge organization as a field of study from an inside perspective with revealing autobiographical parentheses. In many respects, Dahlberg’s book can be read as a retrospec-tion on her dissertation project and the impact of its core ideas within the last four decades, in particular, the devel-opment of her own Information Coding Classification (ICC), published as a first version in 1982 and further de-veloped up to the last decade (Dahlberg 2008).

At the same time, Dahlberg seems to take her own ap-proach to KO for granted as uncontested. Unfortunately, there is not much discussion of alternative approaches or perspectives, a fact also indicated by the reference list in which one-third refers to her own writings. For example, Dahlberg claims that in the history of KO the periods of word lists, encyclopedias, and teaching systems are fol-lowed by the period of universal library classifications in which we still live, ignoring the fact that in KO discourse a shift has taken place from universal classifications based on modernist assumptions to domain-specific KOS’s based on postmodernist assumptions (Hjørland 1997; Ol-son 2002; Mai 2011).

In contrast, Smiraglia’s understanding of knowledge organization is well aware of the plurality of theoretical perspectives and seeks to embrace all of them rather than to privilege a particular point of view. Therefore, Smi-raglia (2014, 4) begins by asking three foundational ques-

tions which seem to be important for any approach: first, “How Do I Know?” related to the study of knowing or epistemology; second, “What Is?” related to the study of being or ontology; and finally, “How Is It Ordered?” re-lated to the heart of knowledge organization. According to Smiraglia, we cannot answer the latter question before we have elaborated on the former ones.

Smiraglia reviews four influential texts on KO theory including Dahlberg (2006) and the concept-theoretical approach, Wilson (1968) and the distinction between de-scriptive and exploitative approaches to recorded knowl-edge, Svenonius (2000) and the analysis of different sets of documents and bibliographic languages, and Hjørland (1997) and the widening of KO’s scope to the influence of and the impact on the social dimension of knowledge. Although Smiraglia concedes that there is not yet a fully developed theory of KO, he highlights some milestones in the progress of theory-building. The specific elements of KO, according to Smiraglia, are knowledge organiza-tion systems (e.g., typologies, taxonomies, classifications, thesauri, or formal ontologies), metadata for document description, and the methodology of domain analysis, whereas the core elements of KO are epistemology and ontology, which is why the importance of philosophy is emphasized. 3.0 Philosophical Underpinnings On one hand, ontology is the branch of philosophy that is concerned with the distinction of existence and non-existence, or the inherent properties of entities. Therefore, Smiraglia considers ontological questions closely related to class-building in terms of inclusion and exclusion based on likeness. On the other hand, epistemology is the branch of philosophy that is concerned with the influences and con-straints of human knowledge. As in a recently edited book (Smiraglia and Lee 2012, reviewed in KO 42.2), Smiraglia emphasizes the cultural frames of knowledge and refers to, among others, Michel Foucault’s archaeology of knowl-edge, to Edmund Husserl’s transcendental phenomenol-ogy, to Wittgenstein’s philosophy of language, or to Charles S. Peirce’s and Ferdinand de Saussure’s semiotic theories. All of these thinkers demonstrate in different ways that the epistemological dimension is crucial for hu-man understanding of being, and, therefore, for the or-ganization of phenomena as human knowledge. In gen-eral, Smiraglia (2014, 28) follows the turn from universal solutions to “post-modern approaches to classification” as domain-specific, socially relevant KOS’s. Therefore, do-main analysis is considered as the core methodology for KO using both quantitative (e.g., citation analysis, co-word analysis, network analysis) and qualitative (e.g., cognitive work analysis) techniques. In the face of the ambiguous

Page 56: KO KNOWLEDGE ORGANIZATION · Knowl. Org. 42(2015)No.3 KO KNOWLEDGE ORGANIZATION Official Journal of the International Society for Knowledge Organization ISSN 0943 – 7444

Knowl. Org. 42(2015)No.3

Book Reviews 192

use of the term “domain” as a discourse community, disci-pline, invisible college, or work ecology, Smiraglia (2014, 85) offers the following definition:

A domain is a group that shares an ontology, un-dertakes common research or work, and also en-gages in discourse or communication, formally or informally.

In other words, Smiraglia seems to adopt the metatheo-retical view, commonly referred to as social epistemology, which according to Hjørland’s influential typology of theories of knowledge is related to historicism or pragma-tism rather than empiricism and rationalism. Smiraglia’s use of terminology, however, differs in some important ways. For example, Smiraglia (2014, 21) states that “prag-matism is exactly what it sounds like, derived from as-sumptions about the best means to an end” and considers classificationists like Antonio Panizzi, Charles Ammi Cut-ter, or Melvil Dewey as pragmatists due to the fact that they provide pragmatic tools for knowledge organization (Smiraglia 2002). Accordingly, the progress of KO theory follows the path from pragmatism to empiricism as ap-plied, for example, in quantitative domain analysis. But this seems to be an upside-down view of Hjørland’s (2008; 2013) original typology in which pragmatism does not simply mean to be pragmatic but refers to a genuine theory of knowledge emphasizing that human knowledge is always situated in a socio-cultural context or practice and discourse community related to specific activities based on explicit or implicit goals and values. Following this view, knowledge cannot be neutral and objective as often assumed by the historically preceding empiricism or rationalism but is context-dependent and theory-laden, a fact hardly acknowledged by Panizzi, Cutter, or Dewey. Therefore, pure quantitative or informetric techniques of domain analysis, as favored by Smiraglia, seem not suffi-cient to reflect the social epistemological stance that is at least implicit in Smiraglia’s approach (Hjørland 2008). Even though one might argue with Hjørland against Smi-raglia that the progress of KO theory follows the path from empiricism/rationalism to pragmatism/historicism, Smiraglia is certainly right in his emphasis to consider all of these metatheories as useful and, to some extent, com-plementary contributions to theory-building in KO.

In contrast to Smiraglia’s reflection on different epis-temological stances, Dahlberg adopts an explicit onto-logical approach and refers particularly to the philoso-phies of James Feibleman and Nicolai Hartmann. Dahl-berg considers the mostly monohierarchical and inflexible discipline-based universal classifications (e.g., Dewey Deci-mal Classification, Universal Decimal Classification, Library of Congress Classification, Bliss Classification, Colon Clas-

sification, or the Russian Library-Bibliographical Classifi-cation) as an outdated approach from the nineteenth cen-tury and proposes the Information Coding Classification as an alternative universal system based on a comprehen-sive, faceted classification of knowledge fields that are constituted by the combination of nine main classes with nine facets. The main classes are derived from ontic structures in terms of levels of being (Dahlberg 2008, 167): 1. Form and structure area 2. Energy and matter area 3. Cosmo and geo area 4. Bio area 5. Human area 6. Socio area 7. Economic and technology area 8. Science and information area 9. Culture area According to Dahlberg, these main classes present inte-grative levels, that is, a hierarchy in which the higher lev-els transcend but include the lower levels; whereas, the following nine facets indicate constitutive aspects (1-3), characteristic manifestations (4-6), and environments (7-9) of particular knowledge fields (Dahlberg 2008, 167): 1. Theories, principles 2. Object, component 3. Activity, process 4. Property attribute 5. Persons or continued 6. Institutions or continued 7. Technology and production 8. Application and determination 9. Distribution and synthesis Leaving aside the possibly overemphasized postmodern critique of universal solutions, the basic idea to create a non-arbitrary and comprehensive schema in order to cover all possible knowledge fields seems quite promising for the advancement of universal classifications. One might question, however, whether the main classes and the facets are sufficiently coherent organizing principles as demanded by such an ambitious project.

On one hand, the main classes appear not to present consistent integrative levels, a concept that, against Dahl-berg’s claim, even Hartmann explicitly rejects and certainly does not apply to the human-related domains (Dahlberg’s levels 5-9). Therefore, the ICC’s basic schema does by no means solves the problems once left open by the Classifi-cation Research Group in the 1960s (Kleineberg 2013). On the other hand, the facets are used in a rather vague and

Page 57: KO KNOWLEDGE ORGANIZATION · Knowl. Org. 42(2015)No.3 KO KNOWLEDGE ORGANIZATION Official Journal of the International Society for Knowledge Organization ISSN 0943 – 7444

Knowl. Org. 42(2015)No.3

Book Reviews 193

underdetermined way, as even Dahlberg has to admit. The first three facets, for example, are derived from the phi-losophy of science, which states that scientific disciplines are constituted by genuine theories, objects, and methods, but Dahlberg’s (2014b, 80-100) use of terminology tends to blur the boundaries of these concepts. It remains un-clear, for instance, in which way documentology (second facet) can be considered as an object, or information tech-nology (third facet) as a method of information science, as the recourse to the philosophy of science suggests.

The obvious difference between Smiraglia’s multi-perspectival epistemology and emphasis on domain analysis and Dahlberg’s mono-perspectival ontology and endeavor for a universal solution reflects also different understandings of the basic units of a KOS, that is, the notion of concept. 4.0 Concept Theory and Semiotics Both authors agree that since knowledge organization is concerned with conceptual ordering systems, theories of language or signs are of utmost importance. While Dahl-berg presents her own concept theory, Smiraglia refers particularly to Saussure’s semiology and Peirce’s semiot-ics. According to Dahlberg, only a well-defined concept should be considered as a knowledge unit for KOS’s. Her so-called referent-oriented analytical approach claims that a concept is always an effigy of something, that is, a ref-erent in a broad sense including, for example, a concrete object, an abstract idea, or a proposition. Furthermore, a concept is considered as constituted by three elements: the referent, the designation (i.e., a word, name, or verbal expression), and the essential characteristics which can be derived analytically by determining correct and true predicates about the referent in terms of a definition.

As Smiraglia rightly suggests, Dahlberg’s concept theory is closely related to semiotics as the study of signs, since even concepts might be considered as signs. Indeed, her concept triangle appears to be quite similar to the semiotic triangle commonly used in information science (Stock and Stock, 2013). But while Dahlberg would agree with Saus-sure that the relation between the “signifier” (Dahlberg’s designation) and the “signified” (Dahlberg’s referent with essential characteristics) is an arbitrary one, she would probably disagree with Peirce’s dynamism and mutability of the “object” (Dahlberg’s referent) itself which, as de-scribed by Smiraglia, might be differently perceived within the process of an unlimited semiosis depending on the particular context. Although Dahlberg concedes that a concept might be defined differently, her requirement of “correct” and “true” predicates for a referent seems to im-ply that the essential characteristics are inherent properties of the referent and that, in principle, conceptual ordering

systems due to their intrinsic relations are more or less self-organizing and, therefore, independent of particular inter-pretations.

This semantic theory of meaning is challenged, how-ever, by pragmatic theories of meaning as proposed by Peirce or the later Wittgenstein (Andersen and Christensen 2001; Friedman and Thellefsen 2011). According to these views, also stressed by Smiraglia, the meaning of a word depends on its use in language. Surprisingly, even Dahlberg claims to rely on Wittgenstein’s use theory, ignoring the fact that his notion of language game explicitly rejects a mere definition (Dahlberg’s essential characteristics) as the meaning of a linguistic expression. Wittgenstein’s concept of family resemblance refers to a net of similarities in which not a single feature is considered to be essential.

The decisive point is that Dahlberg’s concept theory, which is one of at least five major approaches, as summa-rized by Stock and Stock (2013), seems not well equipped to deal with different points of view. As a consequence, the fields of knowledge in her universal ICC represent only one particular perspective and tend to marginalize or ex-clude alternative views. For example, not everyone might agree that KO as a field of knowledge is related to the sci-ence of science instead of information science, and even less that Christian religion is privileged over other forms of religion.

In contrast, the strength of Smiraglia’s book is to create awareness for the crucial role of context-dependent inter-pretation of signs or concepts and even works (see also Smiraglia 2001; Smiraglia and Van den Heuvel 2013). At the same time, Smiraglia’s embrace of multiple perspec-tives, as welcome as it is, tends to underestimate the often cited postmodern condition of a plurality of divergent epistemological stances. For example, how could Foucault’s rejection of the epistemic subject be integrated with Husserl’s dimension of perception originating from the ego? Likewise, how could Dahlberg’s notion of concept based on a definition of essential characteristics be inte-grated with Hjørland’s notion of concept as a language game based on family resemblance? Maybe the metaphor of multiple facets of a diamond misses the point and might be better illustrated by a diamond smashed into fragments. In this case, one might doubt that Smiraglia’s hint to the concept of “boundary object,” as useful as it might be as a shared reference point, is already sufficient to glue the fragments of the diamond into a mosaic gem. 5.0 Conclusion Since both authors differ in their theoretical, metatheoreti-cal, and methodological approaches, one might expect that their proposals for the direction of future research ques-tions in the field of knowledge organization differ as well.

Page 58: KO KNOWLEDGE ORGANIZATION · Knowl. Org. 42(2015)No.3 KO KNOWLEDGE ORGANIZATION Official Journal of the International Society for Knowledge Organization ISSN 0943 – 7444

Knowl. Org. 42(2015)No.3

Book Reviews 194

Indeed, Dahlberg’s ontology-oriented “modernist” ap-proach comes to the conclusion that universal classifica-tions such as her Information Coding Classification need to be developed further towards a kind of meta-frame in the sense of an upper ontology or switching device that function as a stable foundation for the translation between terminologies of different KOS’s. In opposition, Smi-raglia’s more epistemology-oriented “postmodernist” ap-proach comes to the conclusion that KO research should develop better analytical tools for the investigation of the diachronic dimension or the semantic evolution of a do-main across time, and the synchronic dimension or the in-teroperability between neighboring domains.

In this respect, both approaches are valuable contribu-tions to theory-building in knowledge organization, even though, as pointed out by Gnoli (2008), the open ques-tion remains in which way ontology-oriented and episte-mology-oriented approaches might be integrated in order to benefit from their possibly complementary character. Note This review essay should not end without some final re-marks on the editing of the two books since even a distant and selective reading cannot overlook the fact that the quality of these contributions, to some extent, abandons the standard for academic publishing. To begin with, the typing errors in both works, particularly in Dahlberg’s book, are significant to such a degree that the reader might doubt that the publishing houses Ergon and Springer paid enough attention to the process of copy editing. Occasion-ally, the unintended semantic displacements of terms are misleading, as in Dahlberg’s (2014b, 44) use of “counter-factual” (kontrafaktisch) instead of “contradictory” (kon-tradiktorisch), and sometimes simply wrong as Smiraglia’s (2014, 23) mix-up of the terms “signifier” and “signified” which are correctly described earlier at the very same page. Likewise, in Dahlberg’s book, the formatting style for ref-erences or index entries, for example, is not always consis-tent, while Smiraglia’s book has no index at all.

Furthermore, the graphical material used in both publi-cations is often of low resolution or poor scan quality. Most notably, Smiraglia’s figures do not always fulfill their purpose for several reasons. First, the font size is often hard, sometimes impossible, to read. Second, all graphics are printed in black and white, although many diagrams uses different colors for discrimination and in one particu-lar case a figure is explicitly intended to illustrate different colors of marbles as criterion for class-building. Third, the informational value of some figures seems questionable, for example, if three roughly sketched diamonds are pre-sented in order to illustrate the fact that diamonds have facets. This kind of redundancy is even topped by present-

ing the same figure of a domain-specific ontology twice in different chapters.

Another questionable decision by the publishing houses is the general label under which each work is pub-lished. On one hand, Dahlberg’s book is presented as the third volume of Ergon’s series Textbooks for Knowledge Or-ganization—after Robert Fugmann’s (1993) Subject Analysis and Subject Indexing and Hemalata Iyer’s (1995) Classificatory Structures—in spite of Dahlberg’s introductory statement that her work is not at all intended to be a textbook; in-stead, she refers to Kiel and Rost (2002) as a prime ex-ample. On the other hand, Smiraglia’s book is presented under the label of computer science, as the back cover in-forms us, even though it neither deals with nor directly addresses this particular field.

In general, the editing of both works is not what one should expect from scholarly publications, in particular, if a small book of one hundred pages is offered for more than 100 US dollars. We should not blame the authors, maybe we should not blame the publishing houses either, since these two books might exemplarily indicate a much more systemic problem in the field of academic publish-ing. This is not the place to discuss this issue in any detail but it might be worth to note that if the pre-publication review process reveals some inadequacies, then the post-publication review process becomes even more important and might be forced to be more critical than initially in-tended. Michael Kleineberg Berlin School of Library and Information Science (BSLIS) Humboldt-Universität zu Berlin, Germany [email protected] References Andersen, Jack and Frank S. Christensen. 2001. “Witt-

genstein and Indexing Theory.” In Proceedings of the 10th ASIS SIG/CR Classification Research Workshop, ed-ited by Hanne Albrechtsen and Jens-Erik Mai. Ad-vances in classification research 10. Medford, NJ: In-formation Today, 1-21.

Bawden, David and Lyn Robinson. 2012. Introduction to In-formation Science. London: Facet Publishing.

Bliss, Henry E. 1929. The Organization of Knowledge and the System of the Sciences. New York: Holt.

Bliss, Henry E. 1933. The Organization of Knowledge in Li-braries. New York: Wilson.

Broughton, Vanda, et al. 2005. Knowledge Organization. In European Curriculum Reflections on Library and Informa-tion Science Education, edited by Leif Kajberg and Leif Lørring. Copenhagen: Royal School of Library and In-formation Science, 133-48.

Page 59: KO KNOWLEDGE ORGANIZATION · Knowl. Org. 42(2015)No.3 KO KNOWLEDGE ORGANIZATION Official Journal of the International Society for Knowledge Organization ISSN 0943 – 7444

Knowl. Org. 42(2015)No.3

Book Reviews 195

Dahlberg, Ingetraut. 1974. Grundlagen universaler Wissen-sordnung: Probleme und Möglichkeiten eines universalen Klassi-fikationssystems des Wissens. Pullach bei München: Verlag der Dokumentation.

Dahlberg, Ingetraut. 2006. “Knowledge Organization: A New Science?” Knowledge Organization 33, no.1: 11-9.

Dahlberg, Ingetraut. 2008. “The Information Coding Classification (ICC): A Modern, Theory-Based Fully-Faceted, Universal System of Knowledge Fields.” Axiomathes 18, no. 2: 161-76.

Dahlberg, Ingetraut. 2011. “How to Improve ISKO’s Standing: Ten Desiderata for Knowledge Organiza-tion.” Knowledge Organization 38, no. 1: 68-74.

Dahlberg, Ingetraut. 2014a. “What is Knowledge Organi-zation?” Knowledge Organization 41, no. 1: 85-91.

Dahlberg, Ingetraut. 2014b. Wissensorganisation: Entwick-lung, Aufgabe, Anwendung, Zukunft. Würzburg: Ergon Verlag.

Friedman, Alon and Martin Thellefsen. 2011. “Concept Theory and Semiotics in Knowledge Organization.” Journal of Documentation 67, no. 4: 644-74.

Fugmann, Robert. 1993. Subject Analysis and Indexing: Theo-retical Foundation and Practical Advice. Frankfurt: Indeks Verlag.

Glushko, Robert J. 2013. The Discipline of Organizing. Cambridge, MA, MIT Press.

Gnoli, Claudio. 2008. “Ten Long-Term Research Ques-tions in Knowledge Organization.” Knowledge Organiza-tion 35, nos. 2/3: 137-49.

Hjørland, Birger. 1997. Information Seeking and Subject Rep-resentation: An Activity-Theoretical Approach to Information Science. Westport, Conn.: Greenwood Press.

Hjørland, Birger. 2008. “What is Knowledge Organiza-tion?” Knowledge Organization 35, nos. 2-3: 86-101.

Hjørland, Birger. 2013. “Theories of Knowledge Organi-zation: Theories of Knowledge.” Knowledge Organization 40, no. 3: 169-81.

Iyer, Hemalata. 1995. Classificatory Structures: Concepts, Rela-tions and Representation. Frankfurt: Indeks Verlag.

Kiel, Ewald and Friedrich Rost. 2002. Einführung in die Wissensorganisation: Grundlegende Probleme und Begriffe. Frankfurt: Ergon Verlag.

Kleineberg, Michael. 2013. “The Blind Men and the Ele-phant: Towards an Organization of Epistemic Con-texts.” Knowledge Organization 40, no. 5: 240-62.

Mai, Jens-Erik. 2011. “The Modernity of Classification.” Journal of Documentation 67, no. 4: 710-30.

Olson, Hope A. 2002. The Power to Name: Locating the Lim-its of Subject Representation in Libraries. Dordrecht: Klu-wer.

Pattuelli, M. Cristina. 2010. “Knowledge Organization Landscape: A Content Analysis of Introductory Courses.” Journal of Information Science 36, no. 6: 812-22.

Rowley, Jennifer and Richard J. Hartley. 2008. Organizing Knowledge: An Introduction to Managing Access to Informa-tion. Burlington: Ashgate Publishing.

Smiraglia, Richard. 2001. The Nature of ‘a Work’: Implica-tions for the Organization of Knowledge. Lanham: Scare-crow.

Smiraglia, Richard 2002. “The Progress of Theory in Knowledge Organization.” Library Trends 50, no. 3: 330-49.

Smiraglia, Richard. 2014. The Elements of Knowledge Organi-zation. Cham: Springer.

Smiraglia, Richard and Hur-Li Lee. 2012. The Cultural Frames of Knowledge. Würzburg: Ergon Verlag.

Smiraglia, Richard and Charles van den Heuvel. 2013. “Classifications and Concepts: Towards an Elementary Theory of Knowledge Interaction.” Journal of Docu-mentation, 69, no. 3: 360-83.

Stock, Wolfgang G. and Mechthild Stock. 2013. Handbook of Information Science. Berlin: DeGruyter.

Svenonius, Elaine. 2000. Intellectual Foundations of Informa-tion Organization. Cambridge, Massachusetts: MIT Press.

Taylor, Arlene G. and Daniel N. Joudrey. 2008. The Or-ganization of Information. Englewood, Colorado: Librar-ies Unlimited.

Wilson, Patrick. 1968. Two Kinds of Power: An Essay in Bib-liographical Control. Berkeley, California: University of California Press.

Intner, Sheila S. and Weihs, Jean. Standard Cataloging for School and Public Libraries, 5th ed. Santa Barbara, CA: ABC-CLIO/Libraries Unlimited, 2015. ix, 239p. ISBN: 978-1-61069-114-7(pbk). US$ 55. First published in 1990 (with subsequent editions in 1996, 2001, and 2007), the fifth edition of Intner and Weihs’s Standard Cataloging for School and Public Libraries by two eminent and variously experienced KO experts (both are winners of the Margaret Mann Citation) bears testimony of its popularity and standing in the field. Their two other books on this specific subject are equally popular. The title of the book is a misnomer (at least for an Indian librarian), as the book not only covers cataloguing in all its aspects but also classification and subject indexing in equal measures. Further, the level of treatment of the subject is not restricted to only school and (small) public libraries: there seems nothing lacking in it to be useful to university libraries equally well. However, the book seems mostly catering to the needs of practicing catalogers, to speak in American jargon.

Page 60: KO KNOWLEDGE ORGANIZATION · Knowl. Org. 42(2015)No.3 KO KNOWLEDGE ORGANIZATION Official Journal of the International Society for Knowledge Organization ISSN 0943 – 7444

Knowl. Org. 42(2015)No.3

Book Reviews 196

This book of thirteen chapters sans preface can be vir-tually divided into four sections, in addition to appendixes and a glossary of about 170 terms and acronyms. The first section of four chapters dwells on descriptive cataloging beginning from the necessary history and theory to illus-trate the practice of description according to the Resource Description and Access standard (RDA), which replaced the Anglo-American Cataloguing Rules 2nd edition (AACR2) starting in 2013. Most of the terminology used in the text is indeed in tune with the RDA terms and prescriptions. The fourth chapter is focused on name and title access points. Illustra-tions of practical nature abound. Examples are homely, well-chosen, and have been presented in progressive com-plexity for learners to prepare catalogue records according to RDA. For example, the examples given in the second chapter continue on through subsequent chapters with ad-ditional features. Similarly, the next four chapters (5-8) deal with subject cataloguing both in theory and practice, cul-minating with the descriptive use of the Library of Congress Subject Headings (LCSH) and the Sears List of Subject Head-ings in their latest editions. Unsurprisingly, there is hardly any innovation in the subject headings works, though the descriptive cataloguing is always on the move. The next no-tional section of three chapters (9-11) is on shelf classifica-tion, which obviously deals with the Dewey Decimal Classifi-cation (DDC) and the Library of Congress Classification (LCC) with sufficient details to understand and apply these systems. The next chapter is on MARC21 and other meta-data systems such as the Dublin Core. The text closes with a full chapter on managing the cataloguing department of a library. It is comprehensive in treatment in all aspects of management and day-to-day administration, namely setting policies and objectives, staffing, budgeting, product man-agement, evaluating performance and output, reporting, marketing and communicating. Lastly, this chapter lists some current issues in knowledge organization manage-ment in libraries and media centres. This includes reorgani-zation of the cataloguing department to expand it to sub-sume acquisition and interlibrary loan work, as well as co-ordination between cataloguing and metadata staff. In this book full of technicalities, this chapter is embellished with worthwhile advice. Savour some: “Wise managers keep their ears to the ground … so they become aware of … occurrences before they happen. And they resist agreeing to goals and objectives that the department cannot accom-plish” (p. 185). They also advise to communicate to the au-

thorities the importance of the department in support of user services. Indeed, marketing is as important for this department as is for the library as a whole. The glossary, though brief, goes beyond the terms and acronyms dis-cussed in the book: a few more terms not occurring in the text have also been included. At the same time some are missing, even, e.g., RDA is not there, and the term “uni-form title” despite its mention has not been defined.

The user-friendly book has been designed and laid out for progressive learning and easy use. Each chapter has been divided into various sections with apt headings, also listed in the table of contents. Each chapter is amply illus-trated with original and real examples and closes with a skillful summary, list of up-to-date but selective additional readings and notes. The appendix given at the end provides some test questions, which are answered in the next ap-pendix. The book has three indexes, namely the concept index, name index and the index to figures and examples. The latter has four sub-indexes for documents by formats. The language is pithy, precise and clear. This handy book is indispensable for anyone new to the field for learning clas-sification and cataloging practice in the dynamically-changing library and information environment. And no doubt it is equally useful for veteran cataloguers and teach-ers of the subject, though a chapter on teaching of cata-loguing would have made it further useful for teachers. Mi-chael Gorman (2011, 162), who considers cataloging as one of the fundamental bases of librarianship, describes it as “the way librarians think.” The book seems fully to sub-scribe to this philosophical view of one of the outstanding pragmatic philosophers, experts, and advocates for knowl-edge organization of our times. M P Satija UGC Emeritus Fellow Guru Nanak Dev University, Amritsar, India References Gorman, Michael. 2011. Broken Pieces: Library Life, 1941-

1978. Chicago, IL: American Library Association. Intner, Shiela S., Joanna F. Fountain and Jean Weihs, eds.

2010. Cataloging Correctly for Kids: An introduction to Tools. 5th ed. Chicago, IL: ALA Editions.

Weihs, Jean and Sheila S. Intner. 2009. Beginning Catalog-ing. Santa Barbara, Calif.: Libraries Unlimited.

Page 61: KO KNOWLEDGE ORGANIZATION · Knowl. Org. 42(2015)No.3 KO KNOWLEDGE ORGANIZATION Official Journal of the International Society for Knowledge Organization ISSN 0943 – 7444

Knowl. Org. 42(2015)No.3 Books Recently Published

197

Books Recently Published

Hlava, Marjorie M. K. 2015. The Taxobook: History, Theo-

ries, and Concepts of Knowledge Organization. San Rafael, CA.: Morgan & Claypool.

Smiraglia, Richard P. 2015. Domain Analysis for Knowledge Organization: Tools for Ontology Extraction. Amsterdam; Oxford; New York: Chandos Publishing.

Smiraglia, Richard P., and Hur-Li Lee, eds. 2015. Ontology for Knowledge Organization. Würzburg: Ergon-Verlag.

Page 62: KO KNOWLEDGE ORGANIZATION · Knowl. Org. 42(2015)No.3 KO KNOWLEDGE ORGANIZATION Official Journal of the International Society for Knowledge Organization ISSN 0943 – 7444

Knowl. Org. 42(2015)No.3

KNOWLEDGE ORGANIZATION KO Official Journal of the International Society for Knowledge Organization ISSN 0943 – 7444

International Journal devoted to Concept Theory, Classification, Indexing and Knowledge Representation Publisher ERGON-Verlag GmbH, Keesburgstr. 11, D-97074 Würzburg Phone: +49 (0)931 280084; FAX +49 (0)931 282872 E-mail: [email protected]; http://www.ergon-verlag.de Editor-in-chief (Editorial office) Dr. Richard P. SMIRAGLIA (Editor-in-Chief), School of Information Studies, University of Wisconsin, Milwaukee, Northwest Quad Building B, 2025 E Newport St., Milwaukee, WI 53211 USA. E-mail: [email protected] Instructions for Authors Manuscripts should be submitted electronically (in Word format) in English only to the editor-in chief at http://mc04.manuscriptcentral.com/jisko and should be accompanied by an indicative abstract of 150 to 200 words. Manuscripts of articles should fall within the range 6,000-10,000 words. Longer manuscripts will be considered on consultation with the editor-in-chief.

A separate title page should include the article title and the author’s name, postal address, and E-mail address, if available. Only the title of the article should appear on the first page of the text.

To protect anonymity, the author’s name should not appear on the manuscript, and all references in the body of the text and in footnotes that might identify the author to the reviewer should be removed and cited on a separate page.

Criteria for acceptance will be appropriateness to the field of the jour-nal (see Scope and Aims), taking into account the merit of the contents and presentation. The manuscript should be concise and should con-form as much as possible to professional standards of English usage and grammar. Manuscripts are received with the understanding that they have not been previously published, are not being submitted for publi-cation elsewhere, and that if the work received official sponsorship, it has been duly released for publication. Submissions are refereed, and authors will usually be notified within 6 to 10 weeks.

The text should be structured by numbered subheadings. It should contain an introduction, giving an overview and stating the purpose, a main body, describing in sufficient detail the materials or methods used and the results or systems developed, and a conclusion or summary.

Footnotes are accepted only in rare cases and should be limited in number; all narration should be included in the text of the article. Para-graphs should include a topic sentence and some developed narrative; a typical paragraph has several sentences. Italics may not be used for em-phasis. Em-dashes should not be used as substitutes for commas.

Reference citations within the text should have the following form: (Au-thor year). For example, (Jones 1990). Specific page numbers are re-quired for quoted material, e.g. (Jones 1990, 100). A citation with two authors would read (Jones and Smith, 1990); three or more authors would be: (Jones et al., 1990). When the author is mentioned in the text, only the date and optional page number should appear in parenthesis – e.g. According to Jones (1990), …

References should be listed alphabetically by author at the end of the article. Author names should be given as found in the sources (not ab-

breviated). Journal titles should not be abbreviated. Multiple citations to works by the same author should be listed chronologically and should each include the author’s name. Articles appearing in the same year should have the following format: “Jones 2005a, Jones 2005b, etc.” Is-sue numbers are given only when a journal volume is not through-paginated. Examples:

Dahlberg, Ingetraut. 1978. A referent-oriented, analytical concept theory for INTERCONCEPT. International classification 5: 142-51.

Howarth, Lynne C. 2003. Designing a common namespace for searching metadata-enabled knowledge repositories: an international perspective. Cataloging & classification quarterly 37n1/2: 173-85.

Pogorelec, Andrej and Šauperl, Alenka. 2006. The alternative model of classification of belles-lettres in libraries. Knowledge organization 33: 204-14.

Schallier, Wouter. 2004. On the razor’s edge: between local and overall needs in knowledge organization. In McIlwaine, Ia C. ed., Knowl-edge organization and the global information society: Proceedings of the Eighth In-ternational ISKO Conference 13-16 July 2004 London, UK. Advances in knowledge organization 9. Würzburg: Ergon Verlag, pp. 269-74.

Smiraglia, Richard P. 2001. The nature of ‘a work’: implications for the or-ganization of knowledge. Lanham, Md.: Scarecrow.

Smiraglia, Richard P. 2005. Instantiation: Toward a theory. In Vaughan, Liwen, ed. Data, information, and knowledge in a networked world; Annual confer-ence of the Canadian Association for Information Science … London, Ontario, June 2-4 2005. Available http://www.cais-acsi.ca/2005proceedings.htm.

Illustrations should be kept to a necessary minimum and should be embedded within the document. Photographs (including color and half-tone) should be scanned with a minimum resolution of 600 dpi and saved as .jpg files. Tables and figures should be embedded within the document. Tables should contain a number and title at the bottom, and all columns and rows should have headings. All illustrations should be cited in the text as Figure 1, Figure 2, etc. or Table 1, Table 2, etc.

Keywords: authors may suggest keywords, but the published list will be provided by editorial staff after textual analysis of the manuscript.

The entire manuscript should be double-spaced, including notes and references.

Upon acceptance of a manuscript for publication, authors must provide a wallet-size photo and a one-paragraph biographical sketch (fewer than 100 words). The photograph should be scanned with a minimum resolution of 600 dpi and saved as a .jpg file. Advertising Responsible for advertising: ERGON-Verlag GmbH, Keesburgstr. 11, 97074 Würzburg (Germany). © 2015 by ERGON-Verlag GmbH. All Rights reserved. KO is published by ERGON-Verlag GmbH. – The price for the print version (8 issues/ann.) is € 329,00/ann.

including airmail delivery. – The price for the print version plus access to the online version

(PDF) is € 359,00/ann. including airmail delivery.

Page 63: KO KNOWLEDGE ORGANIZATION · Knowl. Org. 42(2015)No.3 KO KNOWLEDGE ORGANIZATION Official Journal of the International Society for Knowledge Organization ISSN 0943 – 7444

Knowl. Org. 42(2015)No.3

KO KNOWLEDGE ORGANIZATION Official Journal of the International Society for Knowledge Organization ISSN 0943 – 7444

International Journal devoted to Concept Theory, Classification, Indexing and Knowledge Representation

Scope

The more scientific data is generated in the impetuous present times,

the more ordering energy needs to be expended to control these data in

a retrievable fashion. With the abundance of knowledge now available

the questions of new solutions to the ordering problem and thus of im-

proved classification systems, methods and procedures have acquired

unforeseen significance. For many years now they have been the focus

of interest of information scientists the world over.

Until recently, the special literature relevant to classification was

published in piecemeal fashion, scattered over the numerous technical

journals serving the experts of the various fields such as:

philosophy and science of science

science policy and science organization

mathematics, statistics and computer science

library and information science

archivistics and museology

journalism and communication science

industrial products and commodity science

terminology, lexicography and linguistics

Beginning in 1974, KNOWLEDGE ORGANIZATION (formerly

INTERNATIONAL CLASSIFICATION) has been serving as a

common platform for the discussion of both theoretical background

questions and practical application problems in many areas of concern.

In each issue experts from many countries comment on questions of an

adequate structuring and construction of ordering systems and on the

problems of their use in opening the information contents of new litera-

ture, of data collections and survey, of tabular works and of other ob-

jects of scientific interest. Their contributions have been concerned with

(1) clarifying the theoretical foundations (general ordering theory/

science, theoretical bases of classification, data analysis and re-

duction)

(2) describing practical operations connected with indexing/classifi-

cation, as well as applications of classification systems and

thesauri, manual and machine indexing

(3) tracing the history of classification knowledge and methodology

(4) discussing questions of education and training in classification

(5) concerning themselves with the problems of terminology in gen-

eral and with respect to special fields.

Aims Thus, KNOWLEDGE ORGANIZATION is a forum for all those in-terested in the organization of knowledge on a universal or a domain-specific scale, using concept-analytical or concept-synthetical ap-proaches, as well as quantitative and qualitative methodologies. KNOWLEDGE ORGANIZATION also addresses the intellectual and automatic compilation and use of classification systems and thesauri in all fields of knowledge, with special attention being given to the prob-lems of terminology.

KNOWLEDGE ORGANIZATION publishes original articles, reports on conferences and similar communications, as well as book re-views, letters to the editor, and an extensive annotated bibliography of recent classification and indexing literature.

KNOWLEDGE ORGANIZATION should therefore be available at every university and research library of every country, at every infor-mation center, at colleges and schools of library and information sci-ence, in the hands of everybody interested in the fields mentioned above and thus also at every office for updating information on any topic related to the problems of order in our information-flooded times.

KNOWLEDGE ORGANIZATION was founded in 1973 by an international group of scholars with a consulting board of editors repre-senting the world’s regions, the special classification fields, and the sub-ject areas involved. From 1974-1980 it was published by K.G. Saur Ver-lag, München. Back issues of 1978-1992 are available from ERGON-Verlag, too.

As of 1989, KNOWLEDGE ORGANIZATION has become the official organ of the INTERNATIONAL SOCIETY FOR KNOWL-EDGE ORGANIZATION (ISKO) and is included for every ISKO-member, personal or institutional in the membership fee.

Rates: From 2015 on for 8 issues/ann. (including indexes) € 329,00 (forwarding costs included) for the print version resp. € 359,00 for the print version plus access to the online version (PDF). Member-ship rates see above.

ERGON-Verlag GmbH, Keesburgstr. 11, D-97074 Würzburg; Phone: +49 (0)931 280084; FAX +49 (0)931 282872; E-mail: [email protected]; http://www.ergon-verlag.de

Founded under the title International Classification in 1974 by Dr. Ingetraut Dahlberg, the founding president of ISKO. Dr. Dahlberg served as the journal’s editor from 1974 to 1997, and as its publisher (Indeks Verlag of Frankfurt) from 1981 to 1997.

The contents of the journal are indexed and abstracted in Social Sci-ences Citation Index, Web of Science, Information Science Abstracts, INSPEC, Library and Information Science Abstracts (LISA), Library, Information Science & Technology Abstracts (EBSCO), Library Literature and Information Science (Wilson), PASCAL, Referativnyi Zhurnal Informatika, and Sociological Ab-stracts.