11
Finding the needle: controlled vocabularies, resource discovery, and Dublin Core Kuang-Hwei (Janet) Lee–Smeltzer* Bibliographic Control and Electronic Resource Services, Colorado State University Libraries, Colorado State University Libraries, Fort Collins, CO 80523-1019, USA Abstract The phenomenal growth of digital resources on the Internet, their lack of organization, and the deficiency of the search tools currently available, make searching for information on the Internet comparable to looking for the proverbial “needle in a haystack.” Developing more effective means of resource discovery and retrieval on the Internet is increasingly necessary. Dublin Core (DC), a newly developed metadata set for resource description, has the potential for providing more effective resource discovery. One major obstacle remains, however: the lack of a systematic approach to subject access. This paper discusses the need for applying controlled vocabularies to enhance the discovery of document-like objects on the Internet and outlines some options for such a process in a distributed environment, with an emphasis on the enhancement of DC with controlled vocabularies. © 2000 Elsevier Science Ltd. All rights reserved. Keywords: Cataloging; Metadata; Dublin Core; Controlled vocabularies; Resource discovery 1. Introduction Many metaphors have been used to describe the phenomenal growth of digital resources on the Internet and the lack of effective ways to discover and access them. The Internet has been compared to a mega bookstore where books are piled everywhere: on tables, chairs, floors, with new ones being continuously added to the piles and with their front matter torn off so there is no way to identify the authorship or what the books are about [1]. Another has * Corresponding author. Tel.: 11-970-491-1849; fax: 11-970-491-4611. E-mail address: [email protected] (K.H. Lee–Smeltzer). Library Collections, Acquisitions, & Technical Services 24 (2000) 205–215 1464-9055/00/$ – see front matter © 2000 Elsevier Science Ltd. All rights reserved. PII: S1464-9055(00)00131-7

Finding the needle: controlled vocabularies, resource discovery, and Dublin Core

Embed Size (px)

Citation preview

Finding the needle: controlled vocabularies, resourcediscovery, and Dublin Core

Kuang-Hwei (Janet) Lee–Smeltzer*

Bibliographic Control and Electronic Resource Services, Colorado State University Libraries, Colorado StateUniversity Libraries, Fort Collins, CO 80523-1019, USA

Abstract

The phenomenal growth of digital resources on the Internet, their lack of organization, and thedeficiency of the search tools currently available, make searching for information on the Internetcomparable to looking for the proverbial “needle in a haystack.” Developing more effective meansof resource discovery and retrieval on the Internet is increasingly necessary. Dublin Core (DC), anewly developed metadata set for resource description, has the potential for providing moreeffective resource discovery. One major obstacle remains, however: the lack of a systematicapproach to subject access. This paper discusses the need for applying controlled vocabularies toenhance the discovery of document-like objects on the Internet and outlines some options for sucha process in a distributed environment, with an emphasis on the enhancement of DC withcontrolled vocabularies. © 2000 Elsevier Science Ltd. All rights reserved.

Keywords:Cataloging; Metadata; Dublin Core; Controlled vocabularies; Resource discovery

1. Introduction

Many metaphors have been used to describe the phenomenal growth of digital resourceson the Internet and the lack of effective ways to discover and access them. The Internet hasbeen compared to a mega bookstore where books are piled everywhere: on tables, chairs,floors, with new ones being continuously added to the piles and with their front matter tornoff so there is no way to identify the authorship or what the books are about [1]. Another has

* Corresponding author. Tel.:11-970-491-1849; fax:11-970-491-4611.E-mail address:[email protected] (K.H. Lee–Smeltzer).

Library Collections, Acquisitions,& Technical Services 24 (2000) 205–215

1464-9055/00/$ – see front matter © 2000 Elsevier Science Ltd. All rights reserved.PII: S1464-9055(00)00131-7

described the Internet as “a library in which authors shelve their own books, haphazardly,rewrite them overnight, and move them from place to place without warning” [2]. Most ofthese metaphors center around the theme of chaos. The chaos is escalating. In April 1998, itwas estimated that there were 320 million indexable pages on the Web [3]. By July 1999, theestimated number of Web pages increased to 800 million [4].

The sheer amount of information on the Internet, the lack of organization, and thedeficiency of the searching tools currently available—directories (such as Yahoo®) andsearch engines (such as AltaVista), all contribute to the ineffectiveness of resource discoveryon the Internet. The problems and deficiencies of both are well documented in the literature.Although directories, such as Yahoo®, offer some sort of categorization, their subjectcategories often lack cross-references and the list can get long and unwieldy [5]. It is costlyand labor-intensive for humans to create subject categories [6].

Neither search engines nor directories cover the entire Internet. A recent NEC ResearchInstitute study found that “most of the major search engines index less then 10% of the Web.Even by combining all the major search engines, only 42% of the Web has been indexed”[7]. Because directories involve human intervention, it is reasonable to assume that theycover even less of the Internet than the search engines. Yahoo®, boasting the largestdirectory on the Internet, covers only about 1.2 million classified sites as of April, 1999 [8].The gap is widening as it becomes more difficult to keep up with the increase in the numberof Web pages.

Full-text indexing, non-contextual keyword searching, and little in the way of relevanceranking, result in overly large retrieval sets with low relevancy. Terry–Kuny noted “What isapparent to researchers in distributed indexing is that the current strategy of search engines—that is, to indiscriminately harvest whatever they can find and then do selective indexing onthose contents—is an unsustainable architecture for retrieval in a billion document universe”[9]. In addition, search results can be radically different from one search engine to the next.

Why are people both within and outside the information science field so concerned withthe discovery and retrieval of Internet resources? It is because the Internet has fundamentallychanged the concept of publishing and the way information is disseminated. There arevaluable information resources on the Internet that warrant more effective and systematicways for identifying, describing, and retrieving them. This paper outlines some approachesto applying controlled vocabularies for subject access to enhance resource discovery on theInternet with an emphasis on the enhancement of Dublin Core (DC), a newly developedmetadata set for resource description. The discussion is limited to document-like objects(DLO) on the Internet.

2. The metadata movement and the discovery of Internet resources

There are many discussions and debates among information professionals over differentapproaches for resource discovery on the Internet. One such approach is using metadata fordescription, and it has gained increasing popularity. Metadata is generally defined as dataabout data. More specifically, it is a structured set of elements that describe an “informationpackage” for the purpose of identification, discovery, and use of information. An information

206 K.H. Lee–Smeltzer / Libr. Coll. Acq. & Tech. Serv. 24 (2000) 205–215

package is defined as an instance of recorded information, e.g., book, article, Internetdocument, electronic journal, etc. [10].

Metadata for description is only one type of metadata, albeit one that we are most familiarwith in the library world. There are metadata for rating and content selection such as PICS(Platform for Internet Content Selection), for rights management and intellectual property(project Interoperability of Data in E-Commerce Systems (INDECS) is investigating rightsmanagement metadata from several different approaches) [11], and for administrative pur-poses (A-Core) [12]. Descriptive metadata ranges from the very rich, highly technicalschema such as MARC formats and TEI headers to the relatively simple 15-element set DC.Metadata can be imbedded in the document itself or exist separately.

3. The development of Dublin Core and its potential as an internationalstandard for resource discovery on the Internet

The DC Metadata Element Set is the result of the March 1995 Metadata Workshopsponsored by the OCLC and the National Center for Supercomputing Applications. Initiatedby OCLC, various organizations such as the Research Libraries Group, Coalition forNetworked Information, Internet Engineering Task Force, UK Office for Library and Infor-mation Networking, and the National Library of Australia have participated in the develop-ment of DC. It has gained international attention and support from communities who areconcerned with resource discovery on the Internet.

A basic set of DC (DC Simple) consists of 15 descriptive elements that can be categorizedinto three groups indicating the class or scope of information stored in them (Table 1). Allelements are optional and can be repeated. Qualified DC (DCQ) allows each element to befurther refined by a limited set of qualifiers. For example, “Creator” can be further definedby “Type,” such as PersonalName or CorporateName. “Subject” can be further qualified by“Scheme” such as LCSH (Library of Congress Subject Headings), LCC (Library of CongressClassification), etc. DC records can be generated by document creators, automatically byintelligent software agents, or by trusted third parties such as librarians. DC has the potentialof being adopted as an international standard for Internet resource description not onlybecause it is the result of international collaboration and consensus, but also because of its

Table 1Dublin Core elementsa

Content Intellectual property InstantiationTitle Creator DateSubject Publisher FormatDescription Contributor IdentifierType Rights LanguageSourceRelationCoverage

a Source: Weibel S, Kunze J, Lagoze C, Wolf M. Dublin Core metadata for resource discovery. IETF #2413.The Internet Society, September 1998. (Http://www.ietf.org/rfc/rfc2413.txt).

207K.H. Lee–Smeltzer / Libr. Coll. Acq. & Tech. Serv. 24 (2000) 205–215

simplicity, extensibility, and interoperability. The DC element set has already been translatedinto other languages such as Arabic, Chinese, French, German, Spanish, etc. [13].

DC elements are being reviewed for clarification and formatted to conform with theinternational standard for expressing the semantics of data elements (ISO 11179). A stan-dards proposal will be submitted to NISO (National Information Standards Organization) in1999 [14]. DC’s simplicity promotes wide and general application, whereas its extensibilityallows augmentation by different communities according to their specific needs. The in-teroperability of DC enables resource description records created using DC to be imple-mented across disciplines and in different environments.

The simple structure of the set promotes general applicability and can be easily created byInternet information suppliers. However, this also suggests a potential problem: lack ofconsistency in how the data are entered, thus influencing its effectiveness for resourcediscovery. Recognizing this, a DC User Guide Working Group is in the process of devel-oping guidelines and promoting “best practices” in a non-technical fashion for both DCSimple and DCQ. The use of controlled vocabularies in the Subject element is also encour-aged to enhance the search and retrieval of Internet resources.

4. Why controlled vocabularies are still necessary for taming the net: aliterature review

Although keyword and full-text searching can be useful in some cases, the biggestchallenge for searching and identifying resources on the Internet remains the lack of effectivemeans of providing subject access. Traditional cataloging and library catalogs have longprovided an effective means for accessing the print collection. This success is largely due toapplying established cataloging rules and controlled vocabularies to records for each item ina library’s collection, thus generating comprehensive and more consistent search results. Inhis keynote address to the Finding Common Ground Conference, Clifford Lynch stated:

The Web searching engines—Lycos, Alta Vista, and the like—have been identified bymany as the future of information organization . . . these systems lack high-qualityretrieval capabilities for many applications (when contrasted to systems designed aroundretrieval from intellectual cataloging, abstracting and indexing by human beings, such asonline library cataloging or good A&I database implementations) [15].

For the purpose of this paper, controlled vocabularies are defined as classification schemaor lists of terms and phrases selected to express certain concepts that are applied to recordsin a database in a consistent manner. A classification scheme places subjects in categoriesand assigns a set of numbers, letters, symbols, or a combination of all of the above, in ahierarchy. Controlled vocabulary can be universal schema such as DDC (Dewey DecimalClassification), LCC, LCSH, schema specific to a discipline, such as the National Library ofMedicine classification and subject headings (MeSh), or locally devised systems.

A literature review indicates that controlled vocabulary is crucial for effective search andretrieval of Internet resources [16,17,18,19]. It is also interesting to note that according to theApril 1998 Search Engine Report, directory-style architecture for portal sites is gaining in

208 K.H. Lee–Smeltzer / Libr. Coll. Acq. & Tech. Serv. 24 (2000) 205–215

popularity with commercial search engines. For example, AltaVista and other search engineshave added directories to their services. “[D]irectories are on the rise mainly as a responseto users continuing to perform overly broad searches, which can leave them feeling lostamong a sea of off-target results” [20]. Even with the problems associated with directoriesmentioned previously, searching through rough subject categories is still considered a morestructured approach than use of full-text keyword searches. Furthermore, the larger the filesbeing searched and the broader the subject coverage, the greater the need for controlledvocabularies [21]. In a language rich in synonyms, such as English, without the controlledform for access to a particular subject term, users will have to search for every variation ofthat term or run the risk of retrieving too many entries and missing important ones, not evenknowing what they have missed [22].

Lassila pointed out that “[m]achines (in this case, search engines) cannot understand anatural-language document and thus cannot always extract specific information from thedocument such as author, publication date, or topic.” [23] “[K]eywords thus perform poorlyin situations where a search index covers multiple subject areas, as is the case with the Web.”[24] Furthermore, Lassila suggested that it would be useful if other means of searching wereavailable to us in addition to full-text search, such as the author of the document, the date itwas published, and the specific topic it discussed, although the topic might not be expressedas a particular word in the document itself [25]. This suggestion implies some form ofcontent analysis and application of controlled vocabulary to these documents. “The differ-ence between keyword searching and controlled vocabulary searching is the distinctionbetween mechanistic access to isolated terms and intellectual access to unified concepts. Thisis the key to the utilization of the Internet . . . the presence of a controlled vocabulary greatlyenhances the usability of any information retrieval system” [26].

5. Scenarios for applying controlled vocabularies for subject access to Internetresources in a shared environment

Analyzing the subject content of a work and assigning controlled vocabulary are the mostlabor-intensive and costly elements of the cataloging process. The dynamic nature ofresources on the Internet adds further difficulty to the already labor-intensive process.Options for better subject access, however, need to be developed and examined for moreeffective resource discovery. The following approaches outline some possible scenarios foraccomplishing this. They are by no means exhaustive. Some of these approaches are alreadybeing used.

5.1. The traditional cataloging approach

Libraries have not been collecting all the print materials that have been published, only theselected ones that fit the mission and the purposes of the institution. The same should applyto the electronic resources on the Internet. As Erik Jul has pointed out, not all Internetresources are worth collecting and cataloging. However, if libraries of different types andsizes with diverse subject emphases and interests follow the same systematic approach as

209K.H. Lee–Smeltzer / Libr. Coll. Acq. & Tech. Serv. 24 (2000) 205–215

they have for collecting and making access to print materials, the “most worthy” Internetresources will be identified, described, and accessible for the users. Even so, this approachdoes not address a bigger problem: a very small percentage of Internet resources will get thislabor-intensive treatment.

Libraries will take a more proactive role in selecting “worthy” Web resources forinclusion in their collection in accordance with their collection development policies.Cataloging records with controlled vocabularies for subject access can be created for theseresources using the current cataloging rules and MARC formats. These records can be inputinto the bibliographic utility for sharing with other libraries and for inclusion in the localonline catalog. The process is easily integrated into the cataloging workflow. In addition, theOCLC Intercat project and the cataloging community’s continuous efforts to revise thecataloging rules and MARC formats to accommodate some of the fundamental differencesposed by electronic resources on the Internet will help shape the cataloging standards andpractices that are still evolving.

5.2. The DC metadata approach

Although the MARC record provides a rich format for resource description, its use is alsolabor-intensive and requires special technical knowledge of the rules for cataloging andMARC encoding. DC was developed specifically for describing document-like objects on theWeb. Once DC becomes a standard and is widely implemented, it may prove to be a moresuitable scheme for resource discovery on the Internet. The creation of a simple DC recorddoes not require special training and can even be done automatically with a template andauthor-supplied information. DC can be encoded in HTML or XML, both of which are, orwill be, more widely used than MARC formats that are usually limited to the librarycommunity. For resources on the Internet, there are other considerations such as intellectualproperty, rights management, and administrative information that may be handled moreeffectively by DC (or other metadata formats) than by MARC. For these and other reasons,DC may be more attractive than MARC in the commercial marketplace for developingelectronic resources.

The initial intention for DC is to have a simple DC record created by the informationprovider at the point of posting the resource on the Internet. Trusted third parties suchas librarians can then enhance the record. Using the scheme qualifier under the element“subject,” the record can be enhanced by various controlled vocabularies such as DDC,LCC, LCSH, etc. To share the enhanced record in a distributed environment, a centralregistry or database becomes necessary. In fall 1998, OCLC announced the experimentalCooperative Online Resource Catalog (CORC) project to explore the cooperative cre-ation and use of metadata primarily for online resources. Based on the WorldCat modelfor print materials, CORC will explore the feasibility of having different record struc-tures, namely MARC and DC, in the same database. By using Resource DescriptionFramework (RDF)-compliant Extensible Markup Language (XML), HTML, and MARC,records can be imported into or exported from CORC. RDF is a standard recommendedby the World Wide Web Consortia (W3C). It provides a framework for exchangingmetadata of many varieties. The CORC system can harvest online resources and create

210 K.H. Lee–Smeltzer / Libr. Coll. Acq. & Tech. Serv. 24 (2000) 205–215

simple descriptions of Web resources automatically. Participants of CORC can createand contribute records to the database in the same fashion as OCLC member librariesshare records in WorldCat. Records can then be exported for local use. CORC has plansto support authority control and automatic assignment of classification numbers andsubject headings. CORC is another step forward in the effort of improving discovery andaccess to resources on the Internet. If successful, it also has significant implications forthe library world. The current OPAC technology is not capable of containing andprocessing different types of data structures. The CORC system can be modeled for thedevelopment of future generations of library OPACs.

There are many other DC projects in North America, Europe, Australia, and Nordiccountries: University of California—Berkeley’s Digital Library Catalog (http://sunsite.berkeley.edu/Catalog), the Gateway to Educational Materials (GEM) initiated by the USDepartment of Education and the National Library of Education (http://gem.syr.edu), theNordic Metadata I project funded by NORDINFO from October 1996 to June 1998 (http://linnea.helsinki.fi/meta/), and the PANDORA Project at the National Library of Australia forbuilding an online archive (http://www.nla.gov.au/policy/pandje97.html). The hope for awide implementation of DC has yet to become a reality. Deployment of DC has not reachedthe critical mass for most commercial search engine developers to be interested in incorpo-rating DC into their search algorithms. Early experiments with and implementations of DChave been mostly carried out by libraries and research institutions that collect or createscholarly information as illustrated by the examples named above.

5.3. Automatic subject analysis and assignment of controlled vocabularies

Classification and application of subject headings are the most labor-intensive and costlyparts of document description. They are also the processes most difficult for humans to applyin a large-scale distributed information environment such as the Internet. Can options bedeveloped to automate the subject analysis and the assignment of controlled vocabularies?

No computer programming at present can fully replace the human intellectual process ofassigning controlled vocabularies. None of the past experiments with automatically assigningclassification numbers and subject headings was very successful. With the advancement ofcomputing technology and the development of artificial intelligence and natural languageprocessing, however, it is conceivable that in the future such programs could be designed torender acceptable results.

It is possible for a sophisticated computer program to examine the entire contents of anelectronic document, or parts thereof, such as the abstract, summary, and the bibliographicalreferences, to determine words that are significant within context according to some algo-rithm. The missing piece in this scenario is a program that is capable of matching the resultto an existing list of controlled terms or classification scheme and assigning the “appropriate”subject headings and/or classification number automatically, and with a high degree ofaccuracy.

It is important to note that several research and experimental projects in this area arecurrently underway. These projects approach the automating process with different models(computer linguistic methods, neural network models, etc.), and some are showing promising

211K.H. Lee–Smeltzer / Libr. Coll. Acq. & Tech. Serv. 24 (2000) 205–215

results. OCLC Scorpion is a research project designed to build “tools for automatic subjectrecognition by combining library science and information retrieval techniques” [27]. Anelectronic document is input as a query against the Scorpion Dewey Databases, using theSMART (System for Manipulating And Retrieving Text) ranked retrieval algorithm. Theprogram returns a ranked list of potential Dewey numbers as subjects. The list can then bereviewed by a human cataloger. Scorpion is used for generating classification numbers andsubject headings for OCLC NetFirst, a database of Web resources available via the OCLCFirstSearch and OCLC Cataloging services.

Another interesting project is the development of “Entry Vocabulary Modules” (EVMs)at the University of California–Berkeley, School of Information Management and Systems.The project experimented with mapping natural language queries to controlled vocabulariessuch as the LCSH via an association dictionary. Although the experiment was conducted inspecific databases as well as the catalog of the University of California’s online librarysystem MELVYL [28], further research could be conducted to test the feasibility of applyingthis experiment’s concept and methodology to resources on the Internet.

5.4. Subject access to resources on the Internet as one virtual library via subjectgateways

Subject gateways are sometimes referred to as subject-based information gateways,subject trees, etc. Many such gateways have been developed to enhance resource discoveryand to provide organization of resources on the Internet in specific subject areas. Some ofthem arrange resources by classification schema and others by subject headings. Most ofthese subject gateways link to resources that have been through a human reviewing orselection process. Examples of subject gateways include: CyberStacks(sm) (http://www.public.iastate.edu/;CYBERSTACKS/); Art, Design, Architecture & Media InformationGateway (ADAM) (http://www.adam.ac.uk/adam/index.html); and Organizing Medical Net-worked Information (OMNI) (http://omni.ac.uk/).

Although similar to traditional cataloging—the assignment of classification and sub-ject headings is still performed by humans—subject gateways take a different approachfrom the traditional library OPAC in that they treat the entire Internet as one virtuallibrary and, in many cases, provide additional searching facilities and subject-relatedservices [29]. Some gateways focus on one single subject area whereas others coverseveral different disciplines. These gateways provide better subject access to qualityresources on the Internet. However, as more and more subject gateways are created andmultiple gateways are devoted to the same subject, searching all of them is time-consuming but selecting which ones to search can be equally difficult if there is no crosssearching. This is particularly problematic when searching for information in inter-disciplinary areas. In an effort to improve the situation and develop a system wherebycross-search in multiple gateways can be executed by a single query, Kirriemuir et al.described a query routing and forward knowledge approach using the Common IndexingProtocol to facilitate cross-searching in multiple gateways [30].

212 K.H. Lee–Smeltzer / Libr. Coll. Acq. & Tech. Serv. 24 (2000) 205–215

5.5. Create online thesauri for Internet resources

The association dictionary in Berkeley’s EVMs project can be viewed as an onlinethesaurus. In the first stage of the experiment, an algorithm was developed by studying andlearning the association between the lexical clues found in the cataloging records and thesubject headings assigned by human catalogers [31]. The advantages of an online thesaurusare that it can be modified and updated more easily and automatically. The results of themapping can either be transparent to the searcher or a ranked list of choices of broader,narrower, or related terms for the searcher to further refine the search. The thesaurus mayhave the capability of mapping the natural language search terms to more than one controlledvocabulary scheme as well as linking among different controlled vocabulary schema. Thecreation and maintenance of such an online thesaurus for Internet resources, however, can bea tremendous task. Although most of the process may be done automatically, humanintervention will still be needed to decide the structure and mapping mechanism of thethesaurus. If different online thesauri are created for different disciplines, is it beneficial toprovide links among them? How can the appropriate links be created and maintained?

6. Conclusion

We have reviewed various efforts from many different communities (library science,information science, computer science) that attempt to better organize Internet resources andprovide more effective methods for resource discovery. These approaches range fromtraditional cataloging in the library world to the recent metadata movement and the devel-opment of the Resource Description Framework for building an architecture that enables theco-existence of different data structures. However, it is clear that one size does not fit all.Traditional library cataloging, MARC records, and the various metadata schema all havetheir places in the sense- and order-making business of the Internet. Some resources maywarrant the most labor-intensive and detailed description, whereas for others a descriptionbased on a simple metadata scheme such as DC Simple will suffice.

Librarians, catalogers specifically, have long developed the expertise in organizing infor-mation stored in physical media and making them accessible in a coherent and consistentmanner through the library catalog. Coherence and consistency are achieved by applyingestablished cataloging rules and the use of controlled vocabularies. Although keywords andfull-text indexing can be useful in some situations, controlled vocabularies are still necessaryfor effective resource discovery in large distributed databases with broad subject coveragesuch as the Internet. The skills of analyzing the intellectual content of a work, classifying it,applying subject headings to it, and arranging the library’s collection logically are easilytransferred to the organization of information stored electronically. There are issues, how-ever, yet to be resolved and standards to be developed for organizing and managing this newmedium.

In addition to using our expertise and sharing our experience in knowledge and informa-tion management, librarians need to be actively involved in the development of many othermetadata schema, as well as innovative ways of making resource discovery on the Internet

213K.H. Lee–Smeltzer / Libr. Coll. Acq. & Tech. Serv. 24 (2000) 205–215

more effective. Only through collaborative efforts from different communities can order beachieved on the Internet. Clifford Lynch noted “. . . the librarian’s classification and selectionskills must be complemented by the computer scientist’s ability to automate the task ofindexing and storing information. Only a synthesis of the differing perspectives brought byboth professions will allow this new medium to remain viable” [32]. In this way we canreduce the difficulty in finding and retrieving the needles in the Internet haystack.

References

[1] Taylor AG. Perspectives on the subject of subjects. Journal of Academic Librarianship 1995;21:485.[2] Mandel CA, Wolven R. Intellectual access to digital documents: joining proven principles with new

technologies. In: Ling-yuh WP, Cox BJ, editors. Electronic resources: selection and bibliographic control.New York: Haworth, 1996. p. 26.

[3] Search engine sizes. Search Engine Watch, 1999. http. //searchenginewatch.com/reports/sizes.html.[4] Search engine sizes. Search Engine Watch, 1999. http. //searchenginewatch.com/reports/sizes.html.[5] Vellucci SL. Options for organizing electronic resources: the coexistence of metadata. Bulletin of the

American Society for Information Science 1997;24:16.[6] Filman RE, Sangam P. Searching the Internet. IEEE Internet Computing 1998;2:21.[7] Dunn A. Most of web beyond scope of search sites. Los Angeles Times Business, 1999. Http://www.

latimes.com/HOME/BUSINESS/UPDATE/lat search990708.htm[8] Directory Sizes. Search Engine Watch http://www.searchenginewatch.com/reports/directories.html.[9] A Note on Internet Search Engines and Metadata (Draft December 11, 1998) posted on the Knowledge

Access Management Discussion Forum KAM @OCLC.org on 1/15/99.[10] Taylor AG. The organization of information. Englewood, CO: Libraries Unlimited, 1999. p. 246.[11] Rust G. Metadata: The right approach: an integrated model for descriptive and rights metadata in e-com-

merce. D-Lib Magazine, 1998. http://www.dlib.org/dlib/july98/rust/07rust.html.[12] Internet Draft: http://metadata.net/admin/draft-iannella-admin201.txt.[13] Weibel S. The state of the Dublin Core metadata initiative April 1999. D-Lib Magazine 1999;5. http://

www.dlib.org/dlib/april99/04weibel.html.[14] Ibid.[15] Lynch CA. Keynote address–finding common ground. In: LaGuardia C, Mitchell BA, editors. Finding

common ground: creating the library of the future without diminishing the library of the past. New York:Neal–Schuman Publishers, 1998. p. 7.

[16] Desai BC. Supporting discovery in virtual libraries. Journal of the American Society for Information Science1997;48:190–204.

[17] Micco M. Subject authority control in the world of Internet: Part 1. LIBRES: Library and InformationScience Research 1996;6. http://www.bubl.ac.uk/journals/lis/kn/libres/v06n0396/micco 1.html.

[18] Micco M. Subject authority control in the world of Internet: Part 2. LIBRES: Library and InformationScience Research 1996;6. http://www.bubl.ac.uk/journals/lis/kn/libres/v06n0396/micco 2.html.

[19] Taylor AG. Perspectives on the subject of subjects. Journal of Academic Librarianship 1995;21:484–91.[20] Directories Take Center Stage, The Search Engine Report, 1998. http://searchenginewatch.internet.com/

sereport/9805-directory.html.[21] Hagler R. The bibliographic record and information technology, 3rd edition. Chicago: American Library

Association, 1997. p. 263.[22] Tillett BB. International shared resource records for controlled access. ALCTS Newsletter Online: From

Cataloging to Gateway 1998;10. http://www.ala.org/accts/alcts news/v10n1/gateway.html).[23] Lassila O. Web metadata: a matter of semantics. IEEE Internet Computing 1998;2:30.[24] Ibid.[25] Ibid.

214 K.H. Lee–Smeltzer / Libr. Coll. Acq. & Tech. Serv. 24 (2000) 205–215

[26] Connaway LS, Wallace DP. Organized access to engineering internet resources using indexing principles.In: LaGuardia C, Mitchell BA, editors. Finding common ground: creating the library of the future withoutdiminishing the library of the past. New York: Neal–Schuman Publishers, 1998. p. 393.

[27] Shafer K. Scorpion helps cataloging the web. June 99. http://orc.rsch.oclc.org:6109/b-asis.html.[28] Norgard B. Entry vocabulary models and agents. June 28, 1998, Rev. July 14, 1998. http://www.sims.

berkeley.edu/research/metadata/agents.html. Project homepage: http://www.sims.berkeley.edu/research/metadata/).

[29] Kirriemuir J, Brickley D, Welsh S, Knight J, Hamilton M. Cross-searching subject gateways: the queryrouting and forward knowledge approach. D-Lib Magazine Jan. 1998. http://www.dlib.org/dlib/january98/01kirriemuir.html.

[30] Ibid.[31] Plaunt C, Norgard BA An association-based method for automatic indexing with a controlled vocabulary.

Journal of the American Society for Information Science 1998;49:888.[32] Lynch C. Searching the Internet: combining the skills of the librarian and the computer scientist may help

organize the anarchy of the internet. Scientific American 1997;276:52.

215K.H. Lee–Smeltzer / Libr. Coll. Acq. & Tech. Serv. 24 (2000) 205–215