Upload
ira-scott
View
220
Download
0
Tags:
Embed Size (px)
Citation preview
Knowledge Organization:Library Tools and
Taxonomies for the WebJan Herd [email protected]
Business Reference Services
Science, Technology & Business Division
The Library of Congress
3
Web is too big to organize?One billion pages1.5 million pages added
dailySelection of sites by
collection development specialists/reference librarians
4
Librarians work in corporate settings
Yahoo.com (directory)
Northern Light.com
(search engine)
Amazon.com (e-book seller)
Microsoft.com
6
Traditional Library Tools on the Web
Medical Subject Headings 1996
Web Dewey 2000
Classification Web 2001 (LCSH & LCC)
7
Importance of controlled vocabulary as metadata
American Library Association
Subject Analysis Committee (SAC)
Subcommittee on Metadata and
Subject Analysis recommendations
http://www.ala.org/alcts/organization/
ccs/metarept2.html
8
Controlled VocabulariesWhy We Need Them
Used “behind” search engines
Standard in online databases
New adherents (i.e., Web Content
Managers utilizing Taxonomies)
They Work !
9
Sherry Vellucci, Associate Professor, St. John’s Univ., during the Conference on Bibliographic Control for the New Millennium:
“authority control is not only wonderful, but critical. Controlled vocabulary mediating tools should cover Subjects, Genres, Gazetteers, Names and Titles, etc.”
10
Metathesauri/Subject CorrelationsUniversal Medical Language System
(UMLS) maps over 60 medical and health care thesauri in one
http://www.nlm.nih.gov/pubs/ factsheets/umlsmeta.html
ClassificationWebThe Library of Congress subject
headings and LC classification correlations
http://classweb.loc.gov
22
Mapping:Standard information exchangesystemsDublin Core to MARC
http://lcweb.loc.gov/marc/dccross.html
MARC to Dublin Core
http://www.loc.gov/marc/marc2dc.htmlXMLMARC Crosswalk
http://lcweb.loc.gov/marc/marcsgml.html (Must download files)
MARC to XML to MARC Converter http://www.logos.com/marc/default.asp
23
Mapping:Specialized information exchange systems
Standard Industrial Classification (SIC codes)
to
North American Industrial Classification System (NAICS codes)
25
SIC Code Example Major group 73=Business services
737=Computer programming, data processing, and other computer related services, 7372=Prepackaged software
Equivalent NAICS codes are:
Major group=51 Information
511=Publishing industries
5112=Software publishers (with cross ref. to Sector 42 for reselling packaged software)
27
What is a Taxonomy ?
A high level information search device constructed to provide a means of understanding, navigating, and gaining access to intellectual capital.
28
384 - 322 B.C.
Aristotle
Library of Alexandria
Carl Linnaeus
1707-1778
Kallimachos
305 - 240 B.C.
History of Taxonomies
34
Service Codes CODE TITLE A Research and Development B Special Studies and Analysis ‑ Not R&D C Architect and Engineering Services ‑ Construction D Information Technology Services, including Telecommunication Services E Purchase of Structures and Facilities F Natural Resources and Conservation Services G Social Services H Quality Control, Testing and Inspection Services J Maintenance, Repair, and Rebuilding of Equipment K Modification of Equipment L Technical Representative Services M Operation of Government‑Owned Facilities N Installation of Equipment P Salvage Services Q Medical Services R Professional, Administrative and Management Support Services S Utilities and Housekeeping Services T Photographic, Mapping, Printing, and Publication Services U Education and Training Services V Transportation, Travel and Relocation Services W Lease or Rental of Equipment X Lease or Rental of Facilities Y Construction of Structures and Facilities Z Maintenance, Repair or Alteration of Real Property
37
How do we define taxonomies in a wired world ?
Taxonomy: A classification of elements within a domain
Domain: a sphere of knowledge, influence, or activity
Classification: the operation of grouping elements and establishing relationships between them (or the product of that operation)
Relationships: a defined linkage between two elements
Element: an object or concept
Crandall, Mike.”Taxonomies for the Real World: The Business Imperative to Simply Content Access” TFPL Taxonomies for Business Conference, London, Oct.23, 2000.
38
What are Taxonomies Good For?Taxonomies are applied to: Items (aka resources) individual pieces of
information (documents, people...
By the use of:Metadata: (aka properties, attributes) information
describing types of data
Which may or may not use values from a:Vocabulary: selection of terms, classified or sorted
To create:Content: an item and its associated metadata
Crandall, Mike.”Taxonomies for the Real World: The Business Imperative to Simply Content Access” TFPL Taxonomies for Business Conference, London, Oct.23, 2000.
39
ChallengesInformation management across divisions of
your agencyAgency global intranets/Internet portalsGlobal or national document management
including technical documentationIncorporating taxonomy technology into agency
technology +info. policiesCost of building a taxonomyMoving a taxonomy from overhead to being a
core part of your agency’s information management.
40
More ChallengesCertification of the taxonomy by an
authoritative body.Finding common ground across multiple
taxonomies or schemas with similar terms and different meanings.
Ensuring the ongoing integrity of the taxonomy with constant maintenance.
Acceptance by developers of tagging tools.Integrating with a legacy system and
external content.
41
The core expertise required for constructing a taxonomy is:
Systems Analyst who understands specifications for creating taxonomies
Domain expert/Subject expert in the subject of the taxonomy
Computational linguist, AI engineerLinguist and/or LexicographerDatabase/Application Development ExpertAdministrative SupportReview Support
42
Example of a custom taxonomy marked up in xbrl:
<?xml version=”1.0" encoding=”utf-8"?><schema xmlns:xbrl=”http://www.xbrl.org/core/2000-07-31">
targetNamespace=”http://www.xbrl.org/us/gaap/ci/2000-07-31"> <import namespace=http://www.xbrl.org/core/2000-07-31/
schemaLocation=”http://www.xbrl.org/core/2000-07-31/ xbrl-meta-2000-07-31.xsd”/>
<element name=”propertyPlantAndEquipmentGrossNote.purchasedSoftwareForInternalUse” type=”monetary”> <annotation>
<documentation>this is software that...</documentation> <appinfo> <xbrl:rollup to=”ci:propertyPlantAndEquipmentNetNote.propertyPlantAndEquipmentGrossNote” weight=”1" order=”7.5" /> <xbrl:label xml:lang=”en”>Purchased software for internal use</xbrl:label> <xbrl:reference name=”GPSI” number=”73" chapter=”11" paragraph=”b” subparagraph=”i” /> </appinfo>
</annotation> </element></schema>
44
Recommendations: Actively seek out existing taxonomies in the target discipline or
subject area. If your needs are met in part by an existing taxonomy use it and build on it.
Look at the intended purpose of the taxonomy and select appropriate software tools.
Consider scalability of the taxonomy. Look at the big picture and see how the taxonomy will be able to hook into others.
Consider utilizing numerical taxonomy as a schema in the metadata in order to merge documents in foreign languages.
Accommodate new standards whenever possible. Document “Best Practices” while creating the taxonomy and
review them regularly. Maintain and update the taxonomy continually.
45
Your Agency
Taxonomy
Existing Taxonomy
in your Field
Related Taxonomy of other agency in same field
Related Taxonomy of other
agency hooked to one above
Electronic Document
in XML
Core Schema (Describes how
document is to be created)
Meta Model(Describes how
taxonomies are created)
46
Efficient Web information
retrieval systems
in the form of search engines
or Web portals
require continued support and
improvement of:
47
Web based classification and numerical taxonomic tools to use in
Web based cataloging tools such as CORC, which provides metadata based on
Taxonomies such as controlled vocabularies/thesauri which will be hooked together using
Metathesauri and standard information exchange systems such as MARC-XML
49
Knowledge Organization:Library Tools and
Taxonomies for the WebJan Herd [email protected]
Business Reference Services
Science, Technology & Business Division
The Library of Congress