Upload
gwen-johns
View
230
Download
0
Tags:
Embed Size (px)
Citation preview
Non-MARC Cataloging
Standards Overview:
TEI & EAD, MODS, METS, XML-based MARC
Non-MARC Cataloging
Standards Overview:
TEI & EAD, MODS, METS, XML-based MARCEric Childress
OCLC
Eric Childress
OCLC
February 10, 2003OCLC
OverviewOverview
• Fundamentals– Metadata and content – Types of metadata– Document mark-up languages & character encoding
• The Big Picture• Metadata formats:
– MARC– MODS– METS– MIX– TEI – EAD– ONIX
FundamentalsFundamentalsMetadata and content
3333Metadata linked to content object•MARC record with URL for ftp object
2222Metadata separate from content object•Book + catalog card•Book + MARC record
1111Metadata embedded in content object•Title page / CIP•HTML header in HTML document
4444
Metadata embedded and linked•MARC record with URL for HTML document•PDF document linked to DC-XML record
•Aggregation of discrete objects linked to record
FundamentalsFundamentalsTypes of metadata
Administrative metadata:•Data about the metadata
•(e.g. record number)
Descriptive metadata:•Description of the object for discovery and retrieval
•(e.g. Title)
Technical metadata:•Technical characteristics of the object
•(e.g. file size)
FundamentalsFundamentals
Markup languages:– Address the structure of a document– Convey instructions to software that will process text to:
• Index the text for searching• To render the text (e.g., for screen display or print) • Transform the text (e.g., for a voice synthesizer) for some output device(s)
– The markup is generally invisible to end-users
• Extensible Markup Language (XML):– XML is metalanguage: agencies define their own XML to suit their task by
creating Document Type Definitions (DTDs) or XML schema– Data separate from presentation instructions (recorded in a style sheet)– Offers just the right mix of flexibility and structure
Character encoding:– Used for communicating text characters in a computing environment– Hundreds of character encoding standards exist– Character conversion is complex and expensive
• Unicode: – A single, “comprehensive” global encoding standard– Includes characters from scripts of all major modern, most minor, and
selected ancient languages
Markup languages & Character encoding
The Big PictureThe Big PictureStandards in a grid
Rich D
escription
Sim
ple
Des
crip
tion
ItemCollections
Dublin Core
RSLP
OAI set record
TEI
VRA Core
ONIX MARC 8
CSDGM
Library-related standardsLibrary-related standards
• MARC 21 (ISO 2709) MARC 8: – Library metadata communications format based on ISO 2709– Strengths:
• Mature standard• Widely adopted by libraries (U.S., Canada, and beyond)• Large universe of records available• Wide choice of software vendors
– Weaknesses (in the present & future): • Virtually unused outside of libraries • Field and record size limitations• Restricted range of scripts supported (MARC 8 repertoire only)• Limited ability to convey hierarchical & complex relationships, attributes• No ability to embed related objects (e.g., book cover GIF)• Cannot be directly processed by widely-used web applications
• MARC 21 (ISO 2709) Unicode:– MARC 21 with Unicode character encoding– Limited to 16K characters equivalent to MARC 8 repertoire
MARC 21 (ISO 2709)MARC
Library-related standardsLibrary-related standards
MARC 21 and XML:– Library of Congress’ MARCXML:
• LC’s schema provides a lossless conversion of MARC 21 (ISO2709) to XML
• LC’s XML framework positions MARCXML as both an end format and as an intermediate format to non-MARC formats
– Stanford University’s Lane Medical School’s XMLMARC:• Developed before LC’s MARCXML schema • Ignores/simplifies some MARC 21 data
UNIMARC and XML:– Ministère de la culture et de la communication (France),
Board of Research and Technology• BiblioML DTD for converting UNIMARC to XML • Conversion tools in development
MARC and XMLMARC
« BiblioML »
Library-related standardsLibrary-related standards
• Metadata Object Description Schema (MODS) – Essentially MARC 21 recast in an XML-native framework
• Text-based tags rather than numeric ones, • Selected clusters of related MARC 21 attributes condensed into single MODS
element
– MARC 21 readily converts to MODS, but can’t do a lossless reverse conversion of MODS to MARC 21
• Value of MODS:– A rich, library-metadata-oriented XML metadata schema– Optimized for from-MARC conversion of legacy records– Selectively “improves” some of MARC’s mechanisms for representing
resource type– Well-suited as a metadata format for OAI harvesting– Maintained by the same agency (LC) that maintains MARC 21
• Applications of MODS:– LC planning to convert 100K American Memory records– Minerva project, U of Chicago Press, California Digital Library, others using
or planning to use for records for web sites, e-texts.
MODS
Library-related standardsLibrary-related standards
• Metadata Encoding and Transmission Standard (METS)– Standard for encoding descriptive, administrative, structural, rights and
other data essential for retrieving, preserving, and serving up digital resources
– Six modules (header, descriptive metadata, administrative metadata, file section, structural map, behavior section)
– Header and structural map are required; descriptive, administrative, behavior metadata may reside in METS object or be external.
• Value of METS:– Need for METS identified at DLF metadata experts meetings – varied local
approaches to non-descriptive metadata not scaling well nor supporting interoperability between agencies
– Can be used to collect digital resource metadata for submission to repository, hold metadata in the repository, inform user access applications
• Applications of METS:– LC using for moving images, audio recordings, folk life mixed media
collections– OCLC DPR, RLG, Harvard, National Library of Wales exploring or using for
variety of projects
METS
Library-related standardsLibrary-related standards
• Metadata for Images in XML (MIX)– Collaboration of LC and NISO Technical Metadata for Digital Still
Images Standards Committee– XML schema for a set of technical data elements required to
manage digital image collections– Format for interchange and/or storage of the data specified in the
NISO Draft Standard Data Dictionary: Technical Metadata for Digital Still Images (version 1.2)
– Still in early development and testing phases
• Value of MIX:– Provides a common XML schema for expressing technical data
particular to still and moving digital images– Can be used with other schema such as METS and MODS as part
of a comprehensive approach to managing and preserving digital images
• Applications of MIX:– OCLC DPR, LC, others planning or testing – MIX still in nascent stage of development and testing
MIX
E-text-related standardE-text-related standard
• Text Encoding Initiative (TEI):– For complex markup of literary texts– Both SGML & XML [new] DTDs available– TEI “header” (TEIH) can be used as a descriptive metadata record– Maintenance agency: TEI Consortium
• TEI Consortium has executive offices in Bergen, Norway, and is hosted at four university sites worldwide: the University of Bergen, Brown University, Oxford University, and the University of Virginia
• Consortium maintains “P4” Guidelines for Electronic Text Encoding and Interchange
• Value of TEI:– Designed to meet the needs of scholarly research community (esp.
in the humanities) for a variety of activities including:• Adding in-line academic commentary in e-texts• As an aid to research through supporting special indexing points, etc.
• Applications of TEI:– Widely used by major humanities electronic text collections such as
CETH, UVa e-text center, many others.
TEI
Archives-related standardArchives-related standard
• Encoded Archival Description (EAD)– A format for expressing electronic archival finding aids – Created by LC and the Society of American Archivists (SAA)– EAD DTD (Version 2002) is designed to function as both an SGML
and XML DTD
• Value of EAD: – Effectively an organized presentation of a collection of documents
• EAD header carries metadata for the finding aid• Provides for simple or complex mark-up to support varying levels of
indexing• Well-suited for interweaving narrative with links to specific objects in a
collection (either directly to the object or via a record for the object that may link to the object).
• Applications of EAD:– Conversion of existing paper finding aids to electronic form– Widely used by academic institutions and archives in North America– RLG Archival Resources database host copies of many EADs
EAD
Publishing-related standardPublishing-related standard
• ONIX International (Online Information Exchange):– Standard format for publishers to use to distribute electronic information
about their publications. – XML schema with Unicode encoding– Based on EPICS (EDItEUR Product Information Communication Standards) – Maintenance agency: EDItEUR working with input from the Book Industry
Communication (BIC) and the Book Industry Study Group (BISG)
• Value of ONIX:– Designed to meet needs of publishers, jobbers, retail sellers for
• richer book data online (including cover art)• a common data exchange format that will allow players to be rid of the burden of
costly, custom programming to handle data from individual suppliers
– Offers two levels of richness (level 1 & level 2)
• Applications of ONIX:– Primarily oriented towards jobbers and publishers – Most major players
(Amazon, Baker & Taylor, etc.) now using/supporting – Some interest in implementation in library systems
ONIXONIX
&QuestionsQuestionsAA
nswersnswers
LinksLinks
• MARC 21: http://lcweb.loc.gov/marc/marcdocz.html• MARCXML: http://www.loc.gov/marc/marcxml.html• XMLMARC: http://laneweb.stanford.edu:2380/wiki/medlane/xmlmarc• BiblioML (UNIMARC XML): http://www.culture.fr/BiblioML• MODS: http://www.loc.gov/standards/mods• METS: http://www.loc.gov/standards/mets• MIX: http://www.loc.gov/standards/mix• TEI: http://www.tei-c.org• EAD: http://www.loc.gov/ead• ONIX: http://www.editeur.org/onix.html
Further reading on MARCXML, MODS, METS:“New Metadata Standards for Digital Resources,” Bulletin of the
American Society for Information Science and Technology. Dec/Jan 2003, pp 12-15. http://www.asis.org/Bulletin/Dec-02/ASISTDecJan.pdf
Major emphasis in this presentation
LinksLinks
• SCORM: http://www.adlnet.org/index.cfm?fuseaction=scormabt• RSLP: http://www.ukoln.ac.uk/metadata/rslp• VRA Core: http://www.vraweb.org/vracore3.htm• IMS LOM: http://www.imsglobal.org/metadata• CSDGM: http://www.fgdc.gov/metadata/contstan.html• GEM: http://www.geminfo.org/Workbench• CIMI: http://www.cimi.org/old_site/standards
Also appearing (in Big Picture)