132
Metadata 101 Amy Benson NELINET, Inc. November 7, 2005

Metadata 101 Amy Benson NELINET, Inc. November 7, 2005

  • View
    217

  • Download
    1

Embed Size (px)

Citation preview

Page 1: Metadata 101 Amy Benson NELINET, Inc. November 7, 2005

Metadata 101

Amy Benson

NELINET, Inc.

November 7, 2005

Page 2: Metadata 101 Amy Benson NELINET, Inc. November 7, 2005

Overview

Terms and definitions– What (the heck) do all those acronyms mean?

Categories of metadata schemes and tools– How do they relate to each other?

Uses and functions– What do you do with them?

Staying power– Which ones do you really have to pay attention to?

Page 3: Metadata 101 Amy Benson NELINET, Inc. November 7, 2005

Standards

Increase interoperability Lower use and participation barriers Build larger communities of users which can

drive creation of a wider range of relevant services and tools (Windows vs Mac)

Improve chances of long term survival of materials

Prefer open over proprietary

Page 4: Metadata 101 Amy Benson NELINET, Inc. November 7, 2005

Categories

Metadata containers– XML, RDF

Metadata standards– MARC, MODS, DC, EAD, TEI, ONIX, FGDC, GILS

Metadata content standards Transmission standards and protocols

– METS, OAI, SOAP, Z39.50, SRW Identifiers

– URI, URL, PURL, URN, DOI, ISTC

Page 5: Metadata 101 Amy Benson NELINET, Inc. November 7, 2005

Metadata - What is it?

Data about data Information about any aspect of a resource -

size, location, attributes, topic, origin, use, audience, creator, quality, access rights, reviews… the list is endless

An aid to the discovery, identification, assessment, and management of described entities

Page 6: Metadata 101 Amy Benson NELINET, Inc. November 7, 2005

Types of Metadata

Descriptive– What is it?

Discovery– How can I find it?

Structural– What files comprise it?

Administrative– When was it created?

Page 7: Metadata 101 Amy Benson NELINET, Inc. November 7, 2005

Types of Metadata

Identifiers– How can I get to it?

Terms & conditions– Can I use it?

Preservation– Which key characteristics of the resource need to

be maintained?

Page 8: Metadata 101 Amy Benson NELINET, Inc. November 7, 2005

Metadata Terms

Structured metadata Extensibility

– Modify to suit local needs

Granularity– Level at which item or collection of items is described

Interoperability– Works with other systems– Share data across systems

Page 9: Metadata 101 Amy Benson NELINET, Inc. November 7, 2005

Metadata - Who needs it?

Impact of metadata on collection access– Without metadata there is no service to users– Metadata provides the means for resource

discovery, grouping, filtering, matching user needs– Keyword searching works only for resources that

are text-based - excludes photographs, data sets, objects, maps, audio, video…

Metadata itself as valuable content– Item descriptions, Finding aids, Reviews

Page 10: Metadata 101 Amy Benson NELINET, Inc. November 7, 2005

Metadata

Description vs. discovery– Full description is important for collection inventory and

management - less so for discovery– Full description of a resource includes much information

that will never be part of a user’s search key Deep vs. shallow

– Basic discovery metadata supports broad, cross-domain searching that can lead users to more complete search mechanisms and descriptions

Page 11: Metadata 101 Amy Benson NELINET, Inc. November 7, 2005

Interoperability

Interoperability allows different computer systems, networks, and software to work together and share information

Usually achieved by following standards Generally, an increase in specialization results

in a decrease in interoperability Allows different systems to make use of same

data

Page 12: Metadata 101 Amy Benson NELINET, Inc. November 7, 2005

Interoperability

Advantages– Can increase awareness and use of collections– Reduces geographic and domain-specific isolation

of collections– Creates new avenues for scholarship– Likely to assist / promote the longevity of data and

collections– Holy Grail = one-stop access to the universe of

online resources

Page 13: Metadata 101 Amy Benson NELINET, Inc. November 7, 2005

Interoperability

Disadvantages– Consensus– Compromise– Delays– Loss of independence– Uniformity– Increased implementation difficulties– Loss of specificity and detail

Worthy goal?

Page 14: Metadata 101 Amy Benson NELINET, Inc. November 7, 2005

Interoperability

NINCH (National Initiative for a Networked Cultural Heritage) Guide to Good Practice first two of its six core principles:

1. Optimize interoperability of materials

2. Enable broadest use

Page 15: Metadata 101 Amy Benson NELINET, Inc. November 7, 2005

Interoperability

Canadian Culture Online (CCO) Technical Standards and Guidelines– Technical requirements that CCO-funded projects

must meet – Six metadata elements are required when

describing objects to ensure interoperability title, creator, subject, date created, language, identifier

Page 16: Metadata 101 Amy Benson NELINET, Inc. November 7, 2005

XML

eXtensible Markup Language– Based on SGML - Standardized General Markup Language– Developed by WWW Consortium (W3C)– Open standard (non-proprietary)– Uses language tags, similar to HTML

<title>Gone with the wind</title>

A structure for storing and tagging information, without prescribing how the information is displayed or used

Page 17: Metadata 101 Amy Benson NELINET, Inc. November 7, 2005

XML

Data stored in XML can be of many types Its simple syntax is easy for machines to

process Natural language tags make XML

understandable to humans XML defines the syntax, but not the data

elements that make up an XML document

Page 18: Metadata 101 Amy Benson NELINET, Inc. November 7, 2005

XML

The structure of XML allows for hierarchical relationships – often necessary for complex documents, 3-D objects, archives, etc.

XML is extensible – an important feature that allows tags to be created by users or a community of users

XML-encoded data is easily transformed or re-purposed

Page 19: Metadata 101 Amy Benson NELINET, Inc. November 7, 2005

XML - Elements Example

<!DOCTYPE list [ <!ELEMENT list (book+)>

<!ELEMENT book (title, author*, date+, year, comment*, code*)>

<!ELEMENT title (#PCDATA)>

<!ELEMENT author (aulast*, aufirst*)>

<!ELEMENT aulast (#PCDATA)>

<!ELEMENT aufirst (#PCDATA)>

<!ELEMENT date (day*, month*)>

<!ELEMENT day (#PCDATA)>

<!ELEMENT month (#PCDATA)>

<!ELEMENT year (#PCDATA)>

<!ELEMENT comment (#PCDATA)>

<!ELEMENT code (#PCDATA)> ]>

Page 20: Metadata 101 Amy Benson NELINET, Inc. November 7, 2005

XML – Record Example

<book> <title>Weaving the Web</title>

<author><aulast>Berners-Lee,</aulast>

<aufirst>Tim</aufirst></author>

<date> <day>6</day>

<month>January</month></date>

<year>2002</year>

<comment>Interesting topic, but not too well written.</comment>

<code>nonfiction</code>

</book>

Page 21: Metadata 101 Amy Benson NELINET, Inc. November 7, 2005

XML - Partial list of ONIX elements

Page 22: Metadata 101 Amy Benson NELINET, Inc. November 7, 2005

RecipeML

Page 23: Metadata 101 Amy Benson NELINET, Inc. November 7, 2005

XML

Usually, tags, definitions, and requirements are defined and adhered to by a specific community– DTD (Document Type Definition)

Describes the permissible data structure for an XML file

– Schema Also describes the permissible data structure for an XML

file Newer, XML-based way to define XML document types

Page 24: Metadata 101 Amy Benson NELINET, Inc. November 7, 2005

XML DTDs and Schemas

DTDs and schemas– Lay out the logical structure of the data– Establish rules about which elements a document

may have, which are required, which can repeat, etc.– Establish a root element, parent and child elements,

and where data can be placed within hierarchy – DTDs can be placed within an XML file, or be external

to it, and then referenced– Schemas are external

Page 25: Metadata 101 Amy Benson NELINET, Inc. November 7, 2005

XML – Simple DTD Example

<!DOCTYPE list [ <!ELEMENT list (book+)>

<!ELEMENT book (title, author*, date+, year, comment*, code*)>

<!ELEMENT title (#PCDATA)>

<!ELEMENT author (aulast*, aufirst*)>

<!ELEMENT aulast (#PCDATA)>

<!ELEMENT aufirst (#PCDATA)>

<!ELEMENT date (day*, month*)>

<!ELEMENT day (#PCDATA)>

<!ELEMENT month (#PCDATA)>

<!ELEMENT year (#PCDATA)>

<!ELEMENT comment (#PCDATA)>

<!ELEMENT code (#PCDATA)> ]>

Page 26: Metadata 101 Amy Benson NELINET, Inc. November 7, 2005

XML – Ways to use XML

XML-encoded data is able to be re-purposed: re-used in multiple contexts

Due to its ability to be easily parsed, software can transform it in countless ways, thereby allowing:

Easy migration paths Alternative displays On-the-fly response to user needs

Transform XML for display via style sheets (XSL) and transformations (XSLT)

Page 27: Metadata 101 Amy Benson NELINET, Inc. November 7, 2005

XML - XSL

XML prescribes the structure of a document/record, but not content or display

XSL - eXtensible Stylesheet Language– XML uses stylesheets to display the code in user-

friendly ways– Use different stylesheets to render the data in

different ways– Similar to Cascading stylesheets used for HTML

Page 28: Metadata 101 Amy Benson NELINET, Inc. November 7, 2005

XML - XSLT

XML Stylesheet Language Transformations (XSLT)– A markup language and programming syntax for

processing XML – Is most often used to:

Transform XML to HTML for delivery to standard web clients

Transform XML from one set of XML tags to another

Page 29: Metadata 101 Amy Benson NELINET, Inc. November 7, 2005

XML File

Page 30: Metadata 101 Amy Benson NELINET, Inc. November 7, 2005

XML File Transformation

Page 31: Metadata 101 Amy Benson NELINET, Inc. November 7, 2005

XML vs Traditional Database Software

If your information is…– Tightly structured– Fixed field length– Massive numbers of individual items

You need a database

If your information is…– Loosely structured– Variable field length– Massive record size

You need XML

Page 32: Metadata 101 Amy Benson NELINET, Inc. November 7, 2005

XML Software

Software– XMLSpy: http://www.xmlspy.com/– XMetal: http://www.xmetal.com/– AxKit: http://axkit.org/– Cocoon: http://xml.apache.org/cocoon/

Used to– Assist with content authoring and coding– Apply dynamic transformations to XML content– Render HTML for standard web browsers, PDAs, cell

phones, etc.

Page 33: Metadata 101 Amy Benson NELINET, Inc. November 7, 2005

Namespaces

A namespace identifies a specific set of elements

Namespaces allow metadata terms to be unambiguously used across applications– Defines what ‘Date’ or ‘Title’ means in a specific

usage, or namespace

Each namespace has a unique identifier associated with it

Page 34: Metadata 101 Amy Benson NELINET, Inc. November 7, 2005

Namespaces - Example

<dc:DC xmlns:dc='http://purl.org/dc/elements/1.1/'>

<dc:title>Internet Ethics</dc:title> <dc:creator>Duncan Langford</dc:creator> <dc:format>Book</dc:format> <dc:identifier>ISBN 0333776267</dc:identifier>

Page 35: Metadata 101 Amy Benson NELINET, Inc. November 7, 2005

Namespaces - Example

<d:studentxmlns:s='http://www.develop.com/student' ' xmlns:w='urn:schemas.develop.com:workshop'> <s:id>3235329</s:id> <s:name>Jeff Smith</s:name> <w:name>Emerging Metadata Topics</w:name> <s:institution>XNL</s:institution>

</d:student>

Page 36: Metadata 101 Amy Benson NELINET, Inc. November 7, 2005

Resource Description Framework (RDF)

A structured framework for multiple resource description schemas

Problem: data providers offer well organized repositories of metadata, but use different description systems

Solution: RDF - a way for machines to understand multiple description systems or metadata schemas and the relationship(s) between them

Page 37: Metadata 101 Amy Benson NELINET, Inc. November 7, 2005

RDF

Allows interoperability among multiple resource description methods– Communities define and state their metadata schemas in

XML documents– Systems use the definitions and statements to “understand”

the metadata In practice the element sets are namespaces which are

“called” or “stated” within RDF RDF schemas “owned” by known groups provide basis

for trusted metadata

Page 38: Metadata 101 Amy Benson NELINET, Inc. November 7, 2005

RDF Example

Page 39: Metadata 101 Amy Benson NELINET, Inc. November 7, 2005

MARC

Advantages– Rich set of descriptive elements– Highly interoperable within library community– Long, established history

Disadvantages– Low extensibility– As is, not interoperable beyond the library world– Weak on administrative, rights, and other kinds of

metadata important for digital resources

Page 40: Metadata 101 Amy Benson NELINET, Inc. November 7, 2005

MARC

Future of MARC– Must MARC die? No. New life through XML

MARC XML from the Library of Congress (LC) MODS: a version of MARC encoded in XML,

developed by the Library of Congress Crosswalks between MARC and many other

metadata schemas already exist

Page 41: Metadata 101 Amy Benson NELINET, Inc. November 7, 2005

MARC XML

LC has developed a MARC XML schema, stylesheets, and tools

The schema allows representation of a complete MARC record in XML– Lossless conversion

Will support new transformations to new uses of MARC data– MARC to MARCXML to Dublin Core and MODS

Page 42: Metadata 101 Amy Benson NELINET, Inc. November 7, 2005

Metadata Object Description Schema (MODS)

Set of 20 bibliographic elements - a subset of the MARC 21 Format for Bibliographic Data

Not as complete as the full MARC format, but richer than Dublin Core (for example)

Highly interoperable with existing MARC records Uses language-based tags, rather than numbers like

MARC 21 (245, 650, etc.) Under development by the LC Network Development

and MARC Standards Office

Page 43: Metadata 101 Amy Benson NELINET, Inc. November 7, 2005

MODS

XML-based– Intended to work with/complement other metadata

formats

Can be used for conversion of existing MARC records or to create new resource description records

Useful particularly for library applications that want to go beyond the OPAC

Shares features of MARC and Dublin Core

Page 44: Metadata 101 Amy Benson NELINET, Inc. November 7, 2005

MODS Elements

TitleInfo Name TypeOfResource Genre PublicationInfo Language PhysicalDescription Abstract TableOfContents TargetAudience

Note Cartographics Subject Classification RelatedItem Identifier Location AccessCondition Extension RecordInfo

Page 45: Metadata 101 Amy Benson NELINET, Inc. November 7, 2005

MODS Elements

Title element is mandatory, all others are optional

Elements can have subelements and attributes which provide refining detail for the element

Elements and sub-elements are repeatable, except in certain cases

Elements display in any order

Page 46: Metadata 101 Amy Benson NELINET, Inc. November 7, 2005

MODS Example

Page 47: Metadata 101 Amy Benson NELINET, Inc. November 7, 2005

MODS Implementation

MODS User Guidelines– http://www.loc.gov/standards/mods/registry.html

MODS Implementation Registry Contains descriptions of MODS projects

planned, in progress, and fully implemented– http://www.loc.gov/standards/mods/registry.html

Page 48: Metadata 101 Amy Benson NELINET, Inc. November 7, 2005

Dublin Core (DC)

A method of describing resources intended to facilitate the discovery of electronic resources

Designed to allow simple description of resources by non-catalogers as well as specialists

National and International standard– ANSI/NISO standard Z39.85-2001– ISO standard 15836

Includes 15 “core” elements

Page 49: Metadata 101 Amy Benson NELINET, Inc. November 7, 2005

Dublin Core Elements

Title Creator Subject Description Publisher Contributor Date Type

Format Identifier Source Language Relation Coverage Rights

Page 50: Metadata 101 Amy Benson NELINET, Inc. November 7, 2005

Dublin Core

All elements optional and repeatable Elements display in any order Authority control not required Simple and Qualified DC Extensible Flexible International

Page 51: Metadata 101 Amy Benson NELINET, Inc. November 7, 2005

Dublin Core

Simple– Lowest common denominator– Less rich– Discovery role – leads to resource or more complete

description of resource

Qualified– More precise– Less interoperable

Page 52: Metadata 101 Amy Benson NELINET, Inc. November 7, 2005

Dublin Core Examples

Generic

Title=“The sound of music” HTML

<meta name = "DC.Title" content = “The sound of music”>

XML<?xml version="1.0"?> <metadata

xmlns:dc="http://purl.org/dc/elements/1.1/"><dc:title> The Sound of Music</dc:title> </metadata>

Page 53: Metadata 101 Amy Benson NELINET, Inc. November 7, 2005

Dublin Core Examples - HTML

Page 54: Metadata 101 Amy Benson NELINET, Inc. November 7, 2005

Dublin Core Examples - XML

Page 55: Metadata 101 Amy Benson NELINET, Inc. November 7, 2005

DC Record in OCLC Connexion

Page 56: Metadata 101 Amy Benson NELINET, Inc. November 7, 2005

Other Metadata Standards

Encoded Archival Description (EAD) Text Encoding Initiative (TEI) Visual Resources Association (VRA) Global Information Locator Service (GILS) Online Information Exchange (ONIX) Content Standards for Digital Geospatial

Metadata (CSDGM) aka FGDC Document Data Initiative (DDI)

Page 57: Metadata 101 Amy Benson NELINET, Inc. November 7, 2005

ONline Information eXchange (ONIX)

Developed and maintained by EDItEUR jointly with Book Industry Communication and the Book Industry Study Group

ONIX is the international standard for representing and communicating book industry product information in electronic form

XML-based

Page 58: Metadata 101 Amy Benson NELINET, Inc. November 7, 2005

ONIX

Highly focused on e-commerce of books ONIX was developed as a solution to two

perceived problems– (1) The need for richer book data online to improve

sales– (2) the widely varying format requirements of the

major book wholesalers and retailers - interoperability May appear in future library applications

Page 59: Metadata 101 Amy Benson NELINET, Inc. November 7, 2005

CSDGM / FGDC

Primary standard for geospatial metadata All federal agencies are required to produce

and collect geospatial data in this format Allows for very detailed description

– 334 different metadata elements

Tremendous potential uses Challenge is to establish interoperability with

other metadata standards

Page 60: Metadata 101 Amy Benson NELINET, Inc. November 7, 2005

Metadata for Images in XML - MIX

A XML-based set of technical data elements required to manage digital image collections

Encodes information such as image source, compression scheme, & image editing software

Currently being developed by LC and the NISO Technical Metadata for Digital Still Images Standards Committee

Draft 0.2 available for review and comment– http://www.loc.gov/standards/mix/

Page 61: Metadata 101 Amy Benson NELINET, Inc. November 7, 2005

Document Data Initiative (DDI)

International, XML-based standard for the content, presentation, transport, and preservation of documentation for datasets in the social and behavioral sciences

Creating appropriate metadata will enable effective, efficient, and accurate use of the datasets

http://www.icpsr.umich.edu/DDI/codebook/

Page 62: Metadata 101 Amy Benson NELINET, Inc. November 7, 2005
Page 63: Metadata 101 Amy Benson NELINET, Inc. November 7, 2005
Page 64: Metadata 101 Amy Benson NELINET, Inc. November 7, 2005
Page 65: Metadata 101 Amy Benson NELINET, Inc. November 7, 2005
Page 66: Metadata 101 Amy Benson NELINET, Inc. November 7, 2005
Page 67: Metadata 101 Amy Benson NELINET, Inc. November 7, 2005
Page 68: Metadata 101 Amy Benson NELINET, Inc. November 7, 2005
Page 69: Metadata 101 Amy Benson NELINET, Inc. November 7, 2005
Page 70: Metadata 101 Amy Benson NELINET, Inc. November 7, 2005

Crosswalks

Crosswalks map an element from one scheme to its closest equivalent in another scheme– Example: MARC 1XX field is mapped to DC ‘creator’

Instrumental for converting data in one format to another format - one that is potentially more widely accessible

Support the demand for cross-domain searching and interoperability

Page 71: Metadata 101 Amy Benson NELINET, Inc. November 7, 2005

Crosswalks

There is rarely a one-to-one correlation between elements of different schemes– One to many - DC to MARC– Many to one or none - MARC to DC– None to one or many

MARC to DC– http://www.loc.gov/marc/marc2dc.html#unqualif

Page 72: Metadata 101 Amy Benson NELINET, Inc. November 7, 2005

Content Standards

AACR (Anglo-American Cataloguing Rules)– “The rules cover the description of, and the

provision of access points for, all library materials commonly collected at the present time.”

– The current text is the 2nd ed, 2002 Revision (with 2003, 2004, and 2005 updates)

– The Joint Steering Committee for Revision of AACR (JSC) is working on a new code, “RDA: Resource Description and Access” scheduled to be published in 2008

Page 73: Metadata 101 Amy Benson NELINET, Inc. November 7, 2005

Content Standards

International Standard Bibliographic Description (ISBD)– A family of standards to regularize the form and

content of bibliographic descriptions– Available for different material types: monographs,

computer files, etc.– Designed to promote record sharing and exchange

Page 74: Metadata 101 Amy Benson NELINET, Inc. November 7, 2005

Content Standards

Book Industry Standards And Communications (BISAC)– Metadata Committee has the responsibility for the

continued development and maintenance of ONIX for Books in North America developed Metadata Best Practices document

– Intended as a response to the question, “I’ve downloaded the ONIX documentation. Now what?”

– http://www.bisg.org/docs/Best_Practices_Document.pdf

Page 75: Metadata 101 Amy Benson NELINET, Inc. November 7, 2005

Content Standards

Describing Archives: A Content Standard (DACS)– Designed to facilitate consistent, appropriate, and

self-explanatory description of archival materials and creators of archival materials

– Replaces Archives, Personal Papers, and Manuscripts (APPM)

Page 76: Metadata 101 Amy Benson NELINET, Inc. November 7, 2005

Content Standards

Western States Dublin Core Metadata Best Practices– Provide guidelines for creating metadata records for

digitized cultural heritage resources– Element set based on Dublin Core– http://www.cdpheritage.org/resource/metadata/wsdcmbp/

Page 77: Metadata 101 Amy Benson NELINET, Inc. November 7, 2005
Page 78: Metadata 101 Amy Benson NELINET, Inc. November 7, 2005

Content Standards

Cataloging Cultural Objects (CCO)– Provides guidelines for selecting, ordering, and

formatting data used to populate catalog records– Designed to promote good descriptive cataloging,

shared documentation, and enhanced end-user access

– Feb. 2005 draft available for review– A project of the Visual Resources Association– http://www.vraweb.org/ccoweb/

Page 79: Metadata 101 Amy Benson NELINET, Inc. November 7, 2005

Content Standards

Descriptive Metadata Guidelines for RLG Cultural Materials– Designed to help institutions with decision making

about metadata for online access to collections– Can be used to create or review local best practice

in describing collections of cultural objects, regardless of the specific metadata standard used

– http://www.rlg.org/en/pdfs/RLG_desc_metadata.pdf

Page 80: Metadata 101 Amy Benson NELINET, Inc. November 7, 2005

Application Profiles

Elements from one or more metadata standards combined to suit the needs of a specific community

May also include usage guidelines– Example: Title element is required

A Library Application Profile for Dublin Core is under development– Working draft is available from the DCMI web site

Page 81: Metadata 101 Amy Benson NELINET, Inc. November 7, 2005

Authority Control Anyone?

Recommended, but not required by many schemas

Librarians know its value Controlled vocabularies: LCSH Thesauri

– Getty Art & Architecture Thesaurus; LC Thesaurus for Graphic Materials I & II

Pre-set searches

Page 82: Metadata 101 Amy Benson NELINET, Inc. November 7, 2005

FAST

Faceted Application of Subject Terminology (FAST) LCSH is by far the most commonly used and widely

accepted subject vocabulary for general application Need for a new approach to subject vocabulary for

electronic resources Easy to maintain and amenable to automatic

authority control and computer manipulation

Page 83: Metadata 101 Amy Benson NELINET, Inc. November 7, 2005

FAST

Maintains upward compatibility with LCSH, and any valid set of LC subject headings can be converted to FAST headings

Retains the advantages of a controlled vocabulary– Most LCSH headings are synthesized by catalogers

based on rules– For FAST, all headings (except chronological) are

established and only established headings can be assigned

Page 84: Metadata 101 Amy Benson NELINET, Inc. November 7, 2005

Faceting of LCSHFaceting of LCSH

FA

ST

648 1775 - 1783650 American loyalists650 Revolution (United States, 1775-1783)650 Secret service650 Painters651 England651 United States651 Great Britain655 Biography655 History

650 American loyalists $z England.651 United States $x History $y Revolution, 1775-1783 $v

Biography.650 Secret service $z Great Britain.650 Painters $z United States.

LCS

H

Page 85: Metadata 101 Amy Benson NELINET, Inc. November 7, 2005

Authority Control: FAST vs. LCSH

LCSH FAST

Many headings are established; most assigned headings are synthesized by catalogers based on rules

All headings (except chronological) are established

Very large number (billions plus) of possible headings

Faceting limits the number of possible headings to a few million

Most headings are distinct (based on NACO normalization rules*); some conflicts occur particularly with $x & $v

All headings are distinct; tagging and subfield coding provides no unique information

*http:\\www.loc.gov/catdir/pcc/naco/normrule.html

Page 86: Metadata 101 Amy Benson NELINET, Inc. November 7, 2005

Metadata Encoding & Transmission Standard (METS)

A system for packaging metadata necessary for both the management of digital library objects within a repository and the exchange of such objects between repositories, or between repositories and their users

Used for: Digital collection repositories Developed by the Digital Library Federation

(DLF) and Library of Congress (LC)

Page 87: Metadata 101 Amy Benson NELINET, Inc. November 7, 2005

Metadata Encoding & Transmission Standard (METS)

METS can be understood as a binder that unites metadata about a particular resource

A METS record includes six parts:– Header– Descriptive metadata– Administrative metadata– File groups– Structural map– Behavior section

Page 88: Metadata 101 Amy Benson NELINET, Inc. November 7, 2005

100 Pixel GIF

800 Pixel JPG

1400 Pixel JPG

2000 Pixel JPG TIFF PDF TEI MrSid AIFF

Whole DocumentPage 1Page 2Page 3Page 4

Object Components(21 Files and counting…)

Page 89: Metadata 101 Amy Benson NELINET, Inc. November 7, 2005

METS Schema

m etsHdr(M E TS

H ead er)

dm dSec(D esc rip t iveM etad a ta )

am dSec(A d m in s tra tive

M etad a ta )

fileSec(F iles )

structM ap(S tru c tu re )

behaviorSec(V iew ers )

MET S

Page 90: Metadata 101 Amy Benson NELINET, Inc. November 7, 2005

Open Archives Initiative (OAI)

A tool that supports interoperability among multiple databases

OAI goal: coarse-granularity resource discovery

OAI handles simple discovery from multiple community-specific repositories with metadata crosswalked to unqualified Dublin Core

Page 91: Metadata 101 Amy Benson NELINET, Inc. November 7, 2005

OAI

Roots are in the science community interested in locating and searching multiple repositories of pre- and e-prints of scientific papers

Not really an archive, the way we traditionally think of the word

Page 92: Metadata 101 Amy Benson NELINET, Inc. November 7, 2005

OAI

Data providers expose (make available) the metadata for their collections

Service providers harvest the exposed metadata and aggregate it (so that one search does it all) and/or provide additional services related to the harvested metadata, such as providing easy access to recent additions, updated materials, pre-set searches, etc.

Page 93: Metadata 101 Amy Benson NELINET, Inc. November 7, 2005

OAI

OAI Protocol for Metadata Harvesting– Metadata content must be encoded in XML and

have a corresponding XML schema for validation– Metadata must be supplied in unqualified Dublin

Core format, at least– Other metadata formats are optional– Metadata may optionally include a link to the actual

content / resource

Page 94: Metadata 101 Amy Benson NELINET, Inc. November 7, 2005

OAI Infrastructure

repository

repository

repository

repository

Harvester

Service Provider

DC

DC

DC

DC

DC

Page 95: Metadata 101 Amy Benson NELINET, Inc. November 7, 2005

OAI Infrastructure

user

Repository

search

Page 96: Metadata 101 Amy Benson NELINET, Inc. November 7, 2005

OAI Infrastructure

user

Repository

search

repository

Page 97: Metadata 101 Amy Benson NELINET, Inc. November 7, 2005

OAI Harvesters - Examples

Registered OAI Service Providers– http://www.openarchives.org/service/listproviders.html

OAIster– http://oaister.umdl.umich.edu/o/oaister/

Page 98: Metadata 101 Amy Benson NELINET, Inc. November 7, 2005

OAI - Advantages

Data providers – more exposure of, and therefore, ideally, more access to one’s data

Overcome the geographical and domain-specific isolation that can occur

Service providers – more data in one place is of value to users

Service providers may offer additional services beyond increased access: prints, rights negotiation, etc.

Page 99: Metadata 101 Amy Benson NELINET, Inc. November 7, 2005

Simple Object Access Protocol (SOAP)

A protocol that defines how to request services, objects, and information in a platform-independent manner using HTTP and XML

The main goal of SOAP is to facilitate interoperability between systems that need to interact– Can run applications as if local user

Used for: Web services & e-commerce

Page 100: Metadata 101 Amy Benson NELINET, Inc. November 7, 2005

Z39.50

Z39.50 is a search and retrieval protocol, maintained by LC, capable of operating over TCP/IP

Negotiates queries with multiple, separate databases – does not harvest + create new db

Built in to some library software systems OAI not intended to replace other approaches, but

to provide an easy-to-use alternative for different constituencies and purposes

Page 101: Metadata 101 Amy Benson NELINET, Inc. November 7, 2005

Search/Retrieve Web Service

The primary function of SRW is to allow a user to search remote databases of records

Protocol uses easily available technologies -- XML, SOAP, HTTP, URI -- to perform tasks traditionally done using proprietary solutions such as database queries and responses

Builds on Z39.50 and moves it forward– ZING: Z39.50 International: Next Generation

Page 102: Metadata 101 Amy Benson NELINET, Inc. November 7, 2005

Functional Requirements for Bibliographic Records (FRBR)

A study by IFLA (International Federation of Library Associations) of the full range of functions performed by the bibliographic record– What do we use bibliographic records for?

Description, access, location, identification, annotations ...

The report provides a framework for the nature of and uses for bibliographic records

A conceptual model that can be used as a means to meet user needs and expectations

Page 103: Metadata 101 Amy Benson NELINET, Inc. November 7, 2005

Functional Requirements for Bibliographic Records (FRBR)

Tasks we use bibliographic records for:– Finding– Identifying– Selecting– Obtaining access to resources

FRBR should allow systems to handle bibliographic data in new, useful ways that fulfill these tasks

Page 104: Metadata 101 Amy Benson NELINET, Inc. November 7, 2005

Functional Requirements for Bibliographic Records (FRBR)

Conceptual model of relationships between bibliographic entities

Hierarchical relationships– Work

The intellectual product

– Expression An ‘expression’ of the parent work such as a translation,

edition, revisions, annotated text, etc. – Expressions entail additional intellectual effort

Page 105: Metadata 101 Amy Benson NELINET, Inc. November 7, 2005

Functional Requirements for Bibliographic Records (FRBR)

Hierarchical relationships– Manifestation

Published runs of each expression in multiple formats over time

The level at which we traditionally create a catalog record

– Item Each copy of a specific manifestation Circulation records track items

Page 106: Metadata 101 Amy Benson NELINET, Inc. November 7, 2005
Page 107: Metadata 101 Amy Benson NELINET, Inc. November 7, 2005

Functional Requirements for Bibliographic Records (FRBR)

OCLC is researching the application of FRBR to WorldCat– “FRBRization”

They have created an algorithm that groups records automatically based on the Work/Expression/Manifestation/Item model

http://www.oclc.org/research/projects/frbr/algorithm.htm

Page 108: Metadata 101 Amy Benson NELINET, Inc. November 7, 2005

OCLC & FRBR

OCLC Research has developed algorithm to build FRBR “work” sets using author/title keys

Fiction Finder Project: Research team mined record content from all records for fiction materials in WorldCat, applied FRBR algorithm to yield

– An enriched record view for every work of fiction represented in WorldCat

– Better search results displays for WorldCat fiction records including links to groups of related WorldCat records by language, format, manifestation/edition, etc.

Page 109: Metadata 101 Amy Benson NELINET, Inc. November 7, 2005

xISBN

A web service that takes as input an ISBN and returns a list of other ISBNs of associated intellectual works

Developed by OCLC’s Office of Research Results intended for use by computer systems

to generate new searches such as in OPAC

Page 110: Metadata 101 Amy Benson NELINET, Inc. November 7, 2005

RLG’s RedLightGreen

Search interface for the RLG union catalog of 126 million bibliographic records representing 42 million titles

FRBR-esque implementation– Uses FRBR concepts such as Work, Expression and

Manifestation for record clusters

Designed for the web-savvy undergraduate Offers filtering and grouping of search results

– http://www.redlightgreen.com

Page 111: Metadata 101 Amy Benson NELINET, Inc. November 7, 2005
Page 112: Metadata 101 Amy Benson NELINET, Inc. November 7, 2005

Identifiers

Four potential purposes– Locator

Where is the document I seek?

– Identifier Unique label for a resource

– Gatherers Groups like resources similar to a uniform title

– Differentiator Helps identify different versions of same resource

Page 113: Metadata 101 Amy Benson NELINET, Inc. November 7, 2005

Identifiers

Uniform Resource Identifiers (URI) – Generic set of all names/addresses that refer to

resources on the Web including: Uniform Resource Locator (URL) Persistent Uniform Resource Locator (PURL) Uniform Resource Name (URN)

OpenURL DOI ISTC

Page 114: Metadata 101 Amy Benson NELINET, Inc. November 7, 2005

Uniform Resource Locator (URL)

Web address or location at which a resource is held, not an identifier for the resource itself

Most common way to locate documents / items on the Web (http, ftp, mailto, etc.)

Not particularly stable or permanent– Error 404: File not Found

No metadata, but important starting point as we look at some of the related technologies

Page 115: Metadata 101 Amy Benson NELINET, Inc. November 7, 2005

Persistent Uniform Resource Locator (PURL)

PURL Service is managed by OCLC Functionally, a PURL is a URL The PURL remains constant even if the URL

changes - its function is to automatically re-direct a user to the current URL

PURL system/resolver is updated by resource manager to reflect any changes to location of the file, or URL

Page 116: Metadata 101 Amy Benson NELINET, Inc. November 7, 2005

PURLs

PURLs can be used both in documents and in cataloging systems

PURLs increase the probability of correct resolution and long-term access to resources

Use of PURLs can reduce the burden and expense of catalog maintenance (and business card printing)

Page 117: Metadata 101 Amy Benson NELINET, Inc. November 7, 2005

PURL - Example

US Government is a big user of PURLs– http://www.ccny.cuny.edu/library/Divisions/

Government/iraqbib.html

Page 118: Metadata 101 Amy Benson NELINET, Inc. November 7, 2005

OpenURL

OpenURL = context-sensitive linking OpenURL is a method of transporting metadata

and identifiers within URLs to allow for the delivery of context-sensitive services

For example, a URL can carry with it information such as author / title from a previous search to allow a system to re-execute a search in a second database without re-entry of the data by the user

Page 119: Metadata 101 Amy Benson NELINET, Inc. November 7, 2005

OpenURL Metadata

Page 120: Metadata 101 Amy Benson NELINET, Inc. November 7, 2005

OpenURL Example

OpenURL incorporates data from a citation search

Embeds metadata such as ISSN, date, volume number, pages, etc. in an OpenURL

A valid OpenURL incorporating the metadata: http://sfx.library.yale.edu/sfx_local?sid=Entrez:PubMed&id=pmid:16135848

Page 121: Metadata 101 Amy Benson NELINET, Inc. November 7, 2005
Page 122: Metadata 101 Amy Benson NELINET, Inc. November 7, 2005

Uniform Resource Name (URN)

Uniform Resource Names (URNs) are intended to serve as persistent, location-independent resource identifiers

Globally unique Never change Format

– urn:<namespace identifier>:<namespace specific string>

Use a resolver system to indicate current location of resource

Page 123: Metadata 101 Amy Benson NELINET, Inc. November 7, 2005

Digital Object Identifier (DOI)

Overseen by the International DOI Foundation DOIs are persistent, location-independent

identifiers of resources Developed to enable management of

copyrightable materials in an electronic environment (locate, buy, sell, track, license)

Specific type / implementation of a URN

Page 124: Metadata 101 Amy Benson NELINET, Inc. November 7, 2005

DOI

A two-part number with a prefix identifying the original publisher and a suffix identifying the specific work– Similar to the ISBN

A DOI resolution request for a specific resource would return one or more URLs - *locations* where a user could obtain access to the resource– Appropriate copy: online, text, free, illustrated, etc.

Page 125: Metadata 101 Amy Benson NELINET, Inc. November 7, 2005

DOI

Applications of the DOI will require metadata The basis of the DOI metadata scheme is a

minimal "kernel" of elements DOI minimal kernel elements of metadata:

– DOI, DOI genre, identifier, title, type, origination, primary agent, agent role, and administrative data such as registrant, and date of registration

Page 126: Metadata 101 Amy Benson NELINET, Inc. November 7, 2005

International Standard Text Work Codes (ISTC)

Type of URN Persistent and unique identifiers for textual

works – abstract, conceptual entities rather than specific bibliographic manifestations

International Standard Codes are also being developed for Audiovisual Works (ISAN) and Musical Works (ISWC)

Emerging ISO standard

Page 127: Metadata 101 Amy Benson NELINET, Inc. November 7, 2005

ISTC

ISTC Registration Authority will be managed by a consortium comprised of CISAC, Nielsen BookData, and R.R. Bowker Inc.

ISTCs will be assigned by the Registration Authority and Regional Agencies

ISTCs can and will be assigned to works retrospectively

Each registered work must include basic metadata such as author, title, subject (ONIX)

Page 128: Metadata 101 Amy Benson NELINET, Inc. November 7, 2005

ISTC

Similar to ISBN, but focused on the work versus the manifestation– Madame Bovary, Chez Gallimard, 2001

207041311X

– Madame Bovary, Penguin, 2001 0140448187

– Two ISBNS, one single ISTC for the work, Madame Bovary

Page 129: Metadata 101 Amy Benson NELINET, Inc. November 7, 2005

ISTC

The ISTC will allow computer systems to bring together all manifestations of an intellectual work

What’s the point?– As multiple versions of books, documents, articles

proliferate, systems need a way to control presentation and access to users who generally don’t care about the difference between the Penguin 2001 edition and the Signet Classic 2001 edition

Page 130: Metadata 101 Amy Benson NELINET, Inc. November 7, 2005

Semantic Web

The mother of all metadata projects, under development by the W3C

An extension of the current Web in which information is given well-defined meaning, understandable to people and computers

This in turn, provides better integration of existing information on the Web

Key components: URIs, XML, RDF

Page 131: Metadata 101 Amy Benson NELINET, Inc. November 7, 2005

Summary

Planning and goal setting are two important factors for successful metadata implementation

Stick with open standards (non-proprietary), where possible

Keep an eye on XML, DC, OAI, METS - but don’t quote me

Page 132: Metadata 101 Amy Benson NELINET, Inc. November 7, 2005

Questions?

Amy Benson

Program Director

NELINET Digital Services

NELINET, Inc.

[email protected]

508.597.1937

800.635.4638 x1937