Metadata and electronic information Michael Day UKOLN: The UK Office for Library and Information...

Preview:

Citation preview

Metadata and electronic information

Michael Day

UKOLN: The UK Office for Library and Information

Networking, University of Bath

http://www.ukoln.ac.uk/

m.day@ukoln.ac.uk

Metadata and electronic information

Michael Day

UKOLN: The UK Office for Library and Information

Networking, University of Bath

Final CIRCE Workshop, The Council House,

Birmingham, 15 January 1999.

3

Presentation Outline

• Metadata - some definitions• Metadata formats• The resource discovery context

– Dublin Core– Resource Description Framework (RDF)

• Interoperability• Other metadata applications

4

Metadata: definitions (1)

Metadata = “data about data”

“… the Internet-age term for structured data about data” - Joint NSF-EU Working Group on Metadata (1998)

“… structured data about data that imposes order on a disordered information universe” - Carl Lagoze (Cornell University)

5

Metadata: definitions (2)

“… machine understandable information about web resources or other things” - Tim Berners-Lee (World Wide Web Consortium)

Roles:• Provides information about resources• Supports operations carried out on

information objects

6

Metadata formats

Diversity of metadata formats and frameworks, e.g.:

• Dublin Core• EAD, CIMI, TEI • PICS, RDF• MARC• GILS, FGDC• ROADS

http://www.ukoln.ac.uk/metadata/glossary/

7

Some examples (1)

USMARC:

245 00 Wordnews online $h [computer file].

246 3 World news online

256 Computer online service.

260 Washington, D.C. : $b Worldnews Online, $c [1995-

538 Mode of access: Internet.

500 Title from title frame.

520 “WorldNews OnLine is a service … “

650 0 Newspapers $x Databases.

856 7 $u http://worldnews.net $2 http

Extract from: Nancy B. Olson, ed., Cataloguing Internet resources: a manual and practical guide, 2nd ed. Dublin, Ohio: OCLC Online Computer Library Center, 1997.

http://www.purl.org/oclc/cataloging-internet

8

Some examples (2)

TEI header:<teiHeader type="aacr2"><fileDesc><titleStmt>

<title type="245">Rubaiyat of Omar Khayyam : the astronomer

poet of Persia / rendered into English verse by Edward

Fitzgerald ; with drawings by Florence Lundborg</title>

<title type="gmd">[electronic resource]</title>

<author>Omar Khayyam</author> [...]

<respStmt>

<resp>Creation of machine-readable version:</resp>

<name>Stephen Ramsay, Electronic Text Center</name>

<resp>Conversion to TEI.2-conformant markup:</resp>

<name>University of Virginia Library Electronic Text Center

</name>

</respStmt> [...]

From: University of Virginia Library, Cataloging Services Department, Cataloging

Procedures Manual, Chapter XII. Charlottesville, Va.: University of Virginia Library,1996-98.

http://www.lib.virginia.edu/cataloging/manual/chapters/chapxiib.html

9

Some examples (3)

IAFA template:Template-Type: SERVICE

Handle: 871473886-23884

Title: Wellcome Unit for the History of Medicine

URI-v1: http://units.ox.ac.uk/cgi-bin/safeperl/wuhminfo/p?home.html

Admin-Email-v1: wuhmo@wuhmo.ox.ac.uk

Publisher-Name-v1: Wellcome Unit for the History of Medicine

Publisher-Postal-v1: 45-47 Banbury Road, Oxford, OX2 6PE

Publisher-City-v1: Oxford

Description: The home page of the Wellcome Unit for the History of Medicine, a sub-department of the Modern History Faculty of the University of Oxford, this site provides information on the Unit, seminars, conferences and workshops, research interests, staff, current projects, and the graduate programmes.

Keywords: History of Medicine; Medicine

Language-v1: English

Subject-Descriptor-v1: WZ40 History of Medicine

Subject-Descriptor-Scheme-v1: NLM

Record-Last-Modified-Date: Fri, 10 Oct 1997 19:09:16 +0000

Record-Last-Modified-Email: cataloguer@omni.ac.uk

Record-Created-Date: Fri, 10 Oct 1997 19:09:16 +0000

Record-Created-Email: cataloguer@omni.ac.uk

10

A metadata typology

Simple Rich

Adapted from: Lorcan Dempsey and Rachel Heery, “Metadata: a current view of

practice and issues”, Journal of Documentation, vol. 54, no.2, March 1998,

pp. 145-172.

Band One Band Two Band Three

(full textindexes)

(simplestructuredgenericformats)

(more complexstructure,domainspecific)

(part of largersemanticframework)

Proprietaryformats

ProprietaryformatsDublin CoreROADSIAFA/Whois++templates

FGDCMARC

TEI headersICPSREADCIMI

11

Resource discovery

Approaches to Internet resource discovery:• Robot-based global indexes, e.g. Alta Vista,

Lycos, etc. • Subject gateways - e.g. ROADS-based

services• Library catalogues, e.g. using USMARC

856 field - InterCat project (OCLC), BIBLINK

• Need for “core” metadata for simple resource discovery and interoperability - Dublin Core initiative

12

Dublin Core (1)

International initiative to define a core set of metadata elements for resource discovery on the Internet

• Six DC workshops (to date):• DC-1 (Dublin, Ohio) - 1995• DC-2 (Warwick) - 1996• DC-3 (Dublin, Ohio) - 1996• DC-4 (Canberra) - 1997• DC-5 (Helsinki) - 1997• DC-6 (Washington, D.C.) - 1998• DC-7 (Frankfurt/AM) - 1999

http://purl.oclc.org/dc

13

Dublin Core (2)15 Elements:

• Title • Subject • Description • Creator • Publisher • Contributor • Date • Type

Core elements defined in RFC 2413:

http://src.doc.ic.ac.uk/computing/internet/rfc/rfc2413.txt

• Format • Identifier • Source • Language • Relation• Coverage • Rights

14

Dublin Core (3)

DC Qualifiers:• TYPE - refines the meaning of

elements:– Relation TYPE=IsPartOf

• SCHEME - associates the value with an externally defined ‘scheme’:

– Subject SCHEME=DDC– Date SCHEME=ISO 8601

• LANGUAGE - indicates the language of the value

– Title LANGUAGE=en

15

Dublin Core (4)

Syntax issues:• Simple DC can be embedded into

HTML Web pages– Limited functionality

• Web moving to Extensible Markup Language (XML)

• Resource Description Framework– RDF … described as “an architecture for

metadata on the Web”

16

RDF

Resource Description Framework

• World Wide Web Consortium (W3C)

• Data model and XML-based syntax

• An implementation of the conceptual ‘Warwick Framework’

• Modular interoperability

• Useful for aggregating the different metadata types required for managing digital information over time

http://www.w3.org/RDF/

17

DC in HTML

Example of DC embedded in HTML:

<HTML>

<HEAD>

<TITLE>UKOLN Home Page</TITLE>

<META NAME="DC.Title” CONTENT="UKOLN: UK Office for Library and Information Networking">

<META NAME="DC.Subject" CONTENT="national centre, network information support, library community, awareness, research, information services, public library networking, bibliographic management, distributed library systems, metadata, resource discovery, conferences, lectures, workshops">

<META NAME="DC.Description" CONTENT="UKOLN is a national centre for support in network information management in the library and information communities. It provides awareness, research and information services">

<META NAME="DC.Creator" CONTENT=”UKOLN Information Services Group">

</HEAD>

<BODY> [...]

18

DC in XML-RDF<rdf:RDF

xmlns:rdf="http://www.w3.org/TR/WD-rdf-syntax#”

xmlns:dc="http://purl.org/dc/elements/1.0/">

<rdf:Description about="http://www.ukoln.ac.uk/metadata/"

dc:Title="UKOLN metadata homepage”

dc:Subject="metadata; BIBLINK; DESIRE; NewsAgent; ROADS;

PRIDE; Cedars; Dublin Core; DC; Z39.50; WHOIS++"

dc:Publisher="UKOLN, University of Bath"

dc:Type="Text"

dc:Format="text/html - 4847 bytes" >

<dc:Creator>

<rdf:Bag rdf:_1="Michael Day”

rdf:_2="Andy Powell" />

</dc:Creator>

<dc:Identifier>

<rdf:Bag rdf:_1="http://purl.org/net/ukoln/metadata"

rdf:_2="http://purl.eu.org/net/ukoln/metadata" />

</dc:Identifier>

</rdf:Description>

</rdf:RDF>

19

Interoperability

Problem of heterogeneous and distributed resources

• Protocols– Z39.50

– Whois++ cross-searching (ROADS)

• Metadata conversion– Nordic Metadata Project

– BIBLINK

• “Layered” approaches– Arts and Humanities Data Service

20

Other applications

Metadata has potential applications in other areas relating to the management of digital resources:

• Digital preservation• Electronic commerce• Authentication• Managing intellectual property rights• Managing access to resources• Content rating services

21

UKOLNUKOLN is funded by the British Library Research and Innovation Centre (BLRIC), the Joint Information Systems Committee (JISC) of the UK Higher Education Funding Councils, as well as by project funding from the JISC’s Electronic Libraries (eLib) Programme and the European Union. UKOLN also receives support from the University of Bath, where it is based.

http://www.ukoln.ac.uk/

More information on UKOLN’s work on metadata can be found at:

http://www.ukoln.ac.uk/metadata/

Recommended