39
1 CS 430: Information Discovery Lecture 6 Descriptive Metadata 2 Library Catalogs Dublin Core

1 CS 430: Information Discovery Lecture 6 Descriptive Metadata 2 Library Catalogs Dublin Core

Embed Size (px)

Citation preview

Page 1: 1 CS 430: Information Discovery Lecture 6 Descriptive Metadata 2 Library Catalogs Dublin Core

1

CS 430: Information Discovery

Lecture 6

Descriptive Metadata 2

Library CatalogsDublin Core

Page 2: 1 CS 430: Information Discovery Lecture 6 Descriptive Metadata 2 Library Catalogs Dublin Core

2

Course Administration

Assignment 1

• Submission instructions will be posted soon.

• You will need a csuglab account. If you do not have such an account, go to Upson 311.

Programming in Perl

• First class on Perl is Wednesday night, Hollister 110, 7:30 to 9:00 p.m.

Page 3: 1 CS 430: Information Discovery Lecture 6 Descriptive Metadata 2 Library Catalogs Dublin Core

3

Course Administration

New Course

• LAW 410 Limits on and Protection of Creative Expression - Copyright Law and Its Close Neighbors

This course, offered during fall term 2001, provides an introduction to copyright law and closely related legal regimes for non-law students.

Page 4: 1 CS 430: Information Discovery Lecture 6 Descriptive Metadata 2 Library Catalogs Dublin Core

4

Example: Monograph catalog record

Citation

Caroline R. Arms, editor, Campus strategies for libraries and electronic information. Bedford, MA: Digital Press, 1990.

Page 5: 1 CS 430: Information Discovery Lecture 6 Descriptive Metadata 2 Library Catalogs Dublin Core

5

MARC fields

tag value

001 89-16879 r93

050 Z675.U5C16 1990

082 027.7/0973 20

245 Campus strategies for libraries and electronic title statement information/Caroline Arms, editor.

260 {Bedford, Mass.} : Digital Press, c1990. publisher

300 xi, 404 p. : ill. ; 24 cm. collation440 EDUCOM strategies series on information technology series title

504 Includes bibliographical references (p. {373}-381).

020 ISBN 1-55558-036-X : $34.95

Page 6: 1 CS 430: Information Discovery Lecture 6 Descriptive Metadata 2 Library Catalogs Dublin Core

6

MARC fields (continued)

650 Academic libraries--United States--Automation. subject heading

650 Libraries and electronic publishing--United States.

650 Library information networks--United States.

650 Information technology--United States.

700 Arms, Caroline R. (Caroline Ruth)

040 DLC DLC DLC

043 n-us---

955 CIP ver. br02 to SL 02-26-90

985 APIF/MIG

Page 7: 1 CS 430: Information Discovery Lecture 6 Descriptive Metadata 2 Library Catalogs Dublin Core

7

MARC Encoding: For Print and Computer Processing

tag: 260

subfield a: {Bedford, Mass.} :

subfield b: Digital Press,

subfield c: c1990.

MARC encoding:

&2600#abc#{Bedford, Mass.} :#Digital Press,#c1990.%

Page 8: 1 CS 430: Information Discovery Lecture 6 Descriptive Metadata 2 Library Catalogs Dublin Core

8

Name authority files

• Caroline R. Arms or Caroline Ruth Arms?

• Which William Phillips of Cardiff?

• Mark Twain or Samuel Clemens?

• Epithets:

of Cardiffdoctor

• Dates:

1832 - 1876flourished 1860 circa 1832 - 1876

Page 9: 1 CS 430: Information Discovery Lecture 6 Descriptive Metadata 2 Library Catalogs Dublin Core

9

Shared cataloguing

OCLC -- Large centralized transaction processing database system

When a library catalogs a book it deposits MARC record in OCLC

Other libraries can copy the record

• saves duplication of cataloguing

• build database of holdings

OCLC database has 43 million records

Page 10: 1 CS 430: Information Discovery Lecture 6 Descriptive Metadata 2 Library Catalogs Dublin Core

10

Subject information

Library of Congress Subject Headings

Academic libraries--United States--Automation

Hierarchical classification

Library of Congress call number: Z675.U5C16

Dewey Decimal Classification: 027.7

Creation and maintenance of lists of subject headings and classifications is a never ending task.

Page 11: 1 CS 430: Information Discovery Lecture 6 Descriptive Metadata 2 Library Catalogs Dublin Core

11

Notes on MARC

A great achievement:

• Developed in 1960s

• Magnetic tape exchange format for printing catalog records

• The dawn of computing:

mixed upper and lower casevariable length fields, repeated fieldsnon-Roman scripts

• 100(?) million records with standard content and format

• Thousands of trained librarians (millions?)

Page 12: 1 CS 430: Information Discovery Lecture 6 Descriptive Metadata 2 Library Catalogs Dublin Core

12

Notes on MARC

A great problem:

• Not designed for computer algorithms

• One record per item (poor links between records)

• Tied to traditional materials and traditional practices

• Not Unicode

• 100 of million records at $100 -- $10 billion

A classic legacy system!

Page 13: 1 CS 430: Information Discovery Lecture 6 Descriptive Metadata 2 Library Catalogs Dublin Core

13

Cataloguing Objectives

Functions of catalogs:

finding

collocating (recall and precision)

choosing

acquiring

navigating

... among items in a bibliographic universe

Compare use cases in software design.

Page 14: 1 CS 430: Information Discovery Lecture 6 Descriptive Metadata 2 Library Catalogs Dublin Core

14

IFLA Model

Work A work is the underlying abstraction, e.g.,

• The Iliad

• The Computer Science departmental web site

• Beethoven's Fifth Symphony

• Unix operating system

• The 1996 U.S. census

This is roughly equivalent to the concept of "literary work" used in copyright law.

Page 15: 1 CS 430: Information Discovery Lecture 6 Descriptive Metadata 2 Library Catalogs Dublin Core

15

IFLA Model

Expression. A work is realized through an expression, e.g.,

• The Illiad has oral expressions and written expressions

• A musical work has score and performance(s).

• Software has source code and machine code

Many works have only a single expression, e.g. a web page, or a book.

Page 16: 1 CS 430: Information Discovery Lecture 6 Descriptive Metadata 2 Library Catalogs Dublin Core

16

IFLA Model

Manifestation. A expression is given form in one or more manifestations, e.g.,

• The text of The Iliad has been manifest in numerous manuscripts and printed books.

• A musical performance can be distributed on CD, or broadcast on television.

• Software is manifest as files, which may be stored or transmitted in any digital medium.

Page 17: 1 CS 430: Information Discovery Lecture 6 Descriptive Metadata 2 Library Catalogs Dublin Core

17

IFLA Model

Item. When many copies are made of a manifestation, each is a separate item, e.g.,

• a specific copy of a book

• computer file

[Works, expressions, manifestations and items are explored in CS 502, Computing Methods of Digital Libraries.]

Page 18: 1 CS 430: Information Discovery Lecture 6 Descriptive Metadata 2 Library Catalogs Dublin Core

18

Dublin Core

Simple set of metadata elements for online information

• 15 basic elements

• intended for all types and genres of material

• all elements optional

• all elements repeatable

Developed by an international group chaired by Stuart Weibel since 1995.

(Diane Hillmann and Carl Lagoze of Cornell are very active in this group.)

Page 19: 1 CS 430: Information Discovery Lecture 6 Descriptive Metadata 2 Library Catalogs Dublin Core

19

Page 20: 1 CS 430: Information Discovery Lecture 6 Descriptive Metadata 2 Library Catalogs Dublin Core

20

Dublin Core

publisher: OCLC

creator: Weibel, Stuart L.

creator: Miller, Eric J.

title: Dublin Core Reference Page

date: 1996-05-28

format: text/html (MIME type)

language: en (English)

identifier: http://purl.org/dc/documents/rec-dces-199809.htm#

Page 21: 1 CS 430: Information Discovery Lecture 6 Descriptive Metadata 2 Library Catalogs Dublin Core

21

Dublin Core with Meta Tags

<meta name="publisher" content="OCLC">

<meta name="creator" content="Weibel, Stuart L.">

<meta name="creator" content="Miller, Eric J.">

<meta name="title" content="Dublin Core Reference Page">

<meta name="date" content="1996-05-28">

<meta name="format" content="text/html">

<meta name="language" content="en">

<meta name="identifier" content="http://purl.org/dc/documents/rec-dces-199809.htm#">

Page 22: 1 CS 430: Information Discovery Lecture 6 Descriptive Metadata 2 Library Catalogs Dublin Core

22

Dublin Core elements

1. Title The name given to the resource by the creator or publisher.

2. Creator The person or organization primarily responsible for the intellectual content of the resource. For example, authors in the case of written documents, artists, photographers, or illustrators in the case of visual resources.

3. Subject The topic of the resource. Typically, subject will be expressed as keywords or phrases that describe the subject or content of the resource. The use of controlled vocabularies and formal classification schemes is encouraged.

Page 23: 1 CS 430: Information Discovery Lecture 6 Descriptive Metadata 2 Library Catalogs Dublin Core

23

Dublin Core elements

4. Description A textual description of the content of the resource, including abstracts in the case of document-like objects or content descriptions in the case of visual resources.

5. Publisher The entity responsible for making the resource available in its present form, such as a publishing house, a university department, or a corporate entity.

6. Contributor A person or organization not specified in a creator element who has made significant intellectual contributions to the resource but whose contribution is secondary to any person or organization specified in a creator element (for example, editor, transcriber, and illustrator).

Page 24: 1 CS 430: Information Discovery Lecture 6 Descriptive Metadata 2 Library Catalogs Dublin Core

24

Dublin Core elements

7. Date A date associated with the creation or availability of the resource.

8. Type The category of the resource, such as home page, novel, poem, working paper, preprint, technical report, essay, dictionary.

9. Format The data format of the resource, used to identify the software and possibly hardware that might be needed to display or operate the resource.

10. Identifier A string or number used to uniquely identify the resource. Examples for networked resources include URLs and URNs.

Page 25: 1 CS 430: Information Discovery Lecture 6 Descriptive Metadata 2 Library Catalogs Dublin Core

25

Dublin Core elements

11. Source Information about a second resource from which the present resource is derived.

12. Language The language of the intellectual content of the resource.

13. Relation An identifier of a second resource and its relationship to the present resource. This element permits links between related resources and resource descriptions to be indicated. Examples include an edition of a work (IsVersionOf), or a chapter of a book (IsPartOf).

Page 26: 1 CS 430: Information Discovery Lecture 6 Descriptive Metadata 2 Library Catalogs Dublin Core

26

Dublin Core elements

14. Coverage The spatial locations and temporal durations characteristic of the resource.

15. Rights A rights management statement, an identifier that links to a rights management statement, or an identifier that links to a service providing information about rights management for the resource.

Page 27: 1 CS 430: Information Discovery Lecture 6 Descriptive Metadata 2 Library Catalogs Dublin Core

27

Qualifiers

Element qualifier

Example: Date

DC.Date -> Created: 1997-11-01

DC.Date -> Issued: 1997-11-15

DC.Date -> Available: 1997-12-01/1998-06-01

DC.Date -> Valid: 1998-01-01/1998-06-01

Page 28: 1 CS 430: Information Discovery Lecture 6 Descriptive Metadata 2 Library Catalogs Dublin Core

28

Qualifiers

Value qualifiers

Example: Subject

DC.Subject -> DDC: 509.123

DC.Subject -> LCSH: Digital libraries-United States

Page 29: 1 CS 430: Information Discovery Lecture 6 Descriptive Metadata 2 Library Catalogs Dublin Core

29

Page 30: 1 CS 430: Information Discovery Lecture 6 Descriptive Metadata 2 Library Catalogs Dublin Core

30

Dublin Core with qualifiers

<title>Digital Libraries and the Problem of Purpose</title>

<creator>David M. Levy</creator>

<publisher>Corporation for National Research Initiatives</publisher>

<date date-type = "publication">January 2000</date>

<type resource-type = "work">article</type>

<identifier uri-type = "DOI">10.1045/january2000-levy</identifier>

<identifier uri-type = "URL">http://www.dlib.org/dlib/january00/01levy.html</identifier>

<language>English</language>

<rights>Copyright (c) David M. Levy</rights>

Page 31: 1 CS 430: Information Discovery Lecture 6 Descriptive Metadata 2 Library Catalogs Dublin Core

31

Page 32: 1 CS 430: Information Discovery Lecture 6 Descriptive Metadata 2 Library Catalogs Dublin Core

32

Page 33: 1 CS 430: Information Discovery Lecture 6 Descriptive Metadata 2 Library Catalogs Dublin Core

33

Limits of Dublin Core

Complex objects

• Article within a journal

• A thumbnail of another image

• The March 28 final edition of a newspaper

Complete object

Sub-objects

Metadata records

Page 34: 1 CS 430: Information Discovery Lecture 6 Descriptive Metadata 2 Library Catalogs Dublin Core

34

Flat v. linked records

Flat record

All information about an item is held in a single Dublin Core record, including information about related items

convenient for access and preservation

information is repeated -- maintenance problem

Linked record

Related information is held in separate records with a link from the item record

less convenient for access and preservation

information is stored once

Compare with normal forms in relational databases

Page 35: 1 CS 430: Information Discovery Lecture 6 Descriptive Metadata 2 Library Catalogs Dublin Core

35

Dublin Core with flat record extension

Continuation

<relation rel-type = "InSerial">

<serial-name>D-Lib Magazine</serial-name>

<issn>1082-9873</issn>

<volume>6</volume>

<issue>1</issue>

</relation>

Page 36: 1 CS 430: Information Discovery Lecture 6 Descriptive Metadata 2 Library Catalogs Dublin Core

36

Events

Version 1

New material

Version 2

Should Version 2 have its own record or should extra information be added to the Version 2 record?

How are these represented in Dublin Core?

Page 37: 1 CS 430: Information Discovery Lecture 6 Descriptive Metadata 2 Library Catalogs Dublin Core

37

Minimalist versus structuralist

Minimalist

15 elements, no qualifiers, suitable for non-professionals

encourage creators to provide metadata

Structuralists

15 elements, qualifiers, RDF, detailed coding rules

will require trained metadata experts

[For an example of how complex Dublin Core can become, see the source of: http://purl.org/dc/documents/rec-dces-199809.htm#]

Page 38: 1 CS 430: Information Discovery Lecture 6 Descriptive Metadata 2 Library Catalogs Dublin Core

38

Dublin Core in many languages

See:

Thomas Baker, Languages for Dublin Core, D-Lib MagazineDecember 1998, http://www.dlib.org/dlib/december98/12baker.html

Page 39: 1 CS 430: Information Discovery Lecture 6 Descriptive Metadata 2 Library Catalogs Dublin Core

39

Dublin Core: Personal Opinion

Dublin Core is a simple way to describe digital content that:

• is a single, self-contained object ("document-like")

• is static with time

• has few relationships

Some web sites satisfy these criteria

Dublin Core is not suitable for digital content that:

• is heavily structured

• changes dynamically