38
1 Crosswalks March 25, 2013 Richard Sapon-White

Metadata crosswalks

Embed Size (px)

DESCRIPTION

 

Citation preview

Page 1: Metadata crosswalks

1

Crosswalks

March 25, 2013

Richard Sapon-White

Page 2: Metadata crosswalks

2

Overview

Crosswalk definition and description Issues

Page 3: Metadata crosswalks

3

Interoperability

Search interoperability The ability to perform a search over

diverse sets of metadata records to obtain meaningful results

Today’s session focuses on sets of records using different metadata schemes

Page 4: Metadata crosswalks

4

Definition

An authoritative mapping from the metadata elements of one scheme to the elements of another

Example:

Dublin Core to MARC Crosswalk

Page 5: Metadata crosswalks

5

Reciprocal Crosswalks

Two crosswalks are needed to map from metadata scheme A to scheme B

AND

from scheme B to scheme A With two crosswalks, “round-trip”

mapping results in loss or distortion of information

Page 6: Metadata crosswalks

6

More Examples

Library of Congress has crosswalks for MARC21 to/from – DC (Dublin Core)– FGDC Content Standards for Geospatial

Metadata (Federal Geographic Data Committee)

– GILS (Global Information Locator Service)– ONIX ((ONline Information eXchange)

Page 7: Metadata crosswalks

7

Uses of Crosswalks

Record exchange Union catalogs Metadata harvesting Search engines: query fields with

similar content in different databases Aid to understanding unfamiliar

schemes

Page 8: Metadata crosswalks

8

Complexities of Crosswalk Creation No standard format for metadata schemes

– Different properties of elements are specified– Same properties may employ different terms

Some elements may map to multiple elements in a second scheme, or vice versa

Elements may be repeatable in one scheme, non-repeatable in another

Page 9: Metadata crosswalks

9

Complexities of Crosswalk Creation (cont.) Source scheme may specify an element

for which there is no comparable element in the target scheme

Differences in content rules (e.g., use of a controlled vocabulary) or data representation (e.g., Michał Kowalski vs. Kowalski, Michał)

Page 10: Metadata crosswalks

10

Issues in Crosswalking Content Metadata Standards

Barriers to creating crosswalks

1. Lack of common terminology between metadata schemes

2. Metadata standards are not organized in the same way

Margaret St. Pierre and William LaPlant

http://www.niso.org/publications/white_papers/crosswalk/ (1998)

Page 11: Metadata crosswalks

11

St. Pierre and LaPlant (cont.)

Barriers to mapping One-to-many mapping: source field contains

multiple keywords while target field is repeatable with one keyword per field

Many-to-one mapping: results in loss of information

Source element does not map to any element in target

Mandatory element in target without any element in source

Page 12: Metadata crosswalks

12

Example

Dublin Core element “Creator” – an uncontrolled name

Creator did not map to MARC MARC name fields defined as main or

added entries (1xx, 7xx) - content defined by AACR2

To develop a crosswalk, a new 720 field was added to MARC

Page 13: Metadata crosswalks

13

Mapping DC Subject to MARC

DC Subject– the topic addressed by the work– Can be qualified by the scheme (e.g., LCSH)

MARC fields 600, 630, 650, 651, 653– 600, 630, 650, 651 are controlled vocabulary with

indicator for the scheme used– 653 is uncontrolled vocabulary

If map to 653, then lose identification of controlled vocabulary

Page 14: Metadata crosswalks

14

Mapping DC Subject to MARC (cont.) Cannot map to other subject fields since DC

doesn’t distinguish between them Suggestion: create new MARC field for generic

subject field (not done)Unqualified: 653 ##$a (Index Term--Uncontrolled)Qualified: Scheme=LCSH: 650 #0$a (Subject added entry--Topical term)Scheme=MeSH: 650 #2$a (Subject added entry--Topical term)Scheme=LCC: 050 ##$a (Library of Congress Call Number/Classification number)Scheme=DDC: 082 ##$a (Dewey Decimal Call Number/Classification number)Scheme=UDC: 080 ##$a (Universal Decimal Classification Number)Scheme=(other): 650 #7$a with $2=code from MARC Code List for Relators,

Sources, Description Conventions

Page 15: Metadata crosswalks

15

Mapping DC Title to MARC

DC Title does not distinguish between title (245 $a) and subtitle (245 $b) or any other kinds of titlesUnqualified: – 245 00$a (Title Statement/Title proper) – If repeated, all titles after the first: 246 33$a (Varying Form

of Title/Title proper)

Qualified: – Alternative: 246 33$a (Varying Form of Title/Title proper)

Page 16: Metadata crosswalks

16

Mapping DC Publisher to MARC

One-to-one relationship between DC Publisher and MARC 260 $b

EASY!

Page 17: Metadata crosswalks

17

Mapping DC Date to MARC

Publication date in DC element Date best maps to MARC21 260 $c

Other dates exist in MARC21:– 008/07-10: date in standardized form– 260 $c can also include copyright or printing dates

Unqualified: 260 ##$c (Date of publication, distribution,

etc.)

Page 18: Metadata crosswalks

18

Mapping DC Date to MARC (cont.)Qualified DC: Available: 307 ##$a (Hours, Etc.) Created: 260 ##$g (Date of manufacture) Issued: 260 ##$c (Date of publication,

distribution, etc.) Modified: 583 ##$d with $a=modified Valid: 518 ##$a (Date/Time and Place of an

Event Note). Text may be generated in $3 to include qualifier name.

Page 19: Metadata crosswalks

19

Mapping DC Identifier to MARC

DC Identifier is any string or number used to uniquely identify an object

Could be ISBN, ISSN, LCCN, URL– Each coded differently in MARC21

MARC 024 (other standard identifier) could be used if type of identifier not specified

Page 20: Metadata crosswalks

20

Mapping DC Identifier to MARC (cont.)Unqualified: 024 8#$a (Other Standard Identifier/Standard number or code)

Qualified: Scheme=URI: 856 40$u (Electronic Location and Access/Uniform

Resource Locator) Scheme=ISBN: 020 ##$a (International Standard Book Number)

Scheme=ISSN: 022 ##$a (International Standard Serial Number)

Scheme=(other): 024 8#$a (Other Standard Identifier/Standard number or code) with $2=scheme value

Page 21: Metadata crosswalks

21

Resolving Difficulties in Crosswalk Creation: A Summary Create a new field in MARC Use qualifiers (Qualified DC) to map to

specific MARC fields If using unqualified DC, then map to

closest matching field (with loss of some information)– Some information maps to a “wrong” field– Map to an “other” or “uncontrolled” field

Page 22: Metadata crosswalks

Terry Reese

Gray Family Chair for Innovative Library Services

Oregon State University

Email: [email protected]

Introduction to MarcEdit, from first run to philosophy

Page 23: Metadata crosswalks

Getting Started

1. Sample Data Files– Sample MARC records need to be downloaded. – Get them from:

http://oregonstate.edu/~reeset/marcedit/examples/session_data.zip (~5 MB)

– Unzip the data to the Desktop• Right click, Extract all to Desktop.

– Worksheet File• Includes the examples that I’ll be working from:

– http://oregonstate.edu/~reeset/marcedit/examples/marc_worksheet.docx

– When you start MarcEdit for the first time, it will ask you to update. Don’t. Tell it no – then we’ll turn off the automated update checker.

– We’ll use this information later.

Page 24: Metadata crosswalks

Keypoints

What is MarcEdit?– Background– System Requirements

Installation Notes– First Run

Understanding the Application Settings– Editor Settings– Language settings

Accessing Application Data MarcEdit Infrastructure Getting Help Questions

Page 25: Metadata crosswalks

What is MarcEdit?

Started development in 1999– Originally coded in 3 programming

languages: Assembler (libraries), Visual Basic (UI) and Delphi (COM).

– Initially designed as a replacement for LC’s DOS-based MARCBreakr/MARCMakr software

Page 26: Metadata crosswalks

What is MarcEdit?

Today:– Written in C#– Continues to be freely available– Supports both UTF/MARC8 charactersets– MARC Neutral– XML aware

Page 27: Metadata crosswalks

Installing MarcEdit

Windows:– Installing from the Windows Installer

• 32-bit version: http://people.oregonstate.edu/~reeset/marcedit/software/development/MarcEdit_Setup.msi

• 64-bit version: http://people.oregonstate.edu/~reeset/marcedit/software/development/MarcEdit_Setup64.msi

– Installing using a Zip file:• http://oregonstate.edu/~reeset/marcedit/software/

development/marcedit.zip

Page 28: Metadata crosswalks

Setting up MarcEdit

On first run, MarcEdit will ask you to confirm some settings. These are broken down into 5 areas– MarcEditor– Language– Export– MARCEngine– Other

Page 29: Metadata crosswalks

MarcEdit Export Properties

Defines MARC import

Can capture port output from record input (much in the same way OCLC’s Connexion can)

Page 30: Metadata crosswalks

MARC Conversions

Page 31: Metadata crosswalks

MarcEdit: crosswalking design

MarcEdit model:– So long as a schema has been

mapped to MARCXML, any metadata combination could be utilized. This means that no more than two tranformations will ever take place. Example: MODS MARCXML EAD

Page 32: Metadata crosswalks

MarcEdit: crosswalking design

MarcEdit Crosswalk model– Pro

• Crosswalks need not be directly related to each other

• Requires crosswalker to know specific knowledge of only one schema

– Con• each known crosswalk must be mapped

to MARCXML.

Page 33: Metadata crosswalks

MarcEdit Crosswalking model

Dublin Core

MARC MODS

FGDC

EAD

MARC21XML

Page 34: Metadata crosswalks

MarcEdit: Crosswalks for everyone

Page 35: Metadata crosswalks

MarcEdit: Crosswalks for everyone

Example Crosswalks:– MODS => MARC– MODS => FGDC– MODS => Dublin Core– EAD => MODS– EAD=>HTML

Page 36: Metadata crosswalks

MarcEdit: Crosswalks for everyone

What’s MarcEdit doing?– Facilitates the crosswalk by:

1. Performing character translations (MARC8-UTF8)

2. Facilitates interaction between binary and XML formats.

Page 37: Metadata crosswalks

Examples

Project Gutenburg RDF => MARC EAD=>MARC

Page 38: Metadata crosswalks

MarcEdit Demo

http://people.oregonstate.edu/~reeset/marcedit/html/index.php

38