Upload
lavonn
View
42
Download
3
Embed Size (px)
DESCRIPTION
Toward a post-MARC view of bibliographic metadata. Jean Godby , Senior Research Scientist Triangle Research Libraries Network workshop -- Chapel Hill, North Carolina March 15, 2012. Outline for today. How did I get to this place? - PowerPoint PPT Presentation
Citation preview
Toward a post-MARC view of bibliographic metadata
Jean Godby, Senior Research Scientist
Triangle Research Libraries Network workshop -- Chapel Hill, North Carolina
March 15, 2012
Post-MARC bibliographic metadata 2
Outline for today
1. How did I get to this place?2. The Library of Congress
Bibliographic Framework for Digital Resources
3. The OCLC ‘Beyond MARC’ work agenda
4. Four guiding assumptions5. Some questions
Post-MARC bibliographic metadata 3
OCLCMARC
OutputsInputs
Translations in the Crosswalk service
ONIX Books 2.1
ONIX Books 3.0
MODS
Dublin Core
OCLC MARCDC-Qualified
MARC
ONIX Books 2.1
ONIX Books 3.0
MODS
Dublin Core
DC-Qualified
MARC
OCLC MARC
Post-MARC bibliographic metadata 4
Problems with mapping to and from MARCProblem: In a MARC record, some critical information is represented redundantly. Effect on the Crosswalk: requires one-to-many mappings, which are semantically opaque and difficult to maintain.Problem: Some MARC fields are ambiguous. Effect on the Crosswalk: The distinctions are difficult to recover or may be lost.Problem: Many MARC free-text fields have formatting requirements. Effect on the Crosswalk: They must be added in (and taken out).
Post-MARC bibliographic metadata 5
And so forth….and so on
Problem: Many formatting requirements are explicitly stated only in cataloging rules, not in the data that is algorithmically processed.
Effect on the Crosswalk: Knowledge of the cataloging rules must be embedded in the translation software.
Problem: Some MARC fields are coded with hidden assumptions.
Effect on the Crosswalk: Knowledge of the hidden assumptions must be embedded in the translation software, which requires complex and brittle Boolean logic.
Problem: MARC has a “long tail.”
Effect on the Crosswalk: It is necessary to maintain a large number of mappings that are not used.
Post-MARC bibliographic metadata 6
RDA or other
structured metadata
vocabulary
OutputsInputs
MARC’s complexity needs to be quarantined.
ONIX Books 2.1
ONIX Books 3.0
MODS
Dublin Core
OCLC MARCDC-Qualified
MARC
ONIX Books 2.1
ONIX Books 3.0
MODS
Dublin Core
DC-Qualified
MARC
OCLC MARC
Post-MARC bibliographic metadata 7
In other words, with MARC in the center of our model…
Despite the hundreds of millions of mappings that have been performed on OCLC’s bibliographic data, it is still locked up in a legacy system.
The mapping problem is complex largely because of the need to support MARC.
It is still too difficult to define and implement mappings.
So what is the alternative?
Post-MARC bibliographic metadata 8
“The new bibliographic framework we are aiming for will broaden participation in the network of resources, librarians will be able to do a much better job of linking their patrons to resources of all kinds (from the library and from many other sources), and costs can be better contained.”-- Library of Congress
“Bibliographic framework is... an environment rather than a ‘format’”
A Bibliographic Framework for the Digital Age (October 31, 2011)
Post-MARC bibliographic metadata 9
resource
relationship
manifestationentity
object
data
abstract
library
RD
A
serviceformat
linkedauthority
MARC
carriergroundtruthing
FRBR
semantic
beyond
content
tran
sfo
rmatio
n
RDFinstance
description
statementschema
role
hadoopproperty
UML
model
identifier
legacy
web
OCLC’s ‘Beyond MARC’ research agenda theme
Post-MARC bibliographic metadata 10
The OCLC “Beyond MARC: research agenda:who’s involved• Eric Childress, Consulting Product Manager• Jean Godby, Senior Research Scientist• Thom Hickey, Chief Scientist • Devon Smith, Consulting Software Engineer• Karen Smith-Yoshimura, Program Officer• Roy Tennant, Senior Program Officer• Diane Vizine-Goetz, Senior Research Scientist• Jeff Young, Software Architect
Post-MARC bibliographic metadata 11
Assumption 1 There are many moving targets
Post-MARC bibliographic metadata 12
• Don’t add to the complexity.• Use publicly defined standards wherever possible.
• Leverage the work of others.• Focus on data preparation, cleanup, and modeling that will support a variety of formats.
The OCLC Research response: Some guiding principles
Post-MARC bibliographic metadata 13
Post-MARC bibliographic metadata 14
Make your stuff available on the web.Make it available as structured data……in a non-proprietary format. Use URLs to identify things.Link your data to other people’s data.
Data preparation: principles
Sourc
e:
W3C
Data, not textIdentifiers, not stringsStatements, not recordsMachine-readable schemaMachine-readable lists
Sourc
e:
Kare
n C
oyle
Post-MARC bibliographic metadata 15
Assumption 2: Most bibliographic metadata will not be created by libraries
Post-MARC bibliographic metadata 16
Why ONIX is interesting
<Product> <RecordReference>0892962844</> <ProductIdentifier> <ProductIDType>02</> <IDValue>0892962852</> </ProductIdentifier> <ProductForm>BB</> <Title> <TitleType>01</> <TitleText>McBain’s Ladies</> </Title> <Contributor> <ContributorRole>A01</> <PersonNameInverted>Hunter, Evan</> </Contributor> <Subject> <SubjectSchemeIdentifier>02</> <SubjectHeadingText> Policewomen--Fiction.
Leader 00000 jm a22000005 4500008 g eng020 $a 0892962852100 $a Hunter, Evan245 $a McBain’s ladies260 $b Mysterious Press $d 1988300 $a 320 p.650 #2 $a Policewomen -- Fiction
identifier
text
A record
string
identifier
string
A statement
data
data
identifier
datastring
Post-MARC bibliographic metadata 17
A hypothetical bibliographic description expressed as linked data
<Product> <RecordReference> http://uri/recordID/0892962844</> <ProductIdentifier> http://uri/identifierisbn0892962852</> </ProductIdentifier> <ProductForm>http://uri:/format/paperback</> <Title> http://uri/title/primaryTitle/McBain’s Ladies</> </Title> <Contributor> <ContributorRole>A01</> <Person> http://uri/person/Hunter, Evan</> </Contributor> <Subject> http://uri/subject/LCSH/Policewomen--Fiction.
Post-MARC bibliographic metadata 18
This list is
inadequate for
describing the range
of material types held
by libraries.
Post-MARC bibliographic metadata 19
Some proposed “library” extensions to Schema.org.
Post-MARC bibliographic metadata 20
The extensions are derived from MARC data for the WorldCat search interface.
Post-MARC bibliographic metadata 21
The WorldCat search interface terms reduce a complex MARC concept space to a list.
Post-MARC bibliographic metadata 22
Assumption 3:MARC will be around for awhile.
Assumption 4:Mapping is still necessary.
A publishing model
OCLC Abstract Modelmodel
model
model
map
map
map
Raw DataStandard
Vocabularies
RDA or other
structured metadata
vocabulary
OutputsInputs
ONIX Books 2.1
ONIX Books 3.0
MODS
Dublin Core
OCLC MARCDC-Qualified
MARC
ONIX Books 2.1
ONIX Books 3.0
MODS
Dublin Core
DC-Qualified
MARC
OCLC MARC
Post-MARC bibliographic metadata 24
It is not enough
To RDF-ify MARC
The concepts
must be extracted.
They eventually
emerge.
Post-MARC bibliographic metadata 25
Some (perhaps uncomfortable) questions1. How much work will be involved in building out the
abstract model? What is the value proposition?2. How can we engage communities of practice to
contribute to the parts of the abstract model that describe their resources?
3. How will mappings be implemented in the post-MARC information landscape?
4. How much information in the MARC record will get lost?
5. What will content standards look like in post-MARC descriptions?
6. How many of the FRBR and RDA concepts are algorithmically recoverable from legacy data?
7. What happens if linked data does not live up to its promise or is not adopted quickly enough?
Post-MARC bibliographic metadata 26
But maps from many
MARC concepts look like this.
Set-theoretic mappings can be
implemented elegantly in RDF/OWL.
Post-MARC bibliographic metadata 27
ReferencesCoyle, Karen. 2011. MARC 21 as data: a start
http://journal.code4lib.org/articles/5468
---.2012. Taking library data from here to there. http://lists.w3.org/Archives/Public/public-esw-thes/2012Feb/0001.html
Godby, Carol Jean. 2010. From records to streams: merging library and publisher metadata. http://dcpapers.dublincore.org/ojs/pubs/article/view/1033.
Library of Congress. 2011. A bibliographic framework for the digital age. http://www.loc.gov/marc/transition/news/framework-103111.html
Library Linked Data Incubator Group final report. 2011. http://www.w3.org/2005/Incubator/lld/XGR-lld-20111025/
OCLC. 2012. FAST Linked Data. http://experimental.worldcat.org/fast/.
Schema.org. 2012 http://schema.org/
Smith-Yoshimura, Karen, et al. 2010. Implications of MARC tag usage on library metadata practices. http://www.oclc.org/research/publications/library/2010/2010-06.pdf
Thank you!