View
170
Download
3
Category
Tags:
Preview:
DESCRIPTION
Riley, Jenn. “Tools and Techniques for Creating, Maintaining, and Distributing Shareable Metadata.” Yale University, June 19, 2008.
Citation preview
Tools and Techniques for Tools and Techniques for Creating, Maintaining, and Creating, Maintaining, and Distributing Shareable Distributing Shareable MetadataMetadata
Jenn RileyMetadata LibrarianIndiana University Digital Library Program
What does this record What does this record describe?describe?<dc:identifier>http://museum.university.edu/unique identifier</dc:identifier>
<dc:publisher>State University Museum of Ichthyology, Fish Field Notes</dc:publisher>
<dc:format>jpeg</dc:format>
<dc:rights>These pages may be freely searched and displayed. Permission must be received for subsequent distribution in print or electronically. Please go to http://museum,univeristy,edu/ for more information.</dc:rights>
<dc:type>image</dc:type>
<dc:description>1926; 0070; 06; Little S. Br. Pere Marquette R.; THL26-68; 71300; 71301; 71302; 71303; 71304; 71305; 71306; 71307; 71308; 71309; 07; 1926/07/06; R12W; S09; Second collector Moody; T16N</dc:description>
<dc:subject>Cottus bairdi; Esox lucius; Cottus cognatus; Etheostoma nigrum; Salmo trutta; Oncorhynchus mykiss; Catostomus commersoni; Pimephales notatus; Margariscus margarita; Rhinichthys atratulus; mottled sculpin; northern pike; slimy sculpin; johnny darter; brown trout; rainbow trout; white sucker; bluntnose minnow; pearl dace; blacknose dace; bairdi; lucius; cognatus; nigrum; trutta; mykiss; commersoni; notatus; margarita; atratulus; Cottus; Esox; Cottus; Etheostoma; Salmo; Oncorhynchus; Catostomus; Pimephales; Margariscus; Rhinichthys; 1926-07-06; ; Boleosoma; Salmo; Hyborhynchus; Semotilus; ; fario; gairdneri--irideus; atronasus--obtusus--meleagris</dc:subject>
<dc:language>UND</dc:language>
<dc:source>Michigan 1926 Langlois, v. 1 1926--1926; </dc:source>Record harvested via OAI PMH 2-27-2007
Collection
Registries
?????
GEM
Photograph from Indiana University
Charles W. Cushman Collection
Why we should careWhy we should careLibrary/archive/museum data is useful
◦Even when objects aren’t digitizedIt’s our mission to distribute informationWe should be leaders in the networked
information environmentWe have good ideas, but others do too
We should therefore make it easier for our data to be used
by others
Shareable Metadata…Shareable Metadata…Is quality metadata Promotes search interoperability -
“the ability to perform a search over diverse sets of metadata records and obtain meaningful results” (Priscilla Caplan)
Is human understandable outside of its local context
Is useful outside of its local contextPreferably is machine processable
Shareable Metadata as a Shareable Metadata as a ViewViewMetadata is not monolithicMetadata should be a view
projected from a single information object
Create multiple views appropriate for groups of important sharing venues
Depends on:◦Use◦Audience
The 6 Cs & Lots of Ss of The 6 Cs & Lots of Ss of Shareable MetadataShareable Metadata
Content
Coherence
Context
Communication
Consistency
Conformance to
Standards
ContentContentHow element values are structured
affect whether the record is shareable For your institution, the resource and
the defined audience choose the appropriate:◦ Vocabularies◦ Content standards◦ Granularity of description◦ Version of the resource to describe◦ Elements to use
Don’t include empty elements in shared records
CoherenceCoherenceA shareable metadata record should
make sense on its own, outside of the local institutional context and without access to the resource itself
Place values in appropriate elementsRepeat elements instead of “packing”
multiple values into one fieldAvoid local jargon, abbreviations and
codesEnsure mappings from local to shared
metadata formats result in coherent records
ContextContextAppropriate context allows a user
to understand a resource based on the metadata record alone
Shareable metadata records should:◦Include information not used locally◦Exclude information only used locally
Collection level records can help, but don’t rely on them
CommunicationCommunicationInformation supplementing your metadata
records can be useful to an aggregator◦ Intended audiences◦ Record creation methods◦ Controlled vocabularies used◦ Content standards used◦ Accrual practices◦ Existence of analytical or supplementary
materials◦ Provenance of materials
Can be within or external to a sharing protocol
ConsistencyConsistencyConsistency allows aggregators to
apply same indexing or enhancement logic to an entire group of records
Can be affected by change in policy or personnel over time
Pay special attention to consistency of:◦ How metadata elements are used ◦ How (and which) vocabularies are used for
a particular element ◦ Syntax encoding schemes
Conformance to Conformance to StandardsStandardsTechnical conformance to all types of
standards is essential. Without it, processing tools and routines simply break.◦ Sharing protocols (e.g. OAI-PMH)◦ Metadata structure standards◦ Controlled vocabularies and syntax
encoding schemes◦ Content standards◦ Technical standards (e.g. XML, character
encoding)
Generic high-level Generic high-level workflowworkflow
Write metadata creation
guidelines
Choose standard
s for native
metadata
Who to share with?
Choose shared metada
ta formats
Plan
Create metadata (thinking about
shareability)
Create
Perform conceptual mapping
Perform technical mapping
Validate transformed
metadata
Test shared metadata with
protocol conformance
tools
Transform
Implement sharing protocol
Share
Communicate with aggregators
See who is collecting your
metadata
Review your metadata in aggregations
Assess
No single “right” workflow No single “right” workflow exists for all situationsexists for all situationsOur tools sometimes dictate parts of
our workflow◦ Be careful not to let them do this too much
- tools serve us, not vice-versaStart workflow design from well-
defined goals (not processes)Fundamental principles to follow
◦ Put the right information in from of the right person at the right time
◦ Ensure shareability is a common theme underlying it all
◦ Generate multiple views from a single master
Choose the best tools for the jobImportant every step of the way
◦Programming languages◦Commercial or open-source software
packages◦Repository solutions◦Metadata creation interfaces
Promotes both efficiency and qualityDefine needed functionality, and
negotiate (compromise) from there
Thinking big pictureMust find a reasonable balance
between the perfect solution for a single set of materials and fully streamlined processes that treat everything the same way
One approach - define categories of material and design reusable workflows for each
Defining categories of materialBy resource type
◦ Text◦ Documentary images◦ Art images◦ Musical audio recordings◦ etc…. (including getting more specific)
By managing institution?◦ May provide barriers for our users - see
Elings/Waibel: “Metadata for All” article in First Monday, 2007
◦ But institutional mission is a factor in determining the appropriate views of a resource to share
Reusable parts of workflowDecisions on metadata structure
standards, content standards, controlled vocabularies, etc.
Metadata creation toolsAutomated processing
techniquesXSLT stylesheets and other data
management codeSIP/AIP/DIP architectureDelivery systems
Generalization is worth the effortYou will have to go back and do it again
at some point◦Fixing typos, errors, etc.◦Adding new content over time◦Adding new metadata format or sharing
mechanism◦Migration to another system
Need both workflow tools and documentation to be accessible
Generalization will allow you to minimize the effort redoing something and focus more on the new stuff
Make the most of automationAutomate the repetitive tasks as much as
feasible, but only where it makes senseFor example:
◦ Create as much technical metadata as possible from the file itself
◦ Derive basic structural metadata from filenaming conventions
◦ Develop automated processes that are triggered when an XML file is placed in a “drop box” or submitted via a specialized tool
◦ Develop easy-to-use tools to apply the same metadata to a defined group of records
Basic workflow at IU (1)Basic workflow at IU (1)Metadata standards chosenMetadata creation guidelines written
and tools developed/adaptedFedora content model developed or
existing appropriate one identifiedMetadata/markup created (and
perhaps digitization performed)◦Sometimes in phases by different people
Basic workflow at IU (2) Metadata transformed via XSLT (one per
category of material, with some tweaking for each collection) into all desired formats, and loaded into Fedora
Metadata for sharing loaded into OAI-PMH data provider
Appropriate staff alerted for parallel metadata creation for OPAC (generally collection level)
Note several opportunities for greater efficiency
One step at a timeOne step at a timeImplementing shareable metadata
practices likely will be done incrementally
We’re still learning how to best achieve effective shareability
Best practices grow and change over time
Must be positioned to respond quickly to new metadata standards and technologies as they evolve
Shareable metadata isn’t Shareable metadata isn’t just about OAI-PMHjust about OAI-PMHSome other options:
◦Lightweight APIs (e.g., OpenLibrary)◦Google SiteMaps◦OpenURL◦SRU◦OAI-ORE◦Linked data
Jim Michalko, RLG: library data sharing mechanisms are “high value and low participation”
Notice Z39.50 isn’t on this list.
Promoting new usesThe academic institution-built metadata
(and/or content) aggregation seems to have plateaued◦ See Ricky Erway RLG report “Seeking
Sustainability”We must provide a variety of options for
accessing our data, to support a variety of uses
We shouldn’t necessarily stop collaboration and aggregation, but we should allow others to do this too, with our metadata (and maybe even our content)
Terminologies servicesTerminologies servicesSharing our authority data is potentially
even more useful than sharing our descriptive data
RLG/OCLC doing some work in this area◦ Moving terminologies to the “network level”
Some possible uses◦ Give me more information on this
concept/person/etc.◦ What are this term’s broader, narrower,
related terms?◦ What are all the synonyms for this term?
Tools supporting the creation Tools supporting the creation of shareable metadataof shareable metadataOur existing metadata creation tools
are embarrasingly badCurrent technologies provide many
opportunities for improvementGood tools make it easy to do the
right thing and hard to do the wrong thing
Can operate when metadata is first created or in a later review step
Here are some ideas…
Directly in XMLDirectly in XMLGenerally only a good idea for
markup languages, rather than metadata structure standards◦And often not even then
Some supplemental tools can help◦Validation to Schema/DTD (of course)◦“Preview” function◦“Report card” function, e.g., with
Schematron
ModularizeModularizeAll metadata for a resource doesn’t
have to be created at once◦ Transcription vs. authority work vs. subject
analysis◦ Descriptive vs. technical vs. structural◦ Us vs. users!
Provide optimized views for each metadata creation function◦ Perhaps even different systems◦ But always provide metadata creators with
a way to see how the metadata will be used
Abandon the record-Abandon the record-centric approachcentric approachPatterns (and outliers) emerge
from data in the aggregateReporting capabilities
◦Sortable, deduplicated lists of values from a given field or set of fields
◦How many of this field per record◦How many distinct values used in
this field◦Data overlap between fields
Useful featuresData type validation (while
entering data in that field!)Auto-completeRecord-level validationSpell checkIntegration of metadata creation
guidelines into software tools
Integration of controlled Integration of controlled vocabulariesvocabulariesShould be seamlessProvide access to entire authority
record rather than just the headingFor short vocabularies, provide a combo
boxFor longer vocabularies
◦ Auto-complete◦ Ajax-y interactions with hierarchical and
alphabetical viewsSimilar features could be used to
perform maintenance of vocabularies
Working around system Working around system limitationslimitationsMany digital asset management
systems don’t support a second shareable copy of records
Do your best to split the difference with system records
Use creative interface design for your local system
Use extra-protocol documentation for communicating with aggregators
Lobby your vendor!
Good practice requires Good practice requires collaborationcollaborationOne person can’t do it allImplementing shareable
metadata requires a primary advocate to ensure shareability is a consideration at all steps of the workflow
Many people will need to be involved
Role of metadata Role of metadata specialistsspecialistsOften are the shareable
metadata advocateChoose standards and sharing
protocolsWrite metadata creation
guidelinesBe prepared to compromise!
Role of technical staffRole of technical staffEvaluate feasibility of technical
plansHelp with prioritization of optionsLocate and evaluate existing
code to minimize duplication of effort
Abstract specific processes for general use
Other collaboratorsOther collaboratorsCollection managersUser specialistsProject managersCatalogers/metadata creatorsReference staffGranting agencies
Final thoughts about Final thoughts about sharingsharingShareable metadata represents a
fundamental shift in thinking◦ Your metadata is no longer a destination,
it is information that will serve as building blocks for other services
◦ Your metadata must operate effectively in an increasingly decontextualized environment
Creating shareable metadata◦ Will require more work on your part◦ Will require our software to support (more)
standards◦ Is no longer an option, it’s a requirement
Yes, this is hard…Yes, this is hard…
…and we’re just starting to learn how to do it effectively and
efficiently
There’s plenty of room for leadership in this area.
Recommended