View
2
Download
0
Category
Preview:
Citation preview
Metadata considerations for the
curation and preservation of research
data
Matt Carruthers
Metadata Projects Librarian
University of Michigan
Traditional view: Focus on end products of research(Articles, books, reports, etc.)
Reproducible research is hard to come by
(http://www.nytimes.com/2015/08/28/science/many-social-science-findings-
not-as-strong-as-claimed-study-says.html?_r=0)
Grant-funding institutions are beginning to
mandate that recipients have plans to manage
and preserve their data sets.
Grant-funding institutions are beginning to
mandate that recipients have plans to manage
and preserve their data sets.
Alfred P. Sloan FoundationNational Science FoundationDepartment of EnergyGulf of Mexico Research Initiative
Gordon and Betty Moore FoundationInstitute of Museum and Library ServicesDepartment of EducationJoint Fire Science Program
National Endowment for the HumanitiesNational Institutes of HealthNational Oceanic and Atmospheric AdministrationU.S. Geological SurveyU.S. Department of Agriculture
Deep Blue Data
University of Michigan Research Data
Repository
(UMRDR)
Deep Blue Data
University of Michigan Research Data
Repository
(UMRDR)
What is metadata?
“Data about data”
What is metadata?
“Data about data”
What is metadata?
1. Metadata describes the content, quality,
condition, and other characteristics of data.
2. Metadata is standardized, structured information
about an object that facilitates functions
associated with that object.
(Discovery, management, rights and access
control, reuse.)
Describing collections of digital objects
• Digital collections
• Online finding aids
• Digital Exhibits
Encoded Archival Description (EAD)
Text Encoding Initiative (TEI)
Dublin Core
Metadata Object Description Schema (MODS)
Metadata Encoding and Transmission Standard (METS)
New things to consider:
• How do you represent context in a digital
environment?
• How do you facilitate long-term preservation of
digital files with vastly different characteristics?
• How do you track changes to digital objects over
time?
Levels of Metadata and Documentation:
Study-level: provides an overview of the research
context and design, data collection methods, data
preparation and results or findings, etc.
Data-level: provides labelling and documentation of
individual data items, such as names and descriptions
of variables, and explanations of codes and
classification schemes used. It can be embedded
within a data collection or recorded in an
accompanying document.
What difference does metadata make?
• Scholarly Communicationo Fight the “Digital Data Deluge”
• Increase the “long tail” of research data
• Potential for increase in data citations
• Meet funding agency requirements
Training Resources and Guides:
• Digital Curation Centre disciplinary metadata standardso http://www.dcc.ac.uk/resources/metadata-standards
• Managing Research Data 101 – Documentation and Metadata (MIT):o http://libraries.mit.edu/data-management/store/documentation/
• Data Management Course Module for Graduate Students (University of Minnesota Libraries):o https://www.lib.umn.edu/datamanagement/workshops (particularly Module 3)
• MANTRA (Research Data Management Training): Documentation, Metadata, Citation (University of Edinburgh):o http://datalib.edina.ac.uk/mantra/documentation_metadata_citation/
• “Practical guidance for anyone working with research data”: Chapters 4 and 5 (UK Data Service):o http://ukdataservice.ac.uk/manage-data/handbook.aspx
• School of Data:o http://schoolofdata.org/courses/
• Metadata Guide Working Level (Australian National Data Service):o http://ands.org.au/guides/metadata-working.html
• Create & Manage Data: Documenting Your Data (UK Data Service):o http://www.data-archive.ac.uk/create-manage/document
• Guide to writing “ReadMe” style metadata (Cornell University):o http://data.research.cornell.edu/content/readme
• “Understanding Metadata” (National Information Standards Organization):o http://www.niso.org/publications/press/UnderstandingMetadata.pdf
• Best Practices in Creating Metadata (ICPSR):o http://www.icpsr.umich.edu/icpsrweb/content/deposit/guide/chapter3docs.html
Tools for Metadata Creation:
For lists of discipline-specific metadata tools, visit:http://www.dcc.ac.uk/resources/metadata-standards/tools
• Automated extraction of technical metadata from files:o File Information Tool Set (FITS): http://projects.iq.harvard.edu/fits
• The File Information Tool Set (FITS) identifies, validates and extracts technical metadata for a wide range of file formats.
o JHOVE: http://sourceforge.net/projects/jhove/
• JHOVE provides functions to perform format-specific identification, validation, and characterization of digital objects.
o JHOVE2: https://bitbucket.org/jhove2/main/wiki/Home
• JHOVE2 is open source software for format-aware characterization of digital objects.
o Exiftool: http://www.sno.phy.queensu.ca/~phil/exiftool/
• ExifTool is a platform-independent Perl library plus a command-line application for reading, writing and editing meta information in a wide variety of files. ExifTool is also available as a stand-alone Windows executable and a Macintosh OS X package
o Apache Tika: http://tika.apache.org/
• The Apache Tika™ toolkit detects and extracts metadata and text from over a thousand different file types (such as PPT, XLS, and PDF). All of these file types can be parsed through a single interface, making Tika useful for search engine indexing, content analysis, translation, and much more.
• Adding metadata to Microsoft documents:o Microsoft Document Properties: http://www.lib.cam.ac.uk/dataman/resources/Cambridge_documentproperties_factsheet.pdf
• The Document Properties feature in Microsoft Office applications such as Word, PowerPoint, Access or Excel allow you to attach information about your document to the file.
o Colectica for Excel: http://www.colectica.com/software/colecticaforexcel
• Colectica for Microsoft Excel is a free tool to document your spreadsheet data using the open standard for data documentation.
mcarruth@umich.edu
Recommended