Metadata considerations for the digital curation and ...€¦ · Levels of Metadata and...

Preview:

Citation preview

Metadata considerations for the

curation and preservation of research

data

Matt Carruthers

Metadata Projects Librarian

University of Michigan

Traditional view: Focus on end products of research(Articles, books, reports, etc.)

Reproducible research is hard to come by

(http://www.nytimes.com/2015/08/28/science/many-social-science-findings-

not-as-strong-as-claimed-study-says.html?_r=0)

Grant-funding institutions are beginning to

mandate that recipients have plans to manage

and preserve their data sets.

Grant-funding institutions are beginning to

mandate that recipients have plans to manage

and preserve their data sets.

Alfred P. Sloan FoundationNational Science FoundationDepartment of EnergyGulf of Mexico Research Initiative

Gordon and Betty Moore FoundationInstitute of Museum and Library ServicesDepartment of EducationJoint Fire Science Program

National Endowment for the HumanitiesNational Institutes of HealthNational Oceanic and Atmospheric AdministrationU.S. Geological SurveyU.S. Department of Agriculture

Deep Blue Data

University of Michigan Research Data

Repository

(UMRDR)

Deep Blue Data

University of Michigan Research Data

Repository

(UMRDR)

What is metadata?

“Data about data”

What is metadata?

“Data about data”

What is metadata?

1. Metadata describes the content, quality,

condition, and other characteristics of data.

2. Metadata is standardized, structured information

about an object that facilitates functions

associated with that object.

(Discovery, management, rights and access

control, reuse.)

Describing collections of digital objects

• Digital collections

• Online finding aids

• Digital Exhibits

Encoded Archival Description (EAD)

Text Encoding Initiative (TEI)

Dublin Core

Metadata Object Description Schema (MODS)

Metadata Encoding and Transmission Standard (METS)

New things to consider:

• How do you represent context in a digital

environment?

• How do you facilitate long-term preservation of

digital files with vastly different characteristics?

• How do you track changes to digital objects over

time?

Levels of Metadata and Documentation:

Study-level: provides an overview of the research

context and design, data collection methods, data

preparation and results or findings, etc.

Data-level: provides labelling and documentation of

individual data items, such as names and descriptions

of variables, and explanations of codes and

classification schemes used. It can be embedded

within a data collection or recorded in an

accompanying document.

What difference does metadata make?

• Scholarly Communicationo Fight the “Digital Data Deluge”

• Increase the “long tail” of research data

• Potential for increase in data citations

• Meet funding agency requirements

Training Resources and Guides:

• Digital Curation Centre disciplinary metadata standardso http://www.dcc.ac.uk/resources/metadata-standards

• Managing Research Data 101 – Documentation and Metadata (MIT):o http://libraries.mit.edu/data-management/store/documentation/

• Data Management Course Module for Graduate Students (University of Minnesota Libraries):o https://www.lib.umn.edu/datamanagement/workshops (particularly Module 3)

• MANTRA (Research Data Management Training): Documentation, Metadata, Citation (University of Edinburgh):o http://datalib.edina.ac.uk/mantra/documentation_metadata_citation/

• “Practical guidance for anyone working with research data”: Chapters 4 and 5 (UK Data Service):o http://ukdataservice.ac.uk/manage-data/handbook.aspx

• School of Data:o http://schoolofdata.org/courses/

• Metadata Guide Working Level (Australian National Data Service):o http://ands.org.au/guides/metadata-working.html

• Create & Manage Data: Documenting Your Data (UK Data Service):o http://www.data-archive.ac.uk/create-manage/document

• Guide to writing “ReadMe” style metadata (Cornell University):o http://data.research.cornell.edu/content/readme

• “Understanding Metadata” (National Information Standards Organization):o http://www.niso.org/publications/press/UnderstandingMetadata.pdf

• Best Practices in Creating Metadata (ICPSR):o http://www.icpsr.umich.edu/icpsrweb/content/deposit/guide/chapter3docs.html

Tools for Metadata Creation:

For lists of discipline-specific metadata tools, visit:http://www.dcc.ac.uk/resources/metadata-standards/tools

• Automated extraction of technical metadata from files:o File Information Tool Set (FITS): http://projects.iq.harvard.edu/fits

• The File Information Tool Set (FITS) identifies, validates and extracts technical metadata for a wide range of file formats.

o JHOVE: http://sourceforge.net/projects/jhove/

• JHOVE provides functions to perform format-specific identification, validation, and characterization of digital objects.

o JHOVE2: https://bitbucket.org/jhove2/main/wiki/Home

• JHOVE2 is open source software for format-aware characterization of digital objects.

o Exiftool: http://www.sno.phy.queensu.ca/~phil/exiftool/

• ExifTool is a platform-independent Perl library plus a command-line application for reading, writing and editing meta information in a wide variety of files. ExifTool is also available as a stand-alone Windows executable and a Macintosh OS X package

o Apache Tika: http://tika.apache.org/

• The Apache Tika™ toolkit detects and extracts metadata and text from over a thousand different file types (such as PPT, XLS, and PDF). All of these file types can be parsed through a single interface, making Tika useful for search engine indexing, content analysis, translation, and much more.

• Adding metadata to Microsoft documents:o Microsoft Document Properties: http://www.lib.cam.ac.uk/dataman/resources/Cambridge_documentproperties_factsheet.pdf

• The Document Properties feature in Microsoft Office applications such as Word, PowerPoint, Access or Excel allow you to attach information about your document to the file.

o Colectica for Excel: http://www.colectica.com/software/colecticaforexcel

• Colectica for Microsoft Excel is a free tool to document your spreadsheet data using the open standard for data documentation.

mcarruth@umich.edu

Recommended